4971 words

25 minutes

Probability and Measure — Part 3

2025-04-23

2025-12-19

Math

/

Measure

/

Probability

本篇是 Adam B Kashlak 老师的 Probability and Measure Theory 课程笔记 Part 3

Probability Theory#

$L^p$ Spaces#

$L^p$ Space
Let $(\Omega, \mathcal{F}, \mu)$ be a measure space and $f : \Omega \rightarrow [−\infty, \infty]$ a measurable function, then we say $f \in L^p(\Omega,\mathcal{F}, \mu)$ , for $1 \leq p < \infty$ if
$\displaystyle\int |f|^p d\mu < \infty$
For $p = \infty$ , we say $f \in L^\infty(\Omega,\mathcal{F}, \mu)$ if $\inf \{t\in [-\infty, \infty] : |f| \leq t ~~\mu\text{-a.e.}\} < \infty$

This definition allows us to write down the $L^p$ norm, which is defined as:

\begin{aligned} ||f||_p &= \left(\displaystyle\int |f|^p d\mu\right)^{1/p} \quad \text{for} \quad 1 \leq p < \infty \\ ||f||_\infty &= \inf \{t\in [-\infty, \infty] : |f| \leq t ~~\mu\text{-a.e.}\} \\ &= \inf \{t\in [-\infty, \infty] : \mu(\{|f(x)| > t\}) = 0\} \end{aligned}

Markov / Chebyshev and Jensen’s Inequalities#

Markov’s Inequality
Let f be a non-negative measurable function and $t > 0$ . Then, denote $\{f > t\} \vcentcolon= \{\omega \in \Omega : f(\omega) > t\}$ ,
$\mu(\{f > t\}) \leq t^{-1} \displaystyle\int f \mathrm{d}\mu$

Proof
Noting that $t\mathbf{1}_{\{f>t\}} \leq f$ , then by monotonicity of the integral,
$t\mu(\{f > t\}) = \displaystyle\int t\mathbf{1}_{\{f>t\}} \mathrm{d}\mu \leq \displaystyle\int f \mathrm{d}\mu$
which proves the theorem. $\square$

There are two useful ways to use Markov’s inequality:

Chebyshev’s Inequality: For $f$ measurable and $m \in \mathbb{R}$ , $\mu(\{|f - m| > t\}) = \mu(\{(f - m)^2 > t^2\})\leq t^{-2} \displaystyle\int (f - m)^2 \mathrm{d}\mu \\ P(|X - \mathbb{E}[X]| > t) \leq \frac{\text{Var}(X)}{t^2}$
Chernoff’s Inequality: For $f$ measurable and $m \in \mathbb{R}$ , $\mu(\{f > t\}) \leq e^{-tm} \displaystyle\int e^{mf} \mathrm{d}\mu \\$ In probability theory, the right hand side becomes the moment generating function or the Laplace transform.

Convex Functions on $\mathbb{R}$
Let $I \subset \mathbb{R}$ be an interval. A function $\phi : I \rightarrow \mathbb{R}$ is convex if for all $t \in [0, 1]$ and all $x, y \in I$ ,
$\phi(tx + (1 − t)y) \leq t\phi(x) + (1 − t)\phi(y)$

Jensen’s Inequality
Let $(\Omega, \mathcal{F}, P)$ be a probability space and $X$ an integrable random variable ( i.e. a measurable function in $L^1(\Omega, \mathcal{F}, P)~$ ) such that $X : \Omega \rightarrow I \subset \mathbb{R}$ . For any convex $\phi : I \rightarrow \mathbb{R}$ ,
$\phi(\displaystyle\int X \mathrm{d}P) \leq \displaystyle\int \phi(X) \mathrm{d}P$
which is $\phi(\mathbb{E}[X]) \leq \mathbb{E}[\phi(X)]$

Proof
For some $c \in I$ , if $X = c$ , $P$ -a.e., then the result is immediate. Otherwise, let $m = \mathbb{E}[X]$ be the mean of $X$ , which lies in the interior of interval. Then, we can choose $a, b \in \mathbb{R}$ such that $\phi(x) \geq ax + b$ for all $x \in I$ with equality at $x = m$ . Then, $\phi(X) \geq aX + b$ , and
$\phi(\mathbb{E}[X]) = am + b = \mathbb{E}[aX + b] \leq \mathbb{E}[\phi(X)]$
Lastly, to check that $\mathbb{E}[\phi(X)]$ is well defined (i.e. not $\infty−\infty$ ), we note that $\phi = \phi^+ −\phi^-$ where $\phi^-$ is concave and $\phi^-(x) \leq |a||x| + |b|$ . Hence, $\mathbb{E}[\phi(X)] \leq |a|\mathbb{E}|x| + |b| < \infty$ . $\square$

Hölder and Minkowski’s Inequalities#

Hölder’s Inequality
Let $p, q \in [1, \infty]$ be conjugate indices ( i.e. $\frac{1}{p} + \frac{1}{q} = 1$ ) and $f$ and $g$ be measurable, then $||fg||_1 \leq ||f||_p||g||_q$

Proof
If either $||f||_p = 0, \infty$ or $||g||_q = 0, \infty$ , then the result is immediate. Hence, for $f$ such that $0 < ||f||_p < \infty$ , we can normalize and without loss of generality assume that $||f||_p = 1$ . Then, we can define a probability measure $P$ on $\mathcal{F}$ such that for any $A \in \mathcal{F}$ ,
$P(A) \vcentcolon= \displaystyle\int_A f\mathrm{d}\mu$
Then, using Jensen’s Inequality with $\phi(x) = x^q$ and $q(p − 1) = p$ ,
$\begin{aligned} ||fg||_1 &= \displaystyle\int |fg| \mathrm{d}\mu = \displaystyle\int \frac{|g|}{|f|^{p - 1}}\mathbf{1}_{|f| > 0}\underbrace{|f|^p\mathrm{d}\mu}_{\color{#b984df}\text{prob measure }\mathrm{d}\nu} \\ &\leq \left[\displaystyle\int \left(\frac{|g|}{|f|^{p-1}}\mathbf{1}_{|f| > 0}\right)^q|f|^p\mathrm{d}\mu\right]^{\frac{1}{q}} \\ &\leq \left[\displaystyle\int |g|^q\mathrm{d}\mu\right]^\frac{1}{q} = ||f||_p||g||_q \end{aligned}$

From Hölder’s inequality with $p=q=2$ , we can derive Cauchy-Schwarz Inequality:

Cauchy-Schwarz Inequality
For measurable $f$ and $g$ , $||fg||_1\leq||f||_2||g||_2$

Minkowski’s Inequality
Let $p \in [1, \infty]$ and $f$ , $g$ be measurable functions, then $||f+g||_p\leq ||f||_p + ||g||_p$

Proof
If either $||f||_p = \infty$ or $||g||_p = \infty$ or $||f+g||_p=0$ , then we are done. If $p=1$ , then $|(f + g)(\omega)| \leq |f(\omega)| + |g(\omega)|$ and the result follows quickly. For $p\in (1, \infty]$ and $p, q$ conjugates, we note that
$|||f+g|^{p-1}||_q = \left[\displaystyle\int |f+g|^{(p-1)q}\mathrm{d}\mu\right]^\frac{1}{q} = \left[\displaystyle\int|f+g|^p\mathrm{d}\mu\right]^{\frac{p-1}{p}} = ||f + g||^{p-1}_p$
Then using the above equality, we have
$\begin{aligned} ||f+g||^p_p &= \displaystyle\int |f+g|^p\mathrm{d}\mu \leq \displaystyle\int |f||f+g|^{p-1}\mathrm{d}\mu + \displaystyle\int |g||f+g|^{p-1}\mathrm{d}\mu \\ &\leq ||f||_p|||f+g|^{p-1}||_q + ||g||_p|||f+g|^{p-1}||_q \\ &= (||f||_p + ||g||_p)||f+g||^{p-1}_p \end{aligned}$
Divide both sides by $||f + g||^{p−1}_p$ finishes the proof. $\square$

$L_p$ Approximation Theorem
Let $(\Omega, \mathcal{F}, \mu$ ) be a measure space, and let $\mathcal{A}$ be a $\pi$ -system such that $\sigma(A) = \mathcal{F}$ and $\mu(A) < \infty$ for all $A ∈ \mathcal{A}$ and there exists $A_i \uparrow \Omega, A_i \in \mathcal{A}$ . Let the collection of simple functions be
$V_0 \vcentcolon= \left\{\displaystyle\sum_{i=1}^n a_i\mathbf{1}_{A_i} : a_i \in \mathbb{R}, A_i\in\mathcal{A}, n\in\mathbb{N}\right\}$
For $p\in[1,\infty), V_0\subset L_p$ and for all $f\in L^p$ and all $\epsilon > 0$ , there exists a $v\in V_0$ such that $||f-v||_p < \epsilon$ .

Proof
For any $A \in \mathcal{A}, ||\mathbf{1}_A||_p = (\displaystyle\int\mathbf{1}_A\mathrm{d}\mu)^\frac{1}{p} = \mu(A)^\frac{1}{p} < \infty$ . Thus $\mathbf{1}_A\in L^p$ for all $A\in\mathcal{A}$ . Since $L^p$ is a linear space, $V_0 \subset L^p$ .
Next, let $V\subseteq L^p$ be all $\mathcal{F}\in L^p$ that can be approximated as above by some $v\in V_0$ and $\epsilon > 0$ . Let $f,g\in V$ be approximated by $v_f, v_g$ then by Minkowski’s Inequality,
$||f+g - (v_f + v_g)||_p \leq ||f - v_f||_p + ||g - v_g||_p < 2\epsilon$
Hence, $V$ is also a linear space.
Now, assume $\Omega\in\mathcal{A}$ ( i.e. $\mu(\Omega) < \infty$ ). Let $\mathcal{L} = \{B \in \Omega : \mathbf{1}_B\in V\}$ , which we will show is, in fact, a $\lambda$ -system. We know that $A \subset \mathcal{L}$ and thus $\Omega \in \mathcal{L}$ . For $A, B \in\mathcal{L}$ such that $A\subseteq B$ then $\mathbf{1}_{B\setminus A} = \mathbf{1}_B - \mathbf{1}_A \in V$ since $V$ is linear, so $B\setminus A\in\mathcal{L}$ . Lastly, for $\{A_i\}_{i=1}^\infty$ pairwise disjoint with $A_i \in \mathcal{L}$ , let $A = \displaystyle\bigcup_{i=1}^\infty A_i$ and $B_j = \displaystyle\bigcup_{i=1}^j A_i$ . Then, $B_j \uparrow A$ and $||\mathbf{1}_A − \mathbf{1}_{B_j}||_p = \mu(A \setminus B_j)^\frac{1}{p} \rightarrow 0$ . Therefore, $A \in \mathcal{L}$ , and thus $\mathcal{L}$ is a $\lambda$ -system. By Dynkin $\pi$ - $\lambda$ Theorem, $\mathcal{F}\subset\mathcal{L}$ and thus $\mathbf{1}_B\in V$ for any $B\in \mathcal{F}$ . Therefore, for any non-negative $f\in L^p$ , we can construct simple functions $f_n = \min\{n, 2^{-n}\lfloor 2^n\rfloor f\}$ such that $f_n\uparrow f$ . Then $|f - f_n|^p\rightarrow 0$ pointwise and $|f - f_n|^p \leq |f|^p$ . Hence, by Dominated Convergence Theorem, $||f - f_n||^p\rightarrow 0$ . Thus $f \in V$ and by the linearity of $V$ , $V = L^p$ .
Lastly, for general $\Omega$ , we have by assumption a sequence $A_i \uparrow \Omega$ . Hence, for any $f \in L^p$ , we have that $f\mathbf{1}_{A_i} \in V$ and similarly to above, $|f − f\mathbf{1}_{A_i}|^p \rightarrow 0$ pointwise and $|f − f\mathbf{1}_{A_i}|^p \leq |f|^p$ . Therefore, $||f − f\mathbf{1}_{A_i}||_p \rightarrow 0$ by dominiated convergence. Thus, $f \in V$ . $\square$

Convergence in Probability & Measure#

Convergence of Measure#

Weak Convergence of Measure
Let $S$ be a metric space and $\mathcal{S}$ be the Borel $\sigma$ -Field on $S$ . Then, for a measure $P$ and a sequence $\{P_i\}_{i=1}^\infty$ , we say that $P_i$ converges weakly to $P$ , i.e. $P_i \Rightarrow P$ , if
$\displaystyle\int f \mathrm{d}P_i \rightarrow \displaystyle\int f \mathrm{d}P$
for all $f\in \mathscr{C}^0_B(\mathbb{R})$ , all continuous bounded real-valued functions on $S$ .

Portmanteau Theorem
For $P$ and $P_i$ on a metric space $(S, \mathcal{S})$ , the following are equivalent:

$P_i \Rightarrow P$

$\displaystyle\int f\mathrm{d}P_i\rightarrow\displaystyle\int f\mathrm{d}P$ for all bounded uniformly continuous functions $f$

$\displaystyle\limsup_i P_i(C) \leq P(C)$ for all closed sets $C$

$\displaystyle\liminf_i P_i(U) \geq P(U)$ for all open sets $U$

$\displaystyle\lim_i P_i(A) = P(A)$ for all sets $A\in\mathcal{S}$ with $P(\partial A) = 0$

Proof

(1) $\rightarrow$ (2) If convergence holds for every $f \in \mathscr{C}^0_B$ then is certainly hold for all bounded uniformly continous $f$ .

(2) $\rightarrow$ (3) For any $C$ closed and $\epsilon > 0$ there exists a $\delta > 0$ such that for $C_δ = \{x \in S : d(x, C) < \delta\}$ , we have $P(C_\delta) < P(C) + \epsilon$ as $C_\delta \downarrow C$ as $\delta\rightarrow 0^+$ . Then, we can define an $f$ such that $f = 1$ on $C$ , $f = 0$ on $S \setminus C_\delta$ . Then $f$ is uniformly continuous (by Urysohn’s Lemma) and $0 \leq f \leq 1$ Then, by (2), we have that
$P_i(C) \leq \displaystyle\int f\mathrm{d}P_i\rightarrow\displaystyle\int f\mathrm{d}P \leq P(C_\delta) < P(C) + \epsilon$
Thus, taking the $\displaystyle\limsup$ and $\epsilon$ to zero gives $\displaystyle\limsup_i P_i(C) \leq P(C)$

(3) $\rightarrow$ (1) Let $f \in \mathscr{C}^0_B(\mathbb{R})$ . Our goal is to show that $\displaystyle\limsup_i\displaystyle\int f \mathrm{d}P_i \leq\displaystyle\int f \mathrm{d}P$ and similarly for $\displaystyle\liminf$ to show (1) holds. As $f$ is bounded, we can shift and scale it, and without loss of generality, we assume that $0 < f < 1$ . Then, for any choice of $n \in \mathbb{N}$ , we define nested closed sets $C_j = \{x\in S : f(x)\geq j/n\}$ for all $j= 1, 2, \ldots, n$ and cut $f$ into pieces to get
$\displaystyle\sum_{j=1}^n\frac{j-1}{n}P(C_{j-1}\setminus C_j) \leq \displaystyle\int f\mathrm{d}P \leq \displaystyle\sum_{j=1}^n\frac{j}{n}P(C_{j-1}\setminus C_j)$
Also, $P(C_{j-1}\setminus C_j) = P(C_{j-1}) - P(C_j)$ , the above becomes
$\frac{1}{n}\displaystyle\sum_{j=1}^n P(C_j) \leq \displaystyle\int f\mathrm{d}P \leq \frac{1}{n} + \frac{1}{n}\displaystyle\sum_{j=1}^n P(C_{j})$
Thus,
$\displaystyle\limsup_i \displaystyle\int f\mathrm{d}P_i \leq \frac{1}{n} + \frac{1}{n}\displaystyle\sum_{j=1}^n \displaystyle\limsup_i P_i(C_j) \leq \frac{1}{n} + \frac{1}{n}\displaystyle\sum_{j=1}^n P(C_j) \leq \frac{1}{n} + \displaystyle\int f\mathrm{d}P$
Taking $n \rightarrow \infty$ gives $\displaystyle\limsup_i \displaystyle\int f \mathrm{d}P_i \leq \displaystyle\int f \mathrm{d}P$ . Replacing $f$ with $−f$ gives $\displaystyle\liminf_i f \mathrm{d}P_i \geq \displaystyle\int f \mathrm{d}P$ . Thus the $\displaystyle\limsup$ and $\displaystyle\liminf$ coincide proving that (3) $\rightarrow$ (1).

(3) $\Leftrightarrow$ (4) Let $U$ be the complement of $C$ . Then, $P(U) = 1 - P(C)$ Thus, $\displaystyle\limsup_i P_i(C) \leq P(C)$ is equivalent to $\displaystyle\liminf_i P_i(U) \geq P(U)$ .

(3)(4) $\Leftrightarrow$ (5) For any set $A\in\mathcal{S}$ with $P(\partial A) = 0$ , $A^\circ$ is an open set and $\bar{A}$ is a closed set. If (3) and (4) hold, we have that
$P(A^\circ) \leq \displaystyle\liminf_i P_i(A^\circ) \leq \displaystyle\liminf_i P_i(A) \leq \displaystyle\limsup_i P_i(\bar{A}) \leq \displaystyle\limsup_i P_i(\bar{A}) \leq P(\bar{A})$
Since $P(\partial A)=0$ , we have that $P(A^\circ) = P(\bar{A})$ . This is equivalent to $\displaystyle\lim_i P_i(A) = P(A)$ , which is (5). $\square$

Convergence of Random Variables#

In contrast to convergence of measures, let $(\Omega, \mathcal{F}, \mu)$ be a probability space and $(S, \mathcal{S})$ be a metric space with Borel sets as above. Then, for a random variable (i.e. measurable function) $X : \Omega → S$ , we can define a probability measure

P(A) \vcentcolon= \mu(X^{-1}(A)) ~~~ A\in\mathcal{S}

This is the distribution of $X$ . Then the expectation of a random variable can be written in multiple ways due to change of variables:

E[X] = \displaystyle\int_\Omega X(\omega) \mathrm{d}\mu(\omega) = \displaystyle\int_S x \mathrm{d}P(x)

Note that above Portmanteau Theorem can be rephrased for random variables as well.

Convergence in Distribution
For a sequence of random variables $\{X_i\}_{i=1}^\infty$ , we say that $X_i$ converges to $X$ in distribution ( denoted $X_i\overset{d}{\longrightarrow}X$ ) if $P_i \Rightarrow P$ .

Convergence in Probability
For a sequence of random variables $\{X_i\}_{i=1}^\infty$ , we say that $X_i$ converges to $X$ in probabilty ( denoted $X_i\overset{P}{\longrightarrow}X$ ) if for all $\epsilon > 0$
$\mu(\{\omega \in \Omega : d(X_i(\omega), X(\omega)) > \epsilon\}) \rightarrow 0$
In short, $P(d(X_i, X)>\epsilon)\rightarrow 0$ .

This means that the measure of the set of $\omega$ where $X_i(\omega)$ and $X(\omega)$ differ by more than $\epsilon$ goes to zero as $i \rightarrow \infty$ . Convergence in probability is closely connected to the metric $d$ on $(S, \mathcal{S})$ .

Convergence Almost Surely
For a sequence of random variables $\{X_i\}_{i=1}^\infty$ , we say that $X_i$ converges almost surely to $X$ ( denoted $X_i\overset{\text{a.s.}}{\longrightarrow}X$ ) if
$\mu(\{\omega \in \Omega : X_i(\omega) \rightarrow X(\omega)\}) = 1$
i.e. pointwise convergence almost everywhere.

Convergence in $L^p$
For a sequence of random variables $\{X_i\}_{i=1}^\infty$ , we say that $X_i$ converges to $X$ in $L^p$ if
$E[d(X_i - X)^p] = \displaystyle\int d(X_i(\omega), X(\omega))^p\mathrm{d}\mu(\omega) \rightarrow 0$

Here, we can think of $d(X_i(\omega), X(\omega))$ as a function from $\Omega$ to $\mathbb{R}^+$ . In the case that we have real valued random variables, i.e. $S = \mathbb{R}$ , then this is

\displaystyle\int |X_i - X|^p\mathrm{d}\mu \rightarrow 0

Hierarchy of Convergence Types#

Conv a.s. $\Rightarrow$ conv in probability
Conv in probability $\Rightarrow$ conv in distribution
For $1\leq p < q \leq \infty$ , conv in $L^q$ $\Rightarrow$ conv in $L^p$
For any $p\in [1, \infty]$ , conv in $L^p$ $\Rightarrow$ conv in probability

Borel-Cantelli Lemmas#

Let $(\Omega, \mathcal{F}, \mu)$ be a probability space. For $\{A_i\}_{i=1}^\infty, A_i\in \mathcal{F}$ , then we define

\displaystyle\limsup_i A_i = \displaystyle\bigcap_{i=1}^\infty\displaystyle\bigcup_{j > i} A_j ~~~\text{and}~~~ \displaystyle\liminf_i A_i = \displaystyle\bigcup_{i=1}^\infty\displaystyle\bigcap_{j > i} A_j

The set $\displaystyle\limsup_i A_i$ is sometimes referred to as $A_i$ infinitely often or $A_i$ i.o. This is because $\omega \in \displaystyle\limsup_i A_i$ implies that for any $N \in \mathbb{N}$ there exists an $n > N$ such that $\omega\in A_n$ . Similarly, some write $A_i$ eventually or $A_i$ ev. for $\displaystyle\liminf_i A_i$ . This is because for $\omega \in \displaystyle\liminf_i A_i$ then there exists an $N$ large enough such that $\omega \in A_n$ for all $n \geq N$ .

1st Borel-Cantelli Lemma
Let $\{A_i\}_{i=1}^\infty$ with $A_i \in \mathcal{F}$ . If $\displaystyle\sum_{i=1}^\infty \mu(A_i) < \infty$ , then $\mu(\displaystyle\limsup_i A_i) = 0$ .

Proof
As the summation converges, then the tail sum has to tend to zero, we have simply that
$\mu(\displaystyle\limsup_i A_i) = \mu\left(\displaystyle\bigcap_{i=1}^\infty\displaystyle\bigcup_{j > i} A_j\right) \leq \mu\left(\displaystyle\bigcup_{j > i} A_j\right) \leq \displaystyle\sum_{j > i} \mu(A_j) \rightarrow 0$
as $i\rightarrow \infty$ . $\square$

2nd Borel-Cantelli Lemma
Let $\{A_i\}_{i=1}^\infty$ be an independent collection with $A_i \in \mathcal{F}$ . If $\displaystyle\sum_{i=1}^\infty \mu(A_i) = \infty$ , then $\mu(\displaystyle\limsup_i A_i) = 1$ .

Proof
Note that $1 − t \leq e^{−t}$ for all $t \in \mathbb{R}$ and we have that the independence of the $\{A_i\}_{i=1}^\infty$ implies the independence $\{A^c_i\}_{i=1}^\infty$ . Therefore, for any $i \in \mathbb{N}$ and $k \geq i$ ,
$\mu(\displaystyle\bigcap_{j=i}^k A^c_j) = \displaystyle\prod_{j=i}^k (1 - \mu(A_j)) \leq \exp\left(-\displaystyle\sum_{j=i}^k \mu(A_j)\right)$
Taking $k \rightarrow \infty$ takes the right hand side to zero. Hence, $\mu(\displaystyle\cap_{j>i} A^c_j) = 0$ for all $i$ . Thus,
$\mu(\displaystyle\limsup_i A_i) = \mu\left(\displaystyle\bigcap_{i=1}^\infty\displaystyle\bigcup_{j > i} A_j\right) = 1 - \mu\left(\displaystyle\bigcup_{i=1}^\infty \displaystyle\bigcap _{j > i} A^c_j\right) = 1$
which is the desired result. $\square$

Law of Large Numbers#

Let $\{X_i\}_{i=1}^\infty$ be random variables from $(\Omega, \mathcal{F}, P)$ to $(\mathbb{R}, \mathcal{B})$ . Hence, for any $A \in \mathcal{B}$ , we write

P(X \in A)\vcentcolon= P(\{\omega \in \Omega : X(\omega) \in A\}).

and $E[X] = \displaystyle\int X(\omega)\mathrm{d}P$ . Furthermore, we define the partial sum $S_n = \displaystyle\sum^n_{i=1} X_i$ , which is also a measurable random variable.

Independence
For random variables $X$ and $Y$ on the same probability space $(\Omega, \mathcal{F}, P)$ but possibly with different codomains, $(\mathbb{X}, \mathcal{X})$ and $(Y, \mathcal{Y})$ respectively, we say that $X$ and $Y$ are independent if
$P(\{X \in A\}\displaystyle\cap \{Y \in B\}) = P(X \in A)P(Y \in B)$
for all $A \in \mathcal{X}$ and $B \in \mathcal{Y}$ .

This definition can be extended to a finite collection of random variables $\{X_i\}^n_{i=1}$ implying

P(\displaystyle\bigcap_{i=1}^n \{X_i \in A_i\}) = \displaystyle\prod^n_{i=1} P(X_i \in A_i)

for all $A_i$ . We say that an infinite collection of random variables is independent if all finite collections are independent.

Note that since $\{X \in A\}$ is shorthand for $\{\omega \in \Omega : X(\omega) \in A\} = X^{−1}(A)$ , random variables $X$ and $Y$ are independent if and only if the $\sigma$ -fields $\sigma(X)$ and $\sigma(Y)$ are independent ( defined in here ).

Identically Distributed
For $X : \Omega → \mathbb{R}$ , the distribution of $X$ is the measure induced by $X$ on $\mathbb{R}$ , i.e., $P\circ X^{-1}(A)$ for $A\in \mathcal{B}$ . We say that $X$ and $Y$ are identically distributed if $P \circ X^{−1}$ and $P \circ Y^{-1}$ coincide almost surely.

Weak Law of Large Numbers#

Weak Law of Large Numbers
Let $(\Omega, \mathcal{F}, P)$ be a probability space and $\{X_i\}_{i=1}^\infty$ be random variables (measurable functions) from $\Omega$ to $\mathbb{R}$ such that $E[X_i] = c \in \mathbb{R}$ and $E[X_i^2]=1$ for all $i$ and $E[(X_i − c)(X_j − c)] = 0$ for all $i \neq j$ . Then $\frac{S_n}{n} \overset{P}{\longrightarrow} c$ .

Proof
Without loss of generality, we assume $c = 0$ . Otherwise, we can replace $X_i$ with $X_i − c$ . Then, for any $t > 0$ , Chebyshev’s inequality implies that
$P(\frac{|S_n|}{n} \geq t) \leq \frac{E[S_n^2]}{n^2t^2} = \frac{1}{n^2t^2}E[(\displaystyle\sum^n_{i=1} X_i)^2] = \frac{1}{n^2t^2}E[\displaystyle\sum^n_{i=1} X_i^2] = \frac{1}{nt^2} \to 0$
as $n \to \infty$ . $\square$

Note that in the above proof, we only require that the $X_i$ be uncorrelated (i.e. $E[(X_i −c)(X_j −c)] = 0$ ) and not independent. In the next theorem, we require independence, but remove all second moment conditions.

Strong Law of Large Numbers#

Strong Law of Large Numbers
Let $\{X_i\}_{i=1}^\infty$ be i.i.d. random variables from $\Omega$ to $\mathbb{R}$ .

If $E[|X_i|] = \infty$ , then $\frac{S_n}{n}$ does not converge to any finite value.

If $E[|X_i|] < \infty$ , then $\frac{S_n}{n} \overset{a.s.}{\longrightarrow} c$ for $c = E[X_i]$ .

Proof

Assume that $\frac{S_n}{n} \longrightarrow c \in \mathbb{R}$ but also that $E[|X_i|] = \infty$ , and note that $\frac{X_n}{n} = \frac{S_n − S_{n−1}}{n} \to 0$ . Since $E[|X_i|] = \infty$ , then $\displaystyle\sum_{n=0}^\infty P(X_i > n) = \infty$ and 2nd Borel-Cantelli Lemma says that $|X_n| > n$ for infinitely many $n$ . Thus
$P(\{\omega\in\Omega : \frac{S_n - S_{n-1}}{n}\to 0\}) = 0$
Thus $\frac{S_n}{n}\nrightarrow c \in \mathbb{R}$ .

Assume that $E[X_i] = c \in \mathbb{R}$ . Without loss of generality, we assume $X_i \geq 0$ for all $i$ . Otherwise, we can write $X = X^+ − X^−$ and independence of $X$ and $Y$ implies independence for $X^+$ and $Y^+$ . Also, we use $F$ to denote the distribution of $X$ , i.e. $F(x) = P(X \leq x)$ .

We define $Y_i = X_i\mathbf{1}_{X_i\leq i}$ and $T_n = \displaystyle\sum_{i=1}^n Y_i$ . For any $\delta > 1$ , we can define a non-decreasing integer sequence $k_n = \lfloor \delta^n \rfloor$ . Then $1\leq k_n\leq \delta^n < k_n + 1 \leq 2k_n$ and $k_n^{-2}\leq 4\delta^{-2n}$ . Therefore,
$\displaystyle\sum_{n=1}^\infty k_n^{-2}\mathbf{1}_{k_n \geq i}\leq 4\displaystyle\sum_{n = 1}^\infty \delta^{-2n}\mathbf{1}_{\delta^n\geq i} = \frac{4}{i^2(1 - \delta^{-2})}\leq c_0 i^{-2} \tag{1}$
for some constant $c_0$ . We also note that $\displaystyle\sum_{i=k+1}^\infty i^{−2} < \displaystyle\int_k^\infty x^{-2}\mathrm{d}x = \frac{1}{k}$ . By Chebyshev’s inequality, $\forall t > 0$ , $\exists~ c_1$ depending on $t$ and $\delta$ such that
$\begin{aligned} \displaystyle\sum_{n=1}^\infty P(|T_{k_n} - E[T_{k_n}]| \geq tk_n) &\leq c_1\displaystyle\sum_{n=1}^\infty k_n^{-2}\text{Var}(T_{k_n}) \\ &= c_1\displaystyle\sum_{n=1}^\infty k_n^{-2}\displaystyle\sum_{i=1}^{k_n}\text{Var}(Y_i) \\ &\leq c_1\displaystyle\sum_{n=1}^\infty \text{Var}(Y_i)\displaystyle\sum_{k_n\geq i}k_n^{-2} \\ &\leq c_2\displaystyle\sum_{n=1}^\infty i^{-2}\text{Var}(Y_i) ~~~\text{by (1)}\\ &= c_2\displaystyle\sum_{i=1}^\infty i^{-2}\displaystyle\int_0^i x^2\mathrm{d}F(x) \\ &= c_2\displaystyle\sum_{i=1}^\infty i^{-2}\left\{\displaystyle\sum_{k=0}^{i-1}\displaystyle\int_k^{k+1}x^2\mathrm{d}F(x)\right\} \\ &\leq c_3\displaystyle\sum_{k=0}^\infty \frac{1}{k+1}\displaystyle\int_k^{k+1}x^2\mathrm{d}F(x) \\ &\leq c_3\displaystyle\sum_{k=0}^\infty \displaystyle\int_k^{k+1}x\mathrm{d}F(x) = c_3E[X_i] < \infty \end{aligned}$
And thus $\displaystyle\sum_{n=1}^\infty P(|T_{k_n} - E[T_{k_n}]| \geq tk_n) < \infty$ . Hence, by 1st Borel-Cantelli Lemma, $n^{-1}(|T_{k_n} - E[T_{k_n}])\overset{a.s.}\longrightarrow0$ . Since $E[Y_n]\uparrow E[X_i]$ , we have that ${k_n}^{-1}E[T_{k_n}]\uparrow E[X_i]$ and in turn that $n^{-1}T_{k_n}\overset{a.s.}\longrightarrow E[X_i]$ .

To get back to $X_i$ and $S_n$ , we note that $E[X] <\infty$ if and only if $\displaystyle\sum_{i=0}^\infty P (X > i) < \infty$ . Thus $\displaystyle\sum_{i=1}^\infty P (X_i \neq Y_i) = \displaystyle\sum_{i=1}^\infty P (X_i > i) < \infty$ and 21stnd Borel-Cantelli Lemma says that $P (\displaystyle\limsup\{X_i \neq Y_i\}) = 0$ so for $i$ large enough, $X_i = Y_i$ a.s. We define “large enough” to be $i > m(\omega)$ ( i.e. $X_i(\omega) = Y_i(\omega) ~~ \forall i > m(\omega)$ ) Furthermore, $k_n^{−1} S_{m(\omega)} \to 0$ and $k_n^{−1}T_{m(\omega)} \to 0$ as $n \to \infty$ , meaning that the contribution of the terms where $X_i$ and $Y_i$ may not coincide becomes negligible. Hence, $k_n^{-1}S_{k_n}\overset{a.s.}\longrightarrow E[X_i]$ , so we have almost sure convergence of a subsequence.

Finally, since $k_{n+1}/k_n \to \delta$ , there exists an $n$ large enough such that $1 \leq k_{n+1}/k_n < \delta^2$ . Thus, for $k_n < i < k_{n+1}$ ,
$k_n^{-1}S_{k_n}\leq \delta^2\frac{S_i}{i}\leq \delta^4 k_{n+1}^{-1}S_{k_{n+1}} \\ \delta^{-2}E[X_i] \leq \displaystyle\limsup_{n\to\infty}\frac{S_i}{i} \leq \displaystyle\limsup_{i\to\infty}\frac{S_i}{i} \leq \delta^2 E[X_i]$
Thus, taking $\delta\to 1$ concludes the proof. $\square$

Central Limit Theorem#

Gaussian Measure on $\mathbb{R}$
A Borel measure $\gamma$ on $(\mathbb{R}, \mathcal{B})$ is said to be Gaussian with mean $m$ and variance $\sigma$ if
$\gamma((a, b]) = \frac{1}{\sigma\sqrt{2\pi}}\displaystyle\int_a^b e^{-(x-m)^2/2\sigma^2}\mathrm{d}\lambda(x)$

Gaussian Measure on $\mathbb{R}^d$
A Borel measure $\gamma$ on $(\mathbb{R}^d, \mathcal{B})$ is said to be Gaussian if for all linear functionals $f : \mathbb{R}^d \rightarrow \mathbb{R}$ , the induced measure $\gamma \circ f^{−1}$ on $(\mathbb{R}, \mathcal{B})$ is Gaussian.

Gaussian Random Variable
A random variable $Z$ from a probability space $(\Omega, \mathcal{F}, \mu)$ to $(\mathbb{R}^d, \mathcal{B})$ is said to be Gaussian if $\gamma \vcentcolon= \mu \circ Z^{−1}$ is a Gaussian measure on $(\mathbb{R}^d, \mathcal{B})$ .

Characteristic Function
For a probability measure $\mu$ on $(\mathbb{R}^d, \mathcal{B})$ , the characteristic function (Fourier transform) $\tilde{\mu}$ : \mathbb{R}^d \rightarrow \mathbb{C}$ is defined as
$\tilde{\mu}(t) \vcentcolon= \displaystyle\int \exp\{i\langle x, t\rangle\}\mathrm{d}\mu(x)$
We can also invert the above transformation. That is, if $\tilde{\mu}$ is integrable with respect to Lebesgue measure on $\mathbb{R}^d$ , then
$p(x) = (2\pi)^{-d}\displaystyle\int \tilde{\mu}(t)\exp\{-i\langle x, t\rangle\}\mathrm{d}\lambda(t)~~~\lambda-a.e.$

Convolution
For two measures $\mu$ and $\nu$ on $(\mathbb{R}^d, \mathcal{B})$ , the convolution measure is defined as
$(\mu * \nu)(B) \vcentcolon= \displaystyle\int \nu(B − x)\mathrm{d}\mu(x)$
for all $B \in \mathcal{B}$ where $B - x = \{y \in \mathbb{R}^d : y +x \in B\}$ .

Note that the convolution operator $*$ is commutative and associative. Also, for two independent random variables $X$ and $Y$ with corresponding measures $\mu$ and $\nu$ , the measure of $X + Y$ is $\mu * \nu$ .

Uniquesness of Characteristic Function Theorem
Let $\mu$ and $\nu$ be probability measures on $(\mathbb{R}^d, \mathcal{B})$ . If $\tilde{\mu} = \tilde{\nu}$ then $\mu = \nu$ .

Proof
Let $\gamma_\sigma$ be a mean zero Gaussian measure on $\mathbb{R}^d$ with variance $\sigma^2I$ . We denote $\mu^{(\sigma)} = \mu * \gamma_\sigma$ and similarly for $\nu^{(\sigma)}$ . It can be shown that the corresponding density functions for $\mu^{(\sigma)}$ and $\nu^{(\sigma)}$ are
$p^{(\sigma)}(x) = \frac{1}{(2\pi)^d}\displaystyle\int \tilde{\mu}(t)\exp\left\{-i\langle x, t\rangle - \frac{1}{2}\sigma^2|t|^2\right\}\mathrm{d}\lambda(t) \\ q^{(\sigma)}(x) = \frac{1}{(2\pi)^d}\displaystyle\int \tilde{\nu}(t)\exp\left\{-i\langle x, t\rangle - \frac{1}{2}\sigma^2|t|^2\right\}\mathrm{d}\lambda(t)$
Since $\tilde{\mu} = \tilde{\nu}$ , we have that $p^{(\sigma)} = q^{(\sigma)}$ for all $\sigma > 0$ .
Let $X$ be a random variable corresponding to $\mu$ and $Z$ to $\gamma_1$ . Then, the measure $\mu^{(\sigma)}$ is paired with the random variable $X + \sigma Z$ . Thus $X+\sigma Z\overset{a.s.}{\longrightarrow}X$ as $\sigma\downarrow 0$ , that is, pointwise for almost all $\omega$ . Thus, this convergence holds in probability and thus in distribution, i.e $\mu^{(\sigma)}\Rightarrow\mu$ as $\sigma\downarrow 0$ .
Lastly, we have that $\mu^{(\sigma)} \Rightarrow \mu$ and $\nu^{(\sigma)} \Rightarrow \nu$ . Since the limit is unique $\mu = \nu$ . $\square$

Central Limit Theorem
Let $(\Omega, \mathcal{F}, P)$ be a probability space, $\{X_i\}_{i=1}^\infty$ be i.i.d. random variables on $(\mathbb{R}^d, \mathcal{B})$ such that $E[X_i] = 0$ and $E[|X_i|^2] < \infty$ . Let $S_n = \displaystyle\sum_{j=1}^n X_j$ . Then, $n^{-1/2}S_n\overset{d}{\longrightarrow}Z$ where $Z$ is a gaussian random variable with zero mean and covariance $\Sigma$ with $jk$ th entry $\Sigma_{jk} = E[X_{ij}X_{ik}]$ .

Proof
As the random vectors $X_i$ are mean zero and independent $E[\langle X_j, X_k \rangle] = 0$ for $j\neq k$ . In turn, for any $n$ ,
$E\left[|n^{-1/2}S_n|^2\right] = n^{-1}E\left[\displaystyle\sum_{j,k=1}^n\langle X_j, X_k \rangle\right] = E\left[|X_j|^2\right]$
For any $\epsilon > 0$ , there exists an $M_\epsilon > 0$ such that $E[|X_j|^2]/M_\epsilon^2 < ε$ . Thus, from Chebyshev’s inequality, we have that $P(|n^{−1/2}S_n| > M_\epsilon) < \epsilon$ . This implies that the sequence $n^{−1/2}S_n$ is “uniformly tight.
For a vector $v \in \mathbb{R}^d$ , the random variables $\langle X_j, v \rangle$ are i.i.d. real-valued with $E[\langle X_j, v\rangle] = 0$ and $E[\langle X_j, v\rangle^2] < \infty$ . Let $h(v) \vcentcolon= E[\exp(i\langle X_j, v\rangle)]$ be the characteristic function of $X_j$ .Then, $h(0) = 1$ and $\nabla h(0)=0$ and $\nabla^2 h(0) = -\Sigma$ Thus, by Taylor’s Theorem, we have
$h(v) = 1 - \frac{1}{2}v^\mathrm{T}\Sigma v + o(||v||_2^2)$
Thus, for any fixed vector $v$ ,
$E\left[\exp\left\{i\langle n^{-1/2}S_n, v\rangle \right\}\right] = h(n^{-1/2}v)^n = \left[1 - \frac{v^\mathrm{T}\Sigma v}{2n} + o\left(\frac{||v||_2^2}{n}\right)\right]^n \to \exp\left\{-\frac{1}{2}v^\mathrm{T}\Sigma v\right\}$
as $n\to\infty$ . Thus, by the Uniqueness of Characteristic Function Theorem, we have that $n^{−1/2}S_n\overset{d}{\longrightarrow}Z$ where $Z$ is a Gaussian random variable with zero mean and covariance $\Sigma$ . $\square$

Erogodic Theorem#

Measure Preserving Map
Let $(\Omega, \mathcal{F}, \mu)$ be a measure space. A mapping $T : \Omega \to \Omega$ is called measure preserving if
$\mu(T^{-1}(A)) = \mu(A)~~~ \forall A\in \mathcal{F}$

Invariant Set and Function
For a mapping $T : \Omega\to\Omega$ ,

A set $A \in \mathcal{F}$ is $T$ -invariant if $T^{−1}(A) = A$ . The set of all $T$ -invariant sets forms a $\sigma$ -field $\mathcal{F}_T$ .

A measurable function $f : \Omega\to\mathbb{R}$ is $T$ -invariant if $f = f \circ T$ . $f$ is $T$ -invariant if and only if $f$ is $\mathcal{F}_T$ -measurable, i.e. $\forall B \in\mathcal{B}(\mathbb{R}), f^{-1}(B)\in \mathcal{F}_T$

Ergodic Map
A mapping $T$ is said to be ergodic if for any $A \in \mathcal{F}_T$ ,
$\mu(A) = 0 ~~~\text{or}~~~ \mu(A^c) = 0$

Example

For Lebesgue measure on $(0, 1]$ , two examples of measure preserving maps are the shift map

T(x) = x + a \mod 1

and Baker’s Map

T(x) = 2x - \lfloor 2x \rfloor

Furthermore, it can be shown that

If $f$ is integrable and $T$ is measure preserving then $f \circ T$ is integrable and

\displaystyle\int f\mathrm{d}\mu = \displaystyle\int f\circ T\mathrm{d}\mu

If $T$ is ergodic and $f$ is invariant, then $f = c~$ $\mu$ -a.e. for some constant $c$ .

Birkhoff and von Neumann’s Theorems#

In what follows, we let $(\Omega, \mathcal{F}, \mu)$ be a measure space, $T$ be a measure preserving transformation, $f : \Omega \rightarrow\mathbb{R}$ a measurable function, and

S_n = S_n(f) = f + f \circ T + \cdots + f \circ T^{n−1}

where $S_0 = 0$ .

Maximal Ergodic Lemma
Let $f$ be integrable and $S^∗$ = $\sup_{n\geq 0} S_n(f)$ (element-wise maximum). Then,
$\displaystyle\int_{S^∗ > 0} f \mathrm{d}\mu \geq 0$

Proof
Let $S^∗_n = \max_{0\leq m\leq n} S_m(f)$ and $A_n = \{\omega \in \Omega : S^∗_n(ω) > 0\}$ . Then, for $1 \leq m \leq n$ ,
$S_m = f + S_{m - 1}\circ T \leq f + S^*_n \circ T$
Furthermore, on the set $A_n$ ,
$S_n^* = \max_{0\leq m\leq n} S_m(f) \leq f + S^*_n \circ T$
On the set $A_n^c$ , $S^∗_n = 0$ since $S_0=0$ and we have that $S^∗_n = 0 \leq S^∗_n \circ T$ Thus, integrating both sides of the above gives
$\displaystyle\int_{\Omega} S^∗_n \mathrm{d}\mu \leq \displaystyle\int_{A_n} f \mathrm{d}\mu + \displaystyle\int_{A_n} S^∗_n \circ T \mathrm{d}\mu$
Since $S^∗_n$ is integrable and $T$ is measure preserving, $\displaystyle\int S^∗_n\mathrm{d}\mu = \displaystyle\int S^*n \circ T \mathrm{d}\mu < \infty$ . Thus, we have $\displaystyle\int_{A_n} f\mathrm{d}\mu \geq 0$ . As $n \rightarrow \infty, A_n\uparrow \{S_n^* > 0\}$ , we have that
$\displaystyle\int_S{n^* > 0} f \mathrm{d}\mu \geq 0 = \displaystyle\lim_{n \rightarrow 0}\displaystyle\int_{A_n} f \mathrm{d}\mu \geq 0$
due to Dominated Convergencee Theorem with $|f|$ as the dominating function. $\square$

Birkhoff’s Ergodic Theorem
Let $(\Omega, \mathcal{F}, \mu)$ be a $\sigma$ -finite measure space and $f \in L^1(\Omega, \mathcal{F}, \mu)$ . Then, there exists an $T$ -invariant $\bar{f} \in L(\Omega, \mathcal{F}, \mu)$ such that
$\displaystyle\int |\bar{f}| \mathrm{d}\mu \leq \displaystyle\int |f| \mathrm{d}\mu$
and $n^{−1}S_n(f) \rightarrow \bar{f}$ as $n \rightarrow \infty$ $\mu$ -a.e.

Proof
Both $\displaystyle\liminf_{n\rightarrow \infty}n^{-1}S_n(f)$ and $\displaystyle\limsup_{n\rightarrow \infty} n^{−1}S_n(f)$ are $T$ -invariant. Indeed, $n^{-1}S_n(f)\circ T = n^{-1}[S_{n + 1}(f) - f] = [(n + 1) / n](n + 1)^{-1}S_{n + 1}(f) - n^{-1}f$ Thus, we can define a set for $a < b$
$D_{a, b} = \left\{\omega\in\Omega : \displaystyle\liminf_{n\rightarrow\infty}\frac{S_n(f)(\omega)}{n} < a < b < \displaystyle\limsup_{n\rightarrow\infty}\frac{S_n(f)(\omega)}{n}\right\}$
which means that the $\displaystyle\liminf$ and $\displaystyle\limsup$ are separated, and this set $D_{a,b}$ is $T$ -invariant. The goal of the proof is to show that $\mu(D_{a,b}) = 0$ . Without loss of generality, we take $b > 0$ . Otherwise, $a < 0$ and we multiply everything by $-1$ .
For some $B \in F$ such that $\mu(B) < \infty$ , then we set $g = f − b\mathbf{1}B$ . Function $g$ is integrable and for each $x \in D_{a,b}$ , there is an $n$ such that $S_n(g)(x) \geq S_n(f)(x) − nb \geq 0$ since $b < \displaystyle\limsup_{n\rightarrow 0} n^{−1}S_n(f)$ . Thus, $S^∗(g) > 0$ and the Maximal Ergodic Lemma says that
$0\leq\displaystyle\int_D (f - b\mathbf{1}_B)\mathrm{d}\mu = \displaystyle\int_D f\mathrm{d}\mu - b\mu(B)$
As $\mu$ is $\sigma$ -finite, there exist such a sequence of sets $B_n \in F$ such that $B_n \uparrow D_{a,b}$ and $\mu(B_n) < \infty$ for all $n$ . Thus,
$b\mu(D_{a, b}) = \displaystyle\lim_{n\rightarrow\infty}b\mu(B_n) \leq \displaystyle\int_{D_{a, b}} f\mathrm{d}\mu$
This implies that $\mu(D_{a,b}) < \infty$ . Redoing the above argument for $−a$ and $−f$ results in $-a\mu(D_{a,b})\leq \displaystyle\int_{D_{a,b}}(-f)\mathrm{d}\mu$ . Therefore,
$b\mu(D_{a,b}) \leq \displaystyle\int_{D_{a,b}} f\mathrm{d}\mu \leq -a\mu(D_{a,b})$
and since $a < b$ , we have that $\mu(D_{a,b}) = 0$ .
Next, let
$E = \{\omega\in\Omega : \displaystyle\liminf_{n\rightarrow\infty}n^{-1}S_n(f)\} < \displaystyle\limsup_{n\rightarrow \infty}n^{-1}S_n(f)$
Then, $E$ is $T$ -invariant as the $\displaystyle\liminf$ and $\displaystyle\limsup$ are. Furthermore, $E = \displaystyle\bigcup_{a,b}\in\{a, b\in\mathbb{Q},a<b, D_{a,b}\}$ . Thus, $\mu(E) = 0$ .
This means that $n^{−1}S_n(f)$ converges in $[-\infty, \infty]$ on $E^c$ . Therefore, we define
$\bar{f} = \begin{cases} \displaystyle\lim_{n\rightarrow\infty}n^{-1}S_n(f) &~ \omega\in E^c \\ -\infty &~ \omega\in E \end{cases}$
Lastly, $\displaystyle\int f\circ T^n \mathrm{d}\mu = \displaystyle\int f \mathrm{d}\mu$ ( $T$ is measure preserving )and thus $\displaystyle\int |S_n(f)|\mathrm{d}\mu \leq n \displaystyle\int |f|\mathrm{d}\mu$ for all $n$ . Applying Fatou’s Lemma gives
$\displaystyle\int |\bar{f}|\mathrm{d}\mu = \displaystyle\int \displaystyle\lim_{n\to\infty}|n^{-1}S_n(f)|\mathrm{d}\mu \leq \displaystyle\liminf_{n\rightarrow\infty}\displaystyle\int |n^{-1}S_n(f)|\mathrm{d}\mu \leq \displaystyle\int |f|\mathrm{d}\mu$
finishing the proof. $\square$

Von Neumann’s Ergodic Theorem
Let $\mu(\Omega) < \infty$ and $p \in [1, \infty)$ . Then, for all $f \in L^p(\Omega, \mathcal{F}, \mu)$ , there exists an $\bar{f} \in L^p$ such that $n^{-1}S_n(f) \overset{L^p}{\longrightarrow} \bar{f}$ and
$\displaystyle\int \bar{f} \mathrm{d}\mu = \displaystyle\int f \mathrm{d}\mu$

Proof

We begin by noting that

||f\circ T^n||^p_p = \displaystyle\int |f|^p\circ T^n\mathrm{d}\mu = ||f||^p_p

By the above and the Minkowski’s Inequality, $||n^{-1}S_n(f)||_p \leq ||f||_p$ . Since $f\in L^p$ , given a $\epsilon > 0$ , we can choose a $C > 0$ such that $||f - g||_p \leq \epsilon/3$ with

g(x) = \begin{cases} C & f(x) > C \\ f(x) & -C \leq f(x) \leq C \\ -C & f(x) < -C \end{cases}

i.e. $g$ is $f$ bounded above and below by $C$ and $−C$ . By the Birkhoff Ergodic Theorem, $n^{-1}S_n(g)\to \bar{g}~$ $\mu$ -a.e.

Next, we note that $|n^{−1}S_n(g)| \leq C$ for all $n$ , and thus by (Dominated Convergence Theorem)[#dominated-convergence-theorem], there exists an $N$ such that for all $n > N$ ,

||n^{-1}S_n(g) - \bar{g}||_p \leq \frac{\epsilon}{3}

Applying Fatou’s Lemma gives that

||\bar{f} - \bar{g}||_p^p = \displaystyle\int \displaystyle\liminf |n^{-1}S_n(f - g)|^p\mathrm{d}\mu \leq \displaystyle\liminf_{n\rightarrow\infty}\displaystyle\int |n^{-1}S_n(f - g)|^p\mathrm{d}\mu = ||f - g||^p_p

Thus, for $n > N$ ,

||n^{-1}S_n(f) - \bar{f}||_p \leq ||n^{-1}S_n(f - g)||_p + ||n^{-1}S_n(g) - \bar{g}||_p + ||\bar{g} - \bar{f}||_p <\epsilon

Since $n^{-1}S_n(f) \overset{L^p}{\longrightarrow} \bar{f}$ , the convergence must hold in $L^1$ as well. Thus, we have

\displaystyle\int \bar{f}\mathrm{d}\mu = \displaystyle\int \displaystyle\lim_{n\to\infty}n^{-1}S_n(f)\mathrm{d}\mu = \displaystyle\lim_{n\to\infty}\displaystyle\int n^{-1}S_n(f)\mathrm{d}\mu = \displaystyle\int f\mathrm{d}\mu \\

which gives the desired result. $\square$

Law of Large Numbers, Again#

Let (Ω, F, P) be a probability space with i.i.d. real-valued random variables $\{X_i\}_{i=1}^\infty$ with distribution function $\mathrm{d}F$ . We define a map $\pi : \Omega \to S \vcentcolon= \mathbb{R}^\mathbb{N}$ by

\pi(\omega) = (X_1(\omega), X_2(\omega), \ldots)

Let $\nu=\mu\circ\pi^{-1}$ be the corresponding probability measure on $S$ .

Because the variables $\{X_i\}$ are independent, $\nu$ has the form $\nu = \mu_1\times\mu_2\times\cdots$ , and because they are identically distributed, all the marginal distributions $\mu_j$ are the same, so in fact $\nu = \mu^\mathbb{N}$ for some probability distribution $\mu$ on $\mathbb{R}$ .

For a sequence $(x_1, x_2, \ldots) \in S$ , we can define the shift map $T : S\to S$ to be

T(x_1, x_2, x_3, \ldots) = (x_2, x_3, x_4, \ldots)

Then the shift map is measure preserving and ergodic by Kolmogorov’s zero-one law.

Strong Law of Large Numbers, Again
Let $\{X_i\}_{i=1}^\infty$ be i.i.d. random variables from $\Omega$ to $\mathbb{R}$ . If $E[|X_i|] < \infty$ , then $\frac{S_n}{n} \overset{a.s.}{\longrightarrow} c$ for $c = E[X_i]$ .

Proof

Let $f : S \to \mathbb{R}$ by taking the first coordinate, that is, for $x = (x_1, x_2, \ldots)\in S$ , $f(x) = x_1$ . Then, for $T$ being the shift map and $x = \pi(\omega)$ , we have

S_n(f)(x) = f(x) + (f\circ T)(x) + \cdots + (f\circ T^{n-1}) = X_1(\omega) + X_2(\omega) + \cdots + X_n(\omega)

Thus, von Neumann’s Ergodic Theorem says that there exists an invariant $\bar{f}\in L^1$ such that

n^{-1}S_n(f)(x)\overset{a.s.}{\longrightarrow} \bar{f}

for $x\in S$ and

\displaystyle\int \bar{f}\mathrm{d}\mu = \displaystyle\int f\mathrm{d}\mu

Since $T$ is ergodic, the result from the beginning of this section states that $\bar{f} = c$ , a constant, almost surely. Thus,

c = \displaystyle\int \bar{f}\mathrm{d}\mu = \displaystyle\lim_{n\to \infty}\displaystyle\int n^{-1}S_n(f) = \displaystyle\int f\mathrm{d}\mu = E[X_i]

gives the disred result. $\square$

Probability and Measure — Part 3

https://astronaut.github.io/posts/measure-theory-part3/

Author

关怀他人

Published at

2025-04-23

License

CC BY-NC-SA 4.0

Probability and Measure — Part 2

1

Probability Theory

LpL^pLp Spaces

Markov / Chebyshev and Jensen’s Inequalities

Hölder and Minkowski’s Inequalities

Convergence in Probability & Measure

Convergence of Measure

Convergence of Random Variables

Hierarchy of Convergence Types

Borel-Cantelli Lemmas

Law of Large Numbers

Weak Law of Large Numbers

Strong Law of Large Numbers

Central Limit Theorem

Erogodic Theorem

Birkhoff and von Neumann’s Theorems

Law of Large Numbers, Again

Probability Theory#

LpL^pLp Spaces#

Markov / Chebyshev and Jensen’s Inequalities#

Hölder and Minkowski’s Inequalities#

Convergence in Probability & Measure#

Convergence of Measure#

Convergence of Random Variables#

Hierarchy of Convergence Types#

Borel-Cantelli Lemmas#

Law of Large Numbers#

Weak Law of Large Numbers#

Strong Law of Large Numbers#

Central Limit Theorem#

Erogodic Theorem#

Birkhoff and von Neumann’s Theorems#

Law of Large Numbers, Again#

$L^p$ Spaces#