4971 words
25 minutes
Probability and Measure — Part 3
2025-04-23
2025-12-19

本篇是 Adam B Kashlak 老师的 Probability and Measure Theory 课程笔记 Part 3

Probability Theory#

LpL^p Spaces#

LpL^p Space

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space and f:Ω[,]f : \Omega \rightarrow [−\infty, \infty] a measurable function, then we say fLp(Ω,F,μ)f \in L^p(\Omega,\mathcal{F}, \mu), for 1p<1 \leq p < \infty if

fpdμ<\displaystyle\int |f|^p d\mu < \infty

For p=p = \infty, we say fL(Ω,F,μ)f \in L^\infty(\Omega,\mathcal{F}, \mu) if inf{t[,]:ft  μ-a.e.}<\inf \{t\in [-\infty, \infty] : |f| \leq t ~~\mu\text{-a.e.}\} < \infty

This definition allows us to write down the LpL^p norm, which is defined as:

fp=(fpdμ)1/pfor1p<f=inf{t[,]:ft  μ-a.e.}=inf{t[,]:μ({f(x)>t})=0}\begin{aligned} ||f||_p &= \left(\displaystyle\int |f|^p d\mu\right)^{1/p} \quad \text{for} \quad 1 \leq p < \infty \\ ||f||_\infty &= \inf \{t\in [-\infty, \infty] : |f| \leq t ~~\mu\text{-a.e.}\} \\ &= \inf \{t\in [-\infty, \infty] : \mu(\{|f(x)| > t\}) = 0\} \end{aligned}

Markov / Chebyshev and Jensen’s Inequalities#

Markov’s Inequality

Let f be a non-negative measurable function and t>0t > 0. Then, denote {f>t}:={ωΩ:f(ω)>t}\{f > t\} \vcentcolon= \{\omega \in \Omega : f(\omega) > t\},

μ({f>t})t1fdμ\mu(\{f > t\}) \leq t^{-1} \displaystyle\int f \mathrm{d}\mu
Proof

Noting that t1{f>t}ft\mathbf{1}_{\{f>t\}} \leq f, then by monotonicity of the integral,

tμ({f>t})=t1{f>t}dμfdμt\mu(\{f > t\}) = \displaystyle\int t\mathbf{1}_{\{f>t\}} \mathrm{d}\mu \leq \displaystyle\int f \mathrm{d}\mu

which proves the theorem.\square

There are two useful ways to use Markov’s inequality:

  1. Chebyshev’s Inequality: For ff measurable and mRm \in \mathbb{R}, μ({fm>t})=μ({(fm)2>t2})t2(fm)2dμP(XE[X]>t)Var(X)t2\mu(\{|f - m| > t\}) = \mu(\{(f - m)^2 > t^2\})\leq t^{-2} \displaystyle\int (f - m)^2 \mathrm{d}\mu \\ P(|X - \mathbb{E}[X]| > t) \leq \frac{\text{Var}(X)}{t^2}
  2. Chernoff’s Inequality: For ff measurable and mRm \in \mathbb{R}, μ({f>t})etmemfdμ\mu(\{f > t\}) \leq e^{-tm} \displaystyle\int e^{mf} \mathrm{d}\mu \\ In probability theory, the right hand side becomes the moment generating function or the Laplace transform.
Convex Functions on R\mathbb{R}

Let IRI \subset \mathbb{R} be an interval. A function ϕ:IR\phi : I \rightarrow \mathbb{R} is convex if for all t[0,1]t \in [0, 1] and all x,yIx, y \in I,

ϕ(tx+(1t)y)tϕ(x)+(1t)ϕ(y)\phi(tx + (1 − t)y) \leq t\phi(x) + (1 − t)\phi(y)
Jensen’s Inequality

Let (Ω,F,P)(\Omega, \mathcal{F}, P) be a probability space and XX an integrable random variable ( i.e. a measurable function in L1(Ω,F,P) L^1(\Omega, \mathcal{F}, P)~) such that X:ΩIRX : \Omega \rightarrow I \subset \mathbb{R}. For any convex ϕ:IR\phi : I \rightarrow \mathbb{R},

ϕ(XdP)ϕ(X)dP\phi(\displaystyle\int X \mathrm{d}P) \leq \displaystyle\int \phi(X) \mathrm{d}P

which is ϕ(E[X])E[ϕ(X)]\phi(\mathbb{E}[X]) \leq \mathbb{E}[\phi(X)]

Proof

For some cIc \in I, if X=cX = c, PP-a.e., then the result is immediate. Otherwise, let m=E[X]m = \mathbb{E}[X] be the mean of XX, which lies in the interior of interval. Then, we can choose a,bRa, b \in \mathbb{R} such that ϕ(x)ax+b\phi(x) \geq ax + b for all xIx \in I with equality at x=mx = m. Then, ϕ(X)aX+b\phi(X) \geq aX + b, and

ϕ(E[X])=am+b=E[aX+b]E[ϕ(X)]\phi(\mathbb{E}[X]) = am + b = \mathbb{E}[aX + b] \leq \mathbb{E}[\phi(X)]

Lastly, to check that E[ϕ(X)]\mathbb{E}[\phi(X)] is well defined (i.e. not \infty−\infty ), we note that ϕ=ϕ+ϕ\phi = \phi^+ −\phi^- where ϕ\phi^- is concave and ϕ(x)ax+b\phi^-(x) \leq |a||x| + |b|. Hence, E[ϕ(X)]aEx+b<\mathbb{E}[\phi(X)] \leq |a|\mathbb{E}|x| + |b| < \infty. \square

Hölder and Minkowski’s Inequalities#

Hölder’s Inequality

Let p,q[1,]p, q \in [1, \infty] be conjugate indices ( i.e. 1p+1q=1\frac{1}{p} + \frac{1}{q} = 1 ) and ff and gg be measurable, then fg1fpgq||fg||_1 \leq ||f||_p||g||_q

Proof

If either fp=0,||f||_p = 0, \infty or gq=0,||g||_q = 0, \infty, then the result is immediate. Hence, for ff such that 0<fp<0 < ||f||_p < \infty, we can normalize and without loss of generality assume that fp=1||f||_p = 1. Then, we can define a probability measure PP on F\mathcal{F} such that for any AFA \in \mathcal{F},

P(A):=AfdμP(A) \vcentcolon= \displaystyle\int_A f\mathrm{d}\mu

Then, using Jensen’s Inequality with ϕ(x)=xq\phi(x) = x^q and q(p1)=pq(p − 1) = p,

fg1=fgdμ=gfp11f>0fpdμprob measure dν[(gfp11f>0)qfpdμ]1q[gqdμ]1q=fpgq\begin{aligned} ||fg||_1 &= \displaystyle\int |fg| \mathrm{d}\mu = \displaystyle\int \frac{|g|}{|f|^{p - 1}}\mathbf{1}_{|f| > 0}\underbrace{|f|^p\mathrm{d}\mu}_{\color{#b984df}\text{prob measure }\mathrm{d}\nu} \\ &\leq \left[\displaystyle\int \left(\frac{|g|}{|f|^{p-1}}\mathbf{1}_{|f| > 0}\right)^q|f|^p\mathrm{d}\mu\right]^{\frac{1}{q}} \\ &\leq \left[\displaystyle\int |g|^q\mathrm{d}\mu\right]^\frac{1}{q} = ||f||_p||g||_q \end{aligned}

From Hölder’s inequality with p=q=2p=q=2, we can derive Cauchy-Schwarz Inequality:

Cauchy-Schwarz Inequality

For measurable ff and gg, fg1f2g2||fg||_1\leq||f||_2||g||_2

Minkowski’s Inequality

Let p[1,]p \in [1, \infty] and ff, gg be measurable functions, then f+gpfp+gp||f+g||_p\leq ||f||_p + ||g||_p

Proof

If either fp=||f||_p = \infty or gp=||g||_p = \infty or f+gp=0||f+g||_p=0, then we are done. If p=1p=1, then (f+g)(ω)f(ω)+g(ω)|(f + g)(\omega)| \leq |f(\omega)| + |g(\omega)| and the result follows quickly. For p(1,]p\in (1, \infty] and p,qp, q conjugates, we note that

f+gp1q=[f+g(p1)qdμ]1q=[f+gpdμ]p1p=f+gpp1|||f+g|^{p-1}||_q = \left[\displaystyle\int |f+g|^{(p-1)q}\mathrm{d}\mu\right]^\frac{1}{q} = \left[\displaystyle\int|f+g|^p\mathrm{d}\mu\right]^{\frac{p-1}{p}} = ||f + g||^{p-1}_p

Then using the above equality, we have

f+gpp=f+gpdμff+gp1dμ+gf+gp1dμfpf+gp1q+gpf+gp1q=(fp+gp)f+gpp1\begin{aligned} ||f+g||^p_p &= \displaystyle\int |f+g|^p\mathrm{d}\mu \leq \displaystyle\int |f||f+g|^{p-1}\mathrm{d}\mu + \displaystyle\int |g||f+g|^{p-1}\mathrm{d}\mu \\ &\leq ||f||_p|||f+g|^{p-1}||_q + ||g||_p|||f+g|^{p-1}||_q \\ &= (||f||_p + ||g||_p)||f+g||^{p-1}_p \end{aligned}

Divide both sides by f+gpp1||f + g||^{p−1}_p finishes the proof. \square

LpL_p Approximation Theorem

Let (Ω,F,μ(\Omega, \mathcal{F}, \mu) be a measure space, and let A\mathcal{A} be a π\pi-system such that σ(A)=F\sigma(A) = \mathcal{F} and μ(A)<\mu(A) < \infty for all AAA ∈ \mathcal{A} and there exists AiΩ,AiAA_i \uparrow \Omega, A_i \in \mathcal{A}. Let the collection of simple functions be

V0:={i=1nai1Ai:aiR,AiA,nN}V_0 \vcentcolon= \left\{\displaystyle\sum_{i=1}^n a_i\mathbf{1}_{A_i} : a_i \in \mathbb{R}, A_i\in\mathcal{A}, n\in\mathbb{N}\right\}

For p[1,),V0Lpp\in[1,\infty), V_0\subset L_p and for all fLpf\in L^p and all ϵ>0\epsilon > 0, there exists a vV0v\in V_0 such that fvp<ϵ||f-v||_p < \epsilon.

Proof

For any AA,1Ap=(1Adμ)1p=μ(A)1p<A \in \mathcal{A}, ||\mathbf{1}_A||_p = (\displaystyle\int\mathbf{1}_A\mathrm{d}\mu)^\frac{1}{p} = \mu(A)^\frac{1}{p} < \infty. Thus 1ALp\mathbf{1}_A\in L^p for all AAA\in\mathcal{A}. Since LpL^p is a linear space, V0LpV_0 \subset L^p.

Next, let VLpV\subseteq L^p be all FLp\mathcal{F}\in L^p that can be approximated as above by some vV0v\in V_0 and ϵ>0\epsilon > 0. Let f,gVf,g\in V be approximated by vf,vgv_f, v_g then by Minkowski’s Inequality,

f+g(vf+vg)pfvfp+gvgp<2ϵ||f+g - (v_f + v_g)||_p \leq ||f - v_f||_p + ||g - v_g||_p < 2\epsilon

Hence, VV is also a linear space.

Now, assume ΩA\Omega\in\mathcal{A} ( i.e. μ(Ω)<\mu(\Omega) < \infty ). Let L={BΩ:1BV}\mathcal{L} = \{B \in \Omega : \mathbf{1}_B\in V\}, which we will show is, in fact, a λ\lambda-system. We know that ALA \subset \mathcal{L} and thus ΩL\Omega \in \mathcal{L}. For A,BLA, B \in\mathcal{L} such that ABA\subseteq B then 1BA=1B1AV\mathbf{1}_{B\setminus A} = \mathbf{1}_B - \mathbf{1}_A \in V since VV is linear, so BALB\setminus A\in\mathcal{L}. Lastly, for {Ai}i=1\{A_i\}_{i=1}^\infty pairwise disjoint with AiLA_i \in \mathcal{L}, let A=i=1AiA = \displaystyle\bigcup_{i=1}^\infty A_i and Bj=i=1jAiB_j = \displaystyle\bigcup_{i=1}^j A_i. Then, BjAB_j \uparrow A and 1A1Bjp=μ(ABj)1p0||\mathbf{1}_A − \mathbf{1}_{B_j}||_p = \mu(A \setminus B_j)^\frac{1}{p} \rightarrow 0. Therefore, ALA \in \mathcal{L}, and thus L\mathcal{L} is a λ\lambda-system. By Dynkin π\pi-λ\lambda Theorem, FL\mathcal{F}\subset\mathcal{L} and thus 1BV\mathbf{1}_B\in V for any BFB\in \mathcal{F}. Therefore, for any non-negative fLpf\in L^p, we can construct simple functions fn=min{n,2n2nf}f_n = \min\{n, 2^{-n}\lfloor 2^n\rfloor f\} such that fnff_n\uparrow f. Then ffnp0|f - f_n|^p\rightarrow 0 pointwise and ffnpfp|f - f_n|^p \leq |f|^p. Hence, by Dominated Convergence Theorem, ffnp0||f - f_n||^p\rightarrow 0. Thus fVf \in V and by the linearity of VV , V=LpV = L^p.

Lastly, for general Ω\Omega, we have by assumption a sequence AiΩA_i \uparrow \Omega. Hence, for any fLpf \in L^p, we have that f1AiVf\mathbf{1}_{A_i} \in V and similarly to above, ff1Aip0|f − f\mathbf{1}_{A_i}|^p \rightarrow 0 pointwise and ff1Aipfp|f − f\mathbf{1}_{A_i}|^p \leq |f|^p. Therefore, ff1Aip0||f − f\mathbf{1}_{A_i}||_p \rightarrow 0 by dominiated convergence. Thus, fVf \in V. \square

Convergence in Probability & Measure#

Convergence of Measure#

Weak Convergence of Measure

Let SS be a metric space and S\mathcal{S} be the Borel σ\sigma-Field on SS. Then, for a measure PP and a sequence {Pi}i=1\{P_i\}_{i=1}^\infty, we say that PiP_i converges weakly to PP, i.e. PiPP_i \Rightarrow P, if

fdPifdP\displaystyle\int f \mathrm{d}P_i \rightarrow \displaystyle\int f \mathrm{d}P

for all fCB0(R)f\in \mathscr{C}^0_B(\mathbb{R}), all continuous bounded real-valued functions on SS.

Portmanteau Theorem

For PP and PiP_i on a metric space (S,S)(S, \mathcal{S}), the following are equivalent:

  1. PiPP_i \Rightarrow P
  2. fdPifdP\displaystyle\int f\mathrm{d}P_i\rightarrow\displaystyle\int f\mathrm{d}P for all bounded uniformly continuous functions ff
  3. lim supiPi(C)P(C)\displaystyle\limsup_i P_i(C) \leq P(C) for all closed sets CC
  4. lim infiPi(U)P(U)\displaystyle\liminf_i P_i(U) \geq P(U) for all open sets UU
  5. limiPi(A)=P(A)\displaystyle\lim_i P_i(A) = P(A) for all sets ASA\in\mathcal{S} with P(A)=0P(\partial A) = 0
Proof
  • (1)\rightarrow(2) If convergence holds for every fCB0f \in \mathscr{C}^0_B then is certainly hold for all bounded uniformly continous ff.

  • (2)\rightarrow(3) For any CC closed and ϵ>0\epsilon > 0 there exists a δ>0\delta > 0 such that for Cδ={xS:d(x,C)<δ}C_δ = \{x \in S : d(x, C) < \delta\}, we have P(Cδ)<P(C)+ϵP(C_\delta) < P(C) + \epsilon as CδCC_\delta \downarrow C as δ0+\delta\rightarrow 0^+. Then, we can define an ff such that f=1f = 1 on CC, f=0f = 0 on SCδS \setminus C_\delta. Then ff is uniformly continuous (by Urysohn’s Lemma) and 0f10 \leq f \leq 1 Then, by (2), we have that

    Pi(C)fdPifdPP(Cδ)<P(C)+ϵP_i(C) \leq \displaystyle\int f\mathrm{d}P_i\rightarrow\displaystyle\int f\mathrm{d}P \leq P(C_\delta) < P(C) + \epsilon

    Thus, taking the lim sup\displaystyle\limsup and ϵ\epsilon to zero gives lim supiPi(C)P(C)\displaystyle\limsup_i P_i(C) \leq P(C)

  • (3)\rightarrow(1) Let fCB0(R)f \in \mathscr{C}^0_B(\mathbb{R}). Our goal is to show that lim supifdPifdP\displaystyle\limsup_i\displaystyle\int f \mathrm{d}P_i \leq\displaystyle\int f \mathrm{d}P and similarly for lim inf\displaystyle\liminf to show (1) holds. As ff is bounded, we can shift and scale it, and without loss of generality, we assume that 0<f<10 < f < 1. Then, for any choice of nNn \in \mathbb{N}, we define nested closed sets Cj={xS:f(x)j/n}C_j = \{x\in S : f(x)\geq j/n\} for all j=1,2,,nj= 1, 2, \ldots, n and cut ff into pieces to get

    j=1nj1nP(Cj1Cj)fdPj=1njnP(Cj1Cj)\displaystyle\sum_{j=1}^n\frac{j-1}{n}P(C_{j-1}\setminus C_j) \leq \displaystyle\int f\mathrm{d}P \leq \displaystyle\sum_{j=1}^n\frac{j}{n}P(C_{j-1}\setminus C_j)

    Also, P(Cj1Cj)=P(Cj1)P(Cj)P(C_{j-1}\setminus C_j) = P(C_{j-1}) - P(C_j), the above becomes

    1nj=1nP(Cj)fdP1n+1nj=1nP(Cj)\frac{1}{n}\displaystyle\sum_{j=1}^n P(C_j) \leq \displaystyle\int f\mathrm{d}P \leq \frac{1}{n} + \frac{1}{n}\displaystyle\sum_{j=1}^n P(C_{j})

    Thus,

    lim supifdPi1n+1nj=1nlim supiPi(Cj)1n+1nj=1nP(Cj)1n+fdP\displaystyle\limsup_i \displaystyle\int f\mathrm{d}P_i \leq \frac{1}{n} + \frac{1}{n}\displaystyle\sum_{j=1}^n \displaystyle\limsup_i P_i(C_j) \leq \frac{1}{n} + \frac{1}{n}\displaystyle\sum_{j=1}^n P(C_j) \leq \frac{1}{n} + \displaystyle\int f\mathrm{d}P

    Taking nn \rightarrow \infty gives lim supifdPifdP\displaystyle\limsup_i \displaystyle\int f \mathrm{d}P_i \leq \displaystyle\int f \mathrm{d}P. Replacing ff with f−f gives lim infifdPifdP\displaystyle\liminf_i f \mathrm{d}P_i \geq \displaystyle\int f \mathrm{d}P. Thus the lim sup\displaystyle\limsup and lim inf\displaystyle\liminf coincide proving that (3)\rightarrow(1).

  • (3)\Leftrightarrow(4) Let UU be the complement of CC. Then, P(U)=1P(C)P(U) = 1 - P(C) Thus, lim supiPi(C)P(C)\displaystyle\limsup_i P_i(C) \leq P(C) is equivalent to lim infiPi(U)P(U)\displaystyle\liminf_i P_i(U) \geq P(U).

  • (3)(4)\Leftrightarrow(5) For any set ASA\in\mathcal{S} with P(A)=0P(\partial A) = 0, AA^\circ is an open set and Aˉ\bar{A} is a closed set. If (3) and (4) hold, we have that

    P(A)lim infiPi(A)lim infiPi(A)lim supiPi(Aˉ)lim supiPi(Aˉ)P(Aˉ)P(A^\circ) \leq \displaystyle\liminf_i P_i(A^\circ) \leq \displaystyle\liminf_i P_i(A) \leq \displaystyle\limsup_i P_i(\bar{A}) \leq \displaystyle\limsup_i P_i(\bar{A}) \leq P(\bar{A})

    Since P(A)=0P(\partial A)=0, we have that P(A)=P(Aˉ)P(A^\circ) = P(\bar{A}). This is equivalent to limiPi(A)=P(A)\displaystyle\lim_i P_i(A) = P(A), which is (5). \square

Convergence of Random Variables#

In contrast to convergence of measures, let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a probability space and (S,S)(S, \mathcal{S}) be a metric space with Borel sets as above. Then, for a random variable (i.e. measurable function) X:ΩSX : \Omega → S, we can define a probability measure

P(A):=μ(X1(A))   ASP(A) \vcentcolon= \mu(X^{-1}(A)) ~~~ A\in\mathcal{S}

This is the distribution of XX . Then the expectation of a random variable can be written in multiple ways due to change of variables:

E[X]=ΩX(ω)dμ(ω)=SxdP(x)E[X] = \displaystyle\int_\Omega X(\omega) \mathrm{d}\mu(\omega) = \displaystyle\int_S x \mathrm{d}P(x)

Note that above Portmanteau Theorem can be rephrased for random variables as well.

Convergence in Distribution

For a sequence of random variables {Xi}i=1\{X_i\}_{i=1}^\infty, we say that XiX_i converges to XX in distribution ( denoted XidXX_i\overset{d}{\longrightarrow}X ) if PiPP_i \Rightarrow P.

Convergence in Probability

For a sequence of random variables {Xi}i=1\{X_i\}_{i=1}^\infty, we say that XiX_i converges to XX in probabilty ( denoted XiPXX_i\overset{P}{\longrightarrow}X ) if for all ϵ>0\epsilon > 0

μ({ωΩ:d(Xi(ω),X(ω))>ϵ})0\mu(\{\omega \in \Omega : d(X_i(\omega), X(\omega)) > \epsilon\}) \rightarrow 0

In short, P(d(Xi,X)>ϵ)0P(d(X_i, X)>\epsilon)\rightarrow 0.

This means that the measure of the set of ω\omega where Xi(ω)X_i(\omega) and X(ω)X(\omega) differ by more than ϵ\epsilon goes to zero as ii \rightarrow \infty. Convergence in probability is closely connected to the metric dd on (S,S)(S, \mathcal{S}).

Convergence Almost Surely

For a sequence of random variables {Xi}i=1\{X_i\}_{i=1}^\infty, we say that XiX_i converges almost surely to XX ( denoted Xia.s.XX_i\overset{\text{a.s.}}{\longrightarrow}X ) if

μ({ωΩ:Xi(ω)X(ω)})=1\mu(\{\omega \in \Omega : X_i(\omega) \rightarrow X(\omega)\}) = 1

i.e. pointwise convergence almost everywhere.

Convergence in LpL^p

For a sequence of random variables {Xi}i=1\{X_i\}_{i=1}^\infty, we say that XiX_i converges to XX in LpL^p if

E[d(XiX)p]=d(Xi(ω),X(ω))pdμ(ω)0E[d(X_i - X)^p] = \displaystyle\int d(X_i(\omega), X(\omega))^p\mathrm{d}\mu(\omega) \rightarrow 0

Here, we can think of d(Xi(ω),X(ω))d(X_i(\omega), X(\omega)) as a function from Ω\Omega to R+\mathbb{R}^+. In the case that we have real valued random variables, i.e. S=RS = \mathbb{R}, then this is

XiXpdμ0\displaystyle\int |X_i - X|^p\mathrm{d}\mu \rightarrow 0

Hierarchy of Convergence Types#

  • Conv a.s. \Rightarrow conv in probability
  • Conv in probability \Rightarrow conv in distribution
  • For 1p<q1\leq p < q \leq \infty, conv in LqL^q \Rightarrow conv in LpL^p
  • For any p[1,]p\in [1, \infty], conv in LpL^p \Rightarrow conv in probability

Borel-Cantelli Lemmas#

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a probability space. For {Ai}i=1,AiF\{A_i\}_{i=1}^\infty, A_i\in \mathcal{F}, then we define

lim supiAi=i=1j>iAj   and   lim infiAi=i=1j>iAj\displaystyle\limsup_i A_i = \displaystyle\bigcap_{i=1}^\infty\displaystyle\bigcup_{j > i} A_j ~~~\text{and}~~~ \displaystyle\liminf_i A_i = \displaystyle\bigcup_{i=1}^\infty\displaystyle\bigcap_{j > i} A_j

The set lim supiAi\displaystyle\limsup_i A_i is sometimes referred to as AiA_i infinitely often or AiA_i i.o. This is because ωlim supiAi\omega \in \displaystyle\limsup_i A_i implies that for any NNN \in \mathbb{N} there exists an n>Nn > N such that ωAn\omega\in A_n. Similarly, some write AiA_i eventually or AiA_i ev. for lim infiAi\displaystyle\liminf_i A_i. This is because for ωlim infiAi\omega \in \displaystyle\liminf_i A_i then there exists an NN large enough such that ωAn\omega \in A_n for all nNn \geq N.

1st Borel-Cantelli Lemma

Let {Ai}i=1\{A_i\}_{i=1}^\infty with AiFA_i \in \mathcal{F}. If i=1μ(Ai)<\displaystyle\sum_{i=1}^\infty \mu(A_i) < \infty, then μ(lim supiAi)=0\mu(\displaystyle\limsup_i A_i) = 0.

Proof

As the summation converges, then the tail sum has to tend to zero, we have simply that

μ(lim supiAi)=μ(i=1j>iAj)μ(j>iAj)j>iμ(Aj)0\mu(\displaystyle\limsup_i A_i) = \mu\left(\displaystyle\bigcap_{i=1}^\infty\displaystyle\bigcup_{j > i} A_j\right) \leq \mu\left(\displaystyle\bigcup_{j > i} A_j\right) \leq \displaystyle\sum_{j > i} \mu(A_j) \rightarrow 0

as ii\rightarrow \infty. \square

2nd Borel-Cantelli Lemma

Let {Ai}i=1\{A_i\}_{i=1}^\infty be an independent collection with AiFA_i \in \mathcal{F}. If i=1μ(Ai)=\displaystyle\sum_{i=1}^\infty \mu(A_i) = \infty, then μ(lim supiAi)=1\mu(\displaystyle\limsup_i A_i) = 1.

Proof

Note that 1tet1 − t \leq e^{−t} for all tRt \in \mathbb{R} and we have that the independence of the {Ai}i=1\{A_i\}_{i=1}^\infty implies the independence {Aic}i=1\{A^c_i\}_{i=1}^\infty. Therefore, for any iNi \in \mathbb{N} and kik \geq i,

μ(j=ikAjc)=j=ik(1μ(Aj))exp(j=ikμ(Aj))\mu(\displaystyle\bigcap_{j=i}^k A^c_j) = \displaystyle\prod_{j=i}^k (1 - \mu(A_j)) \leq \exp\left(-\displaystyle\sum_{j=i}^k \mu(A_j)\right)

Taking kk \rightarrow \infty takes the right hand side to zero. Hence, μ(j>iAjc)=0\mu(\displaystyle\cap_{j>i} A^c_j) = 0 for all ii. Thus,

μ(lim supiAi)=μ(i=1j>iAj)=1μ(i=1j>iAjc)=1\mu(\displaystyle\limsup_i A_i) = \mu\left(\displaystyle\bigcap_{i=1}^\infty\displaystyle\bigcup_{j > i} A_j\right) = 1 - \mu\left(\displaystyle\bigcup_{i=1}^\infty \displaystyle\bigcap _{j > i} A^c_j\right) = 1

which is the desired result. \square

Law of Large Numbers#

Let {Xi}i=1\{X_i\}_{i=1}^\infty be random variables from (Ω,F,P)(\Omega, \mathcal{F}, P) to (R,B)(\mathbb{R}, \mathcal{B}). Hence, for any ABA \in \mathcal{B}, we write

P(XA):=P({ωΩ:X(ω)A}).P(X \in A)\vcentcolon= P(\{\omega \in \Omega : X(\omega) \in A\}).

and E[X]=X(ω)dPE[X] = \displaystyle\int X(\omega)\mathrm{d}P. Furthermore, we define the partial sum Sn=i=1nXiS_n = \displaystyle\sum^n_{i=1} X_i, which is also a measurable random variable.

Independence

For random variables XX and YY on the same probability space (Ω,F,P)(\Omega, \mathcal{F}, P) but possibly with different codomains, (X,X)(\mathbb{X}, \mathcal{X}) and (Y,Y)(Y, \mathcal{Y}) respectively, we say that XX and YY are independent if

P({XA}{YB})=P(XA)P(YB)P(\{X \in A\}\displaystyle\cap \{Y \in B\}) = P(X \in A)P(Y \in B)

for all AXA \in \mathcal{X} and BYB \in \mathcal{Y}.

This definition can be extended to a finite collection of random variables {Xi}i=1n\{X_i\}^n_{i=1} implying

P(i=1n{XiAi})=i=1nP(XiAi) P(\displaystyle\bigcap_{i=1}^n \{X_i \in A_i\}) = \displaystyle\prod^n_{i=1} P(X_i \in A_i)

for all AiA_i. We say that an infinite collection of random variables is independent if all finite collections are independent.

Note that since {XA}\{X \in A\} is shorthand for {ωΩ:X(ω)A}=X1(A)\{\omega \in \Omega : X(\omega) \in A\} = X^{−1}(A), random variables XX and YY are independent if and only if the σ\sigma-fields σ(X)\sigma(X) and σ(Y)\sigma(Y) are independent ( defined in here ).

Identically Distributed

For X:ΩRX : \Omega → \mathbb{R}, the distribution of XX is the measure induced by XX on R\mathbb{R}, i.e., PX1(A)P\circ X^{-1}(A) for ABA\in \mathcal{B}. We say that XX and YY are identically distributed if PX1P \circ X^{−1} and PY1P \circ Y^{-1} coincide almost surely.

Weak Law of Large Numbers#

Weak Law of Large Numbers

Let (Ω,F,P)(\Omega, \mathcal{F}, P) be a probability space and {Xi}i=1\{X_i\}_{i=1}^\infty be random variables (measurable functions) from Ω\Omega to R\mathbb{R} such that E[Xi]=cRE[X_i] = c \in \mathbb{R} and E[Xi2]=1E[X_i^2]=1 for all ii and E[(Xic)(Xjc)]=0E[(X_i − c)(X_j − c)] = 0 for all iji \neq j. Then SnnPc\frac{S_n}{n} \overset{P}{\longrightarrow} c.

Proof

Without loss of generality, we assume c=0c = 0. Otherwise, we can replace XiX_i with XicX_i − c. Then, for any t>0t > 0, Chebyshev’s inequality implies that

P(Snnt)E[Sn2]n2t2=1n2t2E[(i=1nXi)2]=1n2t2E[i=1nXi2]=1nt20P(\frac{|S_n|}{n} \geq t) \leq \frac{E[S_n^2]}{n^2t^2} = \frac{1}{n^2t^2}E[(\displaystyle\sum^n_{i=1} X_i)^2] = \frac{1}{n^2t^2}E[\displaystyle\sum^n_{i=1} X_i^2] = \frac{1}{nt^2} \to 0

as nn \to \infty. \square

Note that in the above proof, we only require that the XiX_i be uncorrelated (i.e. E[(Xic)(Xjc)]=0E[(X_i −c)(X_j −c)] = 0 ) and not independent. In the next theorem, we require independence, but remove all second moment conditions.

Strong Law of Large Numbers#

Strong Law of Large Numbers

Let {Xi}i=1\{X_i\}_{i=1}^\infty be i.i.d. random variables from Ω\Omega to R\mathbb{R}.

  1. If E[Xi]=E[|X_i|] = \infty, then Snn\frac{S_n}{n} does not converge to any finite value.
  2. If E[Xi]<E[|X_i|] < \infty, then Snna.s.c\frac{S_n}{n} \overset{a.s.}{\longrightarrow} c for c=E[Xi]c = E[X_i].
Proof
  1. Assume that SnncR\frac{S_n}{n} \longrightarrow c \in \mathbb{R} but also that E[Xi]=E[|X_i|] = \infty, and note that Xnn=SnSn1n0\frac{X_n}{n} = \frac{S_n − S_{n−1}}{n} \to 0. Since E[Xi]=E[|X_i|] = \infty, then n=0P(Xi>n)=\displaystyle\sum_{n=0}^\infty P(X_i > n) = \infty and 2nd Borel-Cantelli Lemma says that Xn>n|X_n| > n for infinitely many nn. Thus

    P({ωΩ:SnSn1n0})=0P(\{\omega\in\Omega : \frac{S_n - S_{n-1}}{n}\to 0\}) = 0

    Thus SnncR\frac{S_n}{n}\nrightarrow c \in \mathbb{R}.

  2. Assume that E[Xi]=cRE[X_i] = c \in \mathbb{R}. Without loss of generality, we assume Xi0X_i \geq 0 for all ii. Otherwise, we can write X=X+XX = X^+ − X^− and independence of XX and YY implies independence for X+X^+ and Y+Y^+. Also, we use FF to denote the distribution of XX, i.e. F(x)=P(Xx)F(x) = P(X \leq x).

    We define Yi=Xi1XiiY_i = X_i\mathbf{1}_{X_i\leq i} and Tn=i=1nYiT_n = \displaystyle\sum_{i=1}^n Y_i. For any δ>1\delta > 1, we can define a non-decreasing integer sequence kn=δnk_n = \lfloor \delta^n \rfloor. Then 1knδn<kn+12kn1\leq k_n\leq \delta^n < k_n + 1 \leq 2k_n and kn24δ2nk_n^{-2}\leq 4\delta^{-2n}. Therefore,

    n=1kn21kni4n=1δ2n1δni=4i2(1δ2)c0i2(1)\displaystyle\sum_{n=1}^\infty k_n^{-2}\mathbf{1}_{k_n \geq i}\leq 4\displaystyle\sum_{n = 1}^\infty \delta^{-2n}\mathbf{1}_{\delta^n\geq i} = \frac{4}{i^2(1 - \delta^{-2})}\leq c_0 i^{-2} \tag{1}

    for some constant c0c_0. We also note that i=k+1i2<kx2dx=1k\displaystyle\sum_{i=k+1}^\infty i^{−2} < \displaystyle\int_k^\infty x^{-2}\mathrm{d}x = \frac{1}{k}. By Chebyshev’s inequality, t>0\forall t > 0,  c1\exists~ c_1 depending on tt and δ\delta such that

    n=1P(TknE[Tkn]tkn)c1n=1kn2Var(Tkn)=c1n=1kn2i=1knVar(Yi)c1n=1Var(Yi)knikn2c2n=1i2Var(Yi)   by (1)=c2i=1i20ix2dF(x)=c2i=1i2{k=0i1kk+1x2dF(x)}c3k=01k+1kk+1x2dF(x)c3k=0kk+1xdF(x)=c3E[Xi]<\begin{aligned} \displaystyle\sum_{n=1}^\infty P(|T_{k_n} - E[T_{k_n}]| \geq tk_n) &\leq c_1\displaystyle\sum_{n=1}^\infty k_n^{-2}\text{Var}(T_{k_n}) \\ &= c_1\displaystyle\sum_{n=1}^\infty k_n^{-2}\displaystyle\sum_{i=1}^{k_n}\text{Var}(Y_i) \\ &\leq c_1\displaystyle\sum_{n=1}^\infty \text{Var}(Y_i)\displaystyle\sum_{k_n\geq i}k_n^{-2} \\ &\leq c_2\displaystyle\sum_{n=1}^\infty i^{-2}\text{Var}(Y_i) ~~~\text{by (1)}\\ &= c_2\displaystyle\sum_{i=1}^\infty i^{-2}\displaystyle\int_0^i x^2\mathrm{d}F(x) \\ &= c_2\displaystyle\sum_{i=1}^\infty i^{-2}\left\{\displaystyle\sum_{k=0}^{i-1}\displaystyle\int_k^{k+1}x^2\mathrm{d}F(x)\right\} \\ &\leq c_3\displaystyle\sum_{k=0}^\infty \frac{1}{k+1}\displaystyle\int_k^{k+1}x^2\mathrm{d}F(x) \\ &\leq c_3\displaystyle\sum_{k=0}^\infty \displaystyle\int_k^{k+1}x\mathrm{d}F(x) = c_3E[X_i] < \infty \end{aligned}

    And thus n=1P(TknE[Tkn]tkn)<\displaystyle\sum_{n=1}^\infty P(|T_{k_n} - E[T_{k_n}]| \geq tk_n) < \infty. Hence, by 1st Borel-Cantelli Lemma, n1(TknE[Tkn])a.s.0n^{-1}(|T_{k_n} - E[T_{k_n}])\overset{a.s.}\longrightarrow0. Since E[Yn]E[Xi]E[Y_n]\uparrow E[X_i], we have that kn1E[Tkn]E[Xi]{k_n}^{-1}E[T_{k_n}]\uparrow E[X_i] and in turn that n1Tkna.s.E[Xi]n^{-1}T_{k_n}\overset{a.s.}\longrightarrow E[X_i].

    To get back to XiX_i and SnS_n, we note that E[X]<E[X] <\infty if and only if i=0P(X>i)<\displaystyle\sum_{i=0}^\infty P (X > i) < \infty. Thus i=1P(XiYi)=i=1P(Xi>i)<\displaystyle\sum_{i=1}^\infty P (X_i \neq Y_i) = \displaystyle\sum_{i=1}^\infty P (X_i > i) < \infty and 21stnd Borel-Cantelli Lemma says that P(lim sup{XiYi})=0P (\displaystyle\limsup\{X_i \neq Y_i\}) = 0 so for ii large enough, Xi=YiX_i = Y_i a.s. We define “large enough” to be i>m(ω)i > m(\omega) ( i.e. Xi(ω)=Yi(ω)  i>m(ω)X_i(\omega) = Y_i(\omega) ~~ \forall i > m(\omega) ) Furthermore, kn1Sm(ω)0k_n^{−1} S_{m(\omega)} \to 0 and kn1Tm(ω)0k_n^{−1}T_{m(\omega)} \to 0 as nn \to \infty, meaning that the contribution of the terms where XiX_i and YiY_i may not coincide becomes negligible. Hence, kn1Skna.s.E[Xi]k_n^{-1}S_{k_n}\overset{a.s.}\longrightarrow E[X_i], so we have almost sure convergence of a subsequence.

    Finally, since kn+1/knδk_{n+1}/k_n \to \delta, there exists an nn large enough such that 1kn+1/kn<δ21 \leq k_{n+1}/k_n < \delta^2. Thus, for kn<i<kn+1k_n < i < k_{n+1},

    kn1Sknδ2Siiδ4kn+11Skn+1δ2E[Xi]lim supnSiilim supiSiiδ2E[Xi]k_n^{-1}S_{k_n}\leq \delta^2\frac{S_i}{i}\leq \delta^4 k_{n+1}^{-1}S_{k_{n+1}} \\ \delta^{-2}E[X_i] \leq \displaystyle\limsup_{n\to\infty}\frac{S_i}{i} \leq \displaystyle\limsup_{i\to\infty}\frac{S_i}{i} \leq \delta^2 E[X_i]

    Thus, taking δ1\delta\to 1 concludes the proof. \square

Central Limit Theorem#

Gaussian Measure on R\mathbb{R}

A Borel measure γ\gamma on (R,B)(\mathbb{R}, \mathcal{B}) is said to be Gaussian with mean mm and variance σ\sigma if

γ((a,b])=1σ2πabe(xm)2/2σ2dλ(x)\gamma((a, b]) = \frac{1}{\sigma\sqrt{2\pi}}\displaystyle\int_a^b e^{-(x-m)^2/2\sigma^2}\mathrm{d}\lambda(x)
Gaussian Measure on Rd\mathbb{R}^d

A Borel measure γ\gamma on (Rd,B)(\mathbb{R}^d, \mathcal{B}) is said to be Gaussian if for all linear functionals f:RdRf : \mathbb{R}^d \rightarrow \mathbb{R}, the induced measure γf1\gamma \circ f^{−1} on (R,B)(\mathbb{R}, \mathcal{B}) is Gaussian.

Gaussian Random Variable

A random variable ZZ from a probability space (Ω,F,μ)(\Omega, \mathcal{F}, \mu) to (Rd,B)(\mathbb{R}^d, \mathcal{B}) is said to be Gaussian if γ:=μZ1\gamma \vcentcolon= \mu \circ Z^{−1} is a Gaussian measure on(Rd,B)(\mathbb{R}^d, \mathcal{B}).

Characteristic Function

For a probability measure μ\mu on (Rd,B)(\mathbb{R}^d, \mathcal{B}), the characteristic function (Fourier transform) μ~\tilde{\mu} : \mathbb{R}^d \rightarrow \mathbb{C}$ is defined as

μ~(t):=exp{ix,t}dμ(x)\tilde{\mu}(t) \vcentcolon= \displaystyle\int \exp\{i\langle x, t\rangle\}\mathrm{d}\mu(x)

We can also invert the above transformation. That is, if μ~\tilde{\mu} is integrable with respect to Lebesgue measure on Rd\mathbb{R}^d, then

p(x)=(2π)dμ~(t)exp{ix,t}dλ(t)   λa.e.p(x) = (2\pi)^{-d}\displaystyle\int \tilde{\mu}(t)\exp\{-i\langle x, t\rangle\}\mathrm{d}\lambda(t)~~~\lambda-a.e.
Convolution

For two measures μ\mu and ν\nu on (Rd,B)(\mathbb{R}^d, \mathcal{B}), the convolution measure is defined as

(μν)(B):=ν(Bx)dμ(x)(\mu * \nu)(B) \vcentcolon= \displaystyle\int \nu(B − x)\mathrm{d}\mu(x)

for all BBB \in \mathcal{B} where Bx={yRd:y+xB}B - x = \{y \in \mathbb{R}^d : y +x \in B\}.

Note that the convolution operator * is commutative and associative. Also, for two independent random variables XX and YY with corresponding measures μ\mu and ν\nu, the measure of X+YX + Y is μν\mu * \nu.

Uniquesness of Characteristic Function Theorem

Let μ\mu and ν\nu be probability measures on (Rd,B)(\mathbb{R}^d, \mathcal{B}). If μ~=ν~\tilde{\mu} = \tilde{\nu} then μ=ν\mu = \nu.

Proof

Let γσ\gamma_\sigma be a mean zero Gaussian measure on Rd\mathbb{R}^d with variance σ2I\sigma^2I. We denote μ(σ)=μγσ\mu^{(\sigma)} = \mu * \gamma_\sigma and similarly for ν(σ)\nu^{(\sigma)}. It can be shown that the corresponding density functions for μ(σ)\mu^{(\sigma)} and ν(σ)\nu^{(\sigma)} are

p(σ)(x)=1(2π)dμ~(t)exp{ix,t12σ2t2}dλ(t)q(σ)(x)=1(2π)dν~(t)exp{ix,t12σ2t2}dλ(t)p^{(\sigma)}(x) = \frac{1}{(2\pi)^d}\displaystyle\int \tilde{\mu}(t)\exp\left\{-i\langle x, t\rangle - \frac{1}{2}\sigma^2|t|^2\right\}\mathrm{d}\lambda(t) \\ q^{(\sigma)}(x) = \frac{1}{(2\pi)^d}\displaystyle\int \tilde{\nu}(t)\exp\left\{-i\langle x, t\rangle - \frac{1}{2}\sigma^2|t|^2\right\}\mathrm{d}\lambda(t)

Since μ~=ν~\tilde{\mu} = \tilde{\nu}, we have that p(σ)=q(σ)p^{(\sigma)} = q^{(\sigma)} for all σ>0\sigma > 0.

Let XX be a random variable corresponding to μ\mu and ZZ to γ1\gamma_1. Then, the measure μ(σ)\mu^{(\sigma)} is paired with the random variable X+σZX + \sigma Z. Thus X+σZa.s.XX+\sigma Z\overset{a.s.}{\longrightarrow}X as σ0\sigma\downarrow 0, that is, pointwise for almost all ω\omega. Thus, this convergence holds in probability and thus in distribution, i.e μ(σ)μ\mu^{(\sigma)}\Rightarrow\mu as σ0\sigma\downarrow 0.

Lastly, we have that μ(σ)μ\mu^{(\sigma)} \Rightarrow \mu and ν(σ)ν\nu^{(\sigma)} \Rightarrow \nu. Since the limit is unique μ=ν\mu = \nu. \square

Central Limit Theorem

Let (Ω,F,P)(\Omega, \mathcal{F}, P) be a probability space, {Xi}i=1\{X_i\}_{i=1}^\infty be i.i.d. random variables on (Rd,B)(\mathbb{R}^d, \mathcal{B}) such that E[Xi]=0E[X_i] = 0 and E[Xi2]<E[|X_i|^2] < \infty. Let Sn=j=1nXjS_n = \displaystyle\sum_{j=1}^n X_j . Then, n1/2SndZn^{-1/2}S_n\overset{d}{\longrightarrow}Z where ZZ is a gaussian random variable with zero mean and covariance Σ\Sigma with jkjkth entry Σjk=E[XijXik]\Sigma_{jk} = E[X_{ij}X_{ik}].

Proof

As the random vectors XiX_i are mean zero and independent E[Xj,Xk]=0E[\langle X_j, X_k \rangle] = 0 for jkj\neq k. In turn, for any nn,

E[n1/2Sn2]=n1E[j,k=1nXj,Xk]=E[Xj2]E\left[|n^{-1/2}S_n|^2\right] = n^{-1}E\left[\displaystyle\sum_{j,k=1}^n\langle X_j, X_k \rangle\right] = E\left[|X_j|^2\right]

For any ϵ>0\epsilon > 0, there exists an Mϵ>0M_\epsilon > 0 such that E[Xj2]/Mϵ2<εE[|X_j|^2]/M_\epsilon^2 < ε. Thus, from Chebyshev’s inequality, we have that P(n1/2Sn>Mϵ)<ϵP(|n^{−1/2}S_n| > M_\epsilon) < \epsilon. This implies that the sequence n1/2Snn^{−1/2}S_n is “uniformly tight.

For a vector vRdv \in \mathbb{R}^d, the random variables Xj,v\langle X_j, v \rangle are i.i.d. real-valued with E[Xj,v]=0E[\langle X_j, v\rangle] = 0 and E[Xj,v2]<E[\langle X_j, v\rangle^2] < \infty. Let h(v):=E[exp(iXj,v)]h(v) \vcentcolon= E[\exp(i\langle X_j, v\rangle)] be the characteristic function of XjX_j .Then, h(0)=1h(0) = 1 and h(0)=0\nabla h(0)=0 and 2h(0)=Σ\nabla^2 h(0) = -\Sigma Thus, by Taylor’s Theorem, we have

h(v)=112vTΣv+o(v22)h(v) = 1 - \frac{1}{2}v^\mathrm{T}\Sigma v + o(||v||_2^2)

Thus, for any fixed vector vv,

E[exp{in1/2Sn,v}]=h(n1/2v)n=[1vTΣv2n+o(v22n)]nexp{12vTΣv}E\left[\exp\left\{i\langle n^{-1/2}S_n, v\rangle \right\}\right] = h(n^{-1/2}v)^n = \left[1 - \frac{v^\mathrm{T}\Sigma v}{2n} + o\left(\frac{||v||_2^2}{n}\right)\right]^n \to \exp\left\{-\frac{1}{2}v^\mathrm{T}\Sigma v\right\}

as nn\to\infty. Thus, by the Uniqueness of Characteristic Function Theorem, we have that n1/2SndZn^{−1/2}S_n\overset{d}{\longrightarrow}Z where ZZ is a Gaussian random variable with zero mean and covariance Σ\Sigma. \square

Erogodic Theorem#

Measure Preserving Map

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space. A mapping T:ΩΩT : \Omega \to \Omega is called measure preserving if

μ(T1(A))=μ(A)   AF\mu(T^{-1}(A)) = \mu(A)~~~ \forall A\in \mathcal{F}
Invariant Set and Function

For a mapping T:ΩΩT : \Omega\to\Omega,

  • A set AFA \in \mathcal{F} is TT-invariant if T1(A)=AT^{−1}(A) = A. The set of all TT-invariant sets forms a σ\sigma-field FT\mathcal{F}_T.

  • A measurable function f:ΩRf : \Omega\to\mathbb{R} is TT-invariant if f=fTf = f \circ T. ff is TT-invariant if and only if ff is FT\mathcal{F}_T-measurable, i.e. BB(R),f1(B)FT\forall B \in\mathcal{B}(\mathbb{R}), f^{-1}(B)\in \mathcal{F}_T

Ergodic Map

A mapping TT is said to be ergodic if for any AFTA \in \mathcal{F}_T,

μ(A)=0   or   μ(Ac)=0\mu(A) = 0 ~~~\text{or}~~~ \mu(A^c) = 0

Example

For Lebesgue measure on (0,1](0, 1], two examples of measure preserving maps are the shift map

T(x)=x+amod1T(x) = x + a \mod 1

and Baker’s Map

T(x)=2x2xT(x) = 2x - \lfloor 2x \rfloor

Furthermore, it can be shown that

  • If ff is integrable and TT is measure preserving then fTf \circ T is integrable and
fdμ=fTdμ\displaystyle\int f\mathrm{d}\mu = \displaystyle\int f\circ T\mathrm{d}\mu
  • If TT is ergodic and ff is invariant, then f=c f = c~ μ\mu-a.e. for some constant cc.

Birkhoff and von Neumann’s Theorems#

In what follows, we let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space, TT be a measure preserving transformation, f:ΩRf : \Omega \rightarrow\mathbb{R} a measurable function, and

Sn=Sn(f)=f+fT++fTn1S_n = S_n(f) = f + f \circ T + \cdots + f \circ T^{n−1}

where S0=0S_0 = 0.

Maximal Ergodic Lemma

Let ff be integrable and SS^∗ = supn0Sn(f)\sup_{n\geq 0} S_n(f) (element-wise maximum). Then,

S>0fdμ0\displaystyle\int_{S^∗ > 0} f \mathrm{d}\mu \geq 0
Proof

Let Sn=max0mnSm(f)S^∗_n = \max_{0\leq m\leq n} S_m(f) and An={ωΩ:Sn(ω)>0}A_n = \{\omega \in \Omega : S^∗_n(ω) > 0\}. Then, for 1mn1 \leq m \leq n,

Sm=f+Sm1Tf+SnTS_m = f + S_{m - 1}\circ T \leq f + S^*_n \circ T

Furthermore, on the set AnA_n,

Sn=max0mnSm(f)f+SnTS_n^* = \max_{0\leq m\leq n} S_m(f) \leq f + S^*_n \circ T

On the set AncA_n^c, Sn=0S^∗_n = 0 since S0=0S_0=0 and we have that Sn=0SnTS^∗_n = 0 \leq S^∗_n \circ T Thus, integrating both sides of the above gives

ΩSndμAnfdμ+AnSnTdμ\displaystyle\int_{\Omega} S^∗_n \mathrm{d}\mu \leq \displaystyle\int_{A_n} f \mathrm{d}\mu + \displaystyle\int_{A_n} S^∗_n \circ T \mathrm{d}\mu

Since SnS^∗_n is integrable and TT is measure preserving, Sndμ=SnTdμ<\displaystyle\int S^∗_n\mathrm{d}\mu = \displaystyle\int S^*n \circ T \mathrm{d}\mu < \infty. Thus, we have Anfdμ0\displaystyle\int_{A_n} f\mathrm{d}\mu \geq 0. As n,An{Sn>0}n \rightarrow \infty, A_n\uparrow \{S_n^* > 0\}, we have that

Sn>0fdμ0=limn0Anfdμ0\displaystyle\int_S{n^* > 0} f \mathrm{d}\mu \geq 0 = \displaystyle\lim_{n \rightarrow 0}\displaystyle\int_{A_n} f \mathrm{d}\mu \geq 0

due to Dominated Convergencee Theorem with f|f| as the dominating function. \square

Birkhoff’s Ergodic Theorem

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a σ\sigma-finite measure space and fL1(Ω,F,μ)f \in L^1(\Omega, \mathcal{F}, \mu). Then, there exists an TT-invariant fˉL(Ω,F,μ)\bar{f} \in L(\Omega, \mathcal{F}, \mu) such that

fˉdμfdμ\displaystyle\int |\bar{f}| \mathrm{d}\mu \leq \displaystyle\int |f| \mathrm{d}\mu

and n1Sn(f)fˉn^{−1}S_n(f) \rightarrow \bar{f} as nn \rightarrow \infty μ\mu-a.e.

Proof

Both lim infnn1Sn(f)\displaystyle\liminf_{n\rightarrow \infty}n^{-1}S_n(f) and lim supnn1Sn(f)\displaystyle\limsup_{n\rightarrow \infty} n^{−1}S_n(f) are TT-invariant. Indeed, n1Sn(f)T=n1[Sn+1(f)f]=[(n+1)/n](n+1)1Sn+1(f)n1fn^{-1}S_n(f)\circ T = n^{-1}[S_{n + 1}(f) - f] = [(n + 1) / n](n + 1)^{-1}S_{n + 1}(f) - n^{-1}f Thus, we can define a set for a<ba < b

Da,b={ωΩ:lim infnSn(f)(ω)n<a<b<lim supnSn(f)(ω)n}D_{a, b} = \left\{\omega\in\Omega : \displaystyle\liminf_{n\rightarrow\infty}\frac{S_n(f)(\omega)}{n} < a < b < \displaystyle\limsup_{n\rightarrow\infty}\frac{S_n(f)(\omega)}{n}\right\}

which means that the lim inf\displaystyle\liminf and lim sup\displaystyle\limsup are separated, and this set Da,bD_{a,b} is TT-invariant. The goal of the proof is to show that μ(Da,b)=0\mu(D_{a,b}) = 0. Without loss of generality, we take b>0b > 0. Otherwise, a<0a < 0 and we multiply everything by 1-1.

For some BFB \in F such that μ(B)<\mu(B) < \infty, then we set g=fb1Bg = f − b\mathbf{1}B. Function gg is integrable and for each xDa,bx \in D_{a,b}, there is an nn such that Sn(g)(x)Sn(f)(x)nb0S_n(g)(x) \geq S_n(f)(x) − nb \geq 0 since b<lim supn0n1Sn(f)b < \displaystyle\limsup_{n\rightarrow 0} n^{−1}S_n(f). Thus, S(g)>0S^∗(g) > 0 and the Maximal Ergodic Lemma says that

0D(fb1B)dμ=Dfdμbμ(B)0\leq\displaystyle\int_D (f - b\mathbf{1}_B)\mathrm{d}\mu = \displaystyle\int_D f\mathrm{d}\mu - b\mu(B)

As μ\mu is σ\sigma-finite, there exist such a sequence of sets BnFB_n \in F such that BnDa,bB_n \uparrow D_{a,b} and μ(Bn)<\mu(B_n) < \infty for all nn. Thus,

bμ(Da,b)=limnbμ(Bn)Da,bfdμb\mu(D_{a, b}) = \displaystyle\lim_{n\rightarrow\infty}b\mu(B_n) \leq \displaystyle\int_{D_{a, b}} f\mathrm{d}\mu

This implies that μ(Da,b)<\mu(D_{a,b}) < \infty. Redoing the above argument for a−a and f−f results in aμ(Da,b)Da,b(f)dμ-a\mu(D_{a,b})\leq \displaystyle\int_{D_{a,b}}(-f)\mathrm{d}\mu. Therefore,

bμ(Da,b)Da,bfdμaμ(Da,b)b\mu(D_{a,b}) \leq \displaystyle\int_{D_{a,b}} f\mathrm{d}\mu \leq -a\mu(D_{a,b})

and since a<ba < b, we have that μ(Da,b)=0\mu(D_{a,b}) = 0.

Next, let

E={ωΩ:lim infnn1Sn(f)}<lim supnn1Sn(f)E = \{\omega\in\Omega : \displaystyle\liminf_{n\rightarrow\infty}n^{-1}S_n(f)\} < \displaystyle\limsup_{n\rightarrow \infty}n^{-1}S_n(f)

Then, EE is TT-invariant as the lim inf\displaystyle\liminf and lim sup\displaystyle\limsup are. Furthermore, E=a,b{a,bQ,a<b,Da,b}E = \displaystyle\bigcup_{a,b}\in\{a, b\in\mathbb{Q},a<b, D_{a,b}\}. Thus, μ(E)=0\mu(E) = 0.

This means that n1Sn(f)n^{−1}S_n(f) converges in [,][-\infty, \infty] on EcE^c. Therefore, we define

fˉ={limnn1Sn(f) ωEc ωE\bar{f} = \begin{cases} \displaystyle\lim_{n\rightarrow\infty}n^{-1}S_n(f) &~ \omega\in E^c \\ -\infty &~ \omega\in E \end{cases}

Lastly, fTndμ=fdμ\displaystyle\int f\circ T^n \mathrm{d}\mu = \displaystyle\int f \mathrm{d}\mu ( TT is measure preserving )and thus Sn(f)dμnfdμ\displaystyle\int |S_n(f)|\mathrm{d}\mu \leq n \displaystyle\int |f|\mathrm{d}\mu for all nn. Applying Fatou’s Lemma gives

fˉdμ=limnn1Sn(f)dμlim infnn1Sn(f)dμfdμ\displaystyle\int |\bar{f}|\mathrm{d}\mu = \displaystyle\int \displaystyle\lim_{n\to\infty}|n^{-1}S_n(f)|\mathrm{d}\mu \leq \displaystyle\liminf_{n\rightarrow\infty}\displaystyle\int |n^{-1}S_n(f)|\mathrm{d}\mu \leq \displaystyle\int |f|\mathrm{d}\mu

finishing the proof. \square

Von Neumann’s Ergodic Theorem

Let μ(Ω)<\mu(\Omega) < \infty and p[1,)p \in [1, \infty). Then, for all fLp(Ω,F,μ)f \in L^p(\Omega, \mathcal{F}, \mu), there exists an fˉLp\bar{f} \in L^p such that n1Sn(f)Lpfˉn^{-1}S_n(f) \overset{L^p}{\longrightarrow} \bar{f} and

fˉdμ=fdμ\displaystyle\int \bar{f} \mathrm{d}\mu = \displaystyle\int f \mathrm{d}\mu

Proof

We begin by noting that

fTnpp=fpTndμ=fpp||f\circ T^n||^p_p = \displaystyle\int |f|^p\circ T^n\mathrm{d}\mu = ||f||^p_p

By the above and the Minkowski’s Inequality, n1Sn(f)pfp||n^{-1}S_n(f)||_p \leq ||f||_p. Since fLpf\in L^p, given a ϵ>0\epsilon > 0, we can choose a C>0C > 0 such that fgpϵ/3||f - g||_p \leq \epsilon/3 with

g(x)={Cf(x)>Cf(x)Cf(x)CCf(x)<Cg(x) = \begin{cases} C & f(x) > C \\ f(x) & -C \leq f(x) \leq C \\ -C & f(x) < -C \end{cases}

i.e. gg is ff bounded above and below by CC and C−C. By the Birkhoff Ergodic Theorem, n1Sn(g)gˉ n^{-1}S_n(g)\to \bar{g}~ μ\mu-a.e.

Next, we note that n1Sn(g)C|n^{−1}S_n(g)| \leq C for all nn, and thus by (Dominated Convergence Theorem)[#dominated-convergence-theorem], there exists an NN such that for all n>Nn > N,

n1Sn(g)gˉpϵ3||n^{-1}S_n(g) - \bar{g}||_p \leq \frac{\epsilon}{3}

Applying Fatou’s Lemma gives that

fˉgˉpp=lim infn1Sn(fg)pdμlim infnn1Sn(fg)pdμ=fgpp||\bar{f} - \bar{g}||_p^p = \displaystyle\int \displaystyle\liminf |n^{-1}S_n(f - g)|^p\mathrm{d}\mu \leq \displaystyle\liminf_{n\rightarrow\infty}\displaystyle\int |n^{-1}S_n(f - g)|^p\mathrm{d}\mu = ||f - g||^p_p

Thus, for n>Nn > N,

n1Sn(f)fˉpn1Sn(fg)p+n1Sn(g)gˉp+gˉfˉp<ϵ||n^{-1}S_n(f) - \bar{f}||_p \leq ||n^{-1}S_n(f - g)||_p + ||n^{-1}S_n(g) - \bar{g}||_p + ||\bar{g} - \bar{f}||_p <\epsilon

Since n1Sn(f)Lpfˉn^{-1}S_n(f) \overset{L^p}{\longrightarrow} \bar{f}, the convergence must hold in L1L^1 as well. Thus, we have

fˉdμ=limnn1Sn(f)dμ=limnn1Sn(f)dμ=fdμ\displaystyle\int \bar{f}\mathrm{d}\mu = \displaystyle\int \displaystyle\lim_{n\to\infty}n^{-1}S_n(f)\mathrm{d}\mu = \displaystyle\lim_{n\to\infty}\displaystyle\int n^{-1}S_n(f)\mathrm{d}\mu = \displaystyle\int f\mathrm{d}\mu \\

which gives the desired result. \square

Law of Large Numbers, Again#

Let (Ω, F, P) be a probability space with i.i.d. real-valued random variables {Xi}i=1\{X_i\}_{i=1}^\infty with distribution function dF\mathrm{d}F. We define a map π:ΩS:=RN\pi : \Omega \to S \vcentcolon= \mathbb{R}^\mathbb{N} by

π(ω)=(X1(ω),X2(ω),)\pi(\omega) = (X_1(\omega), X_2(\omega), \ldots)

Let ν=μπ1\nu=\mu\circ\pi^{-1} be the corresponding probability measure on SS.

Because the variables {Xi}\{X_i\} are independent, ν\nu has the form ν=μ1×μ2×\nu = \mu_1\times\mu_2\times\cdots , and because they are identically distributed, all the marginal distributions μj\mu_j are the same, so in fact ν=μN\nu = \mu^\mathbb{N} for some probability distribution μ\mu on R\mathbb{R}.

For a sequence (x1,x2,)S(x_1, x_2, \ldots) \in S, we can define the shift map T:SST : S\to S to be

T(x1,x2,x3,)=(x2,x3,x4,)T(x_1, x_2, x_3, \ldots) = (x_2, x_3, x_4, \ldots)

Then the shift map is measure preserving and ergodic by Kolmogorov’s zero-one law.

Strong Law of Large Numbers, Again

Let {Xi}i=1\{X_i\}_{i=1}^\infty be i.i.d. random variables from Ω\Omega to R\mathbb{R}. If E[Xi]<E[|X_i|] < \infty, then Snna.s.c\frac{S_n}{n} \overset{a.s.}{\longrightarrow} c for c=E[Xi]c = E[X_i].

Proof

Let f:SRf : S \to \mathbb{R} by taking the first coordinate, that is, for x=(x1,x2,)Sx = (x_1, x_2, \ldots)\in S, f(x)=x1f(x) = x_1. Then, for TT being the shift map and x=π(ω)x = \pi(\omega), we have

Sn(f)(x)=f(x)+(fT)(x)++(fTn1)=X1(ω)+X2(ω)++Xn(ω)S_n(f)(x) = f(x) + (f\circ T)(x) + \cdots + (f\circ T^{n-1}) = X_1(\omega) + X_2(\omega) + \cdots + X_n(\omega)

Thus, von Neumann’s Ergodic Theorem says that there exists an invariant fˉL1\bar{f}\in L^1 such that

n1Sn(f)(x)a.s.fˉn^{-1}S_n(f)(x)\overset{a.s.}{\longrightarrow} \bar{f}

for xSx\in S and

fˉdμ=fdμ\displaystyle\int \bar{f}\mathrm{d}\mu = \displaystyle\int f\mathrm{d}\mu

Since TT is ergodic, the result from the beginning of this section states that fˉ=c\bar{f} = c, a constant, almost surely. Thus,

c=fˉdμ=limnn1Sn(f)=fdμ=E[Xi]c = \displaystyle\int \bar{f}\mathrm{d}\mu = \displaystyle\lim_{n\to \infty}\displaystyle\int n^{-1}S_n(f) = \displaystyle\int f\mathrm{d}\mu = E[X_i]

gives the disred result. \square

Probability and Measure — Part 3
https://astronaut.github.io/posts/measure-theory-part3/
Author
关怀他人
Published at
2025-04-23
License
CC BY-NC-SA 4.0