10856 words
54 minutes
Probability and Measure

本篇是 Adam B Kashlak 老师的 Probability and Measure Theory 课程笔记

Measure Theory#

Measures and σ\sigma-Fields#

σ\sigma-Field #

Question : What sets (subsets of Ω\Omega) am I allowed to measure?

Definition

For some set Ω\Omega, a σ\sigma-Field F\mathcal{F} is a collection of sets AΩA\in \Omega s.t.

  1. , ΩF\varnothing, ~\Omega \in \mathcal{F}
  2. If AFA\in \mathcal{F} then AcFA^c \in \mathcal{F}
  3. F\mathcal{F} is closed under union: for a countable collection of sets {Ai}i=1\{A_i\}_{i=1}^\infty s.t. AiFA_i \in \mathcal{F} for all iNi\in \mathbb{N} then   i=1AiF\bigcup_{i=1}^{\infty} A_i \in \mathcal{F}

    Equivalence  : F\mathcal{F} also contains countable intersections because  i=1Ai=i=1AicF~\bigcap_{i=1}^{\infty} A_i = \bigcup_{i=1}^{\infty} A_i^c\in \mathcal{F}

Measure#

Definition

A measure μ : FR+\mu~:~\mathcal{F}\rightarrow\mathbb{R}_+ is a mapping from a σ\sigma-field F\mathcal{F} to the non-negative real numbers s.t.

  1. μ()=0\mu(\varnothing) = 0
  2. For a pairwise disjoint collection {Ai}i=1\{A_i\}_{i=1}^\infty then μ(i=1Ai)=i=1μ(Ai)\mu\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} \mu(A_i)  (Countably Additivity)

Special Cases for a Measure Space (Ω,F,μ)(\Omega, \mathcal{F}, \mu):

  • If μ(Ω)=1\mu(\Omega) = 1 then we say that (Ω,F,μ)(\Omega, \mathcal{F}, \mu) is a Probability Measure
  • If μ(A)<\mu(A) < \infty for any AA then we say that μ\mu is a Finite Measure
  • If Ω=i=1Ai\Omega=\bigcup_{i=1}^{\infty} A_i s.t. μ(Ai)<  i \mu(A_i) < \infty ~~ \forall i~then μ\mu is a σ\sigma-Finite Measure
    Example

    Ω=R \Omega=\mathbb{R}~,  μ([a,b])=ba\mu([a,b])=b-a and R=i=1[i1,i][i,i+1]\mathbb{R}= \bigcup_{i=1}^{\infty} [i-1,i]\cup[-i,-i+1] then μ\mu is a σ\sigma-Finite Measure

Power Set#

Definition

Power set P(Ω)\mathscr{P}(\Omega) or 2Ω2^\Omega is the set of all subsets of Ω\Omega

Semi-Ring of Sets#

Definition

A\mathcal{A}, a collection of subsets of Ω \Omega~, is called a Semi-Ring when

  1. A\varnothing \in \mathcal{A}
  2. If A,BAA, B \in \mathcal{A} then ABAA\cap B \in \mathcal{A}
  3. If A,BAA, B \in \mathcal{A} then there exists a finite number of pariwise disjoint sets CiAC_i \in \mathcal{A} for  i=1,,n ~i=1,\cdots,n~s.t. AB=i=1nCiA\setminus B = \bigcup_{i=1}^{n} C_i
Example

A\mathcal{A} : all intervals (a,b](a,b] in R\mathbb{R} is a semi-ring

Ring of Sets#

Definition

A\mathcal{A}, a collection of subsets of Ω \Omega~, is called a Ring when

  1. A\varnothing \in \mathcal{A}
  2. If A,BAA, B \in \mathcal{A} then ABAA\setminus B \in \mathcal{A}
  3. If A,BAA, B \in \mathcal{A} then ABA A\cup B\in \mathcal{A}~(This implies finite unions are also in A\mathcal{A})

From the defintion, we know that the intersection of two sets is also in A\mathcal{A} because AB=A(AB)A\cap B = A\setminus (A\setminus B)

Example

All finite unions of half open intervals (a,b](a,b] in R\mathbb{R} is a ring

Field#

Definition

A\mathcal{A} is a Field if it is a Ring and ΩA\Omega \in \mathcal{A}

Equivalence

Since the whole set Ω\Omega is in A\mathcal{A}, then

  • the second condition of Ring is equivalent to A\mathcal{A} being closed under complimentation as AB=AAcA \setminus B = A \cap A^c and Ac=ΩAA^c = \Omega \setminus A
  • the third condition of Ring is equivalent to A\mathcal{A} being closed under finite unions as AB=(AcBc)cA\cup B = (A^c \cap B^c)^c

Note : Field + countable unions == σ\sigma-Field

Set Function#

Definition

A set function μ: AR+\mu: ~\mathcal{A}\rightarrow \mathbb{R}^+ (not necessarily a measure) is a mapping from a set of functions A\mathcal{A} to R+\mathbb{R}^+. For A,BAA, B \in \mathcal{A}, we say that

  • μ\mu is monotone if ABA\subset B implies μ(A)μ(B)\mu(A)\leq \mu(B)
  • μ\mu is additive if μ(AB)=μ(A)+μ(B)\mu(A\cup B) = \mu(A) + \mu(B) for A,BA, B disjoint
  • μ\mu is countably additive if μ(i=1Ai)=i=1μ(Ai)\mu\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} \mu(A_i) for {Ai}i=1\{A_i\}_{i=1}^\infty pairwise disjoint and i=1AiA\bigcup_{i=1}^{\infty} A_i \in \mathcal{A}
  • μ\mu is countably sub-additive if μ(i=1Ai)i=1μ(Ai)\mu\left(\bigcup_{i=1}^{\infty} A_i\right) \leq \sum_{i=1}^{\infty} \mu(A_i) for i=1AiA\bigcup_{i=1}^{\infty} A_i \in \mathcal{A} (not nessarily pairwise disjoint)
  • μ\mu is a pre-measure if μ\mu is countably additive, μ()=0\mu(\varnothing)=0 and A\mathcal{A} is a ring. For a pre-measure, it is also
    • monotone because for ABA\subset B we have μ(B)=μ(A(BA))=μ(A)+μ(BA)μ(A)\mu(B)=\mu(A\cup (B\setminus A)) = \mu(A) + \mu(B\setminus A) \geq \mu(A)
    • sub-additive because for i=1AiA\bigcup_{i=1}^{\infty} A_i \in \mathcal{A} let Bi=Ai(j=1i1Aj)B_i=A_i \setminus \left(\bigcup_{j=1}^{i-1} A_j\right) then BiB_i are pairwise disjoint and by monotonicity, μ(i=1Ai)=μ(i=1Bi)=i=1μ(Bi)i=1μ(Ai)\mu\left(\bigcup_{i=1}^{\infty} A_i\right) = \mu\left(\bigcup_{i=1}^{\infty} B_i\right) = \sum_{i=1}^{\infty} \mu(B_i) \leq \sum_{i=1}^{\infty} \mu(A_i)

Outter Measure#

Definition

For a pre-measure μ\mu on a ring A\mathcal{A}, the outter measure (Not necessarily a Measure) μ:MR+\mu^* : \mathcal{M} \rightarrow \mathbb{R}^+ is defined as

μ(E)=inf{i=1μ(Ai) : Ei=1Ai, AiA}\mu^*(E) = \inf \left\{ \sum_{i=1}^{\infty} \mu(A_i) ~:~ E\subset \bigcup_{i=1}^{\infty} A_i, ~A_i\in \mathcal{A} \right\}

for any EΩE\subseteq \Omega. This is equal to the smallest possible sum of the pre-measure over all finite or countable collections of AiA_i in A\mathcal{A} that cover EE.

Can we “Measure” (Outer Measure) any EΩE\subset \Omega ? Not necessarily

Denote M\mathcal{M} to be the collection of all μ\mu^* measurable sets where we say that BΩB\subseteq \Omega is μ\mu^* measureable when

μ(EB)+μ(EBc)=μ(E)  EΩ\mu^*(E\cap B) + \mu^*(E\cap B^c) = \mu^*(E) ~~ \forall E\subseteq \Omega

Caratheodory Extension Theorem#

Caratheodory Extension Theorem

Let A\mathcal{A} be a ring on Ω\Omega and μ\mu be a pre-measure, then μ\mu extends to a measure on σ(A)\sigma(\mathcal{A}).

Note
  • σ(A)\sigma(\mathcal{A}) is the σ\sigma-field comes from extend A\mathcal{A} by including coutable unions and Ω\Omega itself.
  • The “corret” extension is the outter measure μ\mu^*.
Proof

Assume BΩB\subset \Omega and μ(B)<\mu^*(B)<\infty

  1. Prove stuff about μ\mu^*
    • μ()=0\mu^*(\varnothing)=0 as μ\mu is a pre-measure

    • μ\mu^* is non-negative as μ\mu is non-negative

    • μ\mu^* is monotone (increasing)

      Let B1,B2ΩB_1, B_2 \subset \Omega and B1B2B_1\subset B_2 then for any Ai{A_i} s.t. B2iAiB_2 \subseteq \bigcup_i A_i. Then B1iAiB_1 \subseteq \bigcup_i A_i thus μ(B1)μ(B2)\mu^*(B_1)\leq \mu^*(B_2)

    • μ\mu^* is countably sub-additive

      For {Bi}i=1\{B_i\}_{i=1}^\infty and a given ϵ>0\epsilon > 0, let BiiAijB_i\subseteq \bigcup_i A_{ij} for AijAA_{ij}\in \mathcal{A} s.t.

      iμ(Aij)μ(Bi)+ϵ2i\sum_i \mu(A_{ij}) \leq \mu^*(B_i) + \epsilon\cdot 2^{-i}

      This is possible because μ\mu^* is the infimum. As i=1Bii,jAij\bigcup_{i=1}^\infty B_i \subseteq \bigcup_{i,j} A_{ij} and as μ\mu^* is monotone and μ\mu is sub-additive, then

      μ(i=1Bi)μ(i,jAij)i,jμ(Aij)iμ(Bi)+ϵ2i\begin{aligned} \mu^*\left(\bigcup_{i=1}^\infty B_i\right) &\leq \mu\left(\bigcup_{i,j} A_{ij}\right) \\ &\leq \sum_{i,j} \mu(A_{ij}) \\ &\leq \sum_i \mu^*(B_i) + \epsilon\cdot 2^{-i} \end{aligned}

      Take ϵ0\epsilon\rightarrow 0 to get that μ\mu^* is countably sub-additive

  2. Check that μ\mu and μ\mu^* coincide for all AAA\in \mathcal{A}
    • For any AAA\in \mathcal{A} we have μ(A)μ(A)\mu^*(A) \leq \mu(A) because {A}\{A\} is a cover of AA and μ\mu^* is the infimum
    • For the reverse, if AiAiA\subset \bigcup_i A_i then by countable sub-additivity and monotonicity
    μ(A)iμ(AAi)iμ(Ai)\mu(A) \leq \sum_i \mu(A\cap A_i) \leq \sum_i \mu(A_i) This implies that μ(A)μ(A)\mu(A)\leq \mu^*(A) because the right hand side is the cover of A.
    • Thus μ(A)=μ(A)  AA\mu(A)=\mu^*(A) ~~ \forall A\in \mathcal{A}
  3. Check that the ring AM\mathcal{A}\subset \mathcal{M} (all μ\mu^* measurable sets)
    • i.e. AA\forall A \in \mathcal{A} we want to show that AA is μ\mu^*-measurable: μ(EA)+μ(EAc)=μ(E)  EΩ\mu^*(E\cap A) + \mu^*(E\cap A^c) = \mu^*(E) ~~ \forall E\subseteq \Omega
    • Note μ(EA)+μ(EAc)μ(E)\mu^*(E \cap A) + \mu^*(E\cap A^c) \geq \mu^*(E) as μ\mu^* is sub-additive.
    • Next, for some ϵ>0 \epsilon>0~, choose {Ai}\{A_i\} s.t. EiAiE\subset \bigcup_i A_i and iμ(Ai)μ(E)+ϵ\sum_i \mu(A_i) \leq \mu^*(E) + \epsilon
    • Furthermore, EAi(AAi)E\cap A\subset \bigcup_i(A\cap A_i) and EAci(AcAi)E\cap A^c \subset \bigcup_i(A^c\cap A_i) thus μ(EA)+μ(EAc)iμ(AAi)+iμ(AcAi)=iμ(Ai)μ(E)+ϵ\begin{aligned} \mu^*(E\cap A) + \mu^*(E\cap A^c) &\leq \sum_i \mu(A\cap A_i) + \sum_i \mu(A^c\cap A_i) \\ &= \sum_i \mu(A_i) \leq \mu^*(E) + \epsilon \end{aligned} and take ϵ0\epsilon\rightarrow 0.
  4. Show that M\mathcal{M} is a σ\sigma-Field
    • M\varnothing \in \mathcal{M} since A\varnothing \in \mathcal{A}
    • ΩM\Omega \in \mathcal{M} since EΩ  μ(EΩ)+μ(E)=μ(E)\forall E\subset \Omega ~~ \mu^*(E\cap \Omega) + \mu^*(E\cap\varnothing)=\mu^*(E)
    • M\mathcal{M} is closed under coplimentation since μ(EBc)+μ(E(Bc)c)=μ(E)\mu^*(E\cap B^c) + \mu^*(E\cap (B^c)^c) = \mu^*(E) which implies that BcMB^c \in \mathcal{M}
    • M\mathcal{M} is closed under finite intersections since for B1,B2MB_1, B_2\in\mathcal{M} and any EΩE \subset \Omega μ(E)=μ(EB1)+μ(EB1c)=μ(EB1B2)+μ(EB1cB2)+μ(EB1B2c)+μ(EB1cB2c)μ(EB1B2)+μ({EB1cB2}{EB1B2c}{EB1cB2c})=μ(E{B1B2})+μ(E{B1B2}c)μ(E)     (sub-additivity)\begin{aligned} \mu^*(E)&=\mu^*(E\cap B_1) + \mu^*(E\cap B_1^c) \\ &= \mu^*(E\cap B_1\cap B_2) + \mu^*(E\cap B_1^c\cap B_2) + \mu^*(E\cap B_1\cap B_2^c) + \mu^*(E\cap B_1^c\cap B_2^c) \\ &\geq \mu^*(E\cap B_1\cap B_2) + \mu^*(\{E\cap B_1^c\cap B_2\}\cup\{E\cap B_1\cap B_2^c\}\cup \{E\cap B_1^c\cap B_2^c\}) \\ &=\mu^*(E\cap \{B_1\cap B_2\}) + \mu^*(E\cap \{B_1\cap B_2\}^c) \geq \mu^*(E) ~~~~~(\text{sub-additivity}) \end{aligned} thus B1B2MB_1\cap B_2 \in \mathcal{M}. Properties above show that M\mathcal{M} is a Field.
    • To get to a σ\sigma-Field, let {Bi}\{B_i\} in M\mathcal{M} be countable and parwise disjoint and iBiM\bigcup_i B_i \in \mathcal{M}. Let B=i=1BiB = \bigcup_{i=1}^\infty B_i, then μ(E)=μ(EB1)+μ(EB1c)=μ(EB1)+μ(EB2)+μ(EB1cB2c)  =i=1nμ(EBi)+μ(E{i=1nBic})\begin{aligned} \mu^*(E) &= \mu^*(E\cap B_1) + \mu^*(E\cap B_1^c) \\ &= \mu^*(E\cap B_1) + \mu^*(E\cap B_2) + \mu^*(E\cap B_1^c\cap B_2^c) \\ &~~\vdots \\ &= \sum_{i=1}^n \mu^*(E\cap B_i) + \mu^*\left(E\cap\bigg\{\bigcap_{i=1}^n B_i^c\bigg\}\right) \\ \end{aligned} Then by monotonicity and sub-additivity and let nn\rightarrow \infty, we get μ(E)=i=1μ(EBi)+μ(EBc)μ(EB)+μ(EBc)μ(E)\begin{aligned} \mu^*(E) &= \sum_{i=1}^{\infty} \mu^*(E\cap B_i) + \mu^*(E\cap B^c) \\ &\geq \mu^*(E\cap B) + \mu^*(E\cap B^c) \\ &\geq \mu^*(E) \end{aligned} Thus M\mathcal{M} is closed under countable unions (M\mathcal{M} is a σ\sigma-Field!). Choose E=BE = B we have μ(E)=i=1μ(EBi)\mu^*(E)=\sum_{i=1}^\infty \mu^*(E\cap B_i) which means μ\mu^* is countably additive.
  5. Conclusion: μ\mu^* is a set function 2ΩR+2^\Omega\rightarrow \mathbb{R}^+ and it’s also a measure on M\mathcal{M}. Since AM\mathcal{A}\subset \mathcal{M}, then σ(A)M\sigma(\mathcal{A})\subseteq \mathcal{M}. Lastly, as μ\mu^* is a measure on M\mathcal{M}, it is also a measure on σ(A)\sigma(\mathcal{A}). \square

π\pi-System and λ\lambda-System #

π\pi-system

A collection of sets A\mathcal{A} is a π\pi-system if

  • A\varnothing \in \mathcal{A}
  • A,BA\forall A, B \in \mathcal{A}, ABAA\cap B \in \mathcal{A}.
λ\lambda-system

A collection of sets L\mathcal{L} is a λ\lambda-system if

  • ΩL\Omega \in \mathcal{L}
  • If A,BLA, B \in \mathcal{L} and ABA\subset B then BALB\setminus A \in \mathcal{L}.
  • If {Ai}i=1\{A_i\}_{i=1}^\infty is a sequence of pairwise disjoint sets in L\mathcal{L} then i=1AiL\bigcup_{i=1}^\infty A_i \in \mathcal{L}.

Note : a Field is a π\pi-system

Example

Let Ω={1,2,3,4}\Omega=\{1,2,3,4\} and L\mathcal{L} contains all subsets with an even number of elements. Then L\mathcal{L} is a λ\lambda-system but not a σ\sigma-Field since {1,2},{2,3}L\{1,2\}, \{2,3\}\in \mathcal{L} but {1,2}{2,3}={1,2,3}L\{1,2\}\cup\{2,3\}=\{1,2,3\}\notin \mathcal{L}

Dynkin π\pi-λ\lambda Theorem #

Dynkin π\pi-λ\lambda Theorem

Let A\mathcal{A} be a π\pi-system, L\mathcal{L} be a λ\lambda-system and AL\mathcal{A}\subseteq \mathcal{L}. Then σ(A)L\sigma(\mathcal{A})\subseteq \mathcal{L}.

Proof

Let L0\mathcal{L}_0 be the smallest λ\lambda-system such that AL0\mathcal{A}\subset\mathcal{L}_0, then L0L\mathcal{L}_0\subseteq\mathcal{L}. Our goal is to show that L0\mathcal{L}_0 is also a π\pi-system and a collection of sets that is both a π\pi-system and a λ\lambda-system is a σ\sigma-Field. Then we necessarily have that σ(A)L0L\sigma(\mathcal{A})\subseteq\mathcal{L}_0\subseteq\mathcal{L}.

We only need to show that L0\mathcal{L}_0 is closed under intersections.

Let L={BL0:BAL0,AA}\mathcal{L}'=\{B\in\mathcal{L}_0 : B\cap A\in\mathcal{L}_0, \forall A\in\mathcal{A}\} then AL\mathcal{A}\subset\mathcal{L}' as A\mathcal{A} is a π\pi-system and let’s show that L\mathcal{L}' is also a λ\lambda-system:

  • ΩL\Omega\in\mathcal{L}' as AL0\mathcal{A}\subset\mathcal{L}_0
  • If B1,B2LB_1, B_2\in\mathcal{L}' s.t. B1B2B_1\subset B_2 then for any AAA\in\mathcal{A} we have that B1A,B2AL0B_1\cap A, B_2\cap A\in\mathcal{L}_0. Thus (B2A)(B1A)=(B2B1)AL0(B_2\cap A)\setminus (B_1\cap A) = (B_2\setminus B_1)\cap A\in\mathcal{L}_0, which implies that B2B1LB_2\setminus B_1\in\mathcal{L}'
  • If {Bi}i=1L\{B_i\}_{i=1}^\infty\in\mathcal{L}' are pariwise disjoint, then for any AAA\in\mathcal{A}, ABiL0A\cap B_i\in\mathcal{L}_0 thus i=1(ABi)L0\bigcup_{i=1}^\infty (A\cap B_i)\in\mathcal{L}_0. This implies that A(i=1Bi)=i=1(ABi)L0A\cap(\bigcup_{i=1}^\infty B_i)=\bigcup_{i=1}^\infty (A\cap B_i)\in\mathcal{L}_0. Hence, i=1BiL\bigcup_{i=1}^\infty B_i\in\mathcal{L}'.

By definition of L\mathcal{L}', LL0\mathcal{L}'\subseteq \mathcal{L}_0 but L0\mathcal{L}_0 is the smallest. Thus L=L0\mathcal{L}'=\mathcal{L}_0 and L0\mathcal{L}_0 is a λ\lambda-system. Therefore, L0\mathcal{L}_0 contains all intersections with elements of A\mathcal{A}.

Lastly, let L={BL0:BCL0  CL0}\mathcal{L}''=\{B\in\mathcal{L}_0 : B\cap C\in\mathcal{L}_0 ~\forall~ C\in\mathcal{L}_0\} since L0=L\mathcal{L}_0=\mathcal{L}', AL\mathcal{A}\subset \mathcal{L}''. Then do the same thing we did above to L\mathcal{L}'' to show that L\mathcal{L}'' is a λ\lambda-system and thus L=L0\mathcal{L}''=\mathcal{L}_0.

Therefore, L0\mathcal{L}_0 is closed under intersections and thus a σ\sigma-Field. This implies that σ(A)L0L\sigma(\mathcal{A})\subseteq\mathcal{L}_0\subseteq\mathcal{L}. \square

Uniquessness of Extension Thereom#

Uniqueness of Extension

Let μ1,μ2\mu_1, \mu_2 be σ\sigma-Finite measures on σ(A)\sigma(\mathcal{A}) where A\mathcal{A} is a π\pi-system. Then, if AA μ1(A)=μ2(A)\forall A\in\mathcal{A} ~\mu_1(A)=\mu_2(A) then μ1\mu_1 and μ2\mu_2 are equal on σ(A)\sigma(\mathcal{A}).

Proof (Finite Measures)

Assuming μ1(Ω)=μ2(Ω)<\mu_1(\Omega)=\mu_2(\Omega)<\infty

Let L={BΩ:μ1(B)=μ2(B)}\mathcal{L}=\{B\subset\Omega : \mu_1(B)=\mu_2(B)\} and we only need to show that L\mathcal{L} is a λ\lambda-system because since AL\mathcal{A}\subset\mathcal{L}, by Dynkin π\pi-λ\lambda Theorem, σ(A)L\sigma(\mathcal{A})\subset\mathcal{L} which means μ1=μ2\mu_1=\mu_2 coincide on σ(A)\sigma(\mathcal{A}).

  • ΩL\Omega\in\mathcal{L}
  • If A,BLA, B\in\mathcal{L} with ABA\subset B then μ1(BA)+μ1(A)=μ1(B)=μ2(B)=μ2(BA)+μ2(A)<\begin{aligned} \mu_1(B\setminus A) + \mu_1(A)&=\mu_1(B)\\ &=\mu_2(B)\\ &=\mu_2(B\setminus A) + \mu_2(A) < \infty \end{aligned} hence BALB\setminus A\in\mathcal{L}
  • {Ai}i=1\{A_i\}_{i=1}^\infty are pairwise disjoint and AiLA_i\in\mathcal{L} μ1(i=1Ai)=i=1μ1(Ai)=i=1μ2(Ai)=μ2(i=1Ai)<\begin{aligned} \mu_1(\bigcup_{i=1}^\infty A_i)=\sum_{i=1}^\infty \mu_1(A_i)&=\sum_{i=1}^\infty\mu_2(A_i)\\ &=\mu_2(\bigcup_{i=1}^\infty A_i) < \infty \end{aligned} hence i=1AiL\bigcup_{i=1}^\infty A_i\in\mathcal{L}.

L\mathcal{L} is a λ\lambda-system! \square

Proof (σ\sigma-Finite Measures)
  • For any AAA\in\mathcal{A} s.t. μ1(A)=μ2(A)<\mu_1(A)=\mu_2(A)<\infty, we define LA\mathcal{L}_A to be all BΩB\subseteq\Omega s.t. μ1(AB)=μ2(AB)\mu_1(A\cap B)=\mu_2(A\cap B).

    Proceeding as in the proof above, we can show that LA\mathcal{L}_A is a λ\lambda-system and thus σ(A)LA AA\sigma(A)\subset\mathcal{L}_A ~ \forall A\in\mathcal{A}.

  • By σ\sigma-Finiteness we decompose Ω=i=1Ai\Omega=\bigcup_{i=1}^\infty A_i, AiAA_i\in\mathcal{A} and μ1(Ai)=μ2(Ai)<\mu_1(A_i)=\mu_2(A_i) < \infty. For any Bσ(A)B\in\sigma(\mathcal{A}) and any nNn\in\mathbb{N}

    μ1(i=1n(BAi))=i=1nμ1(BAi)i<jμ1(BAiAj)+\mu_1\left(\bigcup_{i=1}^n(B\cap A_i)\right)=\sum_{i=1}^n \mu_1(B\cap A_i)-\sum_{i < j}\mu_1(B\cap A_i \cap A_j) + \cdots

    here we use inclusion-exclusion formula. This also works for μ2\mu_2.

    Since A\mathcal{A} is a π\pi-system, AiAjAA_i\cap A_j\in\mathcal{A} as well as futher intersections, thus

    μ1(i=1n(BAi))=μ2(i=1n(BAi)) nN\mu_1\left(\bigcup_{i=1}^n(B\cap A_i)\right)=\mu_2\left(\bigcup_{i=1}^n(B\cap A_i)\right) ~\forall n\in\mathbb{N}

    Let nn\rightarrow\infty we get μ1(B)=μ2(B) Bσ(A)\mu_1(B)=\mu_2(B)~ \forall B\in\sigma(\mathcal{A}) \square

Borel σ\sigma-Field#

Definition

B(R)=σ(open sets in R)\mathcal{B}(\mathbb{R})=\sigma(\text{open sets in }\mathbb{R})

Lebesgue Measure#

Definition

For any interval (a,b](a,b] in R\mathbb{R}, the Lebesgue measure is defined as λ((a,b])=ba\lambda((a,b])=b-a.

Are there any AP(Ω)A\subset \mathscr{P}(\Omega) s.t. AA is not λ\lambda-measurable? No

Example: Vitali Set
  • Let Ω=(0,1]\Omega=(0, 1], for x,y(0,1]x,y \in (0,1] define addition mod 11

    x+y={x+yif x+y1x+y1if x+y>1x+y= \begin{cases} x+y & \text{if } x+y\leq 1 \\ x+y-1 & \text{if } x+y>1 \end{cases}

    Define L\mathcal{L} to contain all λ\lambda-measureable sets A(0,1]A\subseteq (0,1] s.t. λ(A)=λ(A+x)\lambda(A)=\lambda(A+x) for any x(0,1]x \in (0,1], where A+x={x+y(0,1]:yA}A+x=\{x+y\in(0,1] : y \in A\} (shift AA by xx) then we claim that L\mathcal{L} is a λ\lambda-system.

    Since A\mathcal{A} is the set contains all intervals (a,b]Ω(a, b]\subseteq \Omega, we have AL\mathcal{A} \subset \mathcal{L} because λ((a,b])=ba\lambda((a, b])=b-a and λ((a,b]+x)=λ((a+x,b+x])=ba\lambda((a, b]+x)=\lambda((a+x, b+x])=b-a.

    By Dynkin π\pi-λ\lambda Theorem, σ(A)=BL\sigma(\mathcal{A})=\mathcal{B}\subset\mathcal{L}

    i.e. Every Borel subset of (0,1](0, 1] is shift invariant w.r.t λ\lambda

  • Next, we say that xyx\sim y if xyQx-y\in\mathbb{Q} then we can decompose (0,1](0, 1] into disjoint Equivalence classes.

    Define H(0,1]H\subset (0,1] s.t. HH contains one element from each equivalence class. (We can do this because of the Axiom of Choice) Then no two points in HH are equivalent. i.e. r1,r2Qr_1, r_2\in\mathbb{Q} then (H+r1)(H+r2)=(H+r_1)\cap(H+r_2)=\varnothing for r1r2r_1\neq r_2

    Thus (0,1]=rQ(H+r)(0, 1]=\bigcup_{r\in\mathbb{Q}}(H+r) and by countably additivity

    1=λ((0,1])=rQλ(H+r)1 = \lambda((0, 1]) = \sum_{r\in\mathbb{Q}}\lambda(H+r)

    since λ\lambda is traslation invariant, λ(H+r1)=λ(H+r2)=λ(H)\lambda(H+r_1)=\lambda(H+r_2)=\lambda(H).

    • If λ(H)=0\lambda(H)=0, then 1=rQλ(H)=01=\sum_{r\in\mathbb{Q}}\lambda(H)=0.
    • If λ(H)>0\lambda(H)>0, then 1=rQλ(H)=1=\sum_{r\in\mathbb{Q}}\lambda(H)=\infty

    which yields a contradiction. Thus HH is not λ\lambda-measurable and we call it Vitali Set.

  • Fun fact: Lebesgue Measure on R\mathbb{R} is the only translation invariant measure.

    • Same for Rn\mathbb{R}^n
    • There is no \infty-dimensional Lebesgue Measure. i.e. No traslation invariant measure.

Product Measure#

Definition

For two Measure Spaces (X,X,μ)(\mathbb{X}, \mathcal{X}, \mu) and (Y,Y,ν)(\mathbb{Y}, \mathcal{Y}, \nu), define (X×Y,X×Y,π)(\mathbb{X}\times\mathbb{Y}, \mathcal{X}\times\mathcal{Y}, \pi) where π(A×B)=μ(A)ν(B)\pi(A\times B)=\mu(A)\nu(B) for AXA\in\mathcal{X} and BYB\in\mathcal{Y}.

Question : How are these related? B(X)×B(Y)\mathcal{B}(\mathbb{X})\times\mathcal{B}(\mathbb{Y}) and B(X×Y)\mathcal{B}(\mathbb{X}\times\mathbb{Y})

From Dudley 4.1.7 B(X)×B(Y)B(X×Y)\mathcal{B}(\mathbb{X})\times\mathcal{B}(\mathbb{Y})\subset\mathcal{B}(\mathbb{X}\times\mathbb{Y}), but these two are “usually” equal to each other. e.g. X=Y=R\mathbb{X}=\mathbb{Y}=\mathbb{R}

Independence#

Independence for sets

For a countable collection of sets Ai,iIA_i, i\in \mathcal{I}, we say that the collection is independent if for all finite subsets JIJ \subset I, we have

μ(jJAj)=jJμ(Aj)\mu\left(\bigcap_{j\in \mathcal{J}}A_j\right)=\prod_{j\in \mathcal{J}}\mu(A_j)
Independence for σ\sigma-Fields

For a countable collection of σ\sigma-Fields FiF,iIF_i \subset F, i \in \mathcal{I}, we say that this collection of σ\sigma-Fields is independent if any set of sets {AiFi:iI}\{A_i \in F_i : i \in \mathcal{I}\} is independent in the sense of the previous definition

Theorem

Let A1,A2FA_1, A_2 \subset F be π\pi-systems. If μ(A1A2)=μ(A1)μ(A2)\mu(A_1 \cap A_2) = \mu(A_1)\mu(A_2) for any A1A1A_1 \in \mathcal{A}_1 and A2A2A_2 \in \mathcal{A}_2, then σ(A1)\sigma(A_1) and σ(A2)\sigma(A_2) are independent.

Proof

For a fixed A1A1A_1 \in \mathcal{A}_1, we can define two measures for BFB \in \mathcal{F} as

ν1(B)=μ(A1B)  and  ν2(B)=μ(A1)μ(B).\nu_1(B) = \mu(A_1 \cap B) ~~\text{and}~~ \nu_2(B) = \mu(A_1)\mu(B).

By assumption, ν1(A2)=ν2(A2)\nu_1(A_2) = \nu_2(A_2) for any A2A2A_2 \in \mathcal{A}_2. Hence, by Uniqueness of Extension Theorem, they must coincide on σ(A2)\sigma(A_2). Therefore, μ(A1B2)=μ(A1)μ(B2)\mu(A_1 \cap B_2) = \mu(A_1)\mu(B_2) for a fixed A1A_1 and any B2σ(A2)B_2 \in \sigma(A_2).

This argument can be repeated by fixing an element B2σ(A2)B_2 \in \sigma(A_2) to get that μ(B1B2)=μ(B1)μ(B2)\mu(B_1 \cap B_2) = \mu(B_1)\mu(B_2) for Biσ(Ai),i=1,2B_i \in \sigma(A_i), i=1,2. \square

Functions, Random Variables and Integration#

Simple Functions and Random Variables#

Simple Random Variable

Let (Ω,F,P)(\Omega, \mathcal{F}, P) be a Probability Space i.e. P(Ω)=1P(\Omega)=1. A simple random variable X:ΩRX : \Omega\rightarrow\mathbb{R} is a real valued function that only takes on a finite number of values x1,,xpx_1, \dots , x_p and such that the set

{ωΩ:X(ω)=xi}F\{\omega\in\Omega : X(\omega)=x_i\}\in\mathcal{F}

One way to write such a function is to finitely partition Ω\Omega into disjoint sets {Ai}i=1p\{A_i\}_{i=1}^p i.e. i=1pAi=Ω\bigcup_{i=1}^p A_i=\Omega and AiAj=A_i\cap A_j=\varnothing and write

X(ω)=i=1pxi1[ωAi]X(\omega) = \sum_{i=1}^px_i \mathbf{1}[\omega\in A_i]

Then we can say that the probability that XX is equal to xix_i is

P(X=xi)=P({ωΩ:X(ω)=xi})=P(Ai)P(X=x_i)=P(\{\omega\in\Omega : X(\omega)=x_i\})=P(A_i)

Furthermore, this allows us to define the expectation of the simple random variable XX to be

EX=i=1pxiP(X=xi)\mathbb{E}X = \sum_{i=1}^p x_iP(X=x_i)
Simple Measurable Function

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a Probability Space. A simple function F:ΩRF:\Omega\rightarrow\mathbb{R} is s.t.

F(ω)=i=1pxi1[ωBi]   BiFF(\omega)=\sum_{i=1}^p x_i\mathbf{1}[\omega\in B_i]~~ ~B_i\in\mathcal{F}

which is the linear combination of indicator functions. The sets BiB_i need not be disjoint, but given a simple function, we can define it in terms of disjoint BiB_i.

Then, we define the integral of a simple function to be

Fdμ:=i=1pxiμ(Bi)\int F\mathrm{d}\mu \vcentcolon= \sum_{i=1}^p x_i\mu(B_i)

Measurable Functions and Random Variables#

To extend the above idea of a simple random variable, we want to replace the finite xix_i with any Borel set BR\mathcal{B} \subset \mathbb{R}. We need two Measureable Spaces (X,X)(\mathbb{X}, \mathcal{X}) and (Y,Y)(\mathbb{Y}, \mathcal{Y}).

Measurable Function

A function f:XYf : \mathbb{X}\rightarrow\mathbb{Y} is said to be measurable (with respect to X/Y\mathcal{X}/\mathcal{Y}) if f1(B)Xf^{-1}(B)\in\mathcal{X} for any BYB\in\mathcal{Y}. If Y=R\mathbb{Y}=\mathbb{R}, then we say that ff is a X\mathcal{X}-measurable.

Typically, the σ\sigma-Fields of interest are the Borel σ\sigma-Fields and it is sometimes writen as (X,B(X))(\mathbb{X}, \mathcal{B}(\mathbb{X})) when we have a topological space. Moreover, the space (X,X)(\mathbb{X}, \mathcal{X}) is typically taken to be (R,B(R))(\mathbb{R}, \mathcal{B}(\mathbb{R})) or (R+,B(R+))(\mathbb{R}^+, \mathcal{B}(\mathbb{R}^+)). In this case, we say that ff is Borel Measurable.

If we replace B(R)\mathcal{B}(\mathbb{R}) with Mλ(R)\mathcal{M}_λ(\mathbb{R}), the set of Lebesgue measurable subsets of R\mathbb{R}, then we say ff is Lebesgue Measurable.

Cool facts about Measurable Functions#

  1. Inverse images of set functions preserve set operations. i.e. for f:XYf:\mathbb{X}\rightarrow\mathbb{Y} and A,AiY,iIA, A_i\subset\mathbb{Y}, i\in \mathcal{I},

    f1(iIAi)=iIf1(Ai)  and  f1(YA)=Xf1(A)f^{-1}\left(\bigcup_{i\in \mathcal{I}} A_i\right) = \bigcup_{i\in \mathcal{I}} f^{-1}(A_i) ~~ \text{and} ~~ f^{-1}(\mathbb{Y}\setminus A) = \mathbb{X} \setminus f^{-1}(A)

    For a measurable set function ff, this implies that {f1(B):BY}\{f^{-1}(B) : B \in \mathcal{Y}\} is a σ\sigma-Field and is contained in X\mathcal{X} . Hence, we want Y\mathcal{Y} to be no larger than X\mathcal{X} to have measurable functions. Furthermore, this can be used to show that the measurability of ff can be established by looking only at a collection of sets AY\mathcal{A} \subset \mathcal{Y} that generates Y\mathcal{Y}.

    Example

    let A\mathcal{A} be the set of all half-lines At=(,t]A_t = (−\infty, t] for tRt \in \mathbb{R} will generate B(R)\mathcal{B}(\mathbb{R}). Thus, ff is measurable as long as the sets {x:f(x)t}\{x : f(x) \leq t\} are measurable.

  2. For any AXA \in \mathcal{X}, the indicator functions f(x)=1[xA]f(x) = \mathbf{1}[x \in A] are measurable. The σ\sigma-Field generated by f1f^{−1} is simply {,A,Ac,X}X\{\varnothing, A, A^c, \mathbb{X}\}\subset\mathcal{X}.

  3. For measurable functions f,g:XRf, g : \mathbb{X} \rightarrow \mathbb{R}, the functions f+gf + g and fgfg are measurable.

  4. For measurable functions, {fi}i=1\{f_i\}_{i=1}^\infty from X\mathbb{X} to R\mathbb{R}, the following are also measurable: supifi\sup_i f_i, infifi\inf_i f_i, lim supifi\limsup_i f_i, lim infifi\liminf_i f_i and limifi\lim_i f_i if it exists.

    Proof

    In set notation, {x:supifi(x)t}=i=1{x:fi(x)t}\{x : \sup_i f_i(x)\leq t\}=\bigcap_{i=1}^\infty\{x : f_i(x)\leq t\} where the righthand side is a countable intersection of measurable sets and hence measurable. Similarly, {x:infifi(x)t}=i=1{x:fi(x)t}\{x : \inf_i f_i(x)\leq t\}=\bigcup_{i=1}^\infty\{x : f_i(x)\leq t\} and lim supifi=infisupjifi\limsup_i f_i = \inf_i \sup_{j\geq i} f_i and lim infifi=supiinfjifi\liminf_i f_i = \sup_i\inf_{j\geq i} f_i. If limifi\lim_i f_i exists then lim supfi=limifi=lim infifi\limsup f_i=\lim_i f_i=\liminf_i f_i. \square

  5. Let f:XRf : \mathbb{X}\rightarrow\mathbb{R} be a continuous function, then it is measurable.

    Proof

    If UU is an open set in R\mathbb{R}, then f1(U)f^{−1}(U) is open in X\mathbb{X} (definition of a continous function between two topological spaces). Thus, the set f1(U)f^{−1}(U) is measurable. Since the open sets of R\mathbb{R} generate B(R)\mathcal{B}(\mathbb{R}), the function ff is measurable. \square

  6. Given a collection of functions fi:XY,iIf_i: \mathbb{X}\rightarrow\mathbb{Y}, i\in \mathcal{I}, we can make them measurable by constructing the measurable space (X,X)(\mathbb{X},\mathcal{X}) where σ({fi}iI)X\sigma(\{f_i\}_{i\in \mathcal{I}})\subseteq\mathcal{X} is the σ\sigma-field generated by the sets fi1(B)f^{-1}_i(B) for all ii and BYB\in\mathcal{Y}.

Almost Surely / Almost Everywhere

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space. For two functions f,g:ΩRf, g : \Omega\rightarrow\mathbb{R}, we say that f=gf = g a.e. (almost everywhere) when the set N={ω:f(ω)g(ω)}N = \{\omega : f(\omega) \neq g(\omega)\} has measure μ(N)=0\mu(N) = 0.

In probability theory, “almost everywhere” is replaced with “almost surely” abbreviated a.s. and it is equivalently is written “with probability 1” or w.p.1.

Example

Let ([0,1],B,λ)([0, 1], \mathcal{B}, \lambda) be the standard measure space of Borel sets on the unit interval with Lebesgue measure. Let f(t)=0f(t)=0 for all t(0,1]t\in(0,1] and g(t)=0g(t) = 0 on (0,1]Q(0, 1]\setminus\mathbb{Q} and g(t)=1g(t)=1 on (0,1]Q(0, 1]\cap\mathbb{Q}.

Then we have f=gf=g a.e. that is λ((0,1)Q)=0\lambda((0, 1)\cap\mathbb{Q}) = 0. To prove that λ(Q)=0\lambda(\mathbb{Q})=0, we enumerate each rational number {qm}m=1\{q_m\}_{m=1}^\infty and surround each qmq_m with an interval (qmϵ2m,qm+ϵ2m)(q_m-\frac{\epsilon}{2^m}, q_m + \frac{\epsilon}{2^m}). For ϵ>0\epsilon > 0, m=1λ((qmϵ2m,qm+ϵ2m))=m=1ϵ2m1=2ϵ\sum_{m=1}^\infty \lambda((q_m-\frac{\epsilon}{2^m}, q_m + \frac{\epsilon}{2^m}))=\sum_{m=1}^\infty\frac{\epsilon}{2^{m-1}}=2\epsilon. Since ϵ\epsilon is arbitrary, we have λ(Q)=0\lambda(\mathbb{Q})=0.

Integration#

We will consider measurable functions mapping from (Ω,F)(\Omega, \mathcal{F}) to [,][-\infty, \infty], which is called the extended real line. This allows us to handle sets such as f1()f^{-1}(\infty).

Notation : fiff_i\uparrow f

We use fiff_i\uparrow f to represent a sequence of functions {fi}\{f_i\} that are increasing and converge to ff for every ω\omega. This implies that fi(ω)f(ω)f_i(\omega)\rightarrow f(\omega) and fi(ω)fi+1(ω)f_i(\omega)\leq f_{i+1}(\omega) for all ω\omega.

Theorem

Let (Ω,F)(\Omega, \mathcal{F}) be a measurable space and A\mathcal{A} a π\pi-system that generates F\mathcal{F} ( σ(A)=F ~\sigma(\mathcal{A})=\mathcal{F}~). Let V\mathcal{V} be a linear space that contains

  1. all indicators 1Ω\mathbf{1}_\Omega and 1A\mathbf{1}_A for each AAA\in\mathcal{A}
  2. all functions ff such that fiV\exists f_i\in\mathcal{V} s.t. fiff_i\uparrow f

Then V\mathcal{V} contains all measurable functions.

Proof

First, 1AA\mathbf{1}_A\in\mathcal{A}. Let L={BF:1BV}\mathcal{L}=\{B\in\mathcal{F} : \mathbf{1}_B\in\mathcal{V}\}, we claim that L\mathcal{L} is a λ\lambda-system. Since AL\mathcal{A}\subset\mathcal{L}, by Dynkin π\pi-λ\lambda Theorem, σ(A)L\sigma(\mathcal{A})\subseteq\mathcal{L}. By the definition of L\mathcal{L}, we have that VF\mathcal{V}\subseteq\mathcal{F}, thus L=F\mathcal{L}=\mathcal{F}, so every indicator function 1B\mathbf{1}_B is in V\mathcal{V}.

Let ff be a non-negative measurable function, we can define fi=2i2iff_i=2^{-i}\lfloor2^if\rfloor. Each fif_i is a finite linear combination of indicator functions and hence fiVf_i \in \mathcal{V}. Furthermore, fiff_i\uparrow f and thus fVf\in\mathcal{V}.

Lastly, for a general measurable ff, we write f=f+ff=f^+-f^- where

f+(ω)={f(ω)if f(ω)00otherwisef(ω)={f(ω)if f(ω)<00otherwise\begin{aligned} f^+(\omega)&= \begin{cases} f(\omega) & \text{if } f(\omega)\geq 0 \\ 0 & \text{otherwise} \end{cases} \\ f^-(\omega)&= \begin{cases} -f(\omega) & \text{if } f(\omega)< 0 \\ 0 & \text{otherwise} \end{cases} \end{aligned}

which are both non-negative measurable functions. \square

Integral of a Measurable Function

For a non-negative measurable function f:Ω[0,]f : \Omega\rightarrow [0,\infty] on the measure space (Ω,F,μ)(\Omega, \mathcal{F}, \mu), we define

fdμ=sup[iI{infωAif(ω)}μ(Ai)]\int f\mathrm{d}\mu = \sup\left[\sum_{i\in \mathcal{I}} \left\{\inf_{\omega\in A_i} f(\omega) \right\}\mu(A_i)\right]

where the supremum is taken over all finite partitions {Ai}iI\{A_i\}_{i\in \mathcal{I}} of Ω\Omega.

Inside the square brackets is the integral of the simple function that assigns a value of infωAif(ω)\inf_{\omega\in A_i} f(\omega) for the set A_i. Hence, for a non-negative ff, we consider all simple functions gg such that 0gf0 \leq g \leq f and define the integral to be the supremum of the integral of gg . To extend this to all measurable functions ff, we write f=f+ff=f^+-f^- then

fdμ=f+dμfdμ\int f\mathrm{d}\mu = \int f^+\mathrm{d}\mu - \int f^-\mathrm{d}\mu
Monotone Convergence Theorem for Simple Functions

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space, and let f0f \geq 0 be measureable, and fn0 nNf_n \geq 0~\forall n \in \mathbb{N} be a sequence of measureable simple functions such that fnff_n \uparrow f. Then, fndμfdμ\int f_n\mathrm{d}\mu\uparrow\int f\mathrm{d}\mu.

Proof

Since fnff_n \uparrow f and ff is integrable, then fndμc[0,)\int f_n\mathrm{d}\mu \uparrow c\in[0,\infty) and cfdμc\leq \int f\mathrm{d}\mu. To prove the other side of the inequality, let gg be any simple measureable function s.t. 0gf0\leq g\leq f. Our goal is to show that gdμc\int g\mathrm{d}\mu\leq c as this will imply that cfdμc\geq \int f\mathrm{d}\mu.

First, we write g=iIai1Aig=\sum_{i\in \mathcal{I}}a_i \mathbf{1}_{A_i} where AiA_i are disjoint partition of Ω\Omega and similarly fn=jJbj(n)1Bj(n)f_n=\sum_{j\in \mathcal{J}} b_j^{(n)} \mathbf{1}_{B_j^{(n)}}. Consequently,

fn=iIfn1Ai=i,jbj(n)1AiBj(n)f_n = \sum_{i\in \mathcal{I}} f_n \mathbf{1}_{A_i}=\sum_{i, j} b_j^{(n)} \mathbf{1}_{A_i\cap B_j^{(n)}}

and fndμ=iIAifndμ\int f_n\mathrm{d}\mu=\sum_{i\in\mathcal{I}}\int_{A_i}f_n\mathrm{d}\mu. Hence, we want to show that for each iIi \in \mathcal{I} that

limnAifndμAigdμ=aiμ(Ai)\lim_{n\rightarrow \infty}\int_{A_i}f_n\mathrm{d}\mu\geq \int_{A_i} g\mathrm{d}\mu = a_i\mu(A_i)

First, if ai=0a_i=0, then the eqaulity above must hold. If ai>0a_i>0, we can divide by aia_i and without loss of generality, we take ai=1a_i=1 and consider g=1Ag=\mathbf{1}_A for some set AA.

For any ϵ>0\epsilon>0, let Cn={xA:fn(x)>1ϵ}C_n=\{x\in A : f_n(x) > 1 - \epsilon\}. Then CnAC_n\uparrow A, which means

C1C2A   and  n=1Cn=AC_1\subseteq C_2\subseteq \cdots \subseteq A ~~\text{ and }~ \bigcup_{n=1}^\infty C_n = A

By countable additivity of μ\mu

μ(Cn+1)=μ(C1)+i=1nμ(Ci+1Ci)\mu(C_{n+1}) = \mu(C_1) + \sum_{i=1}^n \mu(C_{i + 1} \setminus C_i)

and μ(Cn)μ(A)\mu(C_n)\uparrow \mu(A). Since fndμ(1ϵ)μ(Cn)\int f_n\mathrm{d}\mu \geq (1 - \epsilon)\mu(C_n), we have that c(1ϵ)μ(A)=(1ϵ)gdμc\geq (1 - \epsilon)\mu(A) = (1 - \epsilon)\int g\mathrm{d}\mu. Take ϵ\epsilon to zero gives cgdμc\geq \int g\mathrm{d}\mu. \square

Theorem

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space, and let f,g:Ω[,]f, g : \Omega \rightarrow [-\infty, \infty] be measureable and f=gf = g a.e. Then, ff is integrable if and only if gg is integrable. If ff and gg are integrable, then fdμ=gdμ\int f\mathrm{d}\mu = \int g\mathrm{d}\mu.

Proof

Let f=gf = g on ΩN\Omega \setminus N where μ(N)=0\mu(N) = 0. For any measureable function h:Ω[,]h : \Omega \rightarrow [-\infty, \infty], we can consider when Ωhdμ=ΩNhdμ\int_\Omega h\mathrm{d}\mu = \int_{\Omega\setminus N} h\mathrm{d}\mu coincide.

  • True if hh is an indicator function because μ(A)=μ(AN)\mu(A) = \mu(A\setminus N)
  • Also true for hh being a non-negative simple function.
  • From MCT for Simple Functions, we can take non-negative simple functions hh to any non-negative measureable function.
  • Lastly, write hdμ=h+dμhdμ\int h\mathrm{d}\mu = \int h^+\mathrm{d}\mu - \int h^-\mathrm{d}\mu shows that the integrals will coincide for any integrable measureable function.

To complete the proof, we note that

fdμ=ΩNfdμ=ΩNgdμ=gdμ\int f\mathrm{d}\mu = \int_{\Omega\setminus N}f \mathrm{d}\mu = \int_{\Omega\setminus N}g \mathrm{d}\mu = \int g\mathrm{d}\mu

This result basically tells us that we can modify functions on a set of measure zero without breaking anything. \square

Monotone Convergence Theorem#

Monotone Convergence Theorem

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space and let {fi}i=1\{f_i\}_{i=1}^\infty be measurable functions from Ω\Omega to [,][-\infty, \infty] such that fiff_i \uparrow f μ\mu-a.e. and f1dμ>\int f_1\mathrm{d}\mu > -\infty. Then fidμfdμ\int f_i\mathrm{d}\mu \uparrow \int f\mathrm{d}\mu. (This means that we can swap the limit and the integral)

Proof

First, we need to check that ff is measureable. For cRc\in\mathbb{R}, we consider the sets (c,](c, \infty], which generate the Borel σ\sigma-Field. Since fiff_i\uparrow f, f1((c,])=i=1fi1((c,])f^{-1}((c, \infty]) = \bigcup_{i=1}^\infty f_i^{-1}((c, \infty]) and fi1((c,])Ff_i^{-1}((c, \infty]) \in \mathcal{F}, thus ff is measureable. (by the first cool fact about measurable functions)

Next, assume that f10f_1 \geq 0, and for each fif_i we take simple functions gijg_{ij} such that gijfig_{ij} \uparrow f_i as jj\rightarrow\infty. Thus, by MCT for Simple Functions, gijdμfidμ\int g_{ij}\mathrm{d}\mu\uparrow\int f_i\mathrm{d}\mu. Furthermore, let gi=max{g1i,,gii}g_i^* = \max\{g_{1i}, \cdots , g_{ii}\}. These gig_i^* are simple functions and gifg_i^*\uparrow f. Once again, MCT for Simple Functions implies that gidμfdμ\int g_i^*\mathrm{d}\mu\uparrow\int f\mathrm{d}\mu. But since gifig_i^* \leq f_i by construction, gidμfidμfdμ\int g_i^*\mathrm{d}\mu \leq \int f_i\mathrm{d}\mu \leq \int f\mathrm{d}\mu. Thus, fidμfdμ\int f_i\mathrm{d}\mu\uparrow\int f\mathrm{d}\mu. This means we are done for non-negative fucntions (f10f_1\geq 0).

Now, we assume that f0f \leq 0. In this case fiff_i \uparrow f implies that fif−f_i \downarrow −f. Let h=fh=-f and hi=fih_i=-f_i, we have 0hdμhidμ0\leq \int h\mathrm{d}\mu\leq \int h_i\mathrm{d}\mu. Next, note that 0h1hih1h0\leq h_1-h_i\uparrow h_1-h. Applying the above result gives that (h1hi)dμ(h1h)dμ\int (h_1-h_i)\mathrm{d}\mu\uparrow \int (h_1-h)\mathrm{d}\mu. Since all of the hh have finite integrals, we are allowed to subtract to get that hidμhdμ\int h_i\mathrm{d}\mu\downarrow\int h\mathrm{d}\mu and thus fidμfdμ\int f_i\mathrm{d}\mu\uparrow\int f\mathrm{d}\mu.

For a general function f=f+ff = f^+ - f^-, we have fi+f+f_i^+\uparrow f^+ and fiff_i^-\downarrow f^- and fdμ<\int f^-\mathrm{d}\mu < \infty. So by the above special cases, fi+dμf+dμ\int f_i^+\mathrm{d}\mu\uparrow\int f^+\mathrm{d}\mu and fidμfdμ\int f_i^-\mathrm{d}\mu\downarrow\int f^-\mathrm{d}\mu. Finally, fidμfdμ\int f_i\mathrm{d}\mu \uparrow \int f\mathrm{d}\mu. \square

We only require fiff_i \uparrow f to hold almost everywhere to establish the result. Hence convergence can fail on a set of measure (probability) zero and we still have convergence of the integrals.

Secondly, we can redo the above proof for fiff_i \downarrow f with f1dμ<\int f_1\mathrm{d}\mu < \infty to get a similar result for decreasing sequence.

Fatou’s Lemma #

Fatous’ Lemma

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space and let {fi}i=1\{f_i\}_{i=1}^\infty be non-negative measurable functions from Ω\Omega to [,][-\infty, \infty]. Then, lim inffidμlim inffidμ\int \liminf f_i\mathrm{d}\mu \leq \liminf \int f_i\mathrm{d}\mu

Proof

Recall that lim infi=supjinfi>jfi\liminf_{i\rightarrow\infty} = \sup_j \inf_{i>j} f_i. Hence, let gj=infijfig_j=\inf_{i\geq j} f_i. Then, gjlim infifig_j\uparrow \liminf_{i\rightarrow\infty} f_i and f10f_1\geq 0 by assumption. So MCT for Simple Functions says that gjdμlim infifidμ\int g_j\mathrm{d}\mu\uparrow\int\liminf_{i\rightarrow\infty}f_i\mathrm{d}\mu. By construction, gjfig_j\leq f_i for any iji\geq j, thus gjdμfidμ\int g_j\mathrm{d}\mu \leq \int f_i\mathrm{d}\mu for any iji\geq j and subsequently, gjdμinfijfidμ\int g_j\mathrm{d}\mu\leq \inf_{i\geq j}\int f_i\mathrm{d}\mu. Taking jj\rightarrow\infty gives limjgjdμ=lim inffidμlim inffidμ\lim_{j\rightarrow\infty}\int g_j\mathrm{d}\mu = \int\liminf f_i\mathrm{d}\mu\leq\liminf\int f_i\mathrm{d}\mu. \square

Dominated Convergence Theorem#

Dominated Convergence Theorem

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space and let {fi}i=1\{f_i\}_{i=1}^\infty and gg be measurable functions and absolutely integrable. If fig|f_i| \leq g for all ii and fi(ω)f(ω)f_i(\omega) \rightarrow f(\omega) for each ωΩ\omega \in \Omega (i.e. pointwise convergence), then ff is absolutely integrable and fidμfdμ\int f_i\mathrm{d}\mu \rightarrow \int f\mathrm{d}\mu.

Proof

Let fi=infjifjf_i^\wedge = \inf_{j\geq i} f_j and fi=supjifjf_i^\vee = \sup_{j\geq i} f_j. Then fififif_i^\wedge \leq f_i \leq f_i^\vee. We have that fiff_i^\wedge\uparrow f and f1dμgdμ>\int f_1^\wedge\mathrm{d}\mu\geq -\int g\mathrm{d}\mu > -\infty. So MCT for Simple Functions implies that fidμfdμ\int f_i^\wedge\mathrm{d}\mu\uparrow \int f\mathrm{d}\mu.

Do the same for fif_i^\vee, we have that fiff_i^\vee\downarrow f and hence that fidμfdμ\int f_i^\vee\mathrm{d}\mu\downarrow\int f\mathrm{d}\mu. Since fidμfidμfidμ\int f_i^\wedge\mathrm{d}\mu\leq \int f_i\mathrm{d}\mu\leq \int f_i^\vee\mathrm{d}\mu, we have the desired result that fidμfdμ\int f_i\mathrm{d}\mu\rightarrow \int f\mathrm{d}\mu. \square

Lebesgue-Stieltjes Measure#

Let (X,X)(\mathbb{X}, \mathcal{X}) and (Y,Y)(\mathbb{Y}, \mathcal{Y}) be two measurable spaces. Let ψ:XY\psi : \mathbb{X}\rightarrow\mathbb{Y} be a measurable function and μ\mu be a measure on X\mathcal{X}. Then we can define ν=μψ1\nu = \mu \circ \psi^{-1} to be a measure of Y\mathcal{Y}. This allows us to turn Lebesgue measure into Lebesgue-Stieltjes measures.

Theorem

Let F:RRF : \mathbb{R} \rightarrow \mathbb{R} be non-constant, right-continuous, and non-decreasing. Then, there exists a unique measure dF\mathrm{d}F on R\mathbb{R} such that for all a,bRa, b \in \mathbb{R} with a<ba < b,

dF((a,b])=F(b)F(a)\mathrm{d}F((a, b]) = F(b) - F(a)
Proof

Let F()=limxF(x)F(\infty) = \lim_{x\rightarrow\infty} F(x) and F()=limxF(x)F(-\infty) = \lim_{x\rightarrow-\infty}F(x). We define an open interval I=(F(),F())I = (F(-\infty), F(\infty)) and define g(y)=inf{xR:yF(x)}g(y) = \inf\{x\in\mathbb{R} : y\leq F(x)\}. We want to define dF\mathrm{d}F to be λg1\lambda \circ g^{−1} where λ\lambda is Lebesgue Measure on R\mathbb{R}, so we need to show that this makes sense.

We first show that gg is left-continuous and non-decreasing and for yIy \in I and xR,g(y)xx \in \mathbb{R}, g(y) \leq x if and only if yF(x)y \leq F(x). To show this, fix a yIy\in I and let Jy={xR:yF(x)}J_y = \{x\in\mathbb{R} : y\leq F(x)\}. As FF is non-decreasing, if xJyx\in J_y and xxx'\geq x then xJyx'\in J_y. As FF is right-continuous, if xnJyx_n\in J_y and xnxx_n\downarrow x then xJyx\in J_y. Therefore, Jy=[g(y),)J_y = [g(y), \infty) and g(y)xg(y)\leq x if and only if yF(x)y\leq F(x). Secondly, for yyy\leq y', we have that JyJyJ_{y'}\subseteq J_y and thus g(y)g(y)g(y)\leq g(y'). So if ynyy_n\uparrow y, then Jy=n=1JynJ_y = \bigcap_{n=1}^\infty J_{y_n} and thus g(yn)g(y)g(y_n)\rightarrow g(y), which implies that gg is left continuous and non-decreasing.

From the above, gg is Borel Measurable and thus defining dF=λg1\mathrm{d}F = \lambda \circ g^{-1} gives us that

dF((a,b])=λ({y:g(y)>a,g(y)b})=λ((F(a),F(b)])=F(b)F(a)\mathrm{d}F((a, b]) = \lambda(\{y : g(y) > a, g(y) \leq b\}) = \lambda((F(a), F(b)]) = F(b) - F(a)

Furthermore, this measure, dF\mathrm{d}F, is unique by using the same arguments as before for Lebesgue Measure. \square

In the case that F:R[0,1]F : \mathbb{R} \rightarrow [0, 1] such that the interval I=[0,1]I = [0, 1], we have a cumulative distribution function, which induces a measure on the real line. This allows us to do things like integrate with respect to such measures——i.e. take an expectation.

Radon Measure#

Definition

Let (Ω,B,μ)(\Omega, \mathcal{B}, \mu) be a measure space where B\mathcal{B} is the Borel σ\sigma-field. The measure μ\mu is said to be a Radon Measure if μ(K)<\mu(K) < \infty for all compact KBK \in \mathcal{B}.

  • dF\mathrm{d}F is a Radon Measure.

  • Every non-zero Radon Measure on B(R)\mathcal{B}(\mathbb{R}) can be written as dF=λg1\mathrm{d}F = \lambda \circ g^{−1} for some FF.

    If μ\mu is a Radon Measure on R\mathbb{R}, then we can define FF as

    F(x)={μ((0,x])if x>0μ((x,0])if x<0F(x) = \begin{cases} \mu((0, x]) & \text{if } x > 0 \\ -\mu((x, 0]) & \text{if } x < 0 \\ \end{cases}

    Thus, F(b)F(a)=μ((a,b])F(b) − F(a) = \mu((a, b]) for a<ba < b and hence μ=dF\mu = \mathrm{d}F by uniqueness.

Product Measure#

Definition

Given two σ\sigma-fields X\mathcal{X} and Y\mathcal{Y}, we define the product σ\sigma-field to be X×Y\mathcal{X} \times \mathcal{Y}, which is the σ\sigma-filed generated by the rectangle A×BA\times B where AXA\in\mathcal{X} and BYB\in\mathcal{Y}. The collection of all rectangles will be denoted as R\mathcal{R}

Monotone Class#

Definition

A collection of subsets M\mathcal{M} of Ω\Omega is said to be monotone if

  1. for {Ai}i=1\{A_i\}_{i=1}^\infty s.t. AiMA_i \in \mathcal{M} and AiA=i=1AiA_i \uparrow A = \bigcup_{i=1}^\infty Ai, then AMA \in \mathcal{M}.
  2. for {Ai}i=1\{A_i\}_{i=1}^\infty s.t. AiMA_i \in \mathcal{M} and AiA=i=1AiA_i \downarrow A = \bigcup_{i=1}^\infty Ai, then AMA \in \mathcal{M}.

Note that if a field A\mathcal{A} is also monotone, then it is a σ\sigma-field (by definition of σ\sigma-field.)

Monotone Class Theorem

Let A\mathcal{A} be a field and M\mathcal{M} be monotone such that AM\mathcal{A}\subset\mathcal{M}. Then, σ(A)M\sigma(\mathcal{A}) \subseteq \mathcal{M}

Proof

This proof is similar to the one of Dynkin π\pi-λ\lambda Theorem.

Existence and Uniqueness of Product Measure#

Existence and Uniqueness Theorem of Product Measure

Let (X,X,μ)(\mathbb{X}, \mathcal{X} , \mu) and (Y,Y,ν)(\mathbb{Y}, \mathcal{Y}, \nu) be σ\sigma-finite measure spaces. We denote the product σ\sigma-Field to be X×Y\mathcal{X} \times \mathcal{Y}. (the cartesian product of two σ\sigma-Fields may not be a σ\sigma-Field) Let ππ be a set function on X×Y\mathcal{X}\times\mathcal{Y} such that for AXA \in \mathcal{X} and BYB \in \mathcal{Y}, π(A×B)=μ(A)ν(B)π(A \times B) = \mu(A)\nu(B). Then, ππ extends uniquely to a measure on (X×Y,X×Y)(\mathbb{X} \times \mathbb{Y}, \mathcal{X}\times\mathcal{Y}) such that for any EX×YE \in \mathcal{X} \times \mathcal{Y},

π(E)=1E(x,y)dμ(x)dν(y)=1E(x,y)dν(y)dμ(x)\pi(E) = \int\int \mathbf{1}_E(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) = \int\int \mathbf{1}_E(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)
Lemma

Let (X,X,μ)(\mathbb{X}, \mathcal{X} , \mu) and (Y,Y,ν)(\mathbb{Y}, \mathcal{Y}, \nu) be finite measure spaces, and let

F={EX×Y:1E(x,y)dμ(x)dν(y)=1E(x,y)dν(y)dμ(x)}\mathcal{F} = \left\{E \subset \mathbb{X}\times\mathbb{Y} : \int\int \mathbf{1}_E(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) = \int\int \mathbf{1}_E(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)\right\}

Then X×YF\mathcal{X}\times\mathcal{Y} \subseteq \mathcal{F}. ( This means that (X×Y,X×Y)(Y×X,Y×X) (\mathbb{X}\times\mathbb{Y}, \mathcal{X}\times\mathcal{Y}) \equiv (\mathbb{Y}\times\mathbb{X}, \mathcal{Y}\times\mathcal{X})~)

Proof

Let E=A×BE = A \times B for AXA \in \mathcal{X} and BYB \in \mathcal{Y}, i.e. ERE \in \mathcal{R}. Then,

1E(x,y)dμ(x)dν(y)=μ(A)1B(y)dν(y)μ(A)ν(B)=ν(B)μ(A)=ν(B)1A(x)dμ(x)=1E(x,y)dν(y)dμ(x)\begin{aligned} \int\int \mathbf{1}_E(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) &=\mu(A) \int \mathbf{1}_B(y)\mathrm{d}\nu(y) \mu(A)\nu(B) =\nu(B)\mu(A)\\ &=\nu(B) \int \mathbf{1}_A(x)\mathrm{d}\mu(x) =\int\int \mathbf{1}_E(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x) \end{aligned}

Therefore, RF\mathcal{R} \subset \mathcal{F}.

Also, for disjoint R1,R2RR_1, R_2 \in \mathcal{R}, 1R1R2=1R1+1R2\mathbf{1}_{R_1\cup R_2} = \mathbf{1}_{R_1} + \mathbf{1}_{R_2}. Hence, F\mathcal{F} contains finite disjoint unions of R\mathcal{R}. This implies that the field generated by the set of rectangles AF\mathcal{A} \subset \mathcal{F}. (Dudley 3.2.3)

Next, consider {Ei}i=1,EiF\{E_i\}_{i=1}^\infty, E_i\in\mathcal{F}. Then if EiEE_i\uparrow E, then MCT says

1Ei(x,y)dμ(x)dν(y) 1E(x,y)dμ(x)dν(y)\int\int \mathbf{1}_{E_i}(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) ~\big\uparrow \int\int \mathbf{1}_{E}(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y)

and

1Ei(x,y)dν(y)dμ(x) 1E(x,y)dν(y)dμ(x)\int\int \mathbf{1}_{E_i}(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x) ~\big\uparrow \int\int \mathbf{1}_{E}(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)

Thus, EFE \in \mathcal{F} and the same holds if EiEE_i \downarrow E. Therefore, F\mathcal{F} is a monotone class. Finally, applying the Monotone Class Theorem shows that X×Y=σ(A)F\mathcal{X} \times \mathcal{Y} = \sigma(\mathcal{A}) \subset \mathcal{F}. \square

Proof : Existence and Uniqueness of Product Measure

First, we consider the case that μ\mu and ν\nu are finite measures and π(A×B)=ν(A)ν(B)\pi(A\times B) = \nu(A)\nu(B). Then we extend to a set function

π(E):=1E(x,y)dμ(x)d(y)\pi(E)\vcentcolon=\int\int \mathbf{1}_E(x, y)\mathrm{d}\mu(x)\mathrm{d}(y)

for any EX×YE\in \mathcal{X}\times\mathcal{Y}. The above lemma says that the definiton makes sense and the order of integration can be reversed for any EX×YE \in \mathcal{X} \times \mathcal{Y}. Linearity of integral implies that π\pi is finitely additive. Apply MCT shows that π\pi is countably additive.. Thus, π\pi is a measure on X×Y\mathcal{X} \times \mathcal{Y}.

To show that π\pi is unique, let ρ\rho be some other set function such that ρ(A×B)=μ(A)ν(B)\rho(A \times B) = \mu(A)\nu(B) for A×BRA \times B \in \mathcal{R}. Let M={EX×Y:π(E)=ρ(E)}\mathcal{M} = \{E \subset \mathbb{X} \times \mathbb{Y} : \pi(E) = \rho(E)\}. Then, M\mathcal{M} is a monotone class because for EiE=i=1EiE_i \uparrow E =\bigcup_{i=1}^\infty E_i, we can rewrite E=i=1DiE = \bigcup_{i=1}^\infty D_i where D1=E1D_1 = E_1 and Di=EiEi1D_i = E_i \setminus E_{i−1} for i2i \geq 2 are disjoint. So by countable additivity, π(E)=ρ(E)\pi(E)=\rho(E) and we can do the same for EiEE_i \downarrow E. Thus, by Monotone Class Theorem X×YM\mathcal{X} \times \mathcal{Y} \subseteq \mathcal{M}. Therefore, π\pi is unique on X×Y\mathcal{X} \times \mathcal{Y} for finite measures μ\mu and ν\nu.

Now let μ\mu and ν\nu be σ\sigma-finite measures. Let {Ai}i=1\{A_i\}_{i=1}^\infty and {Bi}i=1\{B_i\}_{i=1}^\infty be disjoint partitions of X\mathbb{X} and Y\mathbb{Y}, respectively, such that μ(Ai)<\mu(A_i)<\infty and ν(Bi)<\nu(B_i)<\infty. Then, for any EX×YE \in \mathcal{X} \times \mathcal{Y}, we define Eij=E(Ai×Bj)E_{ij} = E\cap (A_i\times B_j). From the above finite measure case,

1Eij(x,y)dμ(x)dν(y)=1Eij(x,y)dν(y)dμ(x)\int\int \mathbf{1}_{E_{ij}}(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) = \int\int \mathbf{1}_{E_{ij}}(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)

Sum over all ii and jj and apply MCT again to get

π(E)=1E(x,y)dμ(x)dν(y)=1E(x,y)dν(y)dμ(x)\pi(E) = \int\int \mathbf{1}_{E}(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) = \int\int \mathbf{1}_{E}(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)

Futhermore, Monotone convergence implies that π\pi is countably additive and hence a measure on X×Y\mathcal{X}\times\mathcal{Y}. For any other measure ρ\rho such that ρ(A×B)=μ(A)ν(B)\rho(A \times B) = \mu(A)\nu(B), countably additivity and uniqueness for finite measures implies that

π(E)=i,jπ(Ei,j)=i,jρ(Ei,j)=ρ(E)\pi(E) = \sum_{i,j}\pi(E_{i,j}) = \sum_{i,j}\rho(E_{i,j}) = \rho(E)

Hence, the extension of π\pi to X×Y\mathcal{X} \times\mathcal{Y} is unique. \square

Fubini-Toneli#

Fubini-Toneli Theorem

Let (X,X,μ)(\mathbb{X}, \mathcal{X} , \mu) and (Y,Y,ν)(\mathbb{Y}, \mathcal{Y}, \nu) be σ\sigma-finite measure spaces, and let f:X×YRf : \mathbb{X} \times \mathbb{Y} \rightarrow \mathbb{R} be measurable with respect to X×Y\mathcal{X} \times \mathcal{Y} such that either f0f\geq 0 (non-negative) or fd(μ×ν)<\int\int |f|\mathrm{d}(\mu\times\nu)<\infty (absolutely integrable). Then,

fd(μ×ν)=f(x,y)dμ(x)dν(y)=f(x,y)dν(y)dμ(x)\int\int f\mathrm{d}(\mu\times\nu) = \int\int f(x,y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) = \int\int f(x,y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)

Also, f(x,y)dμ(x)\int f(x, y)\mathrm{d}\mu(x) is Y\mathcal{Y}-measurable and f(x,y)dν(y)\int f(x, y)\mathrm{d}\nu(y) is X\mathcal{X}-measurable.

Proof

We have this for indicator functions from Existence and Uniqueness Theorem of Product Measure and thus the result for simple functions as integrals are linear.

Then, applying MCT to simple functions gives us that the above holds for non-negative measureable functions.

Instead, assume fd(μ×ν)<\int\int|f| \mathrm{d}(\mu\times\nu) < \infty. Then, we can write f=f+ff = f^+ - f^- and the above holds for f+f^+ and ff^- separately. i.e. f+(x,y)dμ(x)<\int f^+(x, y)\mathrm{d}\mu(x) < \infty for ν\nu-almost-every yy and f+(x,y)dν(y)<\int f^+(x, y)\mathrm{d}\nu(y) < \infty for μ\mu-almost-every xx and similarly for ff^-. Therefore,

f(x,y)dν(y)=f+(x,y)dν(y)+f(x,y)dν(y)<   (μa.e.)\int |f|(x, y)\mathrm{d}\nu(y) = \int f^+(x, y)\mathrm{d}\nu(y) + \int f^-(x, y)\mathrm{d}\nu(y) < \infty ~~~(\mu-a.e.)

and thus,

f(x,y)dν(y)=f+(x,y)dν(y)f(x,y)dν(y)   (νa.e.)\int f(x, y)\mathrm{d}\nu(y) = \int f^+(x, y)\mathrm{d}\nu(y) - \int f^-(x, y)\mathrm{d}\nu(y) ~~~(\nu-a.e.)

As we only require finiteness to occur almost everywhere to have the integral exist, we can integrate both sides of the above with respect to μ\mu to get

f(x,y)dν(y)dμ(x)=f+(x,y)dν(y)dμ(x)f(x,y)dν(y)dμ(x)\int\int f(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x) = \int\int f^+(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x) - \int\int f^-(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)

Do the same swapping μ\mu and ν\nu to conclude the theorem. \square

The above theorem lets us swap the order of integration for the product of two measure spaces. This can be extended by induction to the finite product of nn measure spaces.

Probability Theory#

LpL^p Spaces#

LpL^p Space

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space and f:Ω[,]f : \Omega \rightarrow [−\infty, \infty] a measurable function, then we say fLp(Ω,F,μ)f \in L^p(\Omega,\mathcal{F}, \mu), for 1p<1 \leq p < \infty if

fpdμ<\int |f|^p d\mu < \infty

For p=p = \infty, we say fL(Ω,F,μ)f \in L^\infty(\Omega,\mathcal{F}, \mu) if inf{t[,]:ft  μ-a.e.}<\inf \{t\in [-\infty, \infty] : |f| \leq t ~~\mu\text{-a.e.}\} < \infty

This definition allows us to write down the LpL^p norm, which is defined as:

fp=(fpdμ)1/pfor1p<f=inf{t[,]:ft  μ-a.e.}=inf{t[,]:μ({f(x)>t})=0}\begin{aligned} ||f||_p &= \left(\int |f|^p d\mu\right)^{1/p} \quad \text{for} \quad 1 \leq p < \infty \\ ||f||_\infty &= \inf \{t\in [-\infty, \infty] : |f| \leq t ~~\mu\text{-a.e.}\} \\ &= \inf \{t\in [-\infty, \infty] : \mu(\{|f(x)| > t\}) = 0\} \end{aligned}

Markov / Chebyshev and Jensen’s Inequalities#

Markov’s Inequality

Let f be a non-negative measurable function and t>0t > 0. Then, denote {f>t}:={ωΩ:f(ω)>t}\{f > t\} \vcentcolon= \{\omega \in \Omega : f(\omega) > t\},

μ({f>t})t1fdμ\mu(\{f > t\}) \leq t^{-1} \int f \mathrm{d}\mu
Proof

Noting that t1{f>t}ft\mathbf{1}_{\{f>t\}} \leq f, then by monotonicity of the integral,

tμ({f>t})=t1{f>t}dμfdμt\mu(\{f > t\}) = \int t\mathbf{1}_{\{f>t\}} \mathrm{d}\mu \leq \int f \mathrm{d}\mu

which proves the theorem.\square

There are two useful ways to use Markov’s inequality:

  1. Chebyshev’s Inequality: For ff measurable and mRm \in \mathbb{R}, μ({fm>t})=μ({(fm)2>t2})t2(fm)2dμP(XE[X]>t)Var(X)t2\mu(\{|f - m| > t\}) = \mu(\{(f - m)^2 > t^2\})\leq t^{-2} \int (f - m)^2 \mathrm{d}\mu \\ P(|X - \mathbb{E}[X]| > t) \leq \frac{\text{Var}(X)}{t^2}
  2. Chernoff’s Inequality: For ff measurable and mRm \in \mathbb{R}, μ({f>t})etmemfdμ\mu(\{f > t\}) \leq e^{-tm} \int e^{mf} \mathrm{d}\mu \\ In probability theory, the right hand side becomes the moment generating function or the Laplace transform.
Convex Functions on R\mathbb{R}

Let IRI \subset \mathbb{R} be an interval. A function ϕ:IR\phi : I \rightarrow \mathbb{R} is convex if for all t[0,1]t \in [0, 1] and all x,yIx, y \in I,

ϕ(tx+(1t)y)tϕ(x)+(1t)ϕ(y)\phi(tx + (1 − t)y) \leq t\phi(x) + (1 − t)\phi(y)
Jensen’s Inequality

Let (Ω,F,P)(\Omega, \mathcal{F}, P) be a probability space and XX an integrable random variable ( i.e. a measurable function in L1(Ω,F,P) L^1(\Omega, \mathcal{F}, P)~) such that X:ΩIRX : \Omega \rightarrow I \subset \mathbb{R}. For any convex ϕ:IR\phi : I \rightarrow \mathbb{R},

ϕ(XdP)ϕ(X)dP\phi(\int X \mathrm{d}P) \leq \int \phi(X) \mathrm{d}P

which is ϕ(E[X])E[ϕ(X)]\phi(\mathbb{E}[X]) \leq \mathbb{E}[\phi(X)]

Proof

For some cIc \in I, if X=cX = c, PP-a.e., then the result is immediate. Otherwise, let m=E[X]m = \mathbb{E}[X] be the mean of XX, which lies in the interior of interval. Then, we can choose a,bRa, b \in \mathbb{R} such that ϕ(x)ax+b\phi(x) \geq ax + b for all xIx \in I with equality at x=mx = m. Then, ϕ(X)aX+b\phi(X) \geq aX + b, and

ϕ(E[X])=am+b=E[aX+b]E[ϕ(X)]\phi(\mathbb{E}[X]) = am + b = \mathbb{E}[aX + b] \leq \mathbb{E}[\phi(X)]

Lastly, to check that E[ϕ(X)]\mathbb{E}[\phi(X)] is well defined (i.e. not \infty−\infty ), we note that ϕ=ϕ+ϕ\phi = \phi^+ −\phi^- where ϕ\phi^- is concave and ϕ(x)ax+b\phi^-(x) \leq |a||x| + |b|. Hence, E[ϕ(X)]aEx+b<\mathbb{E}[\phi(X)] \leq |a|\mathbb{E}|x| + |b| < \infty. \square

Hölder and Minkowski’s Inequalities#

Hölder’s Inequality

Let p,q[1,]p, q \in [1, \infty] be conjugate indices ( i.e. 1p+1q=1\frac{1}{p} + \frac{1}{q} = 1 ) and ff and gg be measurable, then fg1fpgq||fg||_1 \leq ||f||_p||g||_q

Proof

If either fp=0,||f||_p = 0, \infty or gq=0,||g||_q = 0, \infty, then the result is immediate. Hence, for ff such that 0<fp<0 < ||f||_p < \infty, we can normalize and without loss of generality assume that fp=1||f||_p = 1. Then, we can define a probability measure PP on F\mathcal{F} such that for any AFA \in \mathcal{F},

P(A):=AfdμP(A) \vcentcolon= \int_A f\mathrm{d}\mu

Then, using Jensen’s Inequality with ϕ(x)=xq\phi(x) = x^q and q(p1)=pq(p − 1) = p,

fg1=fgdμ=gfp11f>0fpdμprob measure dν[(gfp11f>0)qfpdμ]1q[gqdμ]1q=fpgq\begin{aligned} ||fg||_1 &= \int |fg| \mathrm{d}\mu = \int \frac{|g|}{|f|^{p - 1}}\mathbf{1}_{|f| > 0}\underbrace{|f|^p\mathrm{d}\mu}_{\color{#b984df}\text{prob measure }\mathrm{d}\nu} \\ &\leq \left[\int \left(\frac{|g|}{|f|^{p-1}}\mathbf{1}_{|f| > 0}\right)^q|f|^p\mathrm{d}\mu\right]^{\frac{1}{q}} \\ &\leq \left[\int |g|^q\mathrm{d}\mu\right]^\frac{1}{q} = ||f||_p||g||_q \end{aligned}

From Hölder’s inequality with p=q=2p=q=2, we can derive Cauchy-Schwarz Inequality:

Cauchy-Schwarz Inequality

For measurable ff and gg, fg1f2g2||fg||_1\leq||f||_2||g||_2

Minkowski’s Inequality

Let p[1,]p \in [1, \infty] and ff, gg be measurable functions, then f+gpfp+gp||f+g||_p\leq ||f||_p + ||g||_p

Proof

If either fp=||f||_p = \infty or gp=||g||_p = \infty or f+gp=0||f+g||_p=0, then we are done. If p=1p=1, then (f+g)(ω)f(ω)+g(ω)|(f + g)(\omega)| \leq |f(\omega)| + |g(\omega)| and the result follows quickly. For p(1,]p\in (1, \infty] and p,qp, q conjugates, we note that

f+gp1q=[f+g(p1)qdμ]1q=[f+gpdμ]p1p=f+gpp1|||f+g|^{p-1}||_q = \left[\int |f+g|^{(p-1)q}\mathrm{d}\mu\right]^\frac{1}{q} = \left[\int|f+g|^p\mathrm{d}\mu\right]^{\frac{p-1}{p}} = ||f + g||^{p-1}_p

Then using the above equality, we have

f+gpp=f+gpdμff+gp1dμ+gf+gp1dμfpf+gp1q+gpf+gp1q=(fp+gp)f+gpp1\begin{aligned} ||f+g||^p_p &= \int |f+g|^p\mathrm{d}\mu \leq \int |f||f+g|^{p-1}\mathrm{d}\mu + \int |g||f+g|^{p-1}\mathrm{d}\mu \\ &\leq ||f||_p|||f+g|^{p-1}||_q + ||g||_p|||f+g|^{p-1}||_q \\ &= (||f||_p + ||g||_p)||f+g||^{p-1}_p \end{aligned}

Divide both sides by f+gpp1||f + g||^{p−1}_p finishes the proof. \square

LpL_p Approximation Theorem

Let (Ω,F,μ(\Omega, \mathcal{F}, \mu) be a measure space, and let A\mathcal{A} be a π\pi-system such that σ(A)=F\sigma(A) = \mathcal{F} and μ(A)<\mu(A) < \infty for all AAA ∈ \mathcal{A} and there exists AiΩ,AiAA_i \uparrow \Omega, A_i \in \mathcal{A}. Let the collection of simple functions be

V0:={i=1nai1Ai:aiR,AiA,nN}V_0 \vcentcolon= \left\{\sum_{i=1}^n a_i\mathbf{1}_{A_i} : a_i \in \mathbb{R}, A_i\in\mathcal{A}, n\in\mathbb{N}\right\}

For p[1,),V0Lpp\in[1,\infty), V_0\subset L_p and for all fLpf\in L^p and all ϵ>0\epsilon > 0, there exists a vV0v\in V_0 such that fvp<ϵ||f-v||_p < \epsilon.

Proof

For any AA,1Ap=(1Adμ)1p=μ(A)1p<A \in \mathcal{A}, ||\mathbf{1}_A||_p = (\int\mathbf{1}_A\mathrm{d}\mu)^\frac{1}{p} = \mu(A)^\frac{1}{p} < \infty. Thus 1ALp\mathbf{1}_A\in L^p for all AAA\in\mathcal{A}. Since LpL^p is a linear space, V0LpV_0 \subset L^p.

Next, let VLpV\subseteq L^p be all FLp\mathcal{F}\in L^p that can be approximated as above by some vV0v\in V_0 and ϵ>0\epsilon > 0. Let f,gVf,g\in V be approximated by vf,vgv_f, v_g then by Minkowski’s Inequality,

f+g(vf+vg)pfvfp+gvgp<2ϵ||f+g - (v_f + v_g)||_p \leq ||f - v_f||_p + ||g - v_g||_p < 2\epsilon

Hence, VV is also a linear space.

Now, assume ΩA\Omega\in\mathcal{A} ( i.e. μ(Ω)<\mu(\Omega) < \infty ). Let L={BΩ:1BV}\mathcal{L} = \{B \in \Omega : \mathbf{1}_B\in V\}, which we will show is, in fact, a λ\lambda-system. We know that ALA \subset \mathcal{L} and thus ΩL\Omega \in \mathcal{L}. For A,BLA, B \in\mathcal{L} such that ABA\subseteq B then 1BA=1B1AV\mathbf{1}_{B\setminus A} = \mathbf{1}_B - \mathbf{1}_A \in V since VV is linear, so BALB\setminus A\in\mathcal{L}. Lastly, for {Ai}i=1\{A_i\}_{i=1}^\infty pairwise disjoint with AiLA_i \in \mathcal{L}, let A=i=1AiA = \bigcup_{i=1}^\infty A_i and Bj=i=1jAiB_j = \bigcup_{i=1}^j A_i. Then, BjAB_j \uparrow A and 1A1Bjp=μ(ABj)1p0||\mathbf{1}_A − \mathbf{1}_{B_j}||_p = \mu(A \setminus B_j)^\frac{1}{p} \rightarrow 0. Therefore, ALA \in \mathcal{L}, and thus L\mathcal{L} is a λ\lambda-system. By Dynkin π\pi-λ\lambda Theorem, FL\mathcal{F}\subset\mathcal{L} and thus 1BV\mathbf{1}_B\in V for any BFB\in \mathcal{F}. Therefore, for any non-negative fLpf\in L^p, we can construct simple functions fn=min{n,2n2nf}f_n = \min\{n, 2^{-n}\lfloor 2^n\rfloor f\} such that fnff_n\uparrow f. Then ffnp0|f - f_n|^p\rightarrow 0 pointwise and ffnpfp|f - f_n|^p \leq |f|^p. Hence, by Dominated Convergence Theorem, ffnp0||f - f_n||^p\rightarrow 0. Thus fVf \in V and by the linearity of VV , V=LpV = L^p.

Lastly, for general Ω\Omega, we have by assumption a sequence AiΩA_i \uparrow \Omega. Hence, for any fLpf \in L^p, we have that f1AiVf\mathbf{1}_{A_i} \in V and similarly to above, ff1Aip0|f − f\mathbf{1}_{A_i}|^p \rightarrow 0 pointwise and ff1Aipfp|f − f\mathbf{1}_{A_i}|^p \leq |f|^p. Therefore, ff1Aip0||f − f\mathbf{1}_{A_i}||_p \rightarrow 0 by dominiated convergence. Thus, fVf \in V. \square

Convergence in Probability & Measure#

Convergence of Measure#

Weak Convergence of Measure

Let SS be a metric space and S\mathcal{S} be the Borel σ\sigma-Field on SS. Then, for a measure PP and a sequence {Pi}i=1\{P_i\}_{i=1}^\infty, we say that PiP_i converges weakly to PP, i.e. PiPP_i \Rightarrow P, if

fdPifdP\int f \mathrm{d}P_i \rightarrow \int f \mathrm{d}P

for all fCB0(R)f\in \mathscr{C}^0_B(\mathbb{R}), all continuous bounded real-valued functions on SS.

Portmanteau Theorem

For PP and PiP_i on a metric space (S,S)(S, \mathcal{S}), the following are equivalent:

  1. PiPP_i \Rightarrow P
  2. fdPifdP\int f\mathrm{d}P_i\rightarrow\int f\mathrm{d}P for all bounded uniformly continuous functions ff
  3. lim supiPi(C)P(C)\limsup_i P_i(C) \leq P(C) for all closed sets CC
  4. lim infiPi(U)P(U)\liminf_i P_i(U) \geq P(U) for all open sets UU
  5. limiPi(A)=P(A)\lim_i P_i(A) = P(A) for all sets ASA\in\mathcal{S} with P(A)=0P(\partial A) = 0
Proof
  • (1)\rightarrow(2) If convergence holds for every fCB0f \in \mathscr{C}^0_B then is certainly hold for all bounded uniformly continous ff.

  • (2)\rightarrow(3) For any CC closed and ϵ>0\epsilon > 0 there exists a δ>0\delta > 0 such that for Cδ={xS:d(x,C)<δ}C_δ = \{x \in S : d(x, C) < \delta\}, we have P(Cδ)<P(C)+ϵP(C_\delta) < P(C) + \epsilon as CδCC_\delta \downarrow C as δ0+\delta\rightarrow 0^+. Then, we can define an ff such that f=1f = 1 on CC, f=0f = 0 on SCδS \setminus C_\delta. Then ff is uniformly continuous (by Urysohn’s Lemma) and 0f10 \leq f \leq 1 Then, by (2), we have that

    Pi(C)fdPifdPP(Cδ)<P(C)+ϵP_i(C) \leq \int f\mathrm{d}P_i\rightarrow\int f\mathrm{d}P \leq P(C_\delta) < P(C) + \epsilon

    Thus, taking the lim sup\limsup and ϵ\epsilon to zero gives lim supiPi(C)P(C)\limsup_i P_i(C) \leq P(C)

  • (3)\rightarrow(1) Let fCB0(R)f \in \mathscr{C}^0_B(\mathbb{R}). Our goal is to show that lim supifdPifdP\limsup_i\int f \mathrm{d}P_i \leq\int f \mathrm{d}P and similarly for lim inf\liminf to show (1) holds. As ff is bounded, we can shift and scale it, and without loss of generality, we assume that 0<f<10 < f < 1. Then, for any choice of nNn \in \mathbb{N}, we define nested closed sets Cj={xS:f(x)j/n}C_j = \{x\in S : f(x)\geq j/n\} for all j=1,2,,nj= 1, 2, \ldots, n and cut ff into pieces to get

    j=1nj1nP(Cj1Cj)fdPj=1njnP(Cj1Cj)\sum_{j=1}^n\frac{j-1}{n}P(C_{j-1}\setminus C_j) \leq \int f\mathrm{d}P \leq \sum_{j=1}^n\frac{j}{n}P(C_{j-1}\setminus C_j)

    Also, P(Cj1Cj)=P(Cj1)P(Cj)P(C_{j-1}\setminus C_j) = P(C_{j-1}) - P(C_j), the above becomes

    1nj=1nP(Cj)fdP1n+1nj=1nP(Cj)\frac{1}{n}\sum_{j=1}^n P(C_j) \leq \int f\mathrm{d}P \leq \frac{1}{n} + \frac{1}{n}\sum_{j=1}^n P(C_{j})

    Thus,

    lim supifdPi1n+1nj=1nlim supiPi(Cj)1n+1nj=1nP(Cj)1n+fdP\limsup_i \int f\mathrm{d}P_i \leq \frac{1}{n} + \frac{1}{n}\sum_{j=1}^n \limsup_i P_i(C_j) \leq \frac{1}{n} + \frac{1}{n}\sum_{j=1}^n P(C_j) \leq \frac{1}{n} + \int f\mathrm{d}P

    Taking nn \rightarrow \infty gives lim supifdPifdP\limsup_i \int f \mathrm{d}P_i \leq \int f \mathrm{d}P. Replacing ff with f−f gives lim infifdPifdP\liminf_i f \mathrm{d}P_i \geq \int f \mathrm{d}P. Thus the lim sup\limsup and lim inf\liminf coincide proving that (3)\rightarrow(1).

  • (3)\Leftrightarrow(4) Let UU be the complement of CC. Then, P(U)=1P(C)P(U) = 1 - P(C) Thus, lim supiPi(C)P(C)\limsup_i P_i(C) \leq P(C) is equivalent to lim infiPi(U)P(U)\liminf_i P_i(U) \geq P(U).

  • (3)(4)\Leftrightarrow(5) For any set ASA\in\mathcal{S} with P(A)=0P(\partial A) = 0, AA^\circ is an open set and Aˉ\bar{A} is a closed set. If (3) and (4) hold, we have that

    P(A)lim infiPi(A)lim infiPi(A)lim supiPi(Aˉ)lim supiPi(Aˉ)P(Aˉ)P(A^\circ) \leq \liminf_i P_i(A^\circ) \leq \liminf_i P_i(A) \leq \limsup_i P_i(\bar{A}) \leq \limsup_i P_i(\bar{A}) \leq P(\bar{A})

    Since P(A)=0P(\partial A)=0, we have that P(A)=P(Aˉ)P(A^\circ) = P(\bar{A}). This is equivalent to limiPi(A)=P(A)\lim_i P_i(A) = P(A), which is (5). \square

Convergence of Random Variables#

In contrast to convergence of measures, let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a probability space and (S,S)(S, \mathcal{S}) be a metric space with Borel sets as above. Then, for a random variable (i.e. measurable function) X:ΩSX : \Omega → S, we can define a probability measure

P(A):=μ(X1(A))   ASP(A) \vcentcolon= \mu(X^{-1}(A)) ~~~ A\in\mathcal{S}

This is the distribution of XX . Then the expectation of a random variable can be written in multiple ways due to change of variables:

E[X]=ΩX(ω)dμ(ω)=SxdP(x)E[X] = \int_\Omega X(\omega) \mathrm{d}\mu(\omega) = \int_S x \mathrm{d}P(x)

Note that above Portmanteau Theorem can be rephrased for random variables as well.

Convergence in Distribution

For a sequence of random variables {Xi}i=1\{X_i\}_{i=1}^\infty, we say that XiX_i converges to XX in distribution ( denoted XidXX_i\overset{d}{\longrightarrow}X ) if PiPP_i \Rightarrow P.

Convergence in Probability

For a sequence of random variables {Xi}i=1\{X_i\}_{i=1}^\infty, we say that XiX_i converges to XX in probabilty ( denoted XiPXX_i\overset{P}{\longrightarrow}X ) if for all ϵ>0\epsilon > 0

μ({ωΩ:d(Xi(ω),X(ω))>ϵ})0\mu(\{\omega \in \Omega : d(X_i(\omega), X(\omega)) > \epsilon\}) \rightarrow 0

In short, P(d(Xi,X)>ϵ)0P(d(X_i, X)>\epsilon)\rightarrow 0.

This means that the measure of the set of ω\omega where Xi(ω)X_i(\omega) and X(ω)X(\omega) differ by more than ϵ\epsilon goes to zero as ii \rightarrow \infty. Convergence in probability is closely connected to the metric dd on (S,S)(S, \mathcal{S}).

Convergence Almost Surely

For a sequence of random variables {Xi}i=1\{X_i\}_{i=1}^\infty, we say that XiX_i converges almost surely to XX ( denoted Xia.s.XX_i\overset{\text{a.s.}}{\longrightarrow}X ) if

μ({ωΩ:Xi(ω)X(ω)})=1\mu(\{\omega \in \Omega : X_i(\omega) \rightarrow X(\omega)\}) = 1

i.e. pointwise convergence almost everywhere.

Convergence in LpL^p

For a sequence of random variables {Xi}i=1\{X_i\}_{i=1}^\infty, we say that XiX_i converges to XX in LpL^p if

E[d(XiX)p]=d(Xi(ω),X(ω))pdμ(ω)0E[d(X_i - X)^p] = \int d(X_i(\omega), X(\omega))^p\mathrm{d}\mu(\omega) \rightarrow 0

Here, we can think of d(Xi(ω),X(ω))d(X_i(\omega), X(\omega)) as a function from Ω\Omega to R+\mathbb{R}^+. In the case that we have real valued random variables, i.e. S=RS = \mathbb{R}, then this is

XiXpdμ0\int |X_i - X|^p\mathrm{d}\mu \rightarrow 0

Hierarchy of Convergence Types#

  • Conv a.s. \Rightarrow conv in probability
  • Conv in probability \Rightarrow conv in distribution
  • For 1p<q1\leq p < q \leq \infty, conv in LpL^p \Rightarrow conv in LqL^q
  • For any p[1,]p\in [1, \infty], conv in LpL^p \Rightarrow conv in probability

Borel-Cantelli Lemmas#

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a probability space. For {Ai}i=1,AiF\{A_i\}_{i=1}^\infty, A_i\in \mathcal{F}, then we define

lim supiAi=i=1j>iAj   and   lim infiAi=i=1j>iAj\limsup_i A_i = \bigcap_{i=1}^\infty\bigcup_{j > i} A_j ~~~\text{and}~~~ \liminf_i A_i = \bigcup_{i=1}^\infty\bigcap_{j > i} A_j

The set lim supiAi\limsup_i A_i is sometimes referred to as AiA_i infinitely often or AiA_i i.o. This is because ωlim supiAi\omega \in \limsup_i A_i implies that for any NNN \in \mathbb{N} there exists an n>Nn > N such that ωAn\omega\in A_n. Similarly, some write AiA_i eventually or AiA_i ev. for lim infiAi\liminf_i A_i. This is because for ωlim infiAi\omega \in \liminf_i A_i then there exists an NN large enough such that ωAn\omega \in A_n for all nNn \geq N.

1st Borel-Cantelli Lemma

Let {Ai}i=1\{A_i\}_{i=1}^\infty with AiFA_i \in \mathcal{F}. If i=1μ(Ai)<\sum_{i=1}^\infty \mu(A_i) < \infty, then μ(lim supiAi)=0\mu(\limsup_i A_i) = 0.

Proof

As the summation converges, then the tail sum has to tend to zero, we have simply that

μ(lim supiAi)=μ(i=1j>iAj)μ(j>iAj)j>iμ(Aj)0\mu(\limsup_i A_i) = \mu\left(\bigcap_{i=1}^\infty\bigcup_{j > i} A_j\right) \leq \mu\left(\bigcup_{j > i} A_j\right) \leq \sum_{j > i} \mu(A_j) \rightarrow 0

as ii\rightarrow \infty. \square

2nd Borel-Cantelli Lemma

Let {Ai}i=1\{A_i\}_{i=1}^\infty be an independent collection with AiFA_i \in \mathcal{F}. If i=1μ(Ai)=\sum_{i=1}^\infty \mu(A_i) = \infty, then μ(lim supiAi)=1\mu(\limsup_i A_i) = 1.

Proof

Note that 1tet1 − t \leq e^{−t} for all tRt \in \mathbb{R} and we have that the independence of the {Ai}i=1\{A_i\}_{i=1}^\infty implies the independence {Aic}i=1\{A^c_i\}_{i=1}^\infty. Therefore, for any iNi \in \mathbb{N} and kik \geq i,

μ(j=ikAjc)=j=ik(1μ(Aj))exp(j=ikμ(Aj))\mu(\bigcap_{j=i}^k A^c_j) = \prod_{j=i}^k (1 - \mu(A_j)) \leq \exp\left(-\sum_{j=i}^k \mu(A_j)\right)

Taking kk \rightarrow \infty takes the right hand side to zero. Hence, μ(j>iAjc)=0\mu(\cap_{j>i} A^c_j) = 0 for all ii. Thus,

μ(lim supiAi)=μ(i=1j>iAj)=1μ(i=1j>iAjc)=1\mu(\limsup_i A_i) = \mu\left(\bigcap_{i=1}^\infty\bigcup_{j > i} A_j\right) = 1 - \mu\left(\bigcup_{i=1}^\infty \bigcap _{j > i} A^c_j\right) = 1

which is the desired result. \square

Law of Large Numbers#

Let {Xi}i=1\{X_i\}_{i=1}^\infty be random variables from (Ω,F,P)(\Omega, \mathcal{F}, P) to (R,B)(\mathbb{R}, \mathcal{B}). Hence, for any ABA \in \mathcal{B}, we write

P(XA):=P({ωΩ:X(ω)A}).P(X \in A)\vcentcolon= P(\{\omega \in \Omega : X(\omega) \in A\}).

and E[X]=X(ω)dPE[X] = \int X(\omega)\mathrm{d}P. Furthermore, we define the partial sum Sn=i=1nXiS_n = \sum^n_{i=1} X_i, which is also a measurable random variable.

Independence

For random variables XX and YY on the same probability space (Ω,F,P)(\Omega, \mathcal{F}, P) but possibly with different codomains, (X,X)(\mathbb{X}, \mathcal{X}) and (Y,Y)(Y, \mathcal{Y}) respectively, we say that XX and YY are independent if

P({XA}{YB})=P(XA)P(YB)P(\{X \in A\}\cap \{Y \in B\}) = P(X \in A)P(Y \in B)

for all AXA \in \mathcal{X} and BYB \in \mathcal{Y}.

This definition can be extended to a finite collection of random variables {Xi}i=1n\{X_i\}^n_{i=1} implying

P(i=1n{XiAi})=i=1nP(XiAi)P(\bigcap_{i=1}^n \{X_i \in A_i\}) = \prod^n_{i=1} P(X_i \in A_i)

for all AiA_i. We say that an infinite collection of random variables is independent if all finite collections are independent.

Note that since {XA}\{X \in A\} is shorthand for {ωΩ:X(ω)A}=X1(A)\{\omega \in \Omega : X(\omega) \in A\} = X^{−1}(A), random variables XX and YY are independent if and only if the σ\sigma-fields σ(X)\sigma(X) and σ(Y)\sigma(Y) are independent ( defined in here ).

Identically Distributed

For X:ΩRX : \Omega → \mathbb{R}, the distribution of XX is the measure induced by XX on R\mathbb{R}, i.e., PX1(A)P\circ X^{-1}(A) for ABA\in \mathcal{B}. We say that XX and YY are identically distributed if PX1P \circ X^{−1} and PY1P \circ Y^{-1} coincide almost surely.

Weak Law of Large Numbers#

Weak Law of Large Numbers

Let (Ω,F,P)(\Omega, \mathcal{F}, P) be a probability space and {Xi}i=1\{X_i\}_{i=1}^\infty be random variables (measurable functions) from Ω\Omega to R\mathbb{R} such that E[Xi]=cRE[X_i] = c \in \mathbb{R} and E[Xi2]=1E[X_i^2]=1 for all ii and E[(Xic)(Xjc)]=0E[(X_i − c)(X_j − c)] = 0 for all iji \neq j. Then SnnPc\frac{S_n}{n} \overset{P}{\longrightarrow} c.

Proof

Without loss of generality, we assume c=0c = 0. Otherwise, we can replace XiX_i with XicX_i − c. Then, for any t>0t > 0, Chebyshev’s inequality implies that

P(Snnt)E[Sn2]n2t2=1n2t2E[(i=1nXi)2]=1n2t2E[i=1nXi2]=1nt20P(\frac{|S_n|}{n} \geq t) \leq \frac{E[S_n^2]}{n^2t^2} = \frac{1}{n^2t^2}E[(\sum^n_{i=1} X_i)^2] = \frac{1}{n^2t^2}E[\sum^n_{i=1} X_i^2] = \frac{1}{nt^2} \to 0

as nn \to \infty. \square

Note that in the above proof, we only require that the XiX_i be uncorrelated (i.e. E[(Xic)(Xjc)]=0E[(X_i −c)(X_j −c)] = 0 ) and not independent. In the next theorem, we require independence, but remove all second moment conditions.

Strong Law of Large Numbers#

Strong Law of Large Numbers

Let {Xi}i=1\{X_i\}_{i=1}^\infty be i.i.d. random variables from Ω\Omega to R\mathbb{R}.

  1. If E[Xi]=E[|X_i|] = \infty, then Snn\frac{S_n}{n} does not converge to any finite value.
  2. If E[Xi]<E[|X_i|] < \infty, then Snna.s.c\frac{S_n}{n} \overset{a.s.}{\longrightarrow} c for c=E[Xi]c = E[X_i].
Proof
  1. Assume that SnncR\frac{S_n}{n} \longrightarrow c \in \mathbb{R} but also that E[Xi]=E[|X_i|] = \infty, and note that Xnn=SnSn1n0\frac{X_n}{n} = \frac{S_n − S_{n−1}}{n} \to 0. Since E[Xi]=E[|X_i|] = \infty, then n=0P(Xi>n)=\sum_{n=0}^\infty P(X_i > n) = \infty and 2nd Borel-Cantelli Lemma says that Xn>n|X_n| > n for infinitely many nn. Thus

    P({ωΩ:SnSn1n0})=0P(\{\omega\in\Omega : \frac{S_n - S_{n-1}}{n}\to 0\}) = 0

    Thus SnncR\frac{S_n}{n}\nrightarrow c \in \mathbb{R}.

  2. Assume that E[Xi]=cRE[X_i] = c \in \mathbb{R}. Without loss of generality, we assume Xi0X_i \geq 0 for all ii. Otherwise, we can write X=X+XX = X^+ − X^− and independence of XX and YY implies independence for X+X^+ and Y+Y^+. Also, we use FF to denote the distribution of XX, i.e. F(x)=P(Xx)F(x) = P(X \leq x).

    We define Yi=Xi1XiiY_i = X_i\mathbf{1}_{X_i\leq i} and Tn=i=1nYiT_n = \sum_{i=1}^n Y_i. For any δ>1\delta > 1, we can define a non-decreasing integer sequence kn=δnk_n = \lfloor \delta^n \rfloor. Then 1knδn<kn+12kn1\leq k_n\leq \delta^n < k_n + 1 \leq 2k_n and kn24δ2nk_n^{-2}\leq 4\delta^{-2n}. Therefore,

    n=1kn21kni4n=1δ2n1δni=4i2(1δ2)c0i2(1)\sum_{n=1}^\infty k_n^{-2}\mathbf{1}_{k_n \geq i}\leq 4\sum_{n = 1}^\infty \delta^{-2n}\mathbf{1}_{\delta^n\geq i} = \frac{4}{i^2(1 - \delta^{-2})}\leq c_0 i^{-2} \tag{1}

    for some constant c0c_0. We also note that i=k+1i2<kx2dx=1k\sum_{i=k+1}^\infty i^{−2} < \int_k^\infty x^{-2}\mathrm{d}x = \frac{1}{k}. By Chebyshev’s inequality, t>0\forall t > 0,  c1\exists~ c_1 depending on tt and δ\delta such that

    n=1P(TknE[Tkn]tkn)c1n=1kn2Var(Tkn)=c1n=1kn2i=1knVar(Yi)c1n=1Var(Yi)knikn2c2n=1i2Var(Yi)   by (1)=c2i=1i20ix2dF(x)=c2i=1i2{k=0i1kk+1x2dF(x)}c3k=01k+1kk+1x2dF(x)c3k=0kk+1xdF(x)=c3E[Xi]<\begin{aligned} \sum_{n=1}^\infty P(|T_{k_n} - E[T_{k_n}]| \geq tk_n) &\leq c_1\sum_{n=1}^\infty k_n^{-2}\text{Var}(T_{k_n}) \\ &= c_1\sum_{n=1}^\infty k_n^{-2}\sum_{i=1}^{k_n}\text{Var}(Y_i) \\ &\leq c_1\sum_{n=1}^\infty \text{Var}(Y_i)\sum_{k_n\geq i}k_n^{-2} \\ &\leq c_2\sum_{n=1}^\infty i^{-2}\text{Var}(Y_i) ~~~\text{by (1)}\\ &= c_2\sum_{i=1}^\infty i^{-2}\int_0^i x^2\mathrm{d}F(x) \\ &= c_2\sum_{i=1}^\infty i^{-2}\left\{\sum_{k=0}^{i-1}\int_k^{k+1}x^2\mathrm{d}F(x)\right\} \\ &\leq c_3\sum_{k=0}^\infty \frac{1}{k+1}\int_k^{k+1}x^2\mathrm{d}F(x) \\ &\leq c_3\sum_{k=0}^\infty \int_k^{k+1}x\mathrm{d}F(x) = c_3E[X_i] < \infty \end{aligned}

    And thus n=1P(TknE[Tkn]tkn)<\sum_{n=1}^\infty P(|T_{k_n} - E[T_{k_n}]| \geq tk_n) < \infty. Hence, by 1st Borel-Cantelli Lemma, n1(TknE[Tkn])a.s.0n^{-1}(|T_{k_n} - E[T_{k_n}])\overset{a.s.}\longrightarrow0. Since E[Yn]E[Xi]E[Y_n]\uparrow E[X_i], we have that kn1E[Tkn]E[Xi]{k_n}^{-1}E[T_{k_n}]\uparrow E[X_i] and in turn that n1Tkna.s.E[Xi]n^{-1}T_{k_n}\overset{a.s.}\longrightarrow E[X_i].

    To get back to XiX_i and SnS_n, we note that E[X]<E[X] <\infty if and only if i=0P(X>i)<\sum_{i=0}^\infty P (X > i) < \infty. Thus i=1P(XiYi)=i=1P(Xi>i)<\sum_{i=1}^\infty P (X_i \neq Y_i) = \sum_{i=1}^\infty P (X_i > i) < \infty and 21stnd Borel-Cantelli Lemma says that P(lim sup{XiYi})=0P (\limsup\{X_i \neq Y_i\}) = 0 so for ii large enough, Xi=YiX_i = Y_i a.s. We define “large enough” to be i>m(ω)i > m(\omega) ( i.e. Xi(ω)=Yi(ω)  i>m(ω)X_i(\omega) = Y_i(\omega) ~~ \forall i > m(\omega) ) Furthermore, kn1Sm(ω)0k_n^{−1} S_{m(\omega)} \to 0 and kn1Tm(ω)0k_n^{−1}T_{m(\omega)} \to 0 as nn \to \infty, meaning that the contribution of the terms where XiX_i and YiY_i may not coincide becomes negligible. Hence, kn1Skna.s.E[Xi]k_n^{-1}S_{k_n}\overset{a.s.}\longrightarrow E[X_i], so we have almost sure convergence of a subsequence.

    Finally, since kn+1/knδk_{n+1}/k_n \to \delta, there exists an nn large enough such that 1kn+1/kn<δ21 \leq k_{n+1}/k_n < \delta^2. Thus, for kn<i<kn+1k_n < i < k_{n+1},

    kn1Sknδ2Siiδ4kn+11Skn+1δ2E[Xi]lim supnSiilim supiSiiδ2E[Xi]k_n^{-1}S_{k_n}\leq \delta^2\frac{S_i}{i}\leq \delta^4 k_{n+1}^{-1}S_{k_{n+1}} \\ \delta^{-2}E[X_i] \leq \limsup_{n\to\infty}\frac{S_i}{i} \leq \limsup_{i\to\infty}\frac{S_i}{i} \leq \delta^2 E[X_i]

    Thus, taking δ1\delta\to 1 concludes the proof. \square

Central Limit Theorem#

Gaussian Measure on R\mathbb{R}

A Borel measure γ\gamma on (R,B)(\mathbb{R}, \mathcal{B}) is said to be Gaussian with mean mm and variance σ\sigma if

γ((a,b])=1σ2πabe(xm)2/2σ2dλ(x)\gamma((a, b]) = \frac{1}{\sigma\sqrt{2\pi}}\int_a^b e^{-(x-m)^2/2\sigma^2}\mathrm{d}\lambda(x)
Gaussian Measure on Rd\mathbb{R}^d

A Borel measure γ\gamma on (Rd,B)(\mathbb{R}^d, \mathcal{B}) is said to be Gaussian if for all linear functionals f:RdRf : \mathbb{R}^d \rightarrow \mathbb{R}, the induced measure γf1\gamma \circ f^{−1} on (R,B)(\mathbb{R}, \mathcal{B}) is Gaussian.

Gaussian Random Variable

A random variable ZZ from a probability space (Ω,F,μ)(\Omega, \mathcal{F}, \mu) to (Rd,B)(\mathbb{R}^d, \mathcal{B}) is said to be Gaussian if γ:=μZ1\gamma \vcentcolon= \mu \circ Z^{−1} is a Gaussian measure on(Rd,B)(\mathbb{R}^d, \mathcal{B}).

Characteristic Function

For a probability measure μ\mu on (Rd,B)(\mathbb{R}^d, \mathcal{B}), the characteristic function (Fourier transform) μ~\tilde{\mu} : \mathbb{R}^d \rightarrow \mathbb{C}$ is defined as

μ~(t):=exp{ix,t}dμ(x)\tilde{\mu}(t) \vcentcolon= \int \exp\{i\langle x, t\rangle\}\mathrm{d}\mu(x)

We can also invert the above transformation. That is, if μ~\tilde{\mu} is integrable with respect to Lebesgue measure on Rd\mathbb{R}^d, then

p(x)=(2π)dμ~(t)exp{ix,t}dλ(t)   λa.e.p(x) = (2\pi)^{-d}\int \tilde{\mu}(t)\exp\{-i\langle x, t\rangle\}\mathrm{d}\lambda(t)~~~\lambda-a.e.
Convolution

For two measures μ\mu and ν\nu on (Rd,B)(\mathbb{R}^d, \mathcal{B}), the convolution measure is defined as

(μν)(B):=ν(Bx)dμ(x)(\mu * \nu)(B) \vcentcolon= \int \nu(B − x)\mathrm{d}\mu(x)

for all BBB \in \mathcal{B} where Bx={yRd:y+xB}B - x = \{y \in \mathbb{R}^d : y +x \in B\}.

Note that the convolution operator * is commutative and associative. Also, for two independent random variables XX and YY with corresponding measures μ\mu and ν\nu, the measure of X+YX + Y is μν\mu * \nu.

Uniquesness of Characteristic Function Theorem

Let μ\mu and ν\nu be probability measures on (Rd,B)(\mathbb{R}^d, \mathcal{B}). If μ~=ν~\tilde{\mu} = \tilde{\nu} then μ=ν\mu = \nu.

Proof

Let γσ\gamma_\sigma be a mean zero Gaussian measure on Rd\mathbb{R}^d with variance σ2I\sigma^2I. We denote μ(σ)=μγσ\mu^{(\sigma)} = \mu * \gamma_\sigma and similarly for ν(σ)\nu^{(\sigma)}. It can be shown that the corresponding density functions for μ(σ)\mu^{(\sigma)} and ν(σ)\nu^{(\sigma)} are

p(σ)(x)=1(2π)dμ~(t)exp{ix,t12σ2t2}dλ(t)q(σ)(x)=1(2π)dν~(t)exp{ix,t12σ2t2}dλ(t)p^{(\sigma)}(x) = \frac{1}{(2\pi)^d}\int \tilde{\mu}(t)\exp\left\{-i\langle x, t\rangle - \frac{1}{2}\sigma^2|t|^2\right\}\mathrm{d}\lambda(t) \\ q^{(\sigma)}(x) = \frac{1}{(2\pi)^d}\int \tilde{\nu}(t)\exp\left\{-i\langle x, t\rangle - \frac{1}{2}\sigma^2|t|^2\right\}\mathrm{d}\lambda(t)

Since μ~=ν~\tilde{\mu} = \tilde{\nu}, we have that p(σ)=q(σ)p^{(\sigma)} = q^{(\sigma)} for all σ>0\sigma > 0.

Let XX be a random variable corresponding to μ\mu and ZZ to γ1\gamma_1. Then, the measure μ(σ)\mu^{(\sigma)} is paired with the random variable X+σZX + \sigma Z. Thus X+σZa.s.XX+\sigma Z\overset{a.s.}{\longrightarrow}X as σ0\sigma\downarrow 0, that is, pointwise for almost all ω\omega. Thus, this convergence holds in probability and thus in distribution, i.e μ(σ)μ\mu^{(\sigma)}\Rightarrow\mu as σ0\sigma\downarrow 0.

Lastly, we have that μ(σ)μ\mu^{(\sigma)} \Rightarrow \mu and ν(σ)ν\nu^{(\sigma)} \Rightarrow \nu. Since the limit is unique μ=ν\mu = \nu. \square

Central Limit Theorem

Let (Ω,F,P)(\Omega, \mathcal{F}, P) be a probability space, {Xi}i=1\{X_i\}_{i=1}^\infty be i.i.d. random variables on (Rd,B)(\mathbb{R}^d, \mathcal{B}) such that E[Xi]=0E[X_i] = 0 and E[Xi2]<E[|X_i|^2] < \infty. Let Sn=j=1nXjS_n = \sum_{j=1}^n X_j . Then, n1/2SndZn^{-1/2}S_n\overset{d}{\longrightarrow}Z where ZZ is a gaussian random variable with zero mean and covariance Σ\Sigma with jkjkth entry Σjk=E[XijXik]\Sigma_{jk} = E[X_{ij}X_{ik}].

Proof

As the random vectors XiX_i are mean zero and independent E[Xj,Xk]=0E[\langle X_j, X_k \rangle] = 0 for jkj\neq k. In turn, for any nn,

E[n1/2Sn2]=n1E[j,k=1nXj,Xk]=E[Xj2]E\left[|n^{-1/2}S_n|^2\right] = n^{-1}E\left[\sum_{j,k=1}^n\langle X_j, X_k \rangle\right] = E\left[|X_j|^2\right]

For any ϵ>0\epsilon > 0, there exists an Mϵ>0M_\epsilon > 0 such that E[Xj2]/Mϵ2<εE[|X_j|^2]/M_\epsilon^2 < ε. Thus, from Chebyshev’s inequality, we have that P(n1/2Sn>Mϵ)<ϵP(|n^{−1/2}S_n| > M_\epsilon) < \epsilon. This implies that the sequence n1/2Snn^{−1/2}S_n is “uniformly tight.

For a vector vRdv \in \mathbb{R}^d, the random variables Xj,v\langle X_j, v \rangle are i.i.d. real-valued with E[Xj,v]=0E[\langle X_j, v\rangle] = 0 and E[Xj,v2]<E[\langle X_j, v\rangle^2] < \infty. Let h(v):=E[exp(iXj,v)]h(v) \vcentcolon= E[\exp(i\langle X_j, v\rangle)] be the characteristic function of XjX_j .Then, h(0)=1h(0) = 1 and h(0)=0\nabla h(0)=0 and 2h(0)=Σ\nabla^2 h(0) = -\Sigma Thus, by Taylor’s Theorem, we have

h(v)=112vTΣv+o(v22)h(v) = 1 - \frac{1}{2}v^\mathrm{T}\Sigma v + o(||v||_2^2)

Thus, for any fixed vector vv,

E[exp{in1/2Sn,v}]=h(n1/2v)n=[1vTΣv2n+o(v22n)]nexp{12vTΣv}E\left[\exp\left\{i\langle n^{-1/2}S_n, v\rangle \right\}\right] = h(n^{-1/2}v)^n = \left[1 - \frac{v^\mathrm{T}\Sigma v}{2n} + o\left(\frac{||v||_2^2}{n}\right)\right]^n \to \exp\left\{-\frac{1}{2}v^\mathrm{T}\Sigma v\right\}

as nn\to\infty. Thus, by the Uniqueness of Characteristic Function Theorem, we have that n1/2SndZn^{−1/2}S_n\overset{d}{\longrightarrow}Z where ZZ is a Gaussian random variable with zero mean and covariance Σ\Sigma. \square

Erogodic Theorem#

Measure Preserving Map

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space. A mapping T:ΩΩT : \Omega \to \Omega is called measure preserving if

μ(T1(A))=μ(A)   AF\mu(T^{-1}(A)) = \mu(A)~~~ \forall A\in \mathcal{F}
Invariant Set and Function

For a mapping T:ΩΩT : \Omega\to\Omega,

  • A set AFA \in \mathcal{F} is TT-invariant if T1(A)=AT^{−1}(A) = A. The set of all TT-invariant sets forms a σ\sigma-field FT\mathcal{F}_T.

  • A measurable function f:ΩRf : \Omega\to\mathbb{R} is TT-invariant if f=fTf = f \circ T. ff is TT-invariant if and only if ff is FT\mathcal{F}_T-measurable, i.e. BB(R),f1(B)FT\forall B \in\mathcal{B}(\mathbb{R}), f^{-1}(B)\in \mathcal{F}_T

Ergodic Map

A mapping TT is said to be ergodic if for any AFTA \in \mathcal{F}_T,

μ(A)=0   or   μ(Ac)=0\mu(A) = 0 ~~~\text{or}~~~ \mu(A^c) = 0

Example

For Lebesgue measure on (0,1](0, 1], two examples of measure preserving maps are the shift map

T(x)=x+amod1T(x) = x + a \mod 1

and Baker’s Map

T(x)=2x2xT(x) = 2x - \lfloor 2x \rfloor

Furthermore, it can be shown that

  • If ff is integrable and TT is measure preserving then fTf \circ T is integrable and
fdμ=fTdμ\int f\mathrm{d}\mu = \int f\circ T\mathrm{d}\mu
  • If TT is ergodic and ff is invariant, then f=c f = c~µµ-a.e. for some constant cc.

Birkhoff and von Neumann’s Theorems#

In what follows, we let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space, TT be a measure preserving transformation, f:ΩRf : \Omega \rightarrow\mathbb{R} a measurable function, and

Sn=Sn(f)=f+fT++fTn1S_n = S_n(f) = f + f \circ T + \cdots + f \circ T^{n−1}

where S0=0S_0 = 0.

Maximal Ergodic Lemma

Let ff be integrable and SS^∗ = supn0Sn(f)\sup_{n\geq 0} S_n(f) (element-wise maximum). Then,

S>0fdμ0\int_{S^∗ > 0} f \mathrm{d}\mu \geq 0
Proof

Let Sn=max0mnSm(f)S^∗_n = \max_{0\leq m\leq n} S_m(f) and An={ωΩ:Sn(ω)>0}A_n = \{\omega \in \Omega : S^∗_n(ω) > 0\}. Then, for 1mn1 \leq m \leq n,

Sm=f+Sm1Tf+SnTS_m = f + S_{m - 1}\circ T \leq f + S^*_n \circ T

Furthermore, on the set AnA_n,

Sn=max0mnSm(f)f+SnTS_n^* = \max_{0\leq m\leq n} S_m(f) \leq f + S^*_n \circ T

On the set AncA_n^c, Sn=0S^∗_n = 0 since S0=0S_0=0 and we have that Sn=0SnTS^∗_n = 0 \leq S^∗_n \circ T Thus, integrating both sides of the above gives

ΩSndμAnfdμ+AnSnTdμ\int_{\Omega} S^∗_n \mathrm{d}\mu \leq \int_{A_n} f \mathrm{d}\mu + \int_{A_n} S^∗_n \circ T \mathrm{d}\mu

Since SnS^∗_n is integrable and TT is measure preserving, Sndμ=SnTdμ<\int S^∗_n\mathrm{d}\mu = \int S^*n \circ T \mathrm{d}\mu < \infty. Thus, we have Anfdμ0\int_{A_n} f\mathrm{d}\mu \geq 0. As n,An{Sn>0}n \rightarrow \infty, A_n\uparrow \{S_n^* > 0\}, we have that

Sn>0fdμ0=limn0Anfdμ0\int_S{n^* > 0} f \mathrm{d}\mu \geq 0 = \lim_{n \rightarrow 0}\int_{A_n} f \mathrm{d}\mu \geq 0

due to Dominated Convergencee Theorem with f|f| as the dominating function. \square

Birkhoff’s Ergodic Theorem

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a σ\sigma-finite measure space and fL1(Ω,F,μ)f \in L^1(\Omega, \mathcal{F}, \mu). Then, there exists an TT-invariant fˉL(Ω,F,μ)\bar{f} \in L(\Omega, \mathcal{F}, \mu) such that

fˉdμfdμ\int |\bar{f}| \mathrm{d}\mu \leq \int |f| \mathrm{d}\mu

and n1Sn(f)fˉn^{−1}S_n(f) \rightarrow \bar{f} as nn \rightarrow \infty μ\mu-a.e.

Proof

Both lim infnn1Sn(f)\liminf_{n\rightarrow \infty}n^{-1}S_n(f) and lim supnn1Sn(f)\limsup_{n\rightarrow \infty} n^{−1}S_n(f) are TT-invariant. Indeed, n1Sn(f)T=n1[Sn+1(f)f]=[(n+1)/n](n+1)1Sn+1(f)n1fn^{-1}S_n(f)\circ T = n^{-1}[S_{n + 1}(f) - f] = [(n + 1) / n](n + 1)^{-1}S_{n + 1}(f) - n^{-1}f Thus, we can define a set for a<ba < b

Da,b={ωΩ:lim infnSn(f)(ω)n<a<b<lim supnSn(f)(ω)n}D_{a, b} = \left\{\omega\in\Omega : \liminf_{n\rightarrow\infty}\frac{S_n(f)(\omega)}{n} < a < b < \limsup_{n\rightarrow\infty}\frac{S_n(f)(\omega)}{n}\right\}

which means that the lim inf\liminf and lim sup\limsup are separated, and this set Da,bD_{a,b} is TT-invariant. The goal of the proof is to show that μ(Da,b)=0\mu(D_{a,b}) = 0. Without loss of generality, we take b>0b > 0. Otherwise, a<0a < 0 and we multiply everything by 1-1.

For some BFB \in F such that μ(B)<\mu(B) < \infty, then we set g=fb1Bg = f − b\mathbf{1}B. Function gg is integrable and for each xDa,bx \in D_{a,b}, there is an nn such that Sn(g)(x)Sn(f)(x)nb0S_n(g)(x) \geq S_n(f)(x) − nb \geq 0 since b<lim supn0n1Sn(f)b < \limsup_{n\rightarrow 0} n^{−1}S_n(f). Thus, S(g)>0S^∗(g) > 0 and the Maximal Ergodic Lemma says that

0D(fb1B)dμ=Dfdμbμ(B)0\leq\int_D (f - b\mathbf{1}_B)\mathrm{d}\mu = \int_D f\mathrm{d}\mu - b\mu(B)

As μ\mu is σ\sigma-finite, there exist such a sequence of sets BnFB_n \in F such that BnDa,bB_n \uparrow D_{a,b} and μ(Bn)<\mu(B_n) < \infty for all nn. Thus,

bμ(Da,b)=limnbμ(Bn)Da,bfdμb\mu(D_{a, b}) = \lim_{n\rightarrow\infty}b\mu(B_n) \leq \int_{D_{a, b}} f\mathrm{d}\mu

This implies that μ(Da,b)<\mu(D_{a,b}) < \infty. Redoing the above argument for a−a and f−f results in aμ(Da,b)Da,b(f)dμ-a\mu(D_{a,b})\leq \int_{D_{a,b}}(-f)\mathrm{d}\mu. Therefore,

bμ(Da,b)Da,bfdμaμ(Da,b)b\mu(D_{a,b}) \leq \int_{D_{a,b}} f\mathrm{d}\mu \leq -a\mu(D_{a,b})

and since a<ba < b, we have that μ(Da,b)=0\mu(D_{a,b}) = 0.

Next, let

E={ωΩ:lim infnn1Sn(f)}<lim supnn1Sn(f)E = \{\omega\in\Omega : \liminf_{n\rightarrow\infty}n^{-1}S_n(f)\} < \limsup_{n\rightarrow \infty}n^{-1}S_n(f)

Then, EE is TT-invariant as the lim inf\liminf and lim sup\limsup are. Furthermore, E=a,b{a,bQ,a<b,Da,b}E = \bigcup_{a,b}\in\{a, b\in\mathbb{Q},a<b, D_{a,b}\}. Thus, μ(E)=0\mu(E) = 0.

This means that n1Sn(f)n^{−1}S_n(f) converges in [,][-\infty, \infty] on EcE^c. Therefore, we define

fˉ={limnn1Sn(f) ωEc ωE\bar{f} = \begin{cases} \lim_{n\rightarrow\infty}n^{-1}S_n(f) &~ \omega\in E^c \\ -\infty &~ \omega\in E \end{cases}

Lastly, fTndμ=fdμ\int f\circ T^n \mathrm{d}\mu = \int f \mathrm{d}\mu ( TT is measure preserving )and thus Sn(f)dμnfdμ\int |S_n(f)|\mathrm{d}\mu \leq n \int |f|\mathrm{d}\mu for all nn. Applying Fatou’s Lemma gives

fˉdμ=limnn1Sn(f)dμlim infnn1Sn(f)dμfdμ\int |\bar{f}|\mathrm{d}\mu = \int \lim_{n\to\infty}|n^{-1}S_n(f)|\mathrm{d}\mu \leq \liminf_{n\rightarrow\infty}\int |n^{-1}S_n(f)|\mathrm{d}\mu \leq \int |f|\mathrm{d}\mu

finishing the proof. \square

Von Neumann’s Ergodic Theorem

Let μ(Ω)<\mu(\Omega) < \infty and p[1,)p \in [1, \infty). Then, for all fLp(Ω,F,μ)f \in L^p(\Omega, \mathcal{F}, \mu), there exists an fˉLp\bar{f} \in L^p such that n1Sn(f)Lpfˉn^{-1}S_n(f) \overset{L^p}{\longrightarrow} \bar{f} and

fˉdμ=fdμ\int \bar{f} \mathrm{d}\mu = \int f \mathrm{d}\mu

Proof

We begin by noting that

fTnpp=fpTndμ=fpp||f\circ T^n||^p_p = \int |f|^p\circ T^n\mathrm{d}\mu = ||f||^p_p

By the above and the Minkowski’s Inequality, n1Sn(f)pfp||n^{-1}S_n(f)||_p \leq ||f||_p. Since fLpf\in L^p, given a ϵ>0\epsilon > 0, we can choose a C>0C > 0 such that fgpϵ/3||f - g||_p \leq \epsilon/3 with

g(x)={Cf(x)>Cf(x)Cf(x)CCf(x)<Cg(x) = \begin{cases} C & f(x) > C \\ f(x) & -C \leq f(x) \leq C \\ -C & f(x) < -C \end{cases}

i.e. gg is ff bounded above and below by CC and C−C. By the Birkhoff Ergodic Theorem, n1Sn(g)gˉ n^{-1}S_n(g)\to \bar{g}~μ\mu-a.e.

Next, we note that n1Sn(g)C|n^{−1}S_n(g)| \leq C for all nn, and thus by (Dominated Convergence Theorem)[#dominated-convergence-theorem], there exists an NN such that for all n>Nn > N,

n1Sn(g)gˉpϵ3||n^{-1}S_n(g) - \bar{g}||_p \leq \frac{\epsilon}{3}

Applying Fatou’s Lemma gives that

fˉgˉpp=lim infn1Sn(fg)pdμlim infnn1Sn(fg)pdμ=fgpp||\bar{f} - \bar{g}||_p^p = \int \liminf |n^{-1}S_n(f - g)|^p\mathrm{d}\mu \leq \liminf_{n\rightarrow\infty}\int |n^{-1}S_n(f - g)|^p\mathrm{d}\mu = ||f - g||^p_p

Thus, for n>Nn > N,

n1Sn(f)fˉpn1Sn(fg)p+n1Sn(g)gˉp+gˉfˉp<ϵ||n^{-1}S_n(f) - \bar{f}||_p \leq ||n^{-1}S_n(f - g)||_p + ||n^{-1}S_n(g) - \bar{g}||_p + ||\bar{g} - \bar{f}||_p <\epsilon

Since n1Sn(f)Lpfˉn^{-1}S_n(f) \overset{L^p}{\longrightarrow} \bar{f}, the convergence must hold in L1L^1 as well. Thus, we have

fˉdμ=limnn1Sn(f)dμ=limnn1Sn(f)dμ=fdμ\int \bar{f}\mathrm{d}\mu = \int \lim_{n\to\infty}n^{-1}S_n(f)\mathrm{d}\mu = \lim_{n\to\infty}\int n^{-1}S_n(f)\mathrm{d}\mu = \int f\mathrm{d}\mu \\

which gives the desired result. \square

Law of Large Numbers, Again#

Let (Ω, F, P) be a probability space with i.i.d. real-valued random variables {Xi}i=1\{X_i\}_{i=1}^\infty with distribution function dF\mathrm{d}F. We define a map π:ΩS:=RN\pi : \Omega \to S \vcentcolon= \mathbb{R}^\mathbb{N} by

π(ω)=(X1(ω),X2(ω),)\pi(\omega) = (X_1(\omega), X_2(\omega), \ldots)

Let ν=μπ1\nu=\mu\circ\pi^{-1} be the corresponding probability measure on SS.

Because the variables {Xi}\{X_i\} are independent, ν\nu has the form ν=μ1×μ2×\nu = \mu_1\times\mu_2\times\cdots , and because they are identically distributed, all the marginal distributions μj\mu_j are the same, so in fact ν=μN\nu = \mu^\mathbb{N} for some probability distribution μ\mu on R\mathbb{R}.

For a sequence (x1,x2,)S(x_1, x_2, \ldots) \in S, we can define the shift map T:SST : S\to S to be

T(x1,x2,x3,)=(x2,x3,x4,)T(x_1, x_2, x_3, \ldots) = (x_2, x_3, x_4, \ldots)

Then the shift map is measure preserving and ergodic by Kolmogorov’s zero-one law.

Strong Law of Large Numbers, Again

Let {Xi}i=1\{X_i\}_{i=1}^\infty be i.i.d. random variables from Ω\Omega to R\mathbb{R}. If E[Xi]<E[|X_i|] < \infty, then Snna.s.c\frac{S_n}{n} \overset{a.s.}{\longrightarrow} c for c=E[Xi]c = E[X_i].

Proof

Let f:SRf : S \to \mathbb{R} by taking the first coordinate, that is, for x=(x1,x2,)Sx = (x_1, x_2, \ldots)\in S, f(x)=x1f(x) = x_1. Then, for TT being the shift map and x=π(ω)x = \pi(\omega), we have

Sn(f)(x)=f(x)+(fT)(x)++(fTn1)=X1(ω)+X2(ω)++Xn(ω)S_n(f)(x) = f(x) + (f\circ T)(x) + \cdots + (f\circ T^{n-1}) = X_1(\omega) + X_2(\omega) + \cdots + X_n(\omega)

Thus, von Neumann’s Ergodic Theorem says that there exists an invariant fˉL1\bar{f}\in L^1 such that

n1Sn(f)(x)a.s.fˉn^{-1}S_n(f)(x)\overset{a.s.}{\longrightarrow} \bar{f}

for xSx\in S and

fˉdμ=fdμ\int \bar{f}\mathrm{d}\mu = \int f\mathrm{d}\mu

Since TT is ergodic, the result from the beginning of this section states that fˉ=c\bar{f} = c, a constant, almost surely. Thus,

c=fˉdμ=limnn1Sn(f)=fdμ=E[Xi]c = \int \bar{f}\mathrm{d}\mu = \lim_{n\to \infty}\int n^{-1}S_n(f) = \int f\mathrm{d}\mu = E[X_i]

gives the disred result. \square

Probability and Measure
https://astronaut.github.io/posts/measure-theory/
Author
关怀他人
Published at
2025-02-07