3634 words
18 minutes
Probability and Measure — Part 2
2025-03-10
2025-12-19

本篇是 Adam B Kashlak 老师的 Probability and Measure Theory 课程笔记 Part 2

Functions, Random Variables and Integration#

Simple Functions and Random Variables#

Simple Random Variable

Let (Ω,F,P)(\Omega, \mathcal{F}, P) be a Probability Space i.e. P(Ω)=1P(\Omega)=1. A simple random variable X:ΩRX : \Omega\rightarrow\mathbb{R} is a real valued function that only takes on a finite number of values x1,,xpx_1, \dots , x_p and such that the set

{ωΩ:X(ω)=xi}F\{\omega\in\Omega : X(\omega)=x_i\}\in\mathcal{F}

One way to write such a function is to finitely partition Ω\Omega into disjoint sets {Ai}i=1p\{A_i\}_{i=1}^p i.e. i=1pAi=Ω\displaystyle\bigcup_{i=1}^p A_i=\Omega and AiAj=A_i\displaystyle\cap A_j=\varnothing and write

X(ω)=i=1pxi1[ωAi]X(\omega) = \displaystyle\sum_{i=1}^px_i \mathbf{1}[\omega\in A_i]

Then we can say that the probability that XX is equal to xix_i is

P(X=xi)=P({ωΩ:X(ω)=xi})=P(Ai)P(X=x_i)=P(\{\omega\in\Omega : X(\omega)=x_i\})=P(A_i)

Furthermore, this allows us to define the expectation of the simple random variable XX to be

EX=i=1pxiP(X=xi)\mathbb{E}X = \displaystyle\sum_{i=1}^p x_iP(X=x_i)
Simple Measurable Function

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a Probability Space. A simple function F:ΩRF:\Omega\rightarrow\mathbb{R} is s.t.

F(ω)=i=1pxi1[ωBi]   BiFF(\omega)=\displaystyle\sum_{i=1}^p x_i\mathbf{1}[\omega\in B_i]~~ ~B_i\in\mathcal{F}

which is the linear combination of indicator functions. The sets BiB_i need not be disjoint, but given a simple function, we can define it in terms of disjoint BiB_i.

Then, we define the integral of a simple function to be

Fdμ:=i=1pxiμ(Bi)\displaystyle\int F\mathrm{d}\mu \vcentcolon= \displaystyle\sum_{i=1}^p x_i\mu(B_i)

Measurable Functions and Random Variables#

To extend the above idea of a simple random variable, we want to replace the finite xix_i with any Borel set BR\mathcal{B} \subset \mathbb{R}. We need two Measureable Spaces (X,X)(\mathbb{X}, \mathcal{X}) and (Y,Y)(\mathbb{Y}, \mathcal{Y}).

Measurable Function

A function f:XYf : \mathbb{X}\rightarrow\mathbb{Y} is said to be measurable (with respect to X/Y\mathcal{X}/\mathcal{Y}) if f1(B)Xf^{-1}(B)\in\mathcal{X} for any BYB\in\mathcal{Y}. If Y=R\mathbb{Y}=\mathbb{R}, then we say that ff is a X\mathcal{X}-measurable.

Typically, the σ\sigma-Fields of interest are the Borel σ\sigma-Fields and it is sometimes writen as (X,B(X))(\mathbb{X}, \mathcal{B}(\mathbb{X})) when we have a topological space. Moreover, the space (X,X)(\mathbb{X}, \mathcal{X}) is typically taken to be (R,B(R))(\mathbb{R}, \mathcal{B}(\mathbb{R})) or (R+,B(R+))(\mathbb{R}^+, \mathcal{B}(\mathbb{R}^+)). In this case, we say that ff is Borel Measurable.

If we replace B(R)\mathcal{B}(\mathbb{R}) with Mλ(R)\mathcal{M}_λ(\mathbb{R}), the set of Lebesgue measurable subsets of R\mathbb{R}, then we say ff is Lebesgue Measurable.

Cool facts about Measurable Functions#

  1. Inverse images of set functions preserve set operations. i.e. for f:XYf:\mathbb{X}\rightarrow\mathbb{Y} and A,AiY,iIA, A_i\subset\mathbb{Y}, i\in \mathcal{I},

    f1(iIAi)=iIf1(Ai)  and  f1(YA)=Xf1(A)f^{-1}\left(\displaystyle\bigcup_{i\in \mathcal{I}} A_i\right) = \displaystyle\bigcup_{i\in \mathcal{I}} f^{-1}(A_i) ~~ \text{and} ~~ f^{-1}(\mathbb{Y}\setminus A) = \mathbb{X} \setminus f^{-1}(A)

    For a measurable set function ff, this implies that {f1(B):BY}\{f^{-1}(B) : B \in \mathcal{Y}\} is a σ\sigma-Field and is contained in X\mathcal{X} . Hence, we want Y\mathcal{Y} to be no larger than X\mathcal{X} to have measurable functions. Furthermore, this can be used to show that the measurability of ff can be established by looking only at a collection of sets AY\mathcal{A} \subset \mathcal{Y} that generates Y\mathcal{Y}.

    Example

    let A\mathcal{A} be the set of all half-lines At=(,t]A_t = (−\infty, t] for tRt \in \mathbb{R} will generate B(R)\mathcal{B}(\mathbb{R}). Thus, ff is measurable as long as the sets {x:f(x)t}\{x : f(x) \leq t\} are measurable.

  2. For any AXA \in \mathcal{X}, the indicator functions f(x)=1[xA]f(x) = \mathbf{1}[x \in A] are measurable. The σ\sigma-Field generated by f1f^{−1} is simply {,A,Ac,X}X\{\varnothing, A, A^c, \mathbb{X}\}\subset\mathcal{X}.

  3. For measurable functions f,g:XRf, g : \mathbb{X} \rightarrow \mathbb{R}, the functions f+gf + g and fgfg are measurable.

  4. For measurable functions, {fi}i=1\{f_i\}_{i=1}^\infty from X\mathbb{X} to R\mathbb{R}, the following are also measurable: supifi\sup_i f_i, infifi\inf_i f_i, lim supifi\displaystyle\limsup_i f_i, lim infifi\displaystyle\liminf_i f_i and limifi\displaystyle\lim_i f_i if it exists.

    Proof

    In set notation, {x:supifi(x)t}=i=1{x:fi(x)t}\{x : \sup_i f_i(x)\leq t\}=\displaystyle\bigcap_{i=1}^\infty\{x : f_i(x)\leq t\} where the righthand side is a countable intersection of measurable sets and hence measurable. Similarly, {x:infifi(x)t}=i=1{x:fi(x)t}\{x : \inf_i f_i(x)\leq t\}=\displaystyle\bigcup_{i=1}^\infty\{x : f_i(x)\leq t\} and lim supifi=infisupjifi\displaystyle\limsup_i f_i = \inf_i \sup_{j\geq i} f_i and lim infifi=supiinfjifi\displaystyle\liminf_i f_i = \sup_i\inf_{j\geq i} f_i. If limifi\displaystyle\lim_i f_i exists then lim supfi=limifi=lim infifi\displaystyle\limsup f_i=\displaystyle\lim_i f_i=\displaystyle\liminf_i f_i. \square

  5. Let f:XRf : \mathbb{X}\rightarrow\mathbb{R} be a continuous function, then it is measurable.

    Proof

    If UU is an open set in R\mathbb{R}, then f1(U)f^{−1}(U) is open in X\mathbb{X} (definition of a continous function between two topological spaces). Thus, the set f1(U)f^{−1}(U) is measurable. Since the open sets of R\mathbb{R} generate B(R)\mathcal{B}(\mathbb{R}), the function ff is measurable. \square

  6. Given a collection of functions fi:XY,iIf_i: \mathbb{X}\rightarrow\mathbb{Y}, i\in \mathcal{I}, we can make them measurable by constructing the measurable space (X,X)(\mathbb{X},\mathcal{X}) where σ({fi}iI)X\sigma(\{f_i\}_{i\in \mathcal{I}})\subseteq\mathcal{X} is the σ\sigma-field generated by the sets fi1(B)f^{-1}_i(B) for all ii and BYB\in\mathcal{Y}.

Almost Surely / Almost Everywhere

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space. For two functions f,g:ΩRf, g : \Omega\rightarrow\mathbb{R}, we say that f=gf = g a.e. (almost everywhere) when the set N={ω:f(ω)g(ω)}N = \{\omega : f(\omega) \neq g(\omega)\} has measure μ(N)=0\mu(N) = 0.

In probability theory, “almost everywhere” is replaced with “almost surely” abbreviated a.s. and it is equivalently is written “with probability 1” or w.p.1.

Example

Let ([0,1],B,λ)([0, 1], \mathcal{B}, \lambda) be the standard measure space of Borel sets on the unit interval with Lebesgue measure. Let f(t)=0f(t)=0 for all t(0,1]t\in(0,1] and g(t)=0g(t) = 0 on (0,1]Q(0, 1]\setminus\mathbb{Q} and g(t)=1g(t)=1 on (0,1]Q(0, 1]\displaystyle\cap\mathbb{Q}.

Then we have f=gf=g a.e. that is λ((0,1)Q)=0\lambda((0, 1)\displaystyle\cap\mathbb{Q}) = 0. To prove that λ(Q)=0\lambda(\mathbb{Q})=0, we enumerate each rational number {qm}m=1\{q_m\}_{m=1}^\infty and surround each qmq_m with an interval (qmϵ2m,qm+ϵ2m)(q_m-\frac{\epsilon}{2^m}, q_m + \frac{\epsilon}{2^m}). For ϵ>0\epsilon > 0, m=1λ((qmϵ2m,qm+ϵ2m))=m=1ϵ2m1=2ϵ\displaystyle\sum_{m=1}^\infty \lambda((q_m-\frac{\epsilon}{2^m}, q_m + \frac{\epsilon}{2^m}))=\displaystyle\sum_{m=1}^\infty\frac{\epsilon}{2^{m-1}}=2\epsilon. Since ϵ\epsilon is arbitrary, we have λ(Q)=0\lambda(\mathbb{Q})=0.

Integration#

We will consider measurable functions mapping from (Ω,F)(\Omega, \mathcal{F}) to [,][-\infty, \infty], which is called the extended real line. This allows us to handle sets such as f1()f^{-1}(\infty).

Notation : fiff_i\uparrow f

We use fiff_i\uparrow f to represent a sequence of functions {fi}\{f_i\} that are increasing and converge to ff for every ω\omega. This implies that fi(ω)f(ω)f_i(\omega)\rightarrow f(\omega) and fi(ω)fi+1(ω)f_i(\omega)\leq f_{i+1}(\omega) for all ω\omega.

Theorem

Let (Ω,F)(\Omega, \mathcal{F}) be a measurable space and A\mathcal{A} a π\pi-system that generates F\mathcal{F} ( σ(A)=F ~\sigma(\mathcal{A})=\mathcal{F}~). Let V\mathcal{V} be a linear space that contains

  1. all indicators 1Ω\mathbf{1}_\Omega and 1A\mathbf{1}_A for each AAA\in\mathcal{A}
  2. all functions ff such that fiV\exists f_i\in\mathcal{V} s.t. fiff_i\uparrow f

Then V\mathcal{V} contains all measurable functions.

Proof

First, 1AA\mathbf{1}_A\in\mathcal{A}. Let L={BF:1BV}\mathcal{L}=\{B\in\mathcal{F} : \mathbf{1}_B\in\mathcal{V}\}, we claim that L\mathcal{L} is a λ\lambda-system. Since AL\mathcal{A}\subset\mathcal{L}, by Dynkin π\pi-λ\lambda Theorem, σ(A)L\sigma(\mathcal{A})\subseteq\mathcal{L}. By the definition of L\mathcal{L}, we have that VF\mathcal{V}\subseteq\mathcal{F}, thus L=F\mathcal{L}=\mathcal{F}, so every indicator function 1B\mathbf{1}_B is in V\mathcal{V}.

Let ff be a non-negative measurable function, we can define fi=2i2iff_i=2^{-i}\lfloor2^if\rfloor. Each fif_i is a finite linear combination of indicator functions and hence fiVf_i \in \mathcal{V}. Furthermore, fiff_i\uparrow f and thus fVf\in\mathcal{V}.

Lastly, for a general measurable ff, we write f=f+ff=f^+-f^- where

f+(ω)={f(ω)if f(ω)00otherwisef(ω)={f(ω)if f(ω)<00otherwise\begin{aligned} f^+(\omega)&= \begin{cases} f(\omega) & \text{if } f(\omega)\geq 0 \\ 0 & \text{otherwise} \end{cases} \\ f^-(\omega)&= \begin{cases} -f(\omega) & \text{if } f(\omega)< 0 \\ 0 & \text{otherwise} \end{cases} \end{aligned}

which are both non-negative measurable functions. \square

Integral of a Measurable Function

For a non-negative measurable function f:Ω[0,]f : \Omega\rightarrow [0,\infty] on the measure space (Ω,F,μ)(\Omega, \mathcal{F}, \mu), we define

fdμ=sup[iI{infωAif(ω)}μ(Ai)]\displaystyle\int f\mathrm{d}\mu = \sup\left[\displaystyle\sum_{i\in \mathcal{I}} \left\{\inf_{\omega\in A_i} f(\omega) \right\}\mu(A_i)\right]

where the supremum is taken over all finite partitions {Ai}iI\{A_i\}_{i\in \mathcal{I}} of Ω\Omega.

Inside the square brackets is the integral of the simple function that assigns a value of infωAif(ω)\inf_{\omega\in A_i} f(\omega) for the set A_i. Hence, for a non-negative ff, we consider all simple functions gg such that 0gf0 \leq g \leq f and define the integral to be the supremum of the integral of gg . To extend this to all measurable functions ff, we write f=f+ff=f^+-f^- then

fdμ=f+dμfdμ\displaystyle\int f\mathrm{d}\mu = \displaystyle\int f^+\mathrm{d}\mu - \displaystyle\int f^-\mathrm{d}\mu
Monotone Convergence Theorem for Simple Functions

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space, and let f0f \geq 0 be measureable, and fn0 nNf_n \geq 0~\forall n \in \mathbb{N} be a sequence of measureable simple functions such that fnff_n \uparrow f. Then, fndμfdμ\displaystyle\int f_n\mathrm{d}\mu\uparrow\displaystyle\int f\mathrm{d}\mu.

Proof

Since fnff_n \uparrow f and ff is integrable, then fndμc[0,)\displaystyle\int f_n\mathrm{d}\mu \uparrow c\in[0,\infty) and cfdμc\leq \displaystyle\int f\mathrm{d}\mu. To prove the other side of the inequality, let gg be any simple measureable function s.t. 0gf0\leq g\leq f. Our goal is to show that gdμc\displaystyle\int g\mathrm{d}\mu\leq c as this will imply that cfdμc\geq \displaystyle\int f\mathrm{d}\mu.

First, we write g=iIai1Aig=\displaystyle\sum_{i\in \mathcal{I}}a_i \mathbf{1}_{A_i} where AiA_i are disjoint partition of Ω\Omega and similarly fn=jJbj(n)1Bj(n)f_n=\displaystyle\sum_{j\in \mathcal{J}} b_j^{(n)} \mathbf{1}_{B_j^{(n)}}. Consequently,

fn=iIfn1Ai=i,jbj(n)1AiBj(n)f_n = \displaystyle\sum_{i\in \mathcal{I}} f_n \mathbf{1}_{A_i}=\displaystyle\sum_{i, j} b_j^{(n)} \mathbf{1}_{A_i\displaystyle\cap B_j^{(n)}}

and fndμ=iIAifndμ\displaystyle\int f_n\mathrm{d}\mu=\displaystyle\sum_{i\in\mathcal{I}}\displaystyle\int_{A_i}f_n\mathrm{d}\mu. Hence, we want to show that for each iIi \in \mathcal{I} that

limnAifndμAigdμ=aiμ(Ai)\displaystyle\lim_{n\rightarrow \infty}\displaystyle\int_{A_i}f_n\mathrm{d}\mu\geq \displaystyle\int_{A_i} g\mathrm{d}\mu = a_i\mu(A_i)

First, if ai=0a_i=0, then the eqaulity above must hold. If ai>0a_i>0, we can divide by aia_i and without loss of generality, we take ai=1a_i=1 and consider g=1Ag=\mathbf{1}_A for some set AA.

For any ϵ>0\epsilon>0, let Cn={xA:fn(x)>1ϵ}C_n=\{x\in A : f_n(x) > 1 - \epsilon\}. Then CnAC_n\uparrow A, which means

C1C2A   and  n=1Cn=AC_1\subseteq C_2\subseteq \cdots \subseteq A ~~\text{ and }~ \displaystyle\bigcup_{n=1}^\infty C_n = A

By countable additivity of μ\mu

μ(Cn+1)=μ(C1)+i=1nμ(Ci+1Ci)\mu(C_{n+1}) = \mu(C_1) + \displaystyle\sum_{i=1}^n \mu(C_{i + 1} \setminus C_i)

and μ(Cn)μ(A)\mu(C_n)\uparrow \mu(A). Since fndμ(1ϵ)μ(Cn)\displaystyle\int f_n\mathrm{d}\mu \geq (1 - \epsilon)\mu(C_n), we have that c(1ϵ)μ(A)=(1ϵ)gdμc\geq (1 - \epsilon)\mu(A) = (1 - \epsilon)\displaystyle\int g\mathrm{d}\mu. Take ϵ\epsilon to zero gives cgdμc\geq \displaystyle\int g\mathrm{d}\mu. \square

Theorem

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space, and let f,g:Ω[,]f, g : \Omega \rightarrow [-\infty, \infty] be measureable and f=gf = g a.e. Then, ff is integrable if and only if gg is integrable. If ff and gg are integrable, then fdμ=gdμ\displaystyle\int f\mathrm{d}\mu = \displaystyle\int g\mathrm{d}\mu.

Proof

Let f=gf = g on ΩN\Omega \setminus N where μ(N)=0\mu(N) = 0. For any measureable function h:Ω[,]h : \Omega \rightarrow [-\infty, \infty], we can consider when Ωhdμ=ΩNhdμ\displaystyle\int_\Omega h\mathrm{d}\mu = \displaystyle\int_{\Omega\setminus N} h\mathrm{d}\mu coincide.

  • True if hh is an indicator function because μ(A)=μ(AN)\mu(A) = \mu(A\setminus N)
  • Also true for hh being a non-negative simple function.
  • From MCT for Simple Functions, we can take non-negative simple functions hh to any non-negative measureable function.
  • Lastly, write hdμ=h+dμhdμ\displaystyle\int h\mathrm{d}\mu = \displaystyle\int h^+\mathrm{d}\mu - \displaystyle\int h^-\mathrm{d}\mu shows that the integrals will coincide for any integrable measureable function.

To complete the proof, we note that

fdμ=ΩNfdμ=ΩNgdμ=gdμ\displaystyle\int f\mathrm{d}\mu = \displaystyle\int_{\Omega\setminus N}f \mathrm{d}\mu = \displaystyle\int_{\Omega\setminus N}g \mathrm{d}\mu = \displaystyle\int g\mathrm{d}\mu

This result basically tells us that we can modify functions on a set of measure zero without breaking anything. \square

Monotone Convergence Theorem#

Monotone Convergence Theorem

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space and let {fi}i=1\{f_i\}_{i=1}^\infty be measurable functions from Ω\Omega to [,][-\infty, \infty] such that fiff_i \uparrow f μ\mu-a.e. and f1dμ>\displaystyle\int f_1\mathrm{d}\mu > -\infty. Then fidμfdμ\displaystyle\int f_i\mathrm{d}\mu \uparrow \displaystyle\int f\mathrm{d}\mu. (This means that we can swap the limit and the integral)

Proof

First, we need to check that ff is measureable. For cRc\in\mathbb{R}, we consider the sets (c,](c, \infty], which generate the Borel σ\sigma-Field. Since fiff_i\uparrow f, f1((c,])=i=1fi1((c,])f^{-1}((c, \infty]) = \displaystyle\bigcup_{i=1}^\infty f_i^{-1}((c, \infty]) and fi1((c,])Ff_i^{-1}((c, \infty]) \in \mathcal{F}, thus ff is measureable. (by the first cool fact about measurable functions)

Next, assume that f10f_1 \geq 0, and for each fif_i we take simple functions gijg_{ij} such that gijfig_{ij} \uparrow f_i as jj\rightarrow\infty. Thus, by MCT for Simple Functions, gijdμfidμ\displaystyle\int g_{ij}\mathrm{d}\mu\uparrow\displaystyle\int f_i\mathrm{d}\mu. Furthermore, let gi=max{g1i,,gii}g_i^* = \max\{g_{1i}, \cdots , g_{ii}\}. These gig_i^* are simple functions and gifg_i^*\uparrow f. Once again, MCT for Simple Functions implies that gidμfdμ\displaystyle\int g_i^*\mathrm{d}\mu\uparrow\displaystyle\int f\mathrm{d}\mu. But since gifig_i^* \leq f_i by construction, gidμfidμfdμ\displaystyle\int g_i^*\mathrm{d}\mu \leq \displaystyle\int f_i\mathrm{d}\mu \leq \displaystyle\int f\mathrm{d}\mu. Thus, fidμfdμ\displaystyle\int f_i\mathrm{d}\mu\uparrow\displaystyle\int f\mathrm{d}\mu. This means we are done for non-negative fucntions (f10f_1\geq 0).

Now, we assume that f0f \leq 0. In this case fiff_i \uparrow f implies that fif−f_i \downarrow −f. Let h=fh=-f and hi=fih_i=-f_i, we have 0hdμhidμ0\leq \displaystyle\int h\mathrm{d}\mu\leq \displaystyle\int h_i\mathrm{d}\mu. Next, note that 0h1hih1h0\leq h_1-h_i\uparrow h_1-h. Applying the above result gives that (h1hi)dμ(h1h)dμ\displaystyle\int (h_1-h_i)\mathrm{d}\mu\uparrow \displaystyle\int (h_1-h)\mathrm{d}\mu. Since all of the hh have finite integrals, we are allowed to subtract to get that hidμhdμ\displaystyle\int h_i\mathrm{d}\mu\downarrow\displaystyle\int h\mathrm{d}\mu and thus fidμfdμ\displaystyle\int f_i\mathrm{d}\mu\uparrow\displaystyle\int f\mathrm{d}\mu.

For a general function f=f+ff = f^+ - f^-, we have fi+f+f_i^+\uparrow f^+ and fiff_i^-\downarrow f^- and fdμ<\displaystyle\int f^-\mathrm{d}\mu < \infty. So by the above special cases, fi+dμf+dμ\displaystyle\int f_i^+\mathrm{d}\mu\uparrow\displaystyle\int f^+\mathrm{d}\mu and fidμfdμ\displaystyle\int f_i^-\mathrm{d}\mu\downarrow\displaystyle\int f^-\mathrm{d}\mu. Finally, fidμfdμ\displaystyle\int f_i\mathrm{d}\mu \uparrow \displaystyle\int f\mathrm{d}\mu. \square

We only require fiff_i \uparrow f to hold almost everywhere to establish the result. Hence convergence can fail on a set of measure (probability) zero and we still have convergence of the integrals.

Secondly, we can redo the above proof for fiff_i \downarrow f with f1dμ<\displaystyle\int f_1\mathrm{d}\mu < \infty to get a similar result for decreasing sequence.

Fatou’s Lemma #

Fatous’ Lemma

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space and let {fi}i=1\{f_i\}_{i=1}^\infty be non-negative measurable functions from Ω\Omega to [,][-\infty, \infty]. Then, lim inffidμlim inffidμ\displaystyle\int \displaystyle\liminf f_i\mathrm{d}\mu \leq \displaystyle\liminf \displaystyle\int f_i\mathrm{d}\mu

Proof

Recall that lim infi=supjinfi>jfi\displaystyle\liminf_{i\rightarrow\infty} = \sup_j \inf_{i>j} f_i. Hence, let gj=infijfig_j=\inf_{i\geq j} f_i. Then, gjlim infifig_j\uparrow \displaystyle\liminf_{i\rightarrow\infty} f_i and f10f_1\geq 0 by assumption. So MCT for Simple Functions says that gjdμlim infifidμ\displaystyle\int g_j\mathrm{d}\mu\uparrow\displaystyle\int\displaystyle\liminf_{i\rightarrow\infty}f_i\mathrm{d}\mu. By construction, gjfig_j\leq f_i for any iji\geq j, thus gjdμfidμ\displaystyle\int g_j\mathrm{d}\mu \leq \displaystyle\int f_i\mathrm{d}\mu for any iji\geq j and subsequently, gjdμinfijfidμ\displaystyle\int g_j\mathrm{d}\mu\leq \inf_{i\geq j}\displaystyle\int f_i\mathrm{d}\mu. Taking jj\rightarrow\infty gives limjgjdμ=lim inffidμlim inffidμ\displaystyle\lim_{j\rightarrow\infty}\displaystyle\int g_j\mathrm{d}\mu = \displaystyle\int\displaystyle\liminf f_i\mathrm{d}\mu\leq\displaystyle\liminf\displaystyle\int f_i\mathrm{d}\mu. \square

Dominated Convergence Theorem#

Dominated Convergence Theorem

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a measure space and let {fi}i=1\{f_i\}_{i=1}^\infty and gg be measurable functions and absolutely integrable. If fig|f_i| \leq g for all ii and fi(ω)f(ω)f_i(\omega) \rightarrow f(\omega) for each ωΩ\omega \in \Omega (i.e. pointwise convergence), then ff is absolutely integrable and fidμfdμ\displaystyle\int f_i\mathrm{d}\mu \rightarrow \displaystyle\int f\mathrm{d}\mu.

Proof

Let fi=infjifjf_i^\wedge = \inf_{j\geq i} f_j and fi=supjifjf_i^\vee = \sup_{j\geq i} f_j. Then fififif_i^\wedge \leq f_i \leq f_i^\vee. We have that fiff_i^\wedge\uparrow f and f1dμgdμ>\displaystyle\int f_1^\wedge\mathrm{d}\mu\geq -\displaystyle\int g\mathrm{d}\mu > -\infty. So MCT for Simple Functions implies that fidμfdμ\displaystyle\int f_i^\wedge\mathrm{d}\mu\uparrow \displaystyle\int f\mathrm{d}\mu.

Do the same for fif_i^\vee, we have that fiff_i^\vee\downarrow f and hence that fidμfdμ\displaystyle\int f_i^\vee\mathrm{d}\mu\downarrow\displaystyle\int f\mathrm{d}\mu. Since fidμfidμfidμ\displaystyle\int f_i^\wedge\mathrm{d}\mu\leq \displaystyle\int f_i\mathrm{d}\mu\leq \displaystyle\int f_i^\vee\mathrm{d}\mu, we have the desired result that fidμfdμ\displaystyle\int f_i\mathrm{d}\mu\rightarrow \displaystyle\int f\mathrm{d}\mu. \square

Lebesgue-Stieltjes Measure#

Let (X,X)(\mathbb{X}, \mathcal{X}) and (Y,Y)(\mathbb{Y}, \mathcal{Y}) be two measurable spaces. Let ψ:XY\psi : \mathbb{X}\rightarrow\mathbb{Y} be a measurable function and μ\mu be a measure on X\mathcal{X}. Then we can define ν=μψ1\nu = \mu \circ \psi^{-1} to be a measure of Y\mathcal{Y}. This allows us to turn Lebesgue measure into Lebesgue-Stieltjes measures.

Theorem

Let F:RRF : \mathbb{R} \rightarrow \mathbb{R} be non-constant, right-continuous, and non-decreasing. Then, there exists a unique measure dF\mathrm{d}F on R\mathbb{R} such that for all a,bRa, b \in \mathbb{R} with a<ba < b,

dF((a,b])=F(b)F(a)\mathrm{d}F((a, b]) = F(b) - F(a)
Proof

Let F()=limxF(x)F(\infty) = \displaystyle\lim_{x\rightarrow\infty} F(x) and F()=limxF(x)F(-\infty) = \displaystyle\lim_{x\rightarrow-\infty}F(x). We define an open interval I=(F(),F())I = (F(-\infty), F(\infty)) and define g(y)=inf{xR:yF(x)}g(y) = \inf\{x\in\mathbb{R} : y\leq F(x)\}. We want to define dF\mathrm{d}F to be λg1\lambda \circ g^{−1} where λ\lambda is Lebesgue Measure on R\mathbb{R}, so we need to show that this makes sense.

We first show that gg is left-continuous and non-decreasing and for yIy \in I and xR,g(y)xx \in \mathbb{R}, g(y) \leq x if and only if yF(x)y \leq F(x). To show this, fix a yIy\in I and let Jy={xR:yF(x)}J_y = \{x\in\mathbb{R} : y\leq F(x)\}. As FF is non-decreasing, if xJyx\in J_y and xxx'\geq x then xJyx'\in J_y. As FF is right-continuous, if xnJyx_n\in J_y and xnxx_n\downarrow x then xJyx\in J_y. Therefore, Jy=[g(y),)J_y = [g(y), \infty) and g(y)xg(y)\leq x if and only if yF(x)y\leq F(x). Secondly, for yyy\leq y', we have that JyJyJ_{y'}\subseteq J_y and thus g(y)g(y)g(y)\leq g(y'). So if ynyy_n\uparrow y, then Jy=n=1JynJ_y = \displaystyle\bigcap_{n=1}^\infty J_{y_n} and thus g(yn)g(y)g(y_n)\rightarrow g(y), which implies that gg is left continuous and non-decreasing.

From the above, gg is Borel Measurable and thus defining dF=λg1\mathrm{d}F = \lambda \circ g^{-1} gives us that

dF((a,b])=λ({y:g(y)>a,g(y)b})=λ((F(a),F(b)])=F(b)F(a)\mathrm{d}F((a, b]) = \lambda(\{y : g(y) > a, g(y) \leq b\}) = \lambda((F(a), F(b)]) = F(b) - F(a)

Furthermore, this measure, dF\mathrm{d}F, is unique by using the same arguments as before for Lebesgue Measure. \square

In the case that F:R[0,1]F : \mathbb{R} \rightarrow [0, 1] such that the interval I=[0,1]I = [0, 1], we have a cumulative distribution function, which induces a measure on the real line. This allows us to do things like integrate with respect to such measures——i.e. take an expectation.

Radon Measure#

Definition

Let (Ω,B,μ)(\Omega, \mathcal{B}, \mu) be a measure space where B\mathcal{B} is the Borel σ\sigma-field. The measure μ\mu is said to be a Radon Measure if μ(K)<\mu(K) < \infty for all compact KBK \in \mathcal{B}.

  • dF\mathrm{d}F is a Radon Measure.

  • Every non-zero Radon Measure on B(R)\mathcal{B}(\mathbb{R}) can be written as dF=λg1\mathrm{d}F = \lambda \circ g^{−1} for some FF.

    If μ\mu is a Radon Measure on R\mathbb{R}, then we can define FF as

    F(x)={μ((0,x])if x>0μ((x,0])if x<0F(x) = \begin{cases} \mu((0, x]) & \text{if } x > 0 \\ -\mu((x, 0]) & \text{if } x < 0 \\ \end{cases}

    Thus, F(b)F(a)=μ((a,b])F(b) − F(a) = \mu((a, b]) for a<ba < b and hence μ=dF\mu = \mathrm{d}F by uniqueness.

Product Measure#

Definition

Given two σ\sigma-fields X\mathcal{X} and Y\mathcal{Y}, we define the product σ\sigma-field to be X×Y\mathcal{X} \times \mathcal{Y}, which is the σ\sigma-filed generated by the rectangle A×BA\times B where AXA\in\mathcal{X} and BYB\in\mathcal{Y}. The collection of all rectangles will be denoted as R\mathcal{R}

Monotone Class#

Definition

A collection of subsets M\mathcal{M} of Ω\Omega is said to be monotone if

  1. for {Ai}i=1\{A_i\}_{i=1}^\infty s.t. AiMA_i \in \mathcal{M} and AiA=i=1AiA_i \uparrow A = \displaystyle\bigcup_{i=1}^\infty Ai, then AMA \in \mathcal{M}.
  2. for {Ai}i=1\{A_i\}_{i=1}^\infty s.t. AiMA_i \in \mathcal{M} and AiA=i=1AiA_i \downarrow A = \displaystyle\bigcup_{i=1}^\infty Ai, then AMA \in \mathcal{M}.

Note that if a field A\mathcal{A} is also monotone, then it is a σ\sigma-field (by definition of σ\sigma-field.)

Monotone Class Theorem

Let A\mathcal{A} be a field and M\mathcal{M} be monotone such that AM\mathcal{A}\subset\mathcal{M}. Then, σ(A)M\sigma(\mathcal{A}) \subseteq \mathcal{M}

Proof

This proof is similar to the one of Dynkin π\pi-λ\lambda Theorem.

Existence and Uniqueness of Product Measure#

Existence and Uniqueness Theorem of Product Measure

Let (X,X,μ)(\mathbb{X}, \mathcal{X} , \mu) and (Y,Y,ν)(\mathbb{Y}, \mathcal{Y}, \nu) be σ\sigma-finite measure spaces. We denote the product σ\sigma-Field to be X×Y\mathcal{X} \times \mathcal{Y}. (the cartesian product of two σ\sigma-Fields may not be a σ\sigma-Field) Let ππ be a set function on X×Y\mathcal{X}\times\mathcal{Y} such that for AXA \in \mathcal{X} and BYB \in \mathcal{Y}, π(A×B)=μ(A)ν(B)π(A \times B) = \mu(A)\nu(B). Then, ππ extends uniquely to a measure on (X×Y,X×Y)(\mathbb{X} \times \mathbb{Y}, \mathcal{X}\times\mathcal{Y}) such that for any EX×YE \in \mathcal{X} \times \mathcal{Y},

π(E)=1E(x,y)dμ(x)dν(y)=1E(x,y)dν(y)dμ(x)\pi(E) = \displaystyle\int\displaystyle\int \mathbf{1}_E(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) = \displaystyle\int\displaystyle\int \mathbf{1}_E(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)
Lemma

Let (X,X,μ)(\mathbb{X}, \mathcal{X} , \mu) and (Y,Y,ν)(\mathbb{Y}, \mathcal{Y}, \nu) be finite measure spaces, and let

F={EX×Y:1E(x,y)dμ(x)dν(y)=1E(x,y)dν(y)dμ(x)}\mathcal{F} = \left\{E \subset \mathbb{X}\times\mathbb{Y} : \displaystyle\int\displaystyle\int \mathbf{1}_E(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) = \displaystyle\int\displaystyle\int \mathbf{1}_E(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)\right\}

Then X×YF\mathcal{X}\times\mathcal{Y} \subseteq \mathcal{F}. ( This means that (X×Y,X×Y)(Y×X,Y×X) (\mathbb{X}\times\mathbb{Y}, \mathcal{X}\times\mathcal{Y}) \equiv (\mathbb{Y}\times\mathbb{X}, \mathcal{Y}\times\mathcal{X})~)

Proof

Let E=A×BE = A \times B for AXA \in \mathcal{X} and BYB \in \mathcal{Y}, i.e. ERE \in \mathcal{R}. Then,

1E(x,y)dμ(x)dν(y)=μ(A)1B(y)dν(y)μ(A)ν(B)=ν(B)μ(A)=ν(B)1A(x)dμ(x)=1E(x,y)dν(y)dμ(x)\begin{aligned} \displaystyle\int\displaystyle\int \mathbf{1}_E(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) &=\mu(A) \displaystyle\int \mathbf{1}_B(y)\mathrm{d}\nu(y) \mu(A)\nu(B) =\nu(B)\mu(A)\\ &=\nu(B) \displaystyle\int \mathbf{1}_A(x)\mathrm{d}\mu(x) =\displaystyle\int\displaystyle\int \mathbf{1}_E(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x) \end{aligned}

Therefore, RF\mathcal{R} \subset \mathcal{F}.

Also, for disjoint R1,R2RR_1, R_2 \in \mathcal{R}, 1R1R2=1R1+1R2\mathbf{1}_{R_1\displaystyle\cup R_2} = \mathbf{1}_{R_1} + \mathbf{1}_{R_2}. Hence, F\mathcal{F} contains finite disjoint unions of R\mathcal{R}. This implies that the field generated by the set of rectangles AF\mathcal{A} \subset \mathcal{F}. (Dudley 3.2.3)

Next, consider {Ei}i=1,EiF\{E_i\}_{i=1}^\infty, E_i\in\mathcal{F}. Then if EiEE_i\uparrow E, then MCT says

1Ei(x,y)dμ(x)dν(y) 1E(x,y)dμ(x)dν(y)\displaystyle\int\displaystyle\int \mathbf{1}_{E_i}(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) ~\big\uparrow \displaystyle\int\displaystyle\int \mathbf{1}_{E}(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y)

and

1Ei(x,y)dν(y)dμ(x) 1E(x,y)dν(y)dμ(x)\displaystyle\int\displaystyle\int \mathbf{1}_{E_i}(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x) ~\big\uparrow \displaystyle\int\displaystyle\int \mathbf{1}_{E}(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)

Thus, EFE \in \mathcal{F} and the same holds if EiEE_i \downarrow E. Therefore, F\mathcal{F} is a monotone class. Finally, applying the Monotone Class Theorem shows that X×Y=σ(A)F\mathcal{X} \times \mathcal{Y} = \sigma(\mathcal{A}) \subset \mathcal{F}. \square

Proof : Existence and Uniqueness of Product Measure

First, we consider the case that μ\mu and ν\nu are finite measures and π(A×B)=ν(A)ν(B)\pi(A\times B) = \nu(A)\nu(B). Then we extend to a set function

π(E):=1E(x,y)dμ(x)d(y)\pi(E)\vcentcolon=\displaystyle\int\displaystyle\int \mathbf{1}_E(x, y)\mathrm{d}\mu(x)\mathrm{d}(y)

for any EX×YE\in \mathcal{X}\times\mathcal{Y}. The above lemma says that the definiton makes sense and the order of integration can be reversed for any EX×YE \in \mathcal{X} \times \mathcal{Y}. Linearity of integral implies that π\pi is finitely additive. Apply MCT shows that π\pi is countably additive.. Thus, π\pi is a measure on X×Y\mathcal{X} \times \mathcal{Y}.

To show that π\pi is unique, let ρ\rho be some other set function such that ρ(A×B)=μ(A)ν(B)\rho(A \times B) = \mu(A)\nu(B) for A×BRA \times B \in \mathcal{R}. Let M={EX×Y:π(E)=ρ(E)}\mathcal{M} = \{E \subset \mathbb{X} \times \mathbb{Y} : \pi(E) = \rho(E)\}. Then, M\mathcal{M} is a monotone class because for EiE=i=1EiE_i \uparrow E =\displaystyle\bigcup_{i=1}^\infty E_i, we can rewrite E=i=1DiE = \displaystyle\bigcup_{i=1}^\infty D_i where D1=E1D_1 = E_1 and Di=EiEi1D_i = E_i \setminus E_{i−1} for i2i \geq 2 are disjoint. So by countable additivity, π(E)=ρ(E)\pi(E)=\rho(E) and we can do the same for EiEE_i \downarrow E. Thus, by Monotone Class Theorem X×YM\mathcal{X} \times \mathcal{Y} \subseteq \mathcal{M}. Therefore, π\pi is unique on X×Y\mathcal{X} \times \mathcal{Y} for finite measures μ\mu and ν\nu.

Now let μ\mu and ν\nu be σ\sigma-finite measures. Let {Ai}i=1\{A_i\}_{i=1}^\infty and {Bi}i=1\{B_i\}_{i=1}^\infty be disjoint partitions of X\mathbb{X} and Y\mathbb{Y}, respectively, such that μ(Ai)<\mu(A_i)<\infty and ν(Bi)<\nu(B_i)<\infty. Then, for any EX×YE \in \mathcal{X} \times \mathcal{Y}, we define Eij=E(Ai×Bj)E_{ij} = E\displaystyle\cap (A_i\times B_j). From the above finite measure case,

1Eij(x,y)dμ(x)dν(y)=1Eij(x,y)dν(y)dμ(x)\displaystyle\int\displaystyle\int \mathbf{1}_{E_{ij}}(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) = \displaystyle\int\displaystyle\int \mathbf{1}_{E_{ij}}(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)

Sum over all ii and jj and apply MCT again to get

π(E)=1E(x,y)dμ(x)dν(y)=1E(x,y)dν(y)dμ(x)\pi(E) = \displaystyle\int\displaystyle\int \mathbf{1}_{E}(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) = \displaystyle\int\displaystyle\int \mathbf{1}_{E}(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)

Futhermore, Monotone convergence implies that π\pi is countably additive and hence a measure on X×Y\mathcal{X}\times\mathcal{Y}. For any other measure ρ\rho such that ρ(A×B)=μ(A)ν(B)\rho(A \times B) = \mu(A)\nu(B), countably additivity and uniqueness for finite measures implies that

π(E)=i,jπ(Ei,j)=i,jρ(Ei,j)=ρ(E)\pi(E) = \displaystyle\sum_{i,j}\pi(E_{i,j}) = \displaystyle\sum_{i,j}\rho(E_{i,j}) = \rho(E)

Hence, the extension of π\pi to X×Y\mathcal{X} \times\mathcal{Y} is unique. \square

Fubini-Toneli#

Fubini-Toneli Theorem

Let (X,X,μ)(\mathbb{X}, \mathcal{X} , \mu) and (Y,Y,ν)(\mathbb{Y}, \mathcal{Y}, \nu) be σ\sigma-finite measure spaces, and let f:X×YRf : \mathbb{X} \times \mathbb{Y} \rightarrow \mathbb{R} be measurable with respect to X×Y\mathcal{X} \times \mathcal{Y} such that either f0f\geq 0 (non-negative) or fd(μ×ν)<\displaystyle\int\displaystyle\int |f|\mathrm{d}(\mu\times\nu)<\infty (absolutely integrable). Then,

fd(μ×ν)=f(x,y)dμ(x)dν(y)=f(x,y)dν(y)dμ(x)\displaystyle\int\displaystyle\int f\mathrm{d}(\mu\times\nu) = \displaystyle\int\displaystyle\int f(x,y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) = \displaystyle\int\displaystyle\int f(x,y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)

Also, f(x,y)dμ(x)\displaystyle\int f(x, y)\mathrm{d}\mu(x) is Y\mathcal{Y}-measurable and f(x,y)dν(y)\displaystyle\int f(x, y)\mathrm{d}\nu(y) is X\mathcal{X}-measurable.

Proof

We have this for indicator functions from Existence and Uniqueness Theorem of Product Measure and thus the result for simple functions as integrals are linear.

Then, applying MCT to simple functions gives us that the above holds for non-negative measureable functions.

Instead, assume fd(μ×ν)<\displaystyle\int\displaystyle\int|f| \mathrm{d}(\mu\times\nu) < \infty. Then, we can write f=f+ff = f^+ - f^- and the above holds for f+f^+ and ff^- separately. i.e. f+(x,y)dμ(x)<\displaystyle\int f^+(x, y)\mathrm{d}\mu(x) < \infty for ν\nu-almost-every yy and f+(x,y)dν(y)<\displaystyle\int f^+(x, y)\mathrm{d}\nu(y) < \infty for μ\mu-almost-every xx and similarly for ff^-. Therefore,

f(x,y)dν(y)=f+(x,y)dν(y)+f(x,y)dν(y)<   (μa.e.)\displaystyle\int |f|(x, y)\mathrm{d}\nu(y) = \displaystyle\int f^+(x, y)\mathrm{d}\nu(y) + \displaystyle\int f^-(x, y)\mathrm{d}\nu(y) < \infty ~~~(\mu-a.e.)

and thus,

f(x,y)dν(y)=f+(x,y)dν(y)f(x,y)dν(y)   (νa.e.)\displaystyle\int f(x, y)\mathrm{d}\nu(y) = \displaystyle\int f^+(x, y)\mathrm{d}\nu(y) - \displaystyle\int f^-(x, y)\mathrm{d}\nu(y) ~~~(\nu-a.e.)

As we only require finiteness to occur almost everywhere to have the integral exist, we can integrate both sides of the above with respect to μ\mu to get

f(x,y)dν(y)dμ(x)=f+(x,y)dν(y)dμ(x)f(x,y)dν(y)dμ(x)\displaystyle\int\displaystyle\int f(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x) = \displaystyle\int\displaystyle\int f^+(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x) - \displaystyle\int\displaystyle\int f^-(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)

Do the same swapping μ\mu and ν\nu to conclude the theorem. \square

The above theorem lets us swap the order of integration for the product of two measure spaces. This can be extended by induction to the finite product of nn measure spaces.

Probability and Measure — Part 2
https://astronaut.github.io/posts/measure-theory-part2/
Author
关怀他人
Published at
2025-03-10
License
CC BY-NC-SA 4.0