3634 words

18 minutes

Probability and Measure — Part 2

2025-03-10

2025-12-19

Math

/

Measure

/

Probability

本篇是 Adam B Kashlak 老师的 Probability and Measure Theory 课程笔记 Part 2

Functions, Random Variables and Integration#

Simple Functions and Random Variables#

Simple Random Variable
Let $(\Omega, \mathcal{F}, P)$ be a Probability Space i.e. $P(\Omega)=1$ . A simple random variable $X : \Omega\rightarrow\mathbb{R}$ is a real valued function that only takes on a finite number of values $x_1, \dots , x_p$ and such that the set
$\{\omega\in\Omega : X(\omega)=x_i\}\in\mathcal{F}$

One way to write such a function is to finitely partition $\Omega$ into disjoint sets $\{A_i\}_{i=1}^p$ i.e. $\displaystyle\bigcup_{i=1}^p A_i=\Omega$ and $A_i\displaystyle\cap A_j=\varnothing$ and write

X(\omega) = \displaystyle\sum_{i=1}^px_i \mathbf{1}[\omega\in A_i]

Then we can say that the probability that $X$ is equal to $x_i$ is

P(X=x_i)=P(\{\omega\in\Omega : X(\omega)=x_i\})=P(A_i)

Furthermore, this allows us to define the expectation of the simple random variable $X$ to be

\mathbb{E}X = \displaystyle\sum_{i=1}^p x_iP(X=x_i)

Simple Measurable Function
Let $(\Omega, \mathcal{F}, \mu)$ be a Probability Space. A simple function $F:\Omega\rightarrow\mathbb{R}$ is s.t.
$F(\omega)=\displaystyle\sum_{i=1}^p x_i\mathbf{1}[\omega\in B_i]~~ ~B_i\in\mathcal{F}$
which is the linear combination of indicator functions. The sets $B_i$ need not be disjoint, but given a simple function, we can define it in terms of disjoint $B_i$ .

Then, we define the integral of a simple function to be

\displaystyle\int F\mathrm{d}\mu \vcentcolon= \displaystyle\sum_{i=1}^p x_i\mu(B_i)

Measurable Functions and Random Variables#

To extend the above idea of a simple random variable, we want to replace the finite $x_i$ with any Borel set $\mathcal{B} \subset \mathbb{R}$ . We need two Measureable Spaces $(\mathbb{X}, \mathcal{X})$ and $(\mathbb{Y}, \mathcal{Y})$ .

Measurable Function
A function $f : \mathbb{X}\rightarrow\mathbb{Y}$ is said to be measurable (with respect to $\mathcal{X}/\mathcal{Y}$ ) if $f^{-1}(B)\in\mathcal{X}$ for any $B\in\mathcal{Y}$ . If $\mathbb{Y}=\mathbb{R}$ , then we say that $f$ is a $\mathcal{X}$ -measurable.

Typically, the $\sigma$ -Fields of interest are the Borel $\sigma$ -Fields and it is sometimes writen as $(\mathbb{X}, \mathcal{B}(\mathbb{X}))$ when we have a topological space. Moreover, the space $(\mathbb{X}, \mathcal{X})$ is typically taken to be $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$ or $(\mathbb{R}^+, \mathcal{B}(\mathbb{R}^+))$ . In this case, we say that $f$ is Borel Measurable.

If we replace $\mathcal{B}(\mathbb{R})$ with $\mathcal{M}_λ(\mathbb{R})$ , the set of Lebesgue measurable subsets of $\mathbb{R}$ , then we say $f$ is Lebesgue Measurable.

Cool facts about Measurable Functions#

Inverse images of set functions preserve set operations. i.e. for $f:\mathbb{X}\rightarrow\mathbb{Y}$ and $A, A_i\subset\mathbb{Y}, i\in \mathcal{I}$ ,
$f^{-1}\left(\displaystyle\bigcup_{i\in \mathcal{I}} A_i\right) = \displaystyle\bigcup_{i\in \mathcal{I}} f^{-1}(A_i) ~~ \text{and} ~~ f^{-1}(\mathbb{Y}\setminus A) = \mathbb{X} \setminus f^{-1}(A)$
For a measurable set function $f$ , this implies that $\{f^{-1}(B) : B \in \mathcal{Y}\}$ is a $\sigma$ -Field and is contained in $\mathcal{X}$ . Hence, we want $\mathcal{Y}$ to be no larger than $\mathcal{X}$ to have measurable functions. Furthermore, this can be used to show that the measurability of $f$ can be established by looking only at a collection of sets $\mathcal{A} \subset \mathcal{Y}$ that generates $\mathcal{Y}$ .

Example
let $\mathcal{A}$ be the set of all half-lines $A_t = (−\infty, t]$ for $t \in \mathbb{R}$ will generate $\mathcal{B}(\mathbb{R})$ . Thus, $f$ is measurable as long as the sets $\{x : f(x) \leq t\}$ are measurable.
For any $A \in \mathcal{X}$ , the indicator functions $f(x) = \mathbf{1}[x \in A]$ are measurable. The $\sigma$ -Field generated by $f^{−1}$ is simply $\{\varnothing, A, A^c, \mathbb{X}\}\subset\mathcal{X}$ .
For measurable functions $f, g : \mathbb{X} \rightarrow \mathbb{R}$ , the functions $f + g$ and $fg$ are measurable.
For measurable functions, $\{f_i\}_{i=1}^\infty$ from $\mathbb{X}$ to $\mathbb{R}$ , the following are also measurable: $\sup_i f_i$ , $\inf_i f_i$ , $\displaystyle\limsup_i f_i$ , $\displaystyle\liminf_i f_i$ and $\displaystyle\lim_i f_i$ if it exists.

Proof
In set notation, $\{x : \sup_i f_i(x)\leq t\}=\displaystyle\bigcap_{i=1}^\infty\{x : f_i(x)\leq t\}$ where the righthand side is a countable intersection of measurable sets and hence measurable. Similarly, $\{x : \inf_i f_i(x)\leq t\}=\displaystyle\bigcup_{i=1}^\infty\{x : f_i(x)\leq t\}$ and $\displaystyle\limsup_i f_i = \inf_i \sup_{j\geq i} f_i$ and $\displaystyle\liminf_i f_i = \sup_i\inf_{j\geq i} f_i$ . If $\displaystyle\lim_i f_i$ exists then $\displaystyle\limsup f_i=\displaystyle\lim_i f_i=\displaystyle\liminf_i f_i$ . $\square$
Let $f : \mathbb{X}\rightarrow\mathbb{R}$ be a continuous function, then it is measurable.

Proof
If $U$ is an open set in $\mathbb{R}$ , then $f^{−1}(U)$ is open in $\mathbb{X}$ (definition of a continous function between two topological spaces). Thus, the set $f^{−1}(U)$ is measurable. Since the open sets of $\mathbb{R}$ generate $\mathcal{B}(\mathbb{R})$ , the function $f$ is measurable. $\square$
Given a collection of functions $f_i: \mathbb{X}\rightarrow\mathbb{Y}, i\in \mathcal{I}$ , we can make them measurable by constructing the measurable space $(\mathbb{X},\mathcal{X})$ where $\sigma(\{f_i\}_{i\in \mathcal{I}})\subseteq\mathcal{X}$ is the $\sigma$ -field generated by the sets $f^{-1}_i(B)$ for all $i$ and $B\in\mathcal{Y}$ .

Almost Surely / Almost Everywhere
Let $(\Omega, \mathcal{F}, \mu)$ be a measure space. For two functions $f, g : \Omega\rightarrow\mathbb{R}$ , we say that $f = g$ a.e. (almost everywhere) when the set $N = \{\omega : f(\omega) \neq g(\omega)\}$ has measure $\mu(N) = 0$ .

In probability theory, “almost everywhere” is replaced with “almost surely” abbreviated a.s. and it is equivalently is written “with probability 1” or w.p.1.

Example
Let $([0, 1], \mathcal{B}, \lambda)$ be the standard measure space of Borel sets on the unit interval with Lebesgue measure. Let $f(t)=0$ for all $t\in(0,1]$ and $g(t) = 0$ on $(0, 1]\setminus\mathbb{Q}$ and $g(t)=1$ on $(0, 1]\displaystyle\cap\mathbb{Q}$ .
Then we have $f=g$ a.e. that is $\lambda((0, 1)\displaystyle\cap\mathbb{Q}) = 0$ . To prove that $\lambda(\mathbb{Q})=0$ , we enumerate each rational number $\{q_m\}_{m=1}^\infty$ and surround each $q_m$ with an interval $(q_m-\frac{\epsilon}{2^m}, q_m + \frac{\epsilon}{2^m})$ . For $\epsilon > 0$ , $\displaystyle\sum_{m=1}^\infty \lambda((q_m-\frac{\epsilon}{2^m}, q_m + \frac{\epsilon}{2^m}))=\displaystyle\sum_{m=1}^\infty\frac{\epsilon}{2^{m-1}}=2\epsilon$ . Since $\epsilon$ is arbitrary, we have $\lambda(\mathbb{Q})=0$ .

Integration#

We will consider measurable functions mapping from $(\Omega, \mathcal{F})$ to $[-\infty, \infty]$ , which is called the extended real line. This allows us to handle sets such as $f^{-1}(\infty)$ .

Notation : $f_i\uparrow f$
We use $f_i\uparrow f$ to represent a sequence of functions $\{f_i\}$ that are increasing and converge to $f$ for every $\omega$ . This implies that $f_i(\omega)\rightarrow f(\omega)$ and $f_i(\omega)\leq f_{i+1}(\omega)$ for all $\omega$ .

Theorem
Let $(\Omega, \mathcal{F})$ be a measurable space and $\mathcal{A}$ a $\pi$ -system that generates $\mathcal{F}$ ( $~\sigma(\mathcal{A})=\mathcal{F}~$ ). Let $\mathcal{V}$ be a linear space that contains

all indicators $\mathbf{1}_\Omega$ and $\mathbf{1}_A$ for each $A\in\mathcal{A}$

all functions $f$ such that $\exists f_i\in\mathcal{V}$ s.t. $f_i\uparrow f$

Then $\mathcal{V}$ contains all measurable functions.

Proof
First, $\mathbf{1}_A\in\mathcal{A}$ . Let $\mathcal{L}=\{B\in\mathcal{F} : \mathbf{1}_B\in\mathcal{V}\}$ , we claim that $\mathcal{L}$ is a $\lambda$ -system. Since $\mathcal{A}\subset\mathcal{L}$ , by Dynkin $\pi$ - $\lambda$ Theorem, $\sigma(\mathcal{A})\subseteq\mathcal{L}$ . By the definition of $\mathcal{L}$ , we have that $\mathcal{V}\subseteq\mathcal{F}$ , thus $\mathcal{L}=\mathcal{F}$ , so every indicator function $\mathbf{1}_B$ is in $\mathcal{V}$ .
Let $f$ be a non-negative measurable function, we can define $f_i=2^{-i}\lfloor2^if\rfloor$ . Each $f_i$ is a finite linear combination of indicator functions and hence $f_i \in \mathcal{V}$ . Furthermore, $f_i\uparrow f$ and thus $f\in\mathcal{V}$ .
Lastly, for a general measurable $f$ , we write $f=f^+-f^-$ where
$\begin{aligned} f^+(\omega)&= \begin{cases} f(\omega) & \text{if } f(\omega)\geq 0 \\ 0 & \text{otherwise} \end{cases} \\ f^-(\omega)&= \begin{cases} -f(\omega) & \text{if } f(\omega)< 0 \\ 0 & \text{otherwise} \end{cases} \end{aligned}$
which are both non-negative measurable functions. $\square$

Integral of a Measurable Function
For a non-negative measurable function $f : \Omega\rightarrow [0,\infty]$ on the measure space $(\Omega, \mathcal{F}, \mu)$ , we define
$\displaystyle\int f\mathrm{d}\mu = \sup\left[\displaystyle\sum_{i\in \mathcal{I}} \left\{\inf_{\omega\in A_i} f(\omega) \right\}\mu(A_i)\right]$
where the supremum is taken over all finite partitions $\{A_i\}_{i\in \mathcal{I}}$ of $\Omega$ .

Inside the square brackets is the integral of the simple function that assigns a value of $\inf_{\omega\in A_i} f(\omega)$ for the set A_i. Hence, for a non-negative $f$ , we consider all simple functions $g$ such that $0 \leq g \leq f$ and define the integral to be the supremum of the integral of $g$ . To extend this to all measurable functions $f$ , we write $f=f^+-f^-$ then

\displaystyle\int f\mathrm{d}\mu = \displaystyle\int f^+\mathrm{d}\mu - \displaystyle\int f^-\mathrm{d}\mu

Monotone Convergence Theorem for Simple Functions
Let $(\Omega, \mathcal{F}, \mu)$ be a measure space, and let $f \geq 0$ be measureable, and $f_n \geq 0~\forall n \in \mathbb{N}$ be a sequence of measureable simple functions such that $f_n \uparrow f$ . Then, $\displaystyle\int f_n\mathrm{d}\mu\uparrow\displaystyle\int f\mathrm{d}\mu$ .

Proof
Since $f_n \uparrow f$ and $f$ is integrable, then $\displaystyle\int f_n\mathrm{d}\mu \uparrow c\in[0,\infty)$ and $c\leq \displaystyle\int f\mathrm{d}\mu$ . To prove the other side of the inequality, let $g$ be any simple measureable function s.t. $0\leq g\leq f$ . Our goal is to show that $\displaystyle\int g\mathrm{d}\mu\leq c$ as this will imply that $c\geq \displaystyle\int f\mathrm{d}\mu$ .
First, we write $g=\displaystyle\sum_{i\in \mathcal{I}}a_i \mathbf{1}_{A_i}$ where $A_i$ are disjoint partition of $\Omega$ and similarly $f_n=\displaystyle\sum_{j\in \mathcal{J}} b_j^{(n)} \mathbf{1}_{B_j^{(n)}}$ . Consequently,
$f_n = \displaystyle\sum_{i\in \mathcal{I}} f_n \mathbf{1}_{A_i}=\displaystyle\sum_{i, j} b_j^{(n)} \mathbf{1}_{A_i\displaystyle\cap B_j^{(n)}}$
and $\displaystyle\int f_n\mathrm{d}\mu=\displaystyle\sum_{i\in\mathcal{I}}\displaystyle\int_{A_i}f_n\mathrm{d}\mu$ . Hence, we want to show that for each $i \in \mathcal{I}$ that
$\displaystyle\lim_{n\rightarrow \infty}\displaystyle\int_{A_i}f_n\mathrm{d}\mu\geq \displaystyle\int_{A_i} g\mathrm{d}\mu = a_i\mu(A_i)$
First, if $a_i=0$ , then the eqaulity above must hold. If $a_i>0$ , we can divide by $a_i$ and without loss of generality, we take $a_i=1$ and consider $g=\mathbf{1}_A$ for some set $A$ .
For any $\epsilon>0$ , let $C_n=\{x\in A : f_n(x) > 1 - \epsilon\}$ . Then $C_n\uparrow A$ , which means
$C_1\subseteq C_2\subseteq \cdots \subseteq A ~~\text{ and }~ \displaystyle\bigcup_{n=1}^\infty C_n = A$
By countable additivity of $\mu$
$\mu(C_{n+1}) = \mu(C_1) + \displaystyle\sum_{i=1}^n \mu(C_{i + 1} \setminus C_i)$
and $\mu(C_n)\uparrow \mu(A)$ . Since $\displaystyle\int f_n\mathrm{d}\mu \geq (1 - \epsilon)\mu(C_n)$ , we have that $c\geq (1 - \epsilon)\mu(A) = (1 - \epsilon)\displaystyle\int g\mathrm{d}\mu$ . Take $\epsilon$ to zero gives $c\geq \displaystyle\int g\mathrm{d}\mu$ . $\square$

Theorem
Let $(\Omega, \mathcal{F}, \mu)$ be a measure space, and let $f, g : \Omega \rightarrow [-\infty, \infty]$ be measureable and $f = g$ a.e. Then, $f$ is integrable if and only if $g$ is integrable. If $f$ and $g$ are integrable, then $\displaystyle\int f\mathrm{d}\mu = \displaystyle\int g\mathrm{d}\mu$ .

Proof
Let $f = g$ on $\Omega \setminus N$ where $\mu(N) = 0$ . For any measureable function $h : \Omega \rightarrow [-\infty, \infty]$ , we can consider when $\displaystyle\int_\Omega h\mathrm{d}\mu = \displaystyle\int_{\Omega\setminus N} h\mathrm{d}\mu$ coincide.

True if $h$ is an indicator function because $\mu(A) = \mu(A\setminus N)$

Also true for $h$ being a non-negative simple function.

From MCT for Simple Functions, we can take non-negative simple functions $h$ to any non-negative measureable function.

Lastly, write $\displaystyle\int h\mathrm{d}\mu = \displaystyle\int h^+\mathrm{d}\mu - \displaystyle\int h^-\mathrm{d}\mu$ shows that the integrals will coincide for any integrable measureable function.

To complete the proof, we note that
$\displaystyle\int f\mathrm{d}\mu = \displaystyle\int_{\Omega\setminus N}f \mathrm{d}\mu = \displaystyle\int_{\Omega\setminus N}g \mathrm{d}\mu = \displaystyle\int g\mathrm{d}\mu$
This result basically tells us that we can modify functions on a set of measure zero without breaking anything. $\square$

Monotone Convergence Theorem#

Monotone Convergence Theorem
Let $(\Omega, \mathcal{F}, \mu)$ be a measure space and let $\{f_i\}_{i=1}^\infty$ be measurable functions from $\Omega$ to $[-\infty, \infty]$ such that $f_i \uparrow f$ $\mu$ -a.e. and $\displaystyle\int f_1\mathrm{d}\mu > -\infty$ . Then $\displaystyle\int f_i\mathrm{d}\mu \uparrow \displaystyle\int f\mathrm{d}\mu$ . (This means that we can swap the limit and the integral)

Proof
First, we need to check that $f$ is measureable. For $c\in\mathbb{R}$ , we consider the sets $(c, \infty]$ , which generate the Borel $\sigma$ -Field. Since $f_i\uparrow f$ , $f^{-1}((c, \infty]) = \displaystyle\bigcup_{i=1}^\infty f_i^{-1}((c, \infty])$ and $f_i^{-1}((c, \infty]) \in \mathcal{F}$ , thus $f$ is measureable. (by the first cool fact about measurable functions)
Next, assume that $f_1 \geq 0$ , and for each $f_i$ we take simple functions $g_{ij}$ such that $g_{ij} \uparrow f_i$ as $j\rightarrow\infty$ . Thus, by MCT for Simple Functions, $\displaystyle\int g_{ij}\mathrm{d}\mu\uparrow\displaystyle\int f_i\mathrm{d}\mu$ . Furthermore, let $g_i^* = \max\{g_{1i}, \cdots , g_{ii}\}$ . These $g_i^*$ are simple functions and $g_i^*\uparrow f$ . Once again, MCT for Simple Functions implies that $\displaystyle\int g_i^*\mathrm{d}\mu\uparrow\displaystyle\int f\mathrm{d}\mu$ . But since $g_i^* \leq f_i$ by construction, $\displaystyle\int g_i^*\mathrm{d}\mu \leq \displaystyle\int f_i\mathrm{d}\mu \leq \displaystyle\int f\mathrm{d}\mu$ . Thus, $\displaystyle\int f_i\mathrm{d}\mu\uparrow\displaystyle\int f\mathrm{d}\mu$ . This means we are done for non-negative fucntions ( $f_1\geq 0$ ).
Now, we assume that $f \leq 0$ . In this case $f_i \uparrow f$ implies that $−f_i \downarrow −f$ . Let $h=-f$ and $h_i=-f_i$ , we have $0\leq \displaystyle\int h\mathrm{d}\mu\leq \displaystyle\int h_i\mathrm{d}\mu$ . Next, note that $0\leq h_1-h_i\uparrow h_1-h$ . Applying the above result gives that $\displaystyle\int (h_1-h_i)\mathrm{d}\mu\uparrow \displaystyle\int (h_1-h)\mathrm{d}\mu$ . Since all of the $h$ have finite integrals, we are allowed to subtract to get that $\displaystyle\int h_i\mathrm{d}\mu\downarrow\displaystyle\int h\mathrm{d}\mu$ and thus $\displaystyle\int f_i\mathrm{d}\mu\uparrow\displaystyle\int f\mathrm{d}\mu$ .
For a general function $f = f^+ - f^-$ , we have $f_i^+\uparrow f^+$ and $f_i^-\downarrow f^-$ and $\displaystyle\int f^-\mathrm{d}\mu < \infty$ . So by the above special cases, $\displaystyle\int f_i^+\mathrm{d}\mu\uparrow\displaystyle\int f^+\mathrm{d}\mu$ and $\displaystyle\int f_i^-\mathrm{d}\mu\downarrow\displaystyle\int f^-\mathrm{d}\mu$ . Finally, $\displaystyle\int f_i\mathrm{d}\mu \uparrow \displaystyle\int f\mathrm{d}\mu$ . $\square$

We only require $f_i \uparrow f$ to hold almost everywhere to establish the result. Hence convergence can fail on a set of measure (probability) zero and we still have convergence of the integrals.

Secondly, we can redo the above proof for $f_i \downarrow f$ with $\displaystyle\int f_1\mathrm{d}\mu < \infty$ to get a similar result for decreasing sequence.

Fatou’s Lemma #

Fatous’ Lemma
Let $(\Omega, \mathcal{F}, \mu)$ be a measure space and let $\{f_i\}_{i=1}^\infty$ be non-negative measurable functions from $\Omega$ to $[-\infty, \infty]$ . Then, $\displaystyle\int \displaystyle\liminf f_i\mathrm{d}\mu \leq \displaystyle\liminf \displaystyle\int f_i\mathrm{d}\mu$

Proof
Recall that $\displaystyle\liminf_{i\rightarrow\infty} = \sup_j \inf_{i>j} f_i$ . Hence, let $g_j=\inf_{i\geq j} f_i$ . Then, $g_j\uparrow \displaystyle\liminf_{i\rightarrow\infty} f_i$ and $f_1\geq 0$ by assumption. So MCT for Simple Functions says that $\displaystyle\int g_j\mathrm{d}\mu\uparrow\displaystyle\int\displaystyle\liminf_{i\rightarrow\infty}f_i\mathrm{d}\mu$ . By construction, $g_j\leq f_i$ for any $i\geq j$ , thus $\displaystyle\int g_j\mathrm{d}\mu \leq \displaystyle\int f_i\mathrm{d}\mu$ for any $i\geq j$ and subsequently, $\displaystyle\int g_j\mathrm{d}\mu\leq \inf_{i\geq j}\displaystyle\int f_i\mathrm{d}\mu$ . Taking $j\rightarrow\infty$ gives $\displaystyle\lim_{j\rightarrow\infty}\displaystyle\int g_j\mathrm{d}\mu = \displaystyle\int\displaystyle\liminf f_i\mathrm{d}\mu\leq\displaystyle\liminf\displaystyle\int f_i\mathrm{d}\mu$ . $\square$

Dominated Convergence Theorem#

Dominated Convergence Theorem
Let $(\Omega, \mathcal{F}, \mu)$ be a measure space and let $\{f_i\}_{i=1}^\infty$ and $g$ be measurable functions and absolutely integrable. If $|f_i| \leq g$ for all $i$ and $f_i(\omega) \rightarrow f(\omega)$ for each $\omega \in \Omega$ (i.e. pointwise convergence), then $f$ is absolutely integrable and $\displaystyle\int f_i\mathrm{d}\mu \rightarrow \displaystyle\int f\mathrm{d}\mu$ .

Proof
Let $f_i^\wedge = \inf_{j\geq i} f_j$ and $f_i^\vee = \sup_{j\geq i} f_j$ . Then $f_i^\wedge \leq f_i \leq f_i^\vee$ . We have that $f_i^\wedge\uparrow f$ and $\displaystyle\int f_1^\wedge\mathrm{d}\mu\geq -\displaystyle\int g\mathrm{d}\mu > -\infty$ . So MCT for Simple Functions implies that $\displaystyle\int f_i^\wedge\mathrm{d}\mu\uparrow \displaystyle\int f\mathrm{d}\mu$ .
Do the same for $f_i^\vee$ , we have that $f_i^\vee\downarrow f$ and hence that $\displaystyle\int f_i^\vee\mathrm{d}\mu\downarrow\displaystyle\int f\mathrm{d}\mu$ . Since $\displaystyle\int f_i^\wedge\mathrm{d}\mu\leq \displaystyle\int f_i\mathrm{d}\mu\leq \displaystyle\int f_i^\vee\mathrm{d}\mu$ , we have the desired result that $\displaystyle\int f_i\mathrm{d}\mu\rightarrow \displaystyle\int f\mathrm{d}\mu$ . $\square$

Lebesgue-Stieltjes Measure#

Let $(\mathbb{X}, \mathcal{X})$ and $(\mathbb{Y}, \mathcal{Y})$ be two measurable spaces. Let $\psi : \mathbb{X}\rightarrow\mathbb{Y}$ be a measurable function and $\mu$ be a measure on $\mathcal{X}$ . Then we can define $\nu = \mu \circ \psi^{-1}$ to be a measure of $\mathcal{Y}$ . This allows us to turn Lebesgue measure into Lebesgue-Stieltjes measures.

Theorem
Let $F : \mathbb{R} \rightarrow \mathbb{R}$ be non-constant, right-continuous, and non-decreasing. Then, there exists a unique measure $\mathrm{d}F$ on $\mathbb{R}$ such that for all $a, b \in \mathbb{R}$ with $a < b$ ,
$\mathrm{d}F((a, b]) = F(b) - F(a)$

Proof
Let $F(\infty) = \displaystyle\lim_{x\rightarrow\infty} F(x)$ and $F(-\infty) = \displaystyle\lim_{x\rightarrow-\infty}F(x)$ . We define an open interval $I = (F(-\infty), F(\infty))$ and define $g(y) = \inf\{x\in\mathbb{R} : y\leq F(x)\}$ . We want to define $\mathrm{d}F$ to be $\lambda \circ g^{−1}$ where $\lambda$ is Lebesgue Measure on $\mathbb{R}$ , so we need to show that this makes sense.
We first show that $g$ is left-continuous and non-decreasing and for $y \in I$ and $x \in \mathbb{R}, g(y) \leq x$ if and only if $y \leq F(x)$ . To show this, fix a $y\in I$ and let $J_y = \{x\in\mathbb{R} : y\leq F(x)\}$ . As $F$ is non-decreasing, if $x\in J_y$ and $x'\geq x$ then $x'\in J_y$ . As $F$ is right-continuous, if $x_n\in J_y$ and $x_n\downarrow x$ then $x\in J_y$ . Therefore, $J_y = [g(y), \infty)$ and $g(y)\leq x$ if and only if $y\leq F(x)$ . Secondly, for $y\leq y'$ , we have that $J_{y'}\subseteq J_y$ and thus $g(y)\leq g(y')$ . So if $y_n\uparrow y$ , then $J_y = \displaystyle\bigcap_{n=1}^\infty J_{y_n}$ and thus $g(y_n)\rightarrow g(y)$ , which implies that $g$ is left continuous and non-decreasing.
From the above, $g$ is Borel Measurable and thus defining $\mathrm{d}F = \lambda \circ g^{-1}$ gives us that
$\mathrm{d}F((a, b]) = \lambda(\{y : g(y) > a, g(y) \leq b\}) = \lambda((F(a), F(b)]) = F(b) - F(a)$
Furthermore, this measure, $\mathrm{d}F$ , is unique by using the same arguments as before for Lebesgue Measure. $\square$
In the case that $F : \mathbb{R} \rightarrow [0, 1]$ such that the interval $I = [0, 1]$ , we have a cumulative distribution function, which induces a measure on the real line. This allows us to do things like integrate with respect to such measures——i.e. take an expectation.

Radon Measure#

Definition
Let $(\Omega, \mathcal{B}, \mu)$ be a measure space where $\mathcal{B}$ is the Borel $\sigma$ -field. The measure $\mu$ is said to be a Radon Measure if $\mu(K) < \infty$ for all compact $K \in \mathcal{B}$ .

$\mathrm{d}F$ is a Radon Measure.
Every non-zero Radon Measure on $\mathcal{B}(\mathbb{R})$ can be written as $\mathrm{d}F = \lambda \circ g^{−1}$ for some $F$ .

If $\mu$ is a Radon Measure on $\mathbb{R}$ , then we can define $F$ as
$F(x) = \begin{cases} \mu((0, x]) & \text{if } x > 0 \\ -\mu((x, 0]) & \text{if } x < 0 \\ \end{cases}$
Thus, $F(b) − F(a) = \mu((a, b])$ for $a < b$ and hence $\mu = \mathrm{d}F$ by uniqueness.

Product Measure#

Definition
Given two $\sigma$ -fields $\mathcal{X}$ and $\mathcal{Y}$ , we define the product $\sigma$ -field to be $\mathcal{X} \times \mathcal{Y}$ , which is the $\sigma$ -filed generated by the rectangle $A\times B$ where $A\in\mathcal{X}$ and $B\in\mathcal{Y}$ . The collection of all rectangles will be denoted as $\mathcal{R}$

Monotone Class#

Definition
A collection of subsets $\mathcal{M}$ of $\Omega$ is said to be monotone if

for $\{A_i\}_{i=1}^\infty$ s.t. $A_i \in \mathcal{M}$ and $A_i \uparrow A = \displaystyle\bigcup_{i=1}^\infty Ai$ , then $A \in \mathcal{M}$ .

for $\{A_i\}_{i=1}^\infty$ s.t. $A_i \in \mathcal{M}$ and $A_i \downarrow A = \displaystyle\bigcup_{i=1}^\infty Ai$ , then $A \in \mathcal{M}$ .

Note that if a field $\mathcal{A}$ is also monotone, then it is a $\sigma$ -field (by definition of $\sigma$ -field.)

Monotone Class Theorem
Let $\mathcal{A}$ be a field and $\mathcal{M}$ be monotone such that $\mathcal{A}\subset\mathcal{M}$ . Then, $\sigma(\mathcal{A}) \subseteq \mathcal{M}$

Proof
This proof is similar to the one of Dynkin $\pi$ - $\lambda$ Theorem.

Existence and Uniqueness of Product Measure#

Existence and Uniqueness Theorem of Product Measure
Let $(\mathbb{X}, \mathcal{X} , \mu)$ and $(\mathbb{Y}, \mathcal{Y}, \nu)$ be $\sigma$ -finite measure spaces. We denote the product $\sigma$ -Field to be $\mathcal{X} \times \mathcal{Y}$ . (the cartesian product of two $\sigma$ -Fields may not be a $\sigma$ -Field) Let $π$ be a set function on $\mathcal{X}\times\mathcal{Y}$ such that for $A \in \mathcal{X}$ and $B \in \mathcal{Y}$ , $π(A \times B) = \mu(A)\nu(B)$ . Then, $π$ extends uniquely to a measure on $(\mathbb{X} \times \mathbb{Y}, \mathcal{X}\times\mathcal{Y})$ such that for any $E \in \mathcal{X} \times \mathcal{Y}$ ,
$\pi(E) = \displaystyle\int\displaystyle\int \mathbf{1}_E(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) = \displaystyle\int\displaystyle\int \mathbf{1}_E(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)$

Lemma
Let $(\mathbb{X}, \mathcal{X} , \mu)$ and $(\mathbb{Y}, \mathcal{Y}, \nu)$ be finite measure spaces, and let
$\mathcal{F} = \left\{E \subset \mathbb{X}\times\mathbb{Y} : \displaystyle\int\displaystyle\int \mathbf{1}_E(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) = \displaystyle\int\displaystyle\int \mathbf{1}_E(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)\right\}$
Then $\mathcal{X}\times\mathcal{Y} \subseteq \mathcal{F}$ . ( This means that $(\mathbb{X}\times\mathbb{Y}, \mathcal{X}\times\mathcal{Y}) \equiv (\mathbb{Y}\times\mathbb{X}, \mathcal{Y}\times\mathcal{X})~$ )

Proof
Let $E = A \times B$ for $A \in \mathcal{X}$ and $B \in \mathcal{Y}$ , i.e. $E \in \mathcal{R}$ . Then,
$\begin{aligned} \displaystyle\int\displaystyle\int \mathbf{1}_E(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) &=\mu(A) \displaystyle\int \mathbf{1}_B(y)\mathrm{d}\nu(y) \mu(A)\nu(B) =\nu(B)\mu(A)\\ &=\nu(B) \displaystyle\int \mathbf{1}_A(x)\mathrm{d}\mu(x) =\displaystyle\int\displaystyle\int \mathbf{1}_E(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x) \end{aligned}$
Therefore, $\mathcal{R} \subset \mathcal{F}$ .
Also, for disjoint $R_1, R_2 \in \mathcal{R}$ , $\mathbf{1}_{R_1\displaystyle\cup R_2} = \mathbf{1}_{R_1} + \mathbf{1}_{R_2}$ . Hence, $\mathcal{F}$ contains finite disjoint unions of $\mathcal{R}$ . This implies that the field generated by the set of rectangles $\mathcal{A} \subset \mathcal{F}$ . (Dudley 3.2.3)
Next, consider $\{E_i\}_{i=1}^\infty, E_i\in\mathcal{F}$ . Then if $E_i\uparrow E$ , then MCT says
$\displaystyle\int\displaystyle\int \mathbf{1}_{E_i}(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) ~\big\uparrow \displaystyle\int\displaystyle\int \mathbf{1}_{E}(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y)$
and
$\displaystyle\int\displaystyle\int \mathbf{1}_{E_i}(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x) ~\big\uparrow \displaystyle\int\displaystyle\int \mathbf{1}_{E}(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)$
Thus, $E \in \mathcal{F}$ and the same holds if $E_i \downarrow E$ . Therefore, $\mathcal{F}$ is a monotone class. Finally, applying the Monotone Class Theorem shows that $\mathcal{X} \times \mathcal{Y} = \sigma(\mathcal{A}) \subset \mathcal{F}$ . $\square$

Proof : Existence and Uniqueness of Product Measure
First, we consider the case that $\mu$ and $\nu$ are finite measures and $\pi(A\times B) = \nu(A)\nu(B)$ . Then we extend to a set function
$\pi(E)\vcentcolon=\displaystyle\int\displaystyle\int \mathbf{1}_E(x, y)\mathrm{d}\mu(x)\mathrm{d}(y)$
for any $E\in \mathcal{X}\times\mathcal{Y}$ . The above lemma says that the definiton makes sense and the order of integration can be reversed for any $E \in \mathcal{X} \times \mathcal{Y}$ . Linearity of integral implies that $\pi$ is finitely additive. Apply MCT shows that $\pi$ is countably additive.. Thus, $\pi$ is a measure on $\mathcal{X} \times \mathcal{Y}$ .
To show that $\pi$ is unique, let $\rho$ be some other set function such that $\rho(A \times B) = \mu(A)\nu(B)$ for $A \times B \in \mathcal{R}$ . Let $\mathcal{M} = \{E \subset \mathbb{X} \times \mathbb{Y} : \pi(E) = \rho(E)\}$ . Then, $\mathcal{M}$ is a monotone class because for $E_i \uparrow E =\displaystyle\bigcup_{i=1}^\infty E_i$ , we can rewrite $E = \displaystyle\bigcup_{i=1}^\infty D_i$ where $D_1 = E_1$ and $D_i = E_i \setminus E_{i−1}$ for $i \geq 2$ are disjoint. So by countable additivity, $\pi(E)=\rho(E)$ and we can do the same for $E_i \downarrow E$ . Thus, by Monotone Class Theorem $\mathcal{X} \times \mathcal{Y} \subseteq \mathcal{M}$ . Therefore, $\pi$ is unique on $\mathcal{X} \times \mathcal{Y}$ for finite measures $\mu$ and $\nu$ .
Now let $\mu$ and $\nu$ be $\sigma$ -finite measures. Let $\{A_i\}_{i=1}^\infty$ and $\{B_i\}_{i=1}^\infty$ be disjoint partitions of $\mathbb{X}$ and $\mathbb{Y}$ , respectively, such that $\mu(A_i)<\infty$ and $\nu(B_i)<\infty$ . Then, for any $E \in \mathcal{X} \times \mathcal{Y}$ , we define $E_{ij} = E\displaystyle\cap (A_i\times B_j)$ . From the above finite measure case,
$\displaystyle\int\displaystyle\int \mathbf{1}_{E_{ij}}(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) = \displaystyle\int\displaystyle\int \mathbf{1}_{E_{ij}}(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)$
Sum over all $i$ and $j$ and apply MCT again to get
$\pi(E) = \displaystyle\int\displaystyle\int \mathbf{1}_{E}(x, y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) = \displaystyle\int\displaystyle\int \mathbf{1}_{E}(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)$
Futhermore, Monotone convergence implies that $\pi$ is countably additive and hence a measure on $\mathcal{X}\times\mathcal{Y}$ . For any other measure $\rho$ such that $\rho(A \times B) = \mu(A)\nu(B)$ , countably additivity and uniqueness for finite measures implies that
$\pi(E) = \displaystyle\sum_{i,j}\pi(E_{i,j}) = \displaystyle\sum_{i,j}\rho(E_{i,j}) = \rho(E)$
Hence, the extension of $\pi$ to $\mathcal{X} \times\mathcal{Y}$ is unique. $\square$

Fubini-Toneli#

Fubini-Toneli Theorem
Let $(\mathbb{X}, \mathcal{X} , \mu)$ and $(\mathbb{Y}, \mathcal{Y}, \nu)$ be $\sigma$ -finite measure spaces, and let $f : \mathbb{X} \times \mathbb{Y} \rightarrow \mathbb{R}$ be measurable with respect to $\mathcal{X} \times \mathcal{Y}$ such that either $f\geq 0$ (non-negative) or $\displaystyle\int\displaystyle\int |f|\mathrm{d}(\mu\times\nu)<\infty$ (absolutely integrable). Then,
$\displaystyle\int\displaystyle\int f\mathrm{d}(\mu\times\nu) = \displaystyle\int\displaystyle\int f(x,y)\mathrm{d}\mu(x)\mathrm{d}\nu(y) = \displaystyle\int\displaystyle\int f(x,y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)$
Also, $\displaystyle\int f(x, y)\mathrm{d}\mu(x)$ is $\mathcal{Y}$ -measurable and $\displaystyle\int f(x, y)\mathrm{d}\nu(y)$ is $\mathcal{X}$ -measurable.

Proof
We have this for indicator functions from Existence and Uniqueness Theorem of Product Measure and thus the result for simple functions as integrals are linear.
Then, applying MCT to simple functions gives us that the above holds for non-negative measureable functions.
Instead, assume $\displaystyle\int\displaystyle\int|f| \mathrm{d}(\mu\times\nu) < \infty$ . Then, we can write $f = f^+ - f^-$ and the above holds for $f^+$ and $f^-$ separately. i.e. $\displaystyle\int f^+(x, y)\mathrm{d}\mu(x) < \infty$ for $\nu$ -almost-every $y$ and $\displaystyle\int f^+(x, y)\mathrm{d}\nu(y) < \infty$ for $\mu$ -almost-every $x$ and similarly for $f^-$ . Therefore,
$\displaystyle\int |f|(x, y)\mathrm{d}\nu(y) = \displaystyle\int f^+(x, y)\mathrm{d}\nu(y) + \displaystyle\int f^-(x, y)\mathrm{d}\nu(y) < \infty ~~~(\mu-a.e.)$
and thus,
$\displaystyle\int f(x, y)\mathrm{d}\nu(y) = \displaystyle\int f^+(x, y)\mathrm{d}\nu(y) - \displaystyle\int f^-(x, y)\mathrm{d}\nu(y) ~~~(\nu-a.e.)$
As we only require finiteness to occur almost everywhere to have the integral exist, we can integrate both sides of the above with respect to $\mu$ to get
$\displaystyle\int\displaystyle\int f(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x) = \displaystyle\int\displaystyle\int f^+(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x) - \displaystyle\int\displaystyle\int f^-(x, y)\mathrm{d}\nu(y)\mathrm{d}\mu(x)$
Do the same swapping $\mu$ and $\nu$ to conclude the theorem. $\square$
The above theorem lets us swap the order of integration for the product of two measure spaces. This can be extended by induction to the finite product of $n$ measure spaces.

Probability and Measure — Part 2

https://astronaut.github.io/posts/measure-theory-part2/

Author

关怀他人

Published at

2025-03-10

License

CC BY-NC-SA 4.0

Probability and Measure — Part 3

Probability and Measure — Part 1

1

Functions, Random Variables and Integration

Simple Functions and Random Variables

Measurable Functions and Random Variables

Cool facts about Measurable Functions

Integration

Monotone Convergence Theorem

Fatou’s Lemma

Dominated Convergence Theorem

Lebesgue-Stieltjes Measure

Radon Measure

Product Measure

Monotone Class

Existence and Uniqueness of Product Measure

Fubini-Toneli