2267 words
11 minutes
Math Misc
2025-01-16
2025-12-20

里面是一些杂七杂八的数学知识,不定期更新(雾)。

Circulant Matrix#

An n×nn \times n circulant matrix C\mathbf{C} takes the form

C=[c0cn1c2c1c1c0cn1c2c1c0cn2cn1cn1cn2c1c0]\mathbf{C} = \left[\begin{array}{ccccc}c_0 & c_{n-1} & \cdots & c_2 & c_1 \\ c_1 & c_0 & c_{n-1} & & c_2 \\ \vdots & c_1 & c_0 & \ddots & \vdots \\ c_{n-2} & & \ddots & \ddots & c_{n-1} \\ c_{n-1} & c_{n-2} & \cdots & c_1 & c_0\end{array}\right]

It has the determinant det(C)=i=0n1f(ωi)\det(\mathbf{C}) = \displaystyle \prod_{i=0}^{n-1} f(\omega^i) , where f(x)=c0+c1x++cn1xn1f(x) = c_0 + c_1 x + \cdots + c_{n-1} x^{n-1} and ω=e2πi/n\omega = e^{2\pi i/n}.

Proof

Let Ω=(ω(i1)(j1))1i,jnMn(C)\mathbf{\Omega}=(\omega^{(i-1)(j-1)})_{1\leq i, j \leq n} \in \mathbf{\mathcal{M}}_n(\mathbb{C}), then

det(CΩ)=det([f(1)f(ω)f(ωn1)f(1)ωf(ω)ωn1f(ωn1)f(1)ωn1f(ω)ω(n1)(n1)f(ωn1)])=f(1)f(ω)f(ωn1)det(Ω)=det(Ω)i=0n1f(ωi)\begin{aligned} \det(\mathbf{C}\mathbf{\Omega}) &= \det\left(\left[\begin{array}{cccc}f(1) & f(\omega) & \cdots & f(\omega^{n-1}) \\ f(1) & \omega f(\omega) & \cdots & \omega^{n-1} f(\omega^{n-1}) \\ \vdots & \vdots & \ddots & \vdots \\ f(1) & \omega^{n-1}f(\omega) & \cdots & \omega^{(n-1)(n-1)}f(\omega^{n-1}) \end{array}\right]\right) \\ \\ &=f(1)f(\omega)\cdots f(\omega^{n-1})\det(\mathbf{\Omega}) \\ &=\det(\mathbf{\Omega})\prod_{i=0}^{n-1} f(\omega^i) \end{aligned}

Therefore , det(C)=det(Ω)i=0n1f(ωi)\det(\mathbf{C}) = \det(\mathbf{\Omega}) \displaystyle \prod_{i=0}^{n-1} f(\omega^i). \quad \square

Gridiant of Log-Determinant Function#

Let XS++\mathbf{X}\in \mathscr{S}_{++} be a positive definite matrix, then (logdetX)=X1\nabla (\log\det\mathbf{X})=\mathbf{X}^{-1}.

Proof

Let Y=X+ΔX\mathbf{Y}=\mathbf{X}+\Delta \mathbf{X}, then

logdetY=logdet(X12(I+X12ΔXX12X12))=logdetX+logdet(I+X12ΔXX12)=logdetX+i=1nlog(1+λi)\begin{aligned} \log\det\mathbf{Y} &= \log\det(\mathbf{X}^{\frac{1}{2}}(\mathbf{I}+\mathbf{X}^{-\frac{1}{2}}\Delta\mathbf{X}\mathbf{X}^{-\frac{1}{2}}X^{\frac{1}{2}})) \\ &= \log\det\mathbf{X} + \log\det(\mathbf{I}+\mathbf{X}^{-\frac{1}{2}}\Delta\mathbf{X}\mathbf{X}^{-\frac{1}{2}}) \\ &= \log\det\mathbf{X} + \sum_{i=1}^n \log(1+\lambda_i) \end{aligned}

where λi\lambda_i are the eigenvalues of X12ΔXX12\mathbf{X}^{-\frac{1}{2}}\Delta\mathbf{X}\mathbf{X}^{-\frac{1}{2}}. Since X=o(X)\mathbf{X}=\mathcal{o}(\mathbf{X}), we have λi=o(1)\lambda_i = \mathcal{o}(1), and thus log(1+λi)=λi+o(λi)\log(1+\lambda_i)=\lambda_i + \mathcal{o}(\lambda_i). Therefore

logdetY=logdetX+i=1nλi+o(1)=logdetX+tr(X12ΔXX12)+o(1)=logdetX+tr(X1ΔX)+o(1)\begin{aligned} \log\det\mathbf{Y} &= \log\det\mathbf{X} + \sum_{i=1}^n \lambda_i + \mathcal{o}(1) \\ &=\log\det\mathbf{X}+\text{tr}(\mathbf{X}^{-\frac{1}{2}}\Delta\mathbf{X}\mathbf{X}^{-\frac{1}{2}}) + \mathcal{o}(1) \\ &=\log\det\mathbf{X}+\text{tr}(\mathbf{X}^{-1}\Delta\mathbf{X}) + \mathcal{o}(1) \end{aligned}

Thus, d(XlogdetX)=tr(X1dX)(logdetX)=X1\text{d}(\mathbf{X} \log\det\mathbf{X}) = \text{tr}(\mathbf{X}^{-1}\text{d}\mathbf{X})\Longrightarrow \nabla (\log\det\mathbf{X})=\mathbf{X}^{-1}. \quad \square

Barbalat’s Lemma#

If f:R+R+f:\mathbb{R^+} \rightarrow \mathbb{R^+} is uniformly continuous with a+f(t)dt<+ \displaystyle\int_a^{+\infty} f(t)dt < +\infty, then limt+f(t)=0 \displaystyle\lim_{t\rightarrow +\infty} f(t) = 0.

For series we have n=1+un<+limn+un=0 \displaystyle \sum_{n=1}^{+\infty} u_n < +\infty \Rightarrow \lim_{n\rightarrow +\infty} u_n = 0. However, for functions a+f(t)dt<+ \displaystyle\int_a^{+\infty} f(t)dt < +\infty \nLeftrightarrow limt+f(t)=0 \lim_{t\rightarrow +\infty} f(t) = 0, the uniform continuity is necessary.

  • Example 1. f(x)=sin(x2)f(x)=\sin(x^2)
    Since 1+sin(x2)dx == u=x21+sin(u)2udu \displaystyle\int_1^{+\infty} \sin(x^2)dx \ \overset{\ u=x^2}{\xlongequal{}\xlongequal{}}\int_1^{+\infty} \frac{\sin(u)}{2\sqrt{u}}du, by Dirichlet’s test, the integral converges. However, limx+sin(x2) \displaystyle \lim_{x\rightarrow +\infty} \sin(x^2) does not exist.
  • Example 2. f(x)=x1+x6sin2x>0 \displaystyle f(x)=\frac{x}{1+x^6\sin^2x} > 0
    Define F(u)=0uf(x)dx \displaystyle F(u)=\int_0^u f(x)dx and ak=(k1)πkπf(x)dx \displaystyle a_k=\int_{(k-1)\pi}^{k\pi} f(x)dx, then F(u)F(u) is monotonic increasing on R+\mathbb{R}^+ and F(nπ)=k=1nak \displaystyle F(n\pi)=\sum_{k=1}^{n}a_k. We have ak(k1)πkπkπ1+(k1)6π6sin2xdx=2kπ0π2dx1+(k1)6π6sin2x2kπ0π2dx1+(k1)6π6(2πx)2=2kπ0π2dx1+[2(k1)3π2x]2=k(k1)3π0(k1)3πdt1+t2k(k1)3π0+dt1+t2=k2(k1)3+12k2\begin{aligned} a_k &\leq \int_{(k-1)\pi}^{k\pi} \frac{k\pi}{1+(k-1)^6\pi^6\sin^2x}dx \\ &= 2k\pi \int_0^{\frac{\pi}{2}} \frac{dx}{1+(k-1)^6\pi^6\sin^2x} \\ &\leq 2k\pi \int_0^{\frac{\pi}{2}} \frac{dx}{1+(k-1)^6\pi^6(\frac{2}{\pi} x)^2} \\ &= 2k\pi\int_0^{\frac{\pi}{2}} \frac{dx}{1+{[2(k-1)^3\pi^2x]}^2} \\ &= \frac{k}{(k-1)^3\pi}\int_0^{(k-1)^3\pi}\frac{dt}{1+t^2} \\ &\leq \frac{k}{(k-1)^3\pi}\int_0^{+\infty}\frac{dt}{1+t^2} \\ &=\frac{k}{2(k-1)^3} \underset{+\infty}{\sim} \frac{1}{2k^2} \end{aligned} Here we use the inequality sinx2xπ\sin x \geq \frac{2x}{\pi} and the fact that 0+dt1+t2=π2 \displaystyle \int_0^{+\infty}\frac{dt}{1+t^2} = \frac{\pi}{2}. Therefore, limn+F(nπ) \displaystyle \lim_{n\rightarrow +\infty} F(n\pi) exists. By F(x)F(x) increasing, limx+<+ \displaystyle \lim_{x\rightarrow +\infty} < +\infty, but limx+f(x) \displaystyle\lim_{x\rightarrow +\infty} f(x) doesn’t exist.
Proof

If f(x)0f(x)\nrightarrow 0 as x+x\rightarrow +\infty, then there exists ϵ>0\epsilon>0 and a sequence (xn)(x_n) such that nN,f(xn)>ϵ \forall n\in \mathbb{N}, |f(x_n)|>\epsilon. Since ff is uniformly continuous, δ>0 nN y>0, xny<δf(xn)f(y)<ϵ2\exists \delta>0 \ \forall n\in\mathbb{N} \ \forall y>0, \ |x_n-y|<\delta \Rightarrow |f(x_n)-f(y)|<\frac{\epsilon}{2}. So x[xn,xn+δ]\forall x\in[x_n, x_n+\delta], we have f(x)>f(xn)f(xn)f(y)>ϵ2|f(x)|>|f(x_n)|-|f(x_n)-f(y)|>\frac{\epsilon}{2}. Therefore

axn+δf(t)dtaxnf(t)dt=xnxn+δf(t)dt>ϵδ2>0\left|\int_a^{x_n+\delta} f(t)dt - \int_a^{x_n}f(t)dt\right| = \sum_{x_n}^{x_n+\delta} f(t)dt > \frac{\epsilon\delta}{2}>0

However, since 0f(t)dt \displaystyle\int_0^{\infty} f(t)dt exists. LHS converges to 0 as n+n\rightarrow +\infty, yielding a contradiction. \quad \square However, since 0f(t)dt\int_0^{\infty} f(t)dt exists. LHS converges to 0 as n+n\rightarrow +\infty, yielding a contradiction. \quad \square

Perron-Frobenius Theorem#

Definition

Positive Matrix#

A positive(non-negative) matrix AMn,m(R)\mathbf{A}\in \mathbf{\mathcal{M}}_{n,m}(\mathbb{R}) is a matrix with all entries aija_{ij} positive(non-negative). If A>0,α0,α0\mathbf{A}>0, \mathbf{\alpha} \geq 0, \mathbf{\alpha}\neq 0, then Aα>0\mathbf{A}\mathbf{\alpha}>0.

Definition

Spectral Radius#

The spectral radius of a matrix AMn(R)\mathbf{A}\in \mathbf{\mathcal{M}}_n(\mathbb{R}) is defined as ρ(A)=max{λ:λ is an eigenvalue of A}\rho(\mathbf{A})=\max\{|\lambda|:\lambda \text{ is an eigenvalue of } \mathbf{A}\}.

Let AMn(R)\mathbf{A}\in \mathbf{\mathcal{M}}_n(\mathbb{R}) be a positive matrix, then

  1. A\mathbf{A} has a positive eigenvalue ρ(A)\rho(\mathbf{A}) with respect to a positive eigenvector v\mathbf{v} (Perron–Frobenius eigenvalue).
  2. Perron–Frobenius eigenvalue is simple (i.e. the algebraic multiplicity of ρ(A)\rho(\mathbf{A}) is both 1).
  3. There are no other positive eigenvectors except positive multiples of v\mathbf{v}.
Lemma

Gelfand’s formula#

If AMn(C)\mathbf{A}\in \mathbf{\mathcal{M}}_n(\mathbb{C}) and A0\mathbf{A}\neq 0, then ρ(A)=limk+Ak21k\displaystyle \rho(\mathbf{A})=\lim_{k\rightarrow +\infty} ||\mathbf{A}^k||_2^{\frac{1}{k}} with 2||\cdot||_2 the spectral norm.

Proof

We can suppose that ρ(A)=1\rho(\mathbf{A})=1

  1. Let λ\lambda be an eigenvalue of A\mathbf{A} with λ=1=ρ(A)|\lambda|=1=\rho(\mathbf{A}) and v\mathbf{v} the corresponding eigenvector. Consider α=(vi)>0\mathbf{\alpha}=(|\mathbf{v}_i|)>0, then we have

    (Aα)i=jaijvjjaijvj=λvi=vi=λvi=vi\begin{aligned} (\mathbf{A}\mathbf{\alpha})_i &= \sum_j a_{ij}|\mathbf{v}_j| \geq |\sum_j a_{ij}\mathbf{v}_j| = |\lambda \mathbf{v}_i| \\ &= |\mathbf{v}_i| = |\lambda||\mathbf{v}_i| = |\mathbf{v}_i| \end{aligned}

    which implies (Aα)iviAαα(\mathbf{A}\mathbf{\alpha})_i\geq |\mathbf{v}_i|\Rightarrow \mathbf{A}\mathbf{\alpha}\geq \mathbf{\alpha}. Now we prove that Aα>α\mathbf{A}\mathbf{\alpha}>\mathbf{\alpha} is impossible. Suppose that Aα>αA\mathbf{\alpha}>\mathbf{\alpha}, then

    Aαα>0A(Aα)>0A2α>Aα\mathbf{A}\mathbf{\alpha}-\mathbf{\alpha}>0 \Rightarrow \mathbf{A}(\mathbf{A}-\mathbf{\alpha})>0 \Rightarrow \mathbf{A}^2\mathbf{\alpha}>\mathbf{A}\mathbf{\alpha}

    Thus, ϵ>0 s.t. A2α>Aα(1+ϵ)\exists \epsilon>0 \ s.t. \ \mathbf{A}^2\mathbf{\alpha}>\mathbf{A}\mathbf{\alpha}(1+\epsilon), and by induction we have Akα>Ak1α(1+ϵ)k1\mathbf{A}^k\mathbf{\alpha}>\mathbf{A}^{k-1}\mathbf{\alpha}(1+\epsilon)^{k-1}, which implies Akα2>(1+ϵ)k1Aα2||\mathbf{A}^k\mathbf{\alpha}||_2 > (1+\epsilon)^{k-1}||\mathbf{A}\mathbf{\alpha}||_2. Therefore

    limk+Ak21k(Ak(Aα)2Aα2)1k>1+ϵ>1=ρ(A)\lim_{k\rightarrow +\infty}||\mathbf{A}^k||_2^{\frac{1}{k}}\geq \left(\frac{||\mathbf{A}^k(\mathbf{A}\mathbf{\alpha})||_2}{||\mathbf{A}\mathbf{\alpha}||_2}\right)^{\frac{1}{k}} > 1+\epsilon > 1=\rho(\mathbf{A})

    which contradicts Gelfand’s formula. Therefore, Aα=α>0\mathbf{A}\mathbf{\alpha}=\mathbf{\alpha}>0 which means there exists a positive eigenvector α\mathbf{\alpha} with respect to eigenvalue ρ(A)\rho(\mathbf{A}).

  2. Assume that there exists an eigenvector β\mathbf{\beta} that is linearly independent with α\mathbf{\alpha} and corresponds to the eigenvalue ρ(A)=1\rho(\mathbf{A})=1. Let c=minβi>0αiβi=αkβkc=\min_{\mathbf{\beta}_i>0} \frac{\mathbf{\alpha}_i}{\mathbf{\beta}_i}=\frac{\mathbf{\alpha}_k}{\mathbf{\beta}_k} (if β<0\beta<0, we can choose β>0-\beta>0), then we have c>0c>0 and αcβ0,αcβ0\mathbf{\alpha}-c\mathbf{\beta}\geq 0, \mathbf{\alpha}-c\mathbf{\beta}\neq 0. However, since A>0\mathbf{A}>0, αcβ=A(αcβ)>0\mathbf{\alpha}-c\mathbf{\beta}=\mathbf{A}(\mathbf{\alpha}-c\mathbf{\beta})>0, which contradicts the fact that αkcβk=0\mathbf{\alpha}_k-c\mathbf{\beta}_k=0. Therefore, the geometric multiplicity of ρ(A)\rho(\mathbf{A}) is 1.

    Using the fact that AT\mathbf{A}^\mathsf{T} is also positive, there exists a positive eigenvector γ\mathbf{\gamma} with respect to the eigenvalue ρ(AT)=1\rho(\mathbf{A}^\mathsf{T})=1. Note U=span(γ)\mathcal{U}=\text{span}(\mathbf{\gamma}), we have Rn=UU\mathbb{R}^n=\mathcal{U}^\perp \oplus\mathcal{U}. Since α>0,γ>0\mathbf{\alpha}>0, \mathbf{\gamma}>0, then γTα>0αU\mathbf{\gamma}^\mathsf{T}\mathbf{\alpha}>0\Rightarrow \mathbf{\alpha}\in\mathcal{U}. Choose a basis E=(α,e1,,en1)\mathcal{E}=(\mathbf{\alpha}, e_1, \cdots, e_{n-1}) for Rn\mathbb{R}^n, then the matrix of A\mathbf{A} under E\mathcal{E} is

    (10B)\left(\begin{array}{cc}1 & * \\ \mathbf{0} & \mathbf{B}\end{array}\right)

    If there exists a positive eigenvector δ\mathbf{\delta} of B\mathbf{B} with respect to the eigenvalue 11, then δUT\mathbf{\delta}\in\mathcal{U}^\mathsf{T}, which implies δ\mathbf{\delta} and α\mathbf{\alpha} are linearly dependent, contracting the fact that the geometric multiplicity is 1. Therefore, the algebraic multiplicity of A\mathbf{A} is also 1.

  3. Suppose there exists a positive eigenvector β\mathbf{\beta} corresponding to the eigenvalue λ<1\lambda < 1, then by the positivity of A\mathbf{A}, λ>0\lambda>0. Let c=maxαiβi=αkβkc=\max \frac{\mathbf{\alpha}_i}{\mathbf{\beta}_i}=\frac{\mathbf{\alpha}_k}{\mathbf{\beta}_k}, then cβα0,cβα0c\mathbf{\beta}-\mathbf{\alpha}\geq 0, c\mathbf{\beta}-\mathbf{\alpha}\neq 0. However, since A>0\mathbf{A}>0, λcβα=A(cβα)>0\lambda c\mathbf{\beta}-\mathbf{\alpha}=\mathbf{A}(c\mathbf{\beta}-\mathbf{\alpha})>0, which means i[[1,n]]  λcβi>αic<λc<c\forall i\in[[1, n]] ~~ \lambda c\mathbf{\beta}_i>\mathbf{\alpha}_i \Rightarrow c<\lambda c < c, yielding a contradiction. Therefore, there are no other positive eigenvectors except positive multiples of α\mathbf{\alpha}. \quad \square

KL Divergence#

Definition

If PP and QQ are two probability distributions, then the Kullback-Leibler divergence from QQ to PP is defined as

DKL(PQ)=xXP(x)log(P(x)Q(x))=EXP[log(P(X)Q(X))]D_{KL}(P||Q) = \sum_{x\in \mathcal{X}} P(x)\log\left(\frac{P(x)}{Q(x)}\right) = \mathbb{E}_{X \sim P}\left[\log\left(\frac{P(X)}{Q(X)}\right)\right]
DKL(PQ)DKL(QP)D_{KL}(P||Q)\neq D_{KL}(Q||P)

Positivity#

DKL(PQ)0D_{KL}(P||Q) \geq 0

Proof
DKL(PQ)=xXP(x)log(P(x)Q(x))log(xXP(x)Q(x)P(x))=log(xXQ(x))=log1=0\begin{aligned} D_{KL}(P||Q) &= \sum_{x\in \mathcal{X}} P(x)\log\left(\frac{P(x)}{Q(x)}\right) \\ &\geq -\log(\sum_{x\in \mathcal{X}} P(x)\cdot \frac{Q(x)}{P(x)}) \\ &= -\log(\sum_{x\in \mathcal{X}} Q(x)) \\ &= -\log 1 = 0 \end{aligned}

Here we use the Jensen’s inequality since f(x)=logxf(x)=-\log x is a convex function.

Forward and Reverse KL Divergence#

If we use the dstribution Qθ(X)Q_\theta(X) to approximate the distribution P(X)P(X), then

  • Minimizing forwad KL divergence arg minθ DKL(PQθ)\underset{\theta}{\argmin} ~ D_{KL}(P||Q_\theta) is equivalent to make a maximum likelihood estimation of θ\theta under PP.

    Proof
    arg minθDKL(PQθ)=arg minθxXP(x)logP(x)xXP(x)logQθ(x)=arg maxθEXP[logQθ(X)]+xXP(x)logP(x)H(P) : Entropy of P=arg maxθEXP[logQθ(X)]=arg maxθi=1mQθ(xi)=arg maxθL(θ)=θMLE\begin{aligned} \argmin_\theta D_{KL}(P||Q_\theta) &= \argmin_\theta \sum_{x\in \mathcal{X}} P(x)\log P(x) - \sum_{x\in \mathcal{X}} P(x)\log Q_\theta(x) \\ &= \argmax_\theta \mathbb{E}_{X\sim P}[\log Q_\theta(X)] + \underbrace{\sum_{x\in \mathcal{X}} -P(x)\log P(x)}_{\color{red}H(P)\text{ : Entropy of } P} \\ &= \argmax_\theta \mathbb{E}_{X\sim P}[\log Q_\theta(X)] \\ &= \argmax_\theta \prod_{i=1}^{m}Q_\theta(x_i) \\ &=\argmax_\theta \mathcal{L}(\theta) = \theta^{\text{MLE}} \end{aligned}

    Here we use Pdata={x1,,xm}P_\text{data}=\{x_1, \cdots, x_m\} to represent the data we collected.

    Wherever P(x)P(x) has high probability, Q(x)Q(x) must also have high probability.

    image description

    The figure above shows the effect of fitting a bimodal distribution PP using a unimodal distribution QθQ_\theta through the forward KL divergence cost.

    This property of the forward KL divergence is also often referred to as “zero avoiding”, because it tends to avoid having Q(x)=0Q(x)=0 at any position where P(x)>0P(x)>0 .

  • Minimizing reverse KL divergence arg minθ DKL(QθP)\underset{\theta}{\argmin} ~ D_{KL}(Q_\theta||P) is equivalent to requiring that the fitting maintains a single mode as much as possible.

    Proof
    arg minθDKL(QθP)=arg minθEXQθ[logP(X)]+H(Qθ(X))\argmin_\theta D_{KL}(Q_\theta||P) = \argmin_\theta \mathbb{E}_{X\sim Q_\theta}[-\log P(X)] + H(Q_\theta(X))

    Based on the properties of entropy, it is known that when QθQ_\theta approaches a uniform distribution, the value of H(Qθ(X))H(Q_\theta(X)) is larger. Conversely, when QθQ_\theta tends toward a unimodal distribution (single-peak distribution), the value of H(Qθ(X))H(Q_\theta(X)) is smaller. Therefore, the reverse KL divergence is equivalent to requiring QθQ_\theta to fit PP while maintaining as much unimodality as possible.

    Wherever Q(x)Q(x) has high probability, P(x)P(x) must also have high probability.

    image description

    The figure above shows the effect of fitting the same bimodal distribution PP using a unimodal distribution QθQ_\theta through the reverse KL divergence cost.

    The reverse KL divergence tends to minimize the difference between Qθ(x)Q_\theta(x) and P(x)P(x) when Qθ(x)>0Q_\theta(x)>0.

Orbit-Stabilizer Theorem#

Let GG be a finite group acting on a finite set XX. For any xXx \in X, let Ox={gxgG}O_x = \{g \cdot x \mid g \in G\} be the orbit of xx and Gx={gGgx=x}G_x = \{g \in G \mid g \cdot x = x\} be the stabilizer of xx. Then G=OxGx|G| = |O_x| \cdot |G_x|

Proof

We construct a bijection between the orbit OxO_x and the set of left cosets of the stabilizer G/GxG/G_x. Define a map ϕ:G/GxOx\phi: G/G_x \to O_x by ϕ(gGx)=gx\phi(gG_x) = g \cdot x.

  1. Well-defined: If g1Gx=g2Gxg_1 G_x = g_2 G_x, then g21g1Gxg_2^{-1}g_1 \in G_x, so g21g1x=x    g1x=g2xg_2^{-1}g_1 \cdot x = x \implies g_1 \cdot x = g_2 \cdot x. Thus ϕ\phi is well-defined.
  2. Injective: If ϕ(g1Gx)=ϕ(g2Gx)\phi(g_1 G_x) = \phi(g_2 G_x), then g1x=g2x    g21g1x=x    g21g1Gxg_1 \cdot x = g_2 \cdot x \implies g_2^{-1}g_1 \cdot x = x \implies g_2^{-1}g_1 \in G_x, so g1Gx=g2Gxg_1 G_x = g_2 G_x.
  3. Surjective: For any yOxy \in O_x, there exists gGg \in G such that y=gxy = g \cdot x. Then ϕ(gGx)=y\phi(gG_x) = y.

Since ϕ\phi is a bijection, Ox=[G:Gx]=GGx\displaystyle |O_x| = [G : G_x] = \dfrac{|G|}{|G_x|}. \quad \square

Cauchy’s Theorem#

If GG is a finite group and pp is a prime number such that pGp \mid |G|, then GG contains an element of order pp.

Proof

Consider the set S={(a1,,ap)Gpa1ap=e}S = \{(a_1, \dots, a_p) \in G^p \mid a_1 \dots a_p = e\}. The first p1p-1 elements can be chosen arbitrarily, and apa_p is uniquely determined by ap=(a1ap1)1a_p = (a_1 \dots a_{p-1})^{-1}. Thus S=Gp1|S| = |G|^{p-1}. Since pGp \mid |G|, we have pSp \mid |S|.

Let the cyclic group Zp=σ\mathbb{Z}_p = \langle \sigma \rangle act on SS by cyclic shift: σ(a1,,ap)=(a2,,ap,a1)\sigma \cdot (a_1, \dots, a_p) = (a_2, \dots, a_p, a_1). Note that if a1ap=ea_1 \dots a_p = e, then a1(a2ap)=e    a2ap=a11a_1 (a_2 \dots a_p) = e \implies a_2 \dots a_p = a_1^{-1}, so a2apa1=a11a1=ea_2 \dots a_p a_1 = a_1^{-1} a_1 = e. Thus the action is well-defined on SS.

By the Orbit-Stabilizer Theorem, the size of any orbit divides Zp=p|\mathbb{Z}_p| = p. So the orbit sizes can only be 1 or pp. Let kk be the number of orbits of size 1. The elements in an orbit of size 1 satisfy a1==ap=aa_1 = \dots = a_p = a with ap=ea^p = e. Since SS is the disjoint union of orbits, we have

S=k1+mpk(modp)|S| = k \cdot 1 + m \cdot p \equiv k \pmod p

Since pSp \mid |S|, we must have pkp \mid k. Note that (e,,e)S(e, \dots, e) \in S forms an orbit of size 1, so k1k \geq 1. Since pkp \mid k, we must have kp2k \geq p \geq 2. Thus there exists at least one non-identity element aGa \in G such that ap=ea^p = e. \quad \square

Fundamental Theorem of Finite Abelian Groups#

Every finite abelian group GG is isomorphic to a direct sum of cyclic groups of prime power order. Moreover, the decomposition is unique up to reordering of the factors. GZp1e1Zp2e2ZpkekG \cong \mathbb{Z}_{p_1^{e_1}} \oplus \mathbb{Z}_{p_2^{e_2}} \oplus \cdots \oplus \mathbb{Z}_{p_k^{e_k}} where pip_i are primes (not necessarily distinct) and ei1e_i \ge 1.

Alternatively, it can be stated as: Every finite abelian group GG is isomorphic to a direct sum of cyclic groups of orders d1,d2,,dkd_1, d_2, \dots, d_k such that d1d2dk>1d_1 \mid d_2 \mid \dots \mid d_k > 1. GZd1Zd2ZdkG \cong \mathbb{Z}_{d_1} \oplus \mathbb{Z}_{d_2} \oplus \cdots \oplus \mathbb{Z}_{d_k} The integers d1,,dkd_1, \dots, d_k are called the invariant factors of GG.

Proof

The proof consists of two main steps: decomposing the group into its primary components (Sylow pp-subgroups) and then decomposing each pp-group into cyclic groups.

  • Step 1: Primary Decomposition

    Let G=n=p1e1p2e2pkek|G| = n = p_1^{e_1} p_2^{e_2} \dots p_k^{e_k} be the prime factorization of the order of GG. For each prime factor pip_i, define the pip_i-primary component (or Sylow pip_i-subgroup) as: G(pi)={xGx=pim for some m0}G(p_i) = \{x \in G \mid |x| = p_i^{m} \text{ for some } m \ge 0\} Since GG is abelian, G(pi)G(p_i) is a subgroup of GG. We claim that GG is the internal direct sum of these subgroups: GG(p1)G(p2)G(pk)G \cong G(p_1) \oplus G(p_2) \oplus \dots \oplus G(p_k)

    Proof of decomposition: Let mi=n/pieim_i = n / p_i^{e_i}. Since gcd(m1,,mk)=1\gcd(m_1, \dots, m_k) = 1, there exists integers u1,,uku_1, \dots, u_k such that i=1kuimi=1\sum_{i=1}^k u_i m_i = 1. For any xGx \in G, we have x=x1=xuimi=i=1kxuimix = x^1 = x^{\sum u_i m_i} = \prod_{i=1}^k x^{u_i m_i}. Let xi=xuimix_i = x^{u_i m_i}. Note that (xi)piei=xuimipiei=xuin=(xn)ui=eui=e(x_i)^{p_i^{e_i}} = x^{u_i m_i p_i^{e_i}} = x^{u_i n} = (x^n)^{u_i} = e^{u_i} = e. Thus xi|x_i| divides pieip_i^{e_i}, so xiG(pi)x_i \in G(p_i). So G=G(p1)G(pk)G = G(p_1) \dots G(p_k). To show the sum is direct, suppose x1xk=ex_1 \dots x_k = e with xiG(pi)x_i \in G(p_i). Fix jj. Since xi|x_i| is a power of pip_i, the order of y=ijxi=xj1y = \prod_{i \ne j} x_i = x_j^{-1} divides ijxi\prod_{i \ne j} |x_i|, which is coprime to pjp_j. But y=xj1G(pj)y = x_j^{-1} \in G(p_j) has order a power of pjp_j. The only element with order dividing two coprime numbers is the identity. Thus xj=ex_j = e.

    Thus, it suffices to prove the theorem for finite abelian pp-groups.

  • Step 2: Structure of Finite Abelian pp-groups

    Let GG be a finite abelian pp-group. We proceed by induction on G|G|. Let aGa \in G be an element of maximal order. Let H=aH = \langle a \rangle. We claim that HH is a direct summand of GG, i.e., there exists a subgroup KK such that G=HKG = H \oplus K. Consider the quotient group G/HG / H. By induction, G/HG / H is a direct sum of cyclic groups. By lifting the generators of these cyclic groups back to GG and adjusting them using the maximality of the order of aa, one can construct the subgroup KK. Since K<G|K| < |G|, by the induction hypothesis, KK is a direct sum of cyclic groups. Therefore, G=HKG = H \oplus K is a direct sum of cyclic groups.

Combining these steps, any finite abelian group decomposes into a direct sum of cyclic pp-groups. Grouping these appropriately yields the invariant factor decomposition. \quad \square

Math Misc
https://astronaut.github.io/posts/math-misc/
Author
关怀他人
Published at
2025-01-16
License
CC BY-NC-SA 4.0