随机算法 (Spring 2013)/Expander Graphs and Mixing
Spectral approach for symmetric chain
We consider the symmetric Markov chains defined on the state space [math]\displaystyle{ \Omega }[/math], where the transition matrix [math]\displaystyle{ P }[/math] is symmetric.
We have the following powerful spectral theorem for symmetric matrices.
Theorem (Spectral theorem) - Let [math]\displaystyle{ S }[/math] be a symmetric [math]\displaystyle{ n\times n }[/math] matrix, whose eigenvalues are [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. There exist eigenvectors [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] such that [math]\displaystyle{ v_iS=\lambda_iv_i }[/math] for all [math]\displaystyle{ i=1,\ldots, n }[/math] and [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] are orthonormal.
A set of orthonormal vectors [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] satisfy that
- for any [math]\displaystyle{ i\neq j }[/math], [math]\displaystyle{ v_i }[/math] is orthogonal to [math]\displaystyle{ v_j }[/math], which means that [math]\displaystyle{ v_i^Tv_j=0 }[/math], denoted [math]\displaystyle{ v_i\bot v_j }[/math];
- all [math]\displaystyle{ v_i }[/math] have unit length, i.e., [math]\displaystyle{ \|v_i\|_2=1 }[/math].
Since the eigenvectors [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] are orthonormal, we can use them as orthogonal basis, so that any vector [math]\displaystyle{ x\in\mathbb{R}^n }[/math] can be expressed as [math]\displaystyle{ x=\sum_{i=1}^nc_iv_i }[/math] where [math]\displaystyle{ c_i=q^Tv_i }[/math], therefore
- [math]\displaystyle{ xS=\sum_{i=1}^nc_iv_iS=\sum_{i=1}^nc_i\lambda_iv_i. }[/math]
So multiplying by [math]\displaystyle{ S }[/math] corresponds to multiplying the length of [math]\displaystyle{ x }[/math] along the direction of every eigenvector by a factor of the corresponding eigenvalue.
Back to the symmetric Markov chain. Let [math]\displaystyle{ \Omega }[/math] be a finite state space of size [math]\displaystyle{ N }[/math], and [math]\displaystyle{ P }[/math] be a symmetric transition matrix, whose eigenvalues are [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\ldots\ge \lambda_N }[/math]. The followings hold for a symmetric transition matrix [math]\displaystyle{ P }[/math]:
- Due to the spectral theorem, there exist orthonormal eigenvectors [math]\displaystyle{ v_1,v_2,\ldots,v_N }[/math] such that [math]\displaystyle{ v_iP=\lambda_iv_i }[/math] for [math]\displaystyle{ i=1,2,\ldots,N }[/math] and any distribution [math]\displaystyle{ q }[/math] over [math]\displaystyle{ \Omega }[/math] can be expressed as [math]\displaystyle{ q=\sum_{i=1}^Nc_iv_i }[/math] where [math]\displaystyle{ c_i=q^Tv_i }[/math].
- A symmetric [math]\displaystyle{ P }[/math] must be double stochastic, thus the stationary distribution [math]\displaystyle{ \pi }[/math] is the uniform distribution.
Recall that due to Perron-Frobenius theorem, [math]\displaystyle{ \lambda_1=1 }[/math]. And [math]\displaystyle{ \boldsymbol{1}P=\boldsymbol{1} }[/math] since [math]\displaystyle{ P }[/math] is double stochastic, thus [math]\displaystyle{ v_1=\frac{\boldsymbol{1}}{\|\boldsymbol{1}\|_2}=\left(\frac{1}{\sqrt{N}},\ldots,\frac{1}{\sqrt{N}}\right) }[/math].
When [math]\displaystyle{ q }[/math] is a distribution, i.e., [math]\displaystyle{ q }[/math] is a nonnegative vector and [math]\displaystyle{ \|q\|_1=1 }[/math], it holds that [math]\displaystyle{ c_1=q^Tv_1=\frac{1}{\sqrt{N}} }[/math] and [math]\displaystyle{ c_1v_1=\left(\frac{1}{N},\ldots,\frac{1}{N}\right)=\pi }[/math], thus
- [math]\displaystyle{ q=\sum_{i=1}^Nc_iv_i=\pi+\sum_{i=2}^Nc_iv_i, }[/math]
and the distribution at time [math]\displaystyle{ t }[/math] when the initial distribution is [math]\displaystyle{ q }[/math], is given by
- [math]\displaystyle{ qP^t=\pi P^t+\sum_{i=2}^Nc_iv_iP^t=\pi+\sum_{i=2}^Nc_i\lambda_i^tv_i. }[/math]
It is easy to see that this distribution converges to [math]\displaystyle{ \pi }[/math] when the absolute values of [math]\displaystyle{ \lambda_2,\ldots,\lambda_N }[/math] are all less than 1. And the rate at which it converges to [math]\displaystyle{ \pi }[/math], namely the mixing rate, is determined by the quantity [math]\displaystyle{ \lambda_{\max}=\max\{|\lambda_2|,|\lambda_N|\}\, }[/math], which is the largest absolute eigenvalues other than [math]\displaystyle{ \lambda_1 }[/math].
Theorem - Let [math]\displaystyle{ P }[/math] be the transition matrix for a symmetric Markov chain on state space [math]\displaystyle{ \Omega }[/math] where [math]\displaystyle{ |\Omega|=N }[/math]. Let [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_N }[/math] be the spectrum of [math]\displaystyle{ P }[/math] and [math]\displaystyle{ \lambda_{\max}=\max\{|\lambda_2|,|\lambda_N|\}\, }[/math]. The mixing rate of the Markov chain is
- [math]\displaystyle{ \tau(\epsilon)\le\frac{\frac{1}{2}\ln N+\ln\frac{1}{2\epsilon}}{1-\lambda_{\text{max}}} }[/math].
- Let [math]\displaystyle{ P }[/math] be the transition matrix for a symmetric Markov chain on state space [math]\displaystyle{ \Omega }[/math] where [math]\displaystyle{ |\Omega|=N }[/math]. Let [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_N }[/math] be the spectrum of [math]\displaystyle{ P }[/math] and [math]\displaystyle{ \lambda_{\max}=\max\{|\lambda_2|,|\lambda_N|\}\, }[/math]. The mixing rate of the Markov chain is
Proof. As analysed above, if [math]\displaystyle{ P }[/math] is symmetric, it has orthonormal eigenvectors [math]\displaystyle{ v_1,\ldots,v_N }[/math] such that any distribution [math]\displaystyle{ q }[/math] over [math]\displaystyle{ \Omega }[/math] can be expressed as
- [math]\displaystyle{ q=\sum_{i=1}^Nc_iv_i=\pi+\sum_{i=2}^Nc_iv_i }[/math]
with [math]\displaystyle{ c_i=q^Tv_i }[/math], and
- [math]\displaystyle{ qP^t=\pi+\sum_{i=2}^Nc_i\lambda_i^tv_i. }[/math]
Thus,
- [math]\displaystyle{ \begin{align} \|qP^t-\pi\|_1 &= \left\|\sum_{i=2}^Nc_i\lambda_i^tv_i\right\|_1\\ &\le \sqrt{N}\left\|\sum_{i=2}^Nc_i\lambda_i^tv_i\right\|_2 &\quad\text{(Cauchy-Schwarz)}\\ &= \sqrt{N}\sqrt{\sum_{i=2}^Nc_i^2\lambda_i^{2t}}\\ &\le \sqrt{N}\lambda_{\max}^t\sqrt{\sum_{i=2}^Nc_i^2}\\ &= \sqrt{N}\lambda_{\max}^t\|q\|_2\\ &\le \sqrt{N}\lambda_{\max}^t. \end{align} }[/math]
The last inequality is due to a universal relation [math]\displaystyle{ \|q\|_2\le\|q\|_1 }[/math] and the fact that [math]\displaystyle{ q }[/math] is a distribution.
Then for any [math]\displaystyle{ x\in\Omega }[/math], denoted by [math]\displaystyle{ \boldsymbol{1}_x }[/math] the indicator vector for [math]\displaystyle{ x }[/math] such that [math]\displaystyle{ \boldsymbol{1}_x(x)=1 }[/math] and [math]\displaystyle{ \boldsymbol{1}_x(y)=0 }[/math] for [math]\displaystyle{ y\neq x }[/math], we have
- [math]\displaystyle{ \begin{align} \Delta_x(t) &=\left\|\boldsymbol{1}_x P^t-\pi\right\|_{TV}=\frac{1}{2}\left\|\boldsymbol{1}_x P^t-\pi\right\|_1\\ &\le\frac{\sqrt{N}}{2}\lambda_{\max}^t\le\frac{\sqrt{N}}{2}\mathrm{e}^{-t(1-\lambda_{\max})}. \end{align} }[/math]
Therefore, we have
- [math]\displaystyle{ \tau_x(\epsilon) =\min\{t\mid\Delta_x(t)\le\epsilon\} \le\frac{\frac{1}{2}\ln N+\ln\frac{1}{2\epsilon}}{1-\lambda_{\max}} }[/math]
for any [math]\displaystyle{ x\in\Omega }[/math], thus the bound holds for [math]\displaystyle{ \tau(\epsilon)=\max_{x}\tau_x(\epsilon) }[/math].
- [math]\displaystyle{ \square }[/math]