Spectral approach for symmetric chain

We consider the symmetric Markov chains defined on the state space ${\displaystyle \Omega }$, where the transition matrix ${\displaystyle P}$ is symmetric.

We have the following powerful spectral theorem for symmetric matrices.

 Theorem (Spectral theorem) Let ${\displaystyle S}$ be a symmetric ${\displaystyle n\times n}$ matrix, whose eigenvalues are ${\displaystyle \lambda _{1}\geq \lambda _{2}\geq \cdots \geq \lambda _{n}}$. There exist eigenvectors ${\displaystyle v_{1},v_{2},\ldots ,v_{n}}$ such that ${\displaystyle v_{i}S=\lambda _{i}v_{i}}$ for all ${\displaystyle i=1,\ldots ,n}$ and ${\displaystyle v_{1},v_{2},\ldots ,v_{n}}$ are orthonormal.

A set of orthonormal vectors ${\displaystyle v_{1},v_{2},\ldots ,v_{n}}$ satisfy that

• for any ${\displaystyle i\neq j}$, ${\displaystyle v_{i}}$ is orthogonal to ${\displaystyle v_{j}}$, which means that ${\displaystyle v_{i}^{T}v_{j}=0}$, denoted ${\displaystyle v_{i}\bot v_{j}}$;
• all ${\displaystyle v_{i}}$ have unit length, i.e., ${\displaystyle \|v_{i}\|_{2}=1}$.

Since the eigenvectors ${\displaystyle v_{1},v_{2},\ldots ,v_{n}}$ are orthonormal, we can use them as orthogonal basis, so that any vector ${\displaystyle x\in \mathbb {R} ^{n}}$ can be expressed as ${\displaystyle x=\sum _{i=1}^{n}c_{i}v_{i}}$ where ${\displaystyle c_{i}=q^{T}v_{i}}$, therefore

${\displaystyle xS=\sum _{i=1}^{n}c_{i}v_{i}S=\sum _{i=1}^{n}c_{i}\lambda _{i}v_{i}.}$

So multiplying by ${\displaystyle S}$ corresponds to multiplying the length of ${\displaystyle x}$ along the direction of every eigenvector by a factor of the corresponding eigenvalue.

Back to the symmetric Markov chain. Let ${\displaystyle \Omega }$ be a finite state space of size ${\displaystyle N}$, and ${\displaystyle P}$ be a symmetric transition matrix, whose eigenvalues are ${\displaystyle \lambda _{1}\geq \lambda _{2}\geq \ldots \geq \lambda _{N}}$. The followings hold for a symmetric transition matrix ${\displaystyle P}$:

• Due to the spectral theorem, there exist orthonormal eigenvectors ${\displaystyle v_{1},v_{2},\ldots ,v_{N}}$ such that ${\displaystyle v_{i}P=\lambda _{i}v_{i}}$ for ${\displaystyle i=1,2,\ldots ,N}$ and any distribution ${\displaystyle q}$ over ${\displaystyle \Omega }$ can be expressed as ${\displaystyle q=\sum _{i=1}^{N}c_{i}v_{i}}$ where ${\displaystyle c_{i}=q^{T}v_{i}}$.
• A symmetric ${\displaystyle P}$ must be double stochastic, thus the stationary distribution ${\displaystyle \pi }$ is the uniform distribution.

Recall that due to Perron-Frobenius theorem, ${\displaystyle \lambda _{1}=1}$. And ${\displaystyle {\boldsymbol {1}}P={\boldsymbol {1}}}$ since ${\displaystyle P}$ is double stochastic, thus ${\displaystyle v_{1}={\frac {\boldsymbol {1}}{\|{\boldsymbol {1}}\|_{2}}}=\left({\frac {1}{\sqrt {N}}},\ldots ,{\frac {1}{\sqrt {N}}}\right)}$.

When ${\displaystyle q}$ is a distribution, i.e., ${\displaystyle q}$ is a nonnegative vector and ${\displaystyle \|q\|_{1}=1}$, it holds that ${\displaystyle c_{1}=q^{T}v_{1}={\frac {1}{\sqrt {N}}}}$ and ${\displaystyle c_{1}v_{1}=\left({\frac {1}{N}},\ldots ,{\frac {1}{N}}\right)=\pi }$, thus

${\displaystyle q=\sum _{i=1}^{N}c_{i}v_{i}=\pi +\sum _{i=2}^{N}c_{i}v_{i},}$

and the distribution at time ${\displaystyle t}$ when the initial distribution is ${\displaystyle q}$, is given by

${\displaystyle qP^{t}=\pi P^{t}+\sum _{i=2}^{N}c_{i}v_{i}P^{t}=\pi +\sum _{i=2}^{N}c_{i}\lambda _{i}^{t}v_{i}.}$

It is easy to see that this distribution converges to ${\displaystyle \pi }$ when the absolute values of ${\displaystyle \lambda _{2},\ldots ,\lambda _{N}}$ are all less than 1. And the rate at which it converges to ${\displaystyle \pi }$, namely the mixing rate, is determined by the quantity ${\displaystyle \lambda _{\max }=\max\{|\lambda _{2}|,|\lambda _{N}|\}\,}$, which is the largest absolute eigenvalues other than ${\displaystyle \lambda _{1}}$.

 Theorem Let ${\displaystyle P}$ be the transition matrix for a symmetric Markov chain on state space ${\displaystyle \Omega }$ where ${\displaystyle |\Omega |=N}$. Let ${\displaystyle \lambda _{1}\geq \lambda _{2}\geq \cdots \geq \lambda _{N}}$ be the spectrum of ${\displaystyle P}$ and ${\displaystyle \lambda _{\max }=\max\{|\lambda _{2}|,|\lambda _{N}|\}\,}$. The mixing rate of the Markov chain is ${\displaystyle \tau (\epsilon )\leq {\frac {{\frac {1}{2}}\ln N+\ln {\frac {1}{2\epsilon }}}{1-\lambda _{\text{max}}}}}$.
Proof.
 As analysed above, if ${\displaystyle P}$ is symmetric, it has orthonormal eigenvectors ${\displaystyle v_{1},\ldots ,v_{N}}$ such that any distribution ${\displaystyle q}$ over ${\displaystyle \Omega }$ can be expressed as ${\displaystyle q=\sum _{i=1}^{N}c_{i}v_{i}=\pi +\sum _{i=2}^{N}c_{i}v_{i}}$ with ${\displaystyle c_{i}=q^{T}v_{i}}$, and ${\displaystyle qP^{t}=\pi +\sum _{i=2}^{N}c_{i}\lambda _{i}^{t}v_{i}.}$ Thus, {\displaystyle {\begin{aligned}\|qP^{t}-\pi \|_{1}&=\left\|\sum _{i=2}^{N}c_{i}\lambda _{i}^{t}v_{i}\right\|_{1}\\&\leq {\sqrt {N}}\left\|\sum _{i=2}^{N}c_{i}\lambda _{i}^{t}v_{i}\right\|_{2}&\quad {\text{(Cauchy-Schwarz)}}\\&={\sqrt {N}}{\sqrt {\sum _{i=2}^{N}c_{i}^{2}\lambda _{i}^{2t}}}\\&\leq {\sqrt {N}}\lambda _{\max }^{t}{\sqrt {\sum _{i=2}^{N}c_{i}^{2}}}\\&={\sqrt {N}}\lambda _{\max }^{t}\|q\|_{2}\\&\leq {\sqrt {N}}\lambda _{\max }^{t}.\end{aligned}}} The last inequality is due to a universal relation ${\displaystyle \|q\|_{2}\leq \|q\|_{1}}$ and the fact that ${\displaystyle q}$ is a distribution. Then for any ${\displaystyle x\in \Omega }$, denoted by ${\displaystyle {\boldsymbol {1}}_{x}}$ the indicator vector for ${\displaystyle x}$ such that ${\displaystyle {\boldsymbol {1}}_{x}(x)=1}$ and ${\displaystyle {\boldsymbol {1}}_{x}(y)=0}$ for ${\displaystyle y\neq x}$, we have {\displaystyle {\begin{aligned}\Delta _{x}(t)&=\left\|{\boldsymbol {1}}_{x}P^{t}-\pi \right\|_{TV}={\frac {1}{2}}\left\|{\boldsymbol {1}}_{x}P^{t}-\pi \right\|_{1}\\&\leq {\frac {\sqrt {N}}{2}}\lambda _{\max }^{t}\leq {\frac {\sqrt {N}}{2}}\mathrm {e} ^{-t(1-\lambda _{\max })}.\end{aligned}}} Therefore, we have ${\displaystyle \tau _{x}(\epsilon )=\min\{t\mid \Delta _{x}(t)\leq \epsilon \}\leq {\frac {{\frac {1}{2}}\ln N+\ln {\frac {1}{2\epsilon }}}{1-\lambda _{\max }}}}$ for any ${\displaystyle x\in \Omega }$, thus the bound holds for ${\displaystyle \tau (\epsilon )=\max _{x}\tau _{x}(\epsilon )}$.
${\displaystyle \square }$

Rapid mixing of expander walk

Let ${\displaystyle G(V,E)}$ be a ${\displaystyle d}$-regular graph on ${\displaystyle n}$ vertices. Let ${\displaystyle A}$ be its adjacency matrix. The transition matrix of the lazy random walk on ${\displaystyle G}$ is given by ${\displaystyle P={\frac {1}{2}}\left(I+{\frac {1}{d}}A\right)}$. Specifically,

${\displaystyle P(u,v)={\begin{cases}{\frac {1}{2}}&{\text{if }}u=v,\\{\frac {1}{2d}}&{\text{if }}uv\in E,\\0&{\text{otherwise.}}\end{cases}}}$

Obviously ${\displaystyle P}$ is symmetric.

Let ${\displaystyle \lambda _{1}\geq \lambda _{2}\geq \cdots \geq \lambda _{n}}$ be the eigenvalues of ${\displaystyle A}$, and ${\displaystyle \nu _{1}\geq \nu _{2}\geq \cdots \geq \nu _{n}}$ be the eigenvalues of ${\displaystyle P}$. It is easy to verify that

${\displaystyle \nu _{i}={\frac {1}{2}}\left(1+{\frac {\lambda _{i}}{d}}\right)}$.

We know that ${\displaystyle -d\leq \lambda _{i}\leq d}$, thus ${\displaystyle 0\leq \nu _{i}\leq 1}$. Therefore, ${\displaystyle \nu _{\max }=\max\{|\nu _{2}|,|\nu _{n}|\}=\nu _{2}\,}$. Due to the above analysis of symmetric Markov chain,

${\displaystyle \tau (\epsilon )\leq {\frac {{\frac {1}{2}}(\ln n+\ln {\frac {1}{2\epsilon }})}{1-\nu _{\max }}}={\frac {{\frac {1}{2}}(\ln n+\ln {\frac {1}{2\epsilon }})}{1-\nu _{2}}}={\frac {d(\ln n+\ln {\frac {1}{2\epsilon }})}{d-\lambda _{2}}}}$.

Thus we prove the following theorem for lazy random walk on ${\displaystyle d}$-regular graphs.

 Theorem Let ${\displaystyle G(V,E)}$ be a ${\displaystyle d}$-regular graph on ${\displaystyle n}$ vertices, with spectrum ${\displaystyle \lambda _{1}\geq \lambda _{2}\geq \cdots \geq \lambda _{n}}$. The mixing rate of lazy random walk on ${\displaystyle G}$ is ${\displaystyle \tau (\epsilon )\leq {\frac {d(\ln n+\ln {\frac {1}{2\epsilon }})}{d-\lambda _{2}}}}$.

Due to Cheeger's inequality, the spectral gap is bounded by expansion ratio as

${\displaystyle d-\lambda _{2}\geq {\frac {\phi ^{2}}{2d}},}$

where ${\displaystyle \phi =\phi (G)}$ is the expansion ratio of the graph ${\displaystyle G}$. Therefore, we have the following corollary which bounds the mixing time by graph expansion.

 Corollary Let ${\displaystyle G(V,E)}$ be a ${\displaystyle d}$-regular graph on ${\displaystyle n}$ vertices, whose expansion ratio is ${\displaystyle \phi =\phi (G)}$. The mixing rate of lazy random walk on ${\displaystyle G}$ is ${\displaystyle \tau (\epsilon )\leq {\frac {2d^{2}(\ln n+\ln {\frac {1}{2\epsilon }})}{\phi ^{2}}}}$. In particular, the mixing time is ${\displaystyle \tau _{\text{mix}}=\tau (1/2\mathrm {e} )\leq {\frac {2d^{2}(\ln n+2)}{\phi ^{2}}}}$.

For expander graphs, both ${\displaystyle d}$ and ${\displaystyle \phi }$ are constants. The mixing time of lazy random walk is ${\displaystyle \tau _{\text{mix}}=O(\ln n)}$ so the random walk is rapidly mixing.