随机算法 (Fall 2011)/Random Walk on Expander Graph: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>WikiSysop
(Created page with 'Given a <math>d</math>-regular graph <math>G</math> on <math>n</math> vertices with the spectrum <math>d=\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>, we denote <math>\lambd…')
 
imported>Etone
No edit summary
 
Line 1: Line 1:
Given a <math>d</math>-regular graph <math>G</math> on <math>n</math> vertices with the spectrum <math>d=\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>, we denote <math>\lambda_\max  = \max(|\lambda_2|,|\lambda_n|)\,</math>, which is the largest absolute value of eigenvalues except <math>\lambda_1=d</math>.  
= Spectral approach for symmetric chain =
We consider the '''symmetric Markov chains''' defined on the state space <math>\Omega</math>, where the transition matrix <math>P</math> is symmetric.  


Let <math>A</math> be the adjacency matrix of <math>G</math> and let <math>P=\frac{1}{d}A</math>. We know that <math>P</math> defines a transition matrix of a random walk on <math>G</math>, such that
We have the following powerful spectral theorem for symmetric matrices.
:<math>
 
P(u,v)=\begin{cases}
{{Theorem|Theorem (Spectral theorem)|
\frac{1}{d} & u\sim v,\\
:Let <math>S</math> be a symmetric <math>n\times n</math> matrix, whose eigenvalues are <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>. There exist eigenvectors <math>v_1,v_2,\ldots,v_n</math> such that <math>v_iS=\lambda_iv_i</math> for all <math>i=1,\ldots, n</math> and <math>v_1,v_2,\ldots,v_n</math> are '''orthonormal'''.
0 & u\not\sim v.
}}
\end{cases}
A set of orthonormal vectors <math>v_1,v_2,\ldots,v_n</math> satisfy that
</math>
* for any <math>i\neq j</math>, <math>v_i</math> is orthogonal to <math>v_j</math>, which means that <math>v_i^Tv_j=0</math>, denoted <math>v_i\bot v_j</math>;
* all <math>v_i</math> have unit length, i.e., <math>\|v_i\|_2=1</math>.
Since the eigenvectors <math>v_1,v_2,\ldots,v_n</math> are orthonormal, we can use them as orthogonal basis, so that any vector <math>x\in\mathbb{R}^n</math> can be expressed as <math>x=\sum_{i=1}^nc_iv_i</math> where <math>c_i=q^Tv_i</math>, therefore
:<math>xS=\sum_{i=1}^nc_iv_iS=\sum_{i=1}^nc_i\lambda_iv_i.</math>
So multiplying by <math>S</math> corresponds to multiplying the length of <math>x</math> along the direction of every eigenvector by a factor of the corresponding eigenvalue.
 
-----
Back to the symmetric Markov chain. Let <math>\Omega</math> be a finite state space of size <math>N</math>, and <math>P</math> be a symmetric transition matrix, whose eigenvalues are <math>\lambda_1\ge\lambda_2\ge\ldots\ge \lambda_N</math>. The followings hold for a symmetric transition matrix <math>P</math>:
*Due to the spectral theorem,  there exist orthonormal eigenvectors <math>v_1,v_2,\ldots,v_N</math> such that <math>v_iP=\lambda_iv_i</math> for <math>i=1,2,\ldots,N</math> and any distribution <math>q</math> over <math>\Omega</math> can be expressed as <math>q=\sum_{i=1}^Nc_iv_i</math> where <math>c_i=q^Tv_i</math>.
*A symmetric <math>P</math> must be double stochastic, thus the stationary distribution <math>\pi</math> is the uniform distribution.


Let <math>\rho_1\ge\rho_2\ge\cdots\ge\rho_n</math> be the spectrum of <math>P</math>. Since <math>P=\frac{1}{d}A</math>, it is obvious that
Recall that due to Perron-Frobenius theorem, <math>\lambda_1=1</math>. And <math>\boldsymbol{1}P=\boldsymbol{1}</math> since <math>P</math> is double stochastic, thus <math>v_1=\frac{\boldsymbol{1}}{\|\boldsymbol{1}\|_2}=\left(\frac{1}{\sqrt{N}},\ldots,\frac{1}{\sqrt{N}}\right)</math>.  
:<math>\rho_i=\frac{\lambda_i}{d}</math>.
Similarly, let <math>\rho_\max= \max(|\rho_2|,|\rho_n|)\,</math>.


{{Theorem
When <math>q</math> is a distribution, i.e., <math>q</math> is a nonnegative vector and <math>\|q\|_1=1</math>, it holds that <math>c_1=q^Tv_1=\frac{1}{\sqrt{N}}</math> and
|Lemma|
<math>c_1v_1=\left(\frac{1}{N},\ldots,\frac{1}{N}\right)=\pi</math>, thus
:Let <math>\pi=(\frac{1}{n},\ldots,\frac{1}{n})</math> be the uniform distribution. For any stochastic vector <math>q</math>,
::<math>\|qP-\pi\|_2\le\rho_\max</math>.
}}
{{Proof| Since <math>G</math> is <math>d</math>-regular, the uniform distribution <math>\pi</math> is stationary for <math>P</math>, i.e. <math>\pi P=\pi</math>. Then
:<math>
:<math>
\|qP-\pi\|_2=\|qP-\pi P\|_2=\|(q-\pi)P\|_2
q=\sum_{i=1}^Nc_iv_i=\pi+\sum_{i=2}^Nc_iv_i,
</math>
</math>
and because <math>(q-\pi)\cdot\mathbf{1}=0</math>, i.e. <math>(q-\pi)</math> is orthogonal to <math>\mathbf{1}</math>, its <math>\ell_2</math>-norm is shrinking by a factor of <math>\rho_\max</math> under the action of <math>P</math>. Consequently,
and the distribution at time <math>t</math> when the initial distribution is <math>q</math>, is given by
:<math>
:<math>
\|(q-\pi)P\|_2\le\rho_\max\|(q-\pi)\|_2\le\rho_\max.
qP^t=\pi P^t+\sum_{i=2}^Nc_iv_iP^t=\pi+\sum_{i=2}^Nc_i\lambda_i^tv_i.
</math>
</math>
The last inequality is due to the fact that both <math>q</math> and <math>\pi</math> are stochastic vectors.
It is easy to see that this distribution converges to <math>\pi</math> when the absolute values of <math>\lambda_2,\ldots,\lambda_N</math> are all less than 1. And the rate at which it converges to <math>\pi</math>, namely the mixing rate, is determined by the quantity <math>\lambda_{\max}=\max\{|\lambda_2|,|\lambda_N|\}\,</math>, which is the largest absolute eigenvalues other than <math>\lambda_1</math>.
}}


Using this lemma, we can prove the following theorem.
{{Theorem|Theorem|
{{Theorem
:Let <math>P</math> be the transition matrix for a symmetric Markov chain on state space <math>\Omega</math> where <math>|\Omega|=N</math>. Let <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_N</math> be the spectrum of <math>P</math> and <math>\lambda_{\max}=\max\{|\lambda_2|,|\lambda_N|\}\,</math>. The mixing rate of the Markov chain is
|Theorem|
::<math>\tau(\epsilon)\le\frac{\frac{1}{2}\ln N+\ln\frac{1}{2\epsilon}}{1-\lambda_{\text{max}}}</math>.
:Let <math>A</math> be the adjacency matrix of a <math>d</math>-regular graph <math>G</math> with the spectrum <math>d=\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>, and let <math>\lambda_\max = \max(|\lambda_2|,|\lambda_n|)\,</math>.  
:Let <math>P=\frac{1}{d}A</math> be the transition matrix of the random walk on <math>G</math>, and <math>\pi=(\frac{1}{n},\ldots,\frac{1}{n})</math> be the uniform distribution. For any initial distribution <math>q</math>,
::<math>\|qP^t-\pi\|_2\le\rho_\max^t=\left(\frac{\lambda_\max}{d}\right)^t</math>.
}}
}}
{{Proof| Prove by induction on <math>t</math> with the lemma.}}
{{Proof|
 
As analysed above, if <math>P</math> is symmetric, it has orthonormal eigenvectors <math>v_1,\ldots,v_N</math> such that any distribution <math>q</math> over <math>\Omega</math> can be expressed as
The next corollary follows immediately (due to [http://en.wikipedia.org/wiki/Cauchy-Schwartz Cauchy-Schwartz] theorem) from the above theorem.
:<math>q=\sum_{i=1}^Nc_iv_i=\pi+\sum_{i=2}^Nc_iv_i</math>  
{{Theorem
with <math>c_i=q^Tv_i</math>, and
|Corollary|
:<math>
:Let <math>\pi=(\frac{1}{n},\ldots,\frac{1}{n})</math> be the uniform distribution. For any initial distribution <math>q</math>,
qP^t=\pi+\sum_{i=2}^Nc_i\lambda_i^tv_i.
::<math>\|qP^t-\pi\|_1\le\sqrt{n}\|qP^t-\pi\|_2\le\sqrt{n}\cdot\left(\frac{\lambda_\max}{d}\right)^t</math>.
</math>
}}
Thus,
 
So the random walk on a graph converges exponentially fast if <math>\frac{\lambda_\max}{d}</math> is a constant <math><1</math>. This is true when <math>\lambda_2\ge|\lambda_n|</math> and the spectral gap <math>(d-\lambda_2)</math> is a constant. This seems not a very nice condition. This situation can be changed by considering the '''lazy random walk'''.
 
 
;Lazy random walk
Recall that for the lazy random walk, at each step, we first flip a fair coin to decide whether to walk or to stay. The corresponding transition matrix is
:<math>
:<math>
Q=\frac{1}{2}(I+P)=\frac{1}{2}\left(I+\frac{1}{d}A\right).
\begin{align}
\|qP^t-\pi\|_1
&=
\left\|\sum_{i=2}^Nc_i\lambda_i^tv_i\right\|_1\\
&\le
\sqrt{N}\left\|\sum_{i=2}^Nc_i\lambda_i^tv_i\right\|_2 &\quad\text{(Cauchy-Schwarz)}\\
&=
\sqrt{N}\sqrt{\sum_{i=2}^Nc_i^2\lambda_i^{2t}}\\
&\le
\sqrt{N}\lambda_{\max}^t\sqrt{\sum_{i=2}^Nc_i^2}\\
&=
\sqrt{N}\lambda_{\max}^t\|q\|_2\\
&\le
\sqrt{N}\lambda_{\max}^t.
\end{align}
</math>
</math>
The spectrum of <math>Q</math> is given by
The last inequality is due to a universal relation <math>\|q\|_2\le\|q\|_1</math> and the fact that <math>q</math> is a distribution.
 
Then for any <math>x\in\Omega</math>, denoted by <math>\boldsymbol{1}_x</math> the indicator vector for <math>x</math> such that <math>\boldsymbol{1}_x(x)=1</math> and <math>\boldsymbol{1}_x(y)=0</math> for <math>y\neq x</math>, we have
:<math>
:<math>
\rho'_i=\frac{1+\frac{1}{d}\lambda_i}{2}
\begin{align}
\Delta_x(t)
&=\left\|\boldsymbol{1}_x P^t-\pi\right\|_{TV}=\frac{1}{2}\left\|\boldsymbol{1}_x P^t-\pi\right\|_1\\
&\le\frac{\sqrt{N}}{2}\lambda_{\max}^t\le\frac{\sqrt{N}}{2}\mathrm{e}^{-t(1-\lambda_{\max})}.
\end{align}
</math>
</math>
where <math>\lambda_i</math> is the <math>i</math>th largest eigenvalue of the adjacency matrix <math>A</math>. For <math>d</math>-regular graph, it holds that
Therefore, we have
:<math>
:<math>
1=\rho_1'\ge \rho_2'\ge\cdots\ge\rho_n'\ge 0.
\tau_x(\epsilon)
=\min\{t\mid\Delta_x(t)\le\epsilon\}
\le\frac{\frac{1}{2}\ln N+\ln\frac{1}{2\epsilon}}{1-\lambda_{\max}}
</math>
</math>
The last inequality is due to that <math>|\lambda_i|\le d</math>.  
for any <math>x\in\Omega</math>, thus the bound holds for <math>\tau(\epsilon)=\max_{x}\tau_x(\epsilon)</math>.
}}
 
= Rapid mixing of expander walk =
Let <math>G(V,E)</math> be a <math>d</math>-regular graph on <math>n</math> vertices. Let <math>A</math> be its adjacency matrix. The transition matrix of the lazy random walk on <math>G</math> is given by <math>P=\frac{1}{2}\left(I+\frac{1}{d}A\right)</math>. Specifically,
:<math>P(u,v)=\begin{cases}
\frac{1}{2} & \text{if }u=v,\\
\frac{1}{2d} & \text{if }uv\in E,\\
0 & \text{otherwise.}
\end{cases}</math>
Obviously <math>P</math> is symmetric.


Therefore, for the lazy walk, it always holds that <math>\rho_\max'=\max(|\rho_2'|,|\rho_n'|)=\rho_2'=\frac{1+\frac{1}{d}\lambda_2}{2}</math>.
Let <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math> be the eigenvalues of <math>A</math>, and <math>\nu_1\ge\nu_2\ge\cdots\ge\nu_n</math> be the eigenvalues of <math>P</math>. It is easy to verify that
:<math>\nu_i=\frac{1}{2}\left(1+\frac{\lambda_i}{d}\right)</math>.
We know that <math>-d\le\lambda_i\le d</math>, thus <math>0\le\nu_i\le1</math>. Therefore, <math>\nu_{\max}=\max\{|\nu_2|,|\nu_n|\}=\nu_2\,</math>. Due to the above analysis of symmetric Markov chain,
:<math>\tau(\epsilon)\le\frac{\frac{1}{2}(\ln n+\ln\frac{1}{2\epsilon})}{1-\nu_{\max}}=\frac{\frac{1}{2}(\ln n+\ln\frac{1}{2\epsilon})}{1-\nu_2}=\frac{d(\ln n+\ln\frac{1}{2\epsilon})}{d-\lambda_2}</math>.
Thus we prove the following theorem for lazy random walk on <math>d</math>-regular graphs.
{{Theorem|Theorem|
:Let <math>G(V,E)</math> be a <math>d</math>-regular graph on <math>n</math> vertices, with spectrum <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>. The mixing rate of lazy random walk on <math>G</math> is
::<math>\tau(\epsilon)\le\frac{d(\ln n+\ln\frac{1}{2\epsilon})}{d-\lambda_2}</math>.
}}


{{Theorem
Due to Cheeger's inequality, the spectral gap is bounded by expansion ratio as
|Theorem (lazy random walk on expander)|
:<math>d-\lambda_2\ge\frac{\phi^2}{2d},</math>
:Given a <math>d</math>-regular graph <math>G</math> with the spectrum <math>d=\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>, let <math>Q</math> be the transition matrix of the '''lazy random walk''' on <math>G</math>, and <math>\pi=(\frac{1}{n},\ldots,\frac{1}{n})</math> be the uniform distribution. For any initial distribution <math>q</math>,
where <math>\phi=\phi(G)</math> is the expansion ratio of the graph <math>G</math>. Therefore, we have the following corollary which bounds the mixing time by graph expansion.
::<math>\|qQ^t-\pi\|_1\le\sqrt{n}\left(\frac{1+\frac{1}{d}\lambda_2}{2}\right)^t</math>.
{{Theorem|Corollary|
:In particular, the lazy random walk converges exponentially fast when the spectral gap <math>(d-\lambda_2)</math> is a constant <math>>0</math>.
:Let <math>G(V,E)</math> be a <math>d</math>-regular graph on <math>n</math> vertices, whose expansion ratio is <math>\phi=\phi(G)</math>. The mixing rate of lazy random walk on <math>G</math> is
::<math>\tau(\epsilon)\le\frac{2d^2(\ln n+\ln\frac{1}{2\epsilon})}{\phi^2}</math>.
:In particular, the mixing time is
::<math>\tau_{\text{mix}}=\tau(1/2\mathrm{e})\le\frac{2d^2(\ln n+2)}{\phi^2}</math>.
}}
}}
Due to the Cheeger's inequality, expander graphs have constant spectral gap, thus the lazy random walk on an expander graph converges exponentially fast.
 
For expander graphs, both <math>d</math> and <math>\phi</math> are constants. The mixing time of lazy random walk is <math>\tau_{\text{mix}}=O(\ln n)</math> so the random walk is rapidly mixing.

Latest revision as of 15:17, 13 December 2011

Spectral approach for symmetric chain

We consider the symmetric Markov chains defined on the state space [math]\displaystyle{ \Omega }[/math], where the transition matrix [math]\displaystyle{ P }[/math] is symmetric.

We have the following powerful spectral theorem for symmetric matrices.

Theorem (Spectral theorem)
Let [math]\displaystyle{ S }[/math] be a symmetric [math]\displaystyle{ n\times n }[/math] matrix, whose eigenvalues are [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. There exist eigenvectors [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] such that [math]\displaystyle{ v_iS=\lambda_iv_i }[/math] for all [math]\displaystyle{ i=1,\ldots, n }[/math] and [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] are orthonormal.

A set of orthonormal vectors [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] satisfy that

  • for any [math]\displaystyle{ i\neq j }[/math], [math]\displaystyle{ v_i }[/math] is orthogonal to [math]\displaystyle{ v_j }[/math], which means that [math]\displaystyle{ v_i^Tv_j=0 }[/math], denoted [math]\displaystyle{ v_i\bot v_j }[/math];
  • all [math]\displaystyle{ v_i }[/math] have unit length, i.e., [math]\displaystyle{ \|v_i\|_2=1 }[/math].

Since the eigenvectors [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] are orthonormal, we can use them as orthogonal basis, so that any vector [math]\displaystyle{ x\in\mathbb{R}^n }[/math] can be expressed as [math]\displaystyle{ x=\sum_{i=1}^nc_iv_i }[/math] where [math]\displaystyle{ c_i=q^Tv_i }[/math], therefore

[math]\displaystyle{ xS=\sum_{i=1}^nc_iv_iS=\sum_{i=1}^nc_i\lambda_iv_i. }[/math]

So multiplying by [math]\displaystyle{ S }[/math] corresponds to multiplying the length of [math]\displaystyle{ x }[/math] along the direction of every eigenvector by a factor of the corresponding eigenvalue.


Back to the symmetric Markov chain. Let [math]\displaystyle{ \Omega }[/math] be a finite state space of size [math]\displaystyle{ N }[/math], and [math]\displaystyle{ P }[/math] be a symmetric transition matrix, whose eigenvalues are [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\ldots\ge \lambda_N }[/math]. The followings hold for a symmetric transition matrix [math]\displaystyle{ P }[/math]:

  • Due to the spectral theorem, there exist orthonormal eigenvectors [math]\displaystyle{ v_1,v_2,\ldots,v_N }[/math] such that [math]\displaystyle{ v_iP=\lambda_iv_i }[/math] for [math]\displaystyle{ i=1,2,\ldots,N }[/math] and any distribution [math]\displaystyle{ q }[/math] over [math]\displaystyle{ \Omega }[/math] can be expressed as [math]\displaystyle{ q=\sum_{i=1}^Nc_iv_i }[/math] where [math]\displaystyle{ c_i=q^Tv_i }[/math].
  • A symmetric [math]\displaystyle{ P }[/math] must be double stochastic, thus the stationary distribution [math]\displaystyle{ \pi }[/math] is the uniform distribution.

Recall that due to Perron-Frobenius theorem, [math]\displaystyle{ \lambda_1=1 }[/math]. And [math]\displaystyle{ \boldsymbol{1}P=\boldsymbol{1} }[/math] since [math]\displaystyle{ P }[/math] is double stochastic, thus [math]\displaystyle{ v_1=\frac{\boldsymbol{1}}{\|\boldsymbol{1}\|_2}=\left(\frac{1}{\sqrt{N}},\ldots,\frac{1}{\sqrt{N}}\right) }[/math].

When [math]\displaystyle{ q }[/math] is a distribution, i.e., [math]\displaystyle{ q }[/math] is a nonnegative vector and [math]\displaystyle{ \|q\|_1=1 }[/math], it holds that [math]\displaystyle{ c_1=q^Tv_1=\frac{1}{\sqrt{N}} }[/math] and [math]\displaystyle{ c_1v_1=\left(\frac{1}{N},\ldots,\frac{1}{N}\right)=\pi }[/math], thus

[math]\displaystyle{ q=\sum_{i=1}^Nc_iv_i=\pi+\sum_{i=2}^Nc_iv_i, }[/math]

and the distribution at time [math]\displaystyle{ t }[/math] when the initial distribution is [math]\displaystyle{ q }[/math], is given by

[math]\displaystyle{ qP^t=\pi P^t+\sum_{i=2}^Nc_iv_iP^t=\pi+\sum_{i=2}^Nc_i\lambda_i^tv_i. }[/math]

It is easy to see that this distribution converges to [math]\displaystyle{ \pi }[/math] when the absolute values of [math]\displaystyle{ \lambda_2,\ldots,\lambda_N }[/math] are all less than 1. And the rate at which it converges to [math]\displaystyle{ \pi }[/math], namely the mixing rate, is determined by the quantity [math]\displaystyle{ \lambda_{\max}=\max\{|\lambda_2|,|\lambda_N|\}\, }[/math], which is the largest absolute eigenvalues other than [math]\displaystyle{ \lambda_1 }[/math].

Theorem
Let [math]\displaystyle{ P }[/math] be the transition matrix for a symmetric Markov chain on state space [math]\displaystyle{ \Omega }[/math] where [math]\displaystyle{ |\Omega|=N }[/math]. Let [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_N }[/math] be the spectrum of [math]\displaystyle{ P }[/math] and [math]\displaystyle{ \lambda_{\max}=\max\{|\lambda_2|,|\lambda_N|\}\, }[/math]. The mixing rate of the Markov chain is
[math]\displaystyle{ \tau(\epsilon)\le\frac{\frac{1}{2}\ln N+\ln\frac{1}{2\epsilon}}{1-\lambda_{\text{max}}} }[/math].
Proof.

As analysed above, if [math]\displaystyle{ P }[/math] is symmetric, it has orthonormal eigenvectors [math]\displaystyle{ v_1,\ldots,v_N }[/math] such that any distribution [math]\displaystyle{ q }[/math] over [math]\displaystyle{ \Omega }[/math] can be expressed as

[math]\displaystyle{ q=\sum_{i=1}^Nc_iv_i=\pi+\sum_{i=2}^Nc_iv_i }[/math]

with [math]\displaystyle{ c_i=q^Tv_i }[/math], and

[math]\displaystyle{ qP^t=\pi+\sum_{i=2}^Nc_i\lambda_i^tv_i. }[/math]

Thus,

[math]\displaystyle{ \begin{align} \|qP^t-\pi\|_1 &= \left\|\sum_{i=2}^Nc_i\lambda_i^tv_i\right\|_1\\ &\le \sqrt{N}\left\|\sum_{i=2}^Nc_i\lambda_i^tv_i\right\|_2 &\quad\text{(Cauchy-Schwarz)}\\ &= \sqrt{N}\sqrt{\sum_{i=2}^Nc_i^2\lambda_i^{2t}}\\ &\le \sqrt{N}\lambda_{\max}^t\sqrt{\sum_{i=2}^Nc_i^2}\\ &= \sqrt{N}\lambda_{\max}^t\|q\|_2\\ &\le \sqrt{N}\lambda_{\max}^t. \end{align} }[/math]

The last inequality is due to a universal relation [math]\displaystyle{ \|q\|_2\le\|q\|_1 }[/math] and the fact that [math]\displaystyle{ q }[/math] is a distribution.

Then for any [math]\displaystyle{ x\in\Omega }[/math], denoted by [math]\displaystyle{ \boldsymbol{1}_x }[/math] the indicator vector for [math]\displaystyle{ x }[/math] such that [math]\displaystyle{ \boldsymbol{1}_x(x)=1 }[/math] and [math]\displaystyle{ \boldsymbol{1}_x(y)=0 }[/math] for [math]\displaystyle{ y\neq x }[/math], we have

[math]\displaystyle{ \begin{align} \Delta_x(t) &=\left\|\boldsymbol{1}_x P^t-\pi\right\|_{TV}=\frac{1}{2}\left\|\boldsymbol{1}_x P^t-\pi\right\|_1\\ &\le\frac{\sqrt{N}}{2}\lambda_{\max}^t\le\frac{\sqrt{N}}{2}\mathrm{e}^{-t(1-\lambda_{\max})}. \end{align} }[/math]

Therefore, we have

[math]\displaystyle{ \tau_x(\epsilon) =\min\{t\mid\Delta_x(t)\le\epsilon\} \le\frac{\frac{1}{2}\ln N+\ln\frac{1}{2\epsilon}}{1-\lambda_{\max}} }[/math]

for any [math]\displaystyle{ x\in\Omega }[/math], thus the bound holds for [math]\displaystyle{ \tau(\epsilon)=\max_{x}\tau_x(\epsilon) }[/math].

[math]\displaystyle{ \square }[/math]

Rapid mixing of expander walk

Let [math]\displaystyle{ G(V,E) }[/math] be a [math]\displaystyle{ d }[/math]-regular graph on [math]\displaystyle{ n }[/math] vertices. Let [math]\displaystyle{ A }[/math] be its adjacency matrix. The transition matrix of the lazy random walk on [math]\displaystyle{ G }[/math] is given by [math]\displaystyle{ P=\frac{1}{2}\left(I+\frac{1}{d}A\right) }[/math]. Specifically,

[math]\displaystyle{ P(u,v)=\begin{cases} \frac{1}{2} & \text{if }u=v,\\ \frac{1}{2d} & \text{if }uv\in E,\\ 0 & \text{otherwise.} \end{cases} }[/math]

Obviously [math]\displaystyle{ P }[/math] is symmetric.

Let [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math] be the eigenvalues of [math]\displaystyle{ A }[/math], and [math]\displaystyle{ \nu_1\ge\nu_2\ge\cdots\ge\nu_n }[/math] be the eigenvalues of [math]\displaystyle{ P }[/math]. It is easy to verify that

[math]\displaystyle{ \nu_i=\frac{1}{2}\left(1+\frac{\lambda_i}{d}\right) }[/math].

We know that [math]\displaystyle{ -d\le\lambda_i\le d }[/math], thus [math]\displaystyle{ 0\le\nu_i\le1 }[/math]. Therefore, [math]\displaystyle{ \nu_{\max}=\max\{|\nu_2|,|\nu_n|\}=\nu_2\, }[/math]. Due to the above analysis of symmetric Markov chain,

[math]\displaystyle{ \tau(\epsilon)\le\frac{\frac{1}{2}(\ln n+\ln\frac{1}{2\epsilon})}{1-\nu_{\max}}=\frac{\frac{1}{2}(\ln n+\ln\frac{1}{2\epsilon})}{1-\nu_2}=\frac{d(\ln n+\ln\frac{1}{2\epsilon})}{d-\lambda_2} }[/math].

Thus we prove the following theorem for lazy random walk on [math]\displaystyle{ d }[/math]-regular graphs.

Theorem
Let [math]\displaystyle{ G(V,E) }[/math] be a [math]\displaystyle{ d }[/math]-regular graph on [math]\displaystyle{ n }[/math] vertices, with spectrum [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. The mixing rate of lazy random walk on [math]\displaystyle{ G }[/math] is
[math]\displaystyle{ \tau(\epsilon)\le\frac{d(\ln n+\ln\frac{1}{2\epsilon})}{d-\lambda_2} }[/math].

Due to Cheeger's inequality, the spectral gap is bounded by expansion ratio as

[math]\displaystyle{ d-\lambda_2\ge\frac{\phi^2}{2d}, }[/math]

where [math]\displaystyle{ \phi=\phi(G) }[/math] is the expansion ratio of the graph [math]\displaystyle{ G }[/math]. Therefore, we have the following corollary which bounds the mixing time by graph expansion.

Corollary
Let [math]\displaystyle{ G(V,E) }[/math] be a [math]\displaystyle{ d }[/math]-regular graph on [math]\displaystyle{ n }[/math] vertices, whose expansion ratio is [math]\displaystyle{ \phi=\phi(G) }[/math]. The mixing rate of lazy random walk on [math]\displaystyle{ G }[/math] is
[math]\displaystyle{ \tau(\epsilon)\le\frac{2d^2(\ln n+\ln\frac{1}{2\epsilon})}{\phi^2} }[/math].
In particular, the mixing time is
[math]\displaystyle{ \tau_{\text{mix}}=\tau(1/2\mathrm{e})\le\frac{2d^2(\ln n+2)}{\phi^2} }[/math].

For expander graphs, both [math]\displaystyle{ d }[/math] and [math]\displaystyle{ \phi }[/math] are constants. The mixing time of lazy random walk is [math]\displaystyle{ \tau_{\text{mix}}=O(\ln n) }[/math] so the random walk is rapidly mixing.