随机算法 (Spring 2013)/Expander Graphs and Mixing: Difference between revisions
imported>Etone No edit summary |
imported>Etone No edit summary |
||
Line 461: | Line 461: | ||
}} | }} | ||
}} | }} | ||
= Spectral approach for symmetric chain = | |||
We consider the '''symmetric Markov chains''' defined on the state space <math>\Omega</math>, where the transition matrix <math>P</math> is symmetric. | |||
We have the following powerful spectral theorem for symmetric matrices. | |||
{{Theorem|Theorem (Spectral theorem)| | |||
:Let <math>S</math> be a symmetric <math>n\times n</math> matrix, whose eigenvalues are <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>. There exist eigenvectors <math>v_1,v_2,\ldots,v_n</math> such that <math>v_iS=\lambda_iv_i</math> for all <math>i=1,\ldots, n</math> and <math>v_1,v_2,\ldots,v_n</math> are '''orthonormal'''. | |||
}} | |||
A set of orthonormal vectors <math>v_1,v_2,\ldots,v_n</math> satisfy that | |||
* for any <math>i\neq j</math>, <math>v_i</math> is orthogonal to <math>v_j</math>, which means that <math>v_i^Tv_j=0</math>, denoted <math>v_i\bot v_j</math>; | |||
* all <math>v_i</math> have unit length, i.e., <math>\|v_i\|_2=1</math>. | |||
Since the eigenvectors <math>v_1,v_2,\ldots,v_n</math> are orthonormal, we can use them as orthogonal basis, so that any vector <math>x\in\mathbb{R}^n</math> can be expressed as <math>x=\sum_{i=1}^nc_iv_i</math> where <math>c_i=q^Tv_i</math>, therefore | |||
:<math>xS=\sum_{i=1}^nc_iv_iS=\sum_{i=1}^nc_i\lambda_iv_i.</math> | |||
So multiplying by <math>S</math> corresponds to multiplying the length of <math>x</math> along the direction of every eigenvector by a factor of the corresponding eigenvalue. | |||
----- | |||
Back to the symmetric Markov chain. Let <math>\Omega</math> be a finite state space of size <math>N</math>, and <math>P</math> be a symmetric transition matrix, whose eigenvalues are <math>\lambda_1\ge\lambda_2\ge\ldots\ge \lambda_N</math>. The followings hold for a symmetric transition matrix <math>P</math>: | |||
*Due to the spectral theorem, there exist orthonormal eigenvectors <math>v_1,v_2,\ldots,v_N</math> such that <math>v_iP=\lambda_iv_i</math> for <math>i=1,2,\ldots,N</math> and any distribution <math>q</math> over <math>\Omega</math> can be expressed as <math>q=\sum_{i=1}^Nc_iv_i</math> where <math>c_i=q^Tv_i</math>. | |||
*A symmetric <math>P</math> must be double stochastic, thus the stationary distribution <math>\pi</math> is the uniform distribution. | |||
Recall that due to Perron-Frobenius theorem, <math>\lambda_1=1</math>. And <math>\boldsymbol{1}P=\boldsymbol{1}</math> since <math>P</math> is double stochastic, thus <math>v_1=\frac{\boldsymbol{1}}{\|\boldsymbol{1}\|_2}=\left(\frac{1}{\sqrt{N}},\ldots,\frac{1}{\sqrt{N}}\right)</math>. | |||
When <math>q</math> is a distribution, i.e., <math>q</math> is a nonnegative vector and <math>\|q\|_1=1</math>, it holds that <math>c_1=q^Tv_1=\frac{1}{\sqrt{N}}</math> and | |||
<math>c_1v_1=\left(\frac{1}{N},\ldots,\frac{1}{N}\right)=\pi</math>, thus | |||
:<math> | |||
q=\sum_{i=1}^Nc_iv_i=\pi+\sum_{i=2}^Nc_iv_i, | |||
</math> | |||
and the distribution at time <math>t</math> when the initial distribution is <math>q</math>, is given by | |||
:<math> | |||
qP^t=\pi P^t+\sum_{i=2}^Nc_iv_iP^t=\pi+\sum_{i=2}^Nc_i\lambda_i^tv_i. | |||
</math> | |||
It is easy to see that this distribution converges to <math>\pi</math> when the absolute values of <math>\lambda_2,\ldots,\lambda_N</math> are all less than 1. And the rate at which it converges to <math>\pi</math>, namely the mixing rate, is determined by the quantity <math>\lambda_{\max}=\max\{|\lambda_2|,|\lambda_N|\}\,</math>, which is the largest absolute eigenvalues other than <math>\lambda_1</math>. | |||
{{Theorem|Theorem| | |||
:Let <math>P</math> be the transition matrix for a symmetric Markov chain on state space <math>\Omega</math> where <math>|\Omega|=N</math>. Let <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_N</math> be the spectrum of <math>P</math> and <math>\lambda_{\max}=\max\{|\lambda_2|,|\lambda_N|\}\,</math>. The mixing rate of the Markov chain is | |||
::<math>\tau(\epsilon)\le\frac{\frac{1}{2}\ln N+\ln\frac{1}{2\epsilon}}{1-\lambda_{\text{max}}}</math>. | |||
}} | |||
{{Proof| | |||
As analysed above, if <math>P</math> is symmetric, it has orthonormal eigenvectors <math>v_1,\ldots,v_N</math> such that any distribution <math>q</math> over <math>\Omega</math> can be expressed as | |||
:<math>q=\sum_{i=1}^Nc_iv_i=\pi+\sum_{i=2}^Nc_iv_i</math> | |||
with <math>c_i=q^Tv_i</math>, and | |||
:<math> | |||
qP^t=\pi+\sum_{i=2}^Nc_i\lambda_i^tv_i. | |||
</math> | |||
Thus, | |||
:<math> | |||
\begin{align} | |||
\|qP^t-\pi\|_1 | |||
&= | |||
\left\|\sum_{i=2}^Nc_i\lambda_i^tv_i\right\|_1\ | |||
&\le | |||
\sqrt{N}\left\|\sum_{i=2}^Nc_i\lambda_i^tv_i\right\|_2 &\quad\text{(Cauchy-Schwarz)}\ | |||
&= | |||
\sqrt{N}\sqrt{\sum_{i=2}^Nc_i^2\lambda_i^{2t}}\ | |||
&\le | |||
\sqrt{N}\lambda_{\max}^t\sqrt{\sum_{i=2}^Nc_i^2}\ | |||
&= | |||
\sqrt{N}\lambda_{\max}^t\|q\|_2\ | |||
&\le | |||
\sqrt{N}\lambda_{\max}^t. | |||
\end{align} | |||
</math> | |||
The last inequality is due to a universal relation <math>\|q\|_2\le\|q\|_1</math> and the fact that <math>q</math> is a distribution. | |||
Then for any <math>x\in\Omega</math>, denoted by <math>\boldsymbol{1}_x</math> the indicator vector for <math>x</math> such that <math>\boldsymbol{1}_x(x)=1</math> and <math>\boldsymbol{1}_x(y)=0</math> for <math>y\neq x</math>, we have | |||
:<math> | |||
\begin{align} | |||
\Delta_x(t) | |||
&=\left\|\boldsymbol{1}_x P^t-\pi\right\|_{TV}=\frac{1}{2}\left\|\boldsymbol{1}_x P^t-\pi\right\|_1\ | |||
&\le\frac{\sqrt{N}}{2}\lambda_{\max}^t\le\frac{\sqrt{N}}{2}\mathrm{e}^{-t(1-\lambda_{\max})}. | |||
\end{align} | |||
</math> | |||
Therefore, we have | |||
:<math> | |||
\tau_x(\epsilon) | |||
=\min\{t\mid\Delta_x(t)\le\epsilon\} | |||
\le\frac{\frac{1}{2}\ln N+\ln\frac{1}{2\epsilon}}{1-\lambda_{\max}} | |||
</math> | |||
for any <math>x\in\Omega</math>, thus the bound holds for <math>\tau(\epsilon)=\max_{x}\tau_x(\epsilon)</math>. | |||
}} | |||
== Rapid mixing of expander walk == | |||
Let <math>G(V,E)</math> be a <math>d</math>-regular graph on <math>n</math> vertices. Let <math>A</math> be its adjacency matrix. The transition matrix of the lazy random walk on <math>G</math> is given by <math>P=\frac{1}{2}\left(I+\frac{1}{d}A\right)</math>. Specifically, | |||
:<math>P(u,v)=\begin{cases} | |||
\frac{1}{2} & \text{if }u=v,\ | |||
\frac{1}{2d} & \text{if }uv\in E,\ | |||
0 & \text{otherwise.} | |||
\end{cases}</math> | |||
Obviously <math>P</math> is symmetric. | |||
Let <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math> be the eigenvalues of <math>A</math>, and <math>\nu_1\ge\nu_2\ge\cdots\ge\nu_n</math> be the eigenvalues of <math>P</math>. It is easy to verify that | |||
:<math>\nu_i=\frac{1}{2}\left(1+\frac{\lambda_i}{d}\right)</math>. | |||
We know that <math>-d\le\lambda_i\le d</math>, thus <math>0\le\nu_i\le1</math>. Therefore, <math>\nu_{\max}=\max\{|\nu_2|,|\nu_n|\}=\nu_2\,</math>. Due to the above analysis of symmetric Markov chain, | |||
:<math>\tau(\epsilon)\le\frac{\frac{1}{2}(\ln n+\ln\frac{1}{2\epsilon})}{1-\nu_{\max}}=\frac{\frac{1}{2}(\ln n+\ln\frac{1}{2\epsilon})}{1-\nu_2}=\frac{d(\ln n+\ln\frac{1}{2\epsilon})}{d-\lambda_2}</math>. | |||
Thus we prove the following theorem for lazy random walk on <math>d</math>-regular graphs. | |||
{{Theorem|Theorem| | |||
:Let <math>G(V,E)</math> be a <math>d</math>-regular graph on <math>n</math> vertices, with spectrum <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>. The mixing rate of lazy random walk on <math>G</math> is | |||
::<math>\tau(\epsilon)\le\frac{d(\ln n+\ln\frac{1}{2\epsilon})}{d-\lambda_2}</math>. | |||
}} | |||
Due to Cheeger's inequality, the spectral gap is bounded by expansion ratio as | |||
:<math>d-\lambda_2\ge\frac{\phi^2}{2d},</math> | |||
where <math>\phi=\phi(G)</math> is the expansion ratio of the graph <math>G</math>. Therefore, we have the following corollary which bounds the mixing time by graph expansion. | |||
{{Theorem|Corollary| | |||
:Let <math>G(V,E)</math> be a <math>d</math>-regular graph on <math>n</math> vertices, whose expansion ratio is <math>\phi=\phi(G)</math>. The mixing rate of lazy random walk on <math>G</math> is | |||
::<math>\tau(\epsilon)\le\frac{2d^2(\ln n+\ln\frac{1}{2\epsilon})}{\phi^2}</math>. | |||
:In particular, the mixing time is | |||
::<math>\tau_{\text{mix}}=\tau(1/2\mathrm{e})\le\frac{2d^2(\ln n+2)}{\phi^2}</math>. | |||
}} | |||
For expander graphs, both <math>d</math> and <math>\phi</math> are constants. The mixing time of lazy random walk is <math>\tau_{\text{mix}}=O(\ln n)</math> so the random walk is rapidly mixing. |
Latest revision as of 15:28, 8 June 2013
Expander Graphs
According to wikipedia:
- "Expander graphs have found extensive applications in computer science, in designing algorithms, error correcting codes, extractors, pseudorandom generators, sorting networks and robust computer networks. They have also been used in proofs of many important results in computational complexity theory, such as SL=L and the PCP theorem. In cryptography too, expander graphs are used to construct hash functions."
We will not explore everything about expander graphs, but will focus on the performances of random walks on expander graphs.
Consider an undirected (multi-)graph
Some notations:
- For
, let . - The Edge Boundary of a set
, denoted , is .
Definition (Graph expansion) - The expansion ratio of an undirected graph
on vertices, is defined as
- The expansion ratio of an undirected graph
Expander graphs are
This definition states the following properties of expander graphs:
- Expander graphs are sparse graphs. This is because the number of edges is
. - Despite the sparsity, expander graphs have good connectivity. This is supported by the expansion ratio.
- This one is implicit: expander graph is a family of graphs
, where is the number of vertices. The asymptotic order and in the definition is relative to the number of vertices , which grows to infinity.
The following fact is directly implied by the definition.
- An expander graph has diameter
.
- An expander graph has diameter
The proof is left for an exercise.
For a vertex set
- Vertex expansion
- We can alternatively define the vertex expansion. For a vertex set
, its vertex boundary, denoted is defined as that ,
- and the vertex expansion of a graph
is .
Existence of expander graph
We will show the existence of expander graphs by the probabilistic method. In order to do so, we need to generate random
Suppose that
- Let
be the vertex set. Uniformly and independently choose cycles of . - For each vertex
, for every cycle, assuming that the two neighbors of in that cycle is and , add two edges and to .
The resulting
.
By the probabilistic method, this shows that there exist expander graphs. In fact, the above probability bound shows something much stronger: it shows that almost every regular graph is an expander.
Recall that
Let
Therefore,
The last line is
Computing graph expansion
Computation of graph expansion seems hard, because the definition involves the minimum over exponentially many subsets of vertices. In fact, the problem of deciding whether a graph is an expander is co-NP-complete. For a non-expander
The expansion ratio of a graph is closely related to the sparsest cut of the graph, which is the dual problem of the multicommodity flow problem, both NP-complete. Studies of these two problems revolutionized the area of approximation algorithms.
We will see right now that although it is hard to compute the expansion ratio exactly, the expansion ratio can be approximated by some efficiently computable algebraic identity of the graph.
Spectral Graph Theory
Graph spectrum
The adjacency matrix of an
The spectrum of a graph contains a lot of information about the graph. For example, supposed that
Lemma for all . and the corresponding eigenvector is . is connected if and only if .- If
is bipartite then .
Proof. Let be the adjacency matrix of , with entries . It is obvious that for any .- (1) Suppose that
, and let be an entry of with the largest absolute value. Since , we have
- and so
- Thus
.
- (2) is easy to check.
- (3) Let
be the nonzero vector for which , and let be an entry of with the largest absolute value. Since , we have
- Since
and by the maximality of , it follows that for all that . Thus, if and are adjacent, which implies that if and are connected. For connected , all vertices are connected, thus all are equal. This shows that if is connected, the eigenvalue has multiplicity 1, thus .
- If otherwise,
is disconnected, then for two different components, we have and , where the entries of and are nonzero only for the vertices in their components components. Then . Thus, the multiplicity of is greater than 1, so .
- (4) If
if bipartite, then the vertex set can be partitioned into two disjoint nonempty sets and such that all edges have one endpoint in each of and . Algebraically, this means that the adjacency matrix can be organized into the form
- where
is a permutation matrix, which has no change on the eigenvalues. - If
is an eigenvector corresponding to the eigenvalue , then which is obtained from by changing the sign of the entries corresponding to vertices in , is an eigenvector corresponding to the eigenvalue . It follows that the spectrum of a bipartite graph is symmetric with respect to 0.
- (1) Suppose that
Cheeger's Inequality
One of the most exciting results in spectral graph theory is the following theorem which relate the graph expansion to the spectral gap.
Theorem (Cheeger's inequality) - Let
be a -regular graph with spectrum . Then
- Let
The theorem was first stated for Riemannian manifolds, and was proved by Cheeger and Buser (for different directions of the inequalities). The discrete case is proved independently by Dodziuk and Alon-Milman.
For a
If we write
or equivalently .
Optimization Characterization of Eigenvalues
Theorem (Rayleigh-Ritz theorem) - Let
be a symmetric matrix. Let be the eigen values of and be the corresponding eigenvectors. Then and
- Let
Proof. Without loss of generality, we may assume that
are orthonormal eigen-basis. Then it holds that ,
thus we have
.Let
be an arbitrary vector and let be its normalization. Since are orthonormal basis, can be expressed as . ThenTherefore,
. Altogether we haveIt is similar to prove
. In the first part take to show that ; and in the second part take an arbitrary and . Notice that , thus with .
The Rayleigh-Ritz Theorem is a special case of a fundamental theorem in linear algebra, called the Courant-Fischer theorem, which characterizes the eigenvalues of a symmetric matrix by a series of optimizations:
Theorem (Courant-Fischer theorem) - Let
be a symmetric matrix with eigenvalues . Then
- Let
Graph Laplacian
Let
Laplacian Property - For any vector
, it holds that .
- For any vector
Proof. On the other hand,
Applying the Rayleigh-Ritz theorem to the Laplacian matrix of the graph, we have the following "variational characterization" of the spectral gap
Theorem (Variational Characterization) - Let
be a -regular graph of vertices. Suppose that its adjacency matrix is , whose eigenvalues are . Let be the Laplacian matrix. Then
- Let
Proof. For
-regular graph, we know that and , thus is the eigenvector of . Due to Rayleigh-Ritz Theorem, it holds that . ThenWe know it holds for the graph Laplacian that
. So the variational characterization of the second eigenvalue of graph is proved.
Proof of Cheeger's Inequality
We will first give an informal explanation why Cheeger's inequality holds.
Recall that the expansion is defined as
Let
It is easy to see that
Thus, the expansion can be expressed algebraically as
On the other hand, due to the variational characterization of the spectral gap, we have
We can easily observe the similarity between the two formulas. Both the expansion ration
- Notations
Throughout the proof, we assume that
Large spectral gap implies high expansion
Cheeger's inequality (lower bound)
Proof. Let
, , be the vertex set achieving the optimal expansion ratio , and be a vector defined asClearly,
, thus .Due to the variational characterization of the second eigenvalue,
High expansion implies large spectral gap
We next prove the upper bound direction of the Cheeger's inequality:
Cheeger's inequality (upper bound)
This direction is harder than the lower bound direction. But it is mathematically more interesting and also more useful to us for analyzing the mixing time of random walks.
We prove the following equivalent inequality:
Let
, i.e., it is a eigenvector for ; , i.e., has at most positive entries. (We can always choose to be if this is not satisfied.)
And let nonnegative vector
We then prove the following inequalities:
; .
The theorem is then a simple consequence by combining these two inequalities.
We prove the first inequality:
Lemma .
Proof. If
, thenThen
which proves the lemma.
We then prove the second inequality:
Lemma .
Proof. To prove this, we introduce a new quantity
and shows that .
This will give us the desired inequality
.Lemma .
Proof. By the Cauchy-Schwarz Inequality,
By the Laplacian property, the first term
. By the Inequality of Arithmetic and Geometric Means, the second termCombining them together, we have
.
Lemma .
Proof. Suppose that
has nonzero entries. We know that due to the definition of . We enumerate the vertices such that .
Then
We have the following universal equation for sums:
Notice that
, which is at most since . Therefore, combining these together, we have
Spectral approach for symmetric chain
We consider the symmetric Markov chains defined on the state space
We have the following powerful spectral theorem for symmetric matrices.
Theorem (Spectral theorem) - Let
be a symmetric matrix, whose eigenvalues are . There exist eigenvectors such that for all and are orthonormal.
- Let
A set of orthonormal vectors
- for any
, is orthogonal to , which means that , denoted ; - all
have unit length, i.e., .
Since the eigenvectors
So multiplying by
Back to the symmetric Markov chain. Let
- Due to the spectral theorem, there exist orthonormal eigenvectors
such that for and any distribution over can be expressed as where . - A symmetric
must be double stochastic, thus the stationary distribution is the uniform distribution.
Recall that due to Perron-Frobenius theorem,
When
and the distribution at time
It is easy to see that this distribution converges to
Theorem - Let
be the transition matrix for a symmetric Markov chain on state space where . Let be the spectrum of and . The mixing rate of the Markov chain is .
- Let
Proof. As analysed above, if
is symmetric, it has orthonormal eigenvectors such that any distribution over can be expressed aswith
, andThus,
The last inequality is due to a universal relation
and the fact that is a distribution.Then for any
, denoted by the indicator vector for such that and for , we haveTherefore, we have
for any
, thus the bound holds for .
Rapid mixing of expander walk
Let
Obviously
Let
.
We know that
.
Thus we prove the following theorem for lazy random walk on
Theorem - Let
be a -regular graph on vertices, with spectrum . The mixing rate of lazy random walk on is .
- Let
Due to Cheeger's inequality, the spectral gap is bounded by expansion ratio as
where
Corollary - Let
be a -regular graph on vertices, whose expansion ratio is . The mixing rate of lazy random walk on is .
- In particular, the mixing time is
.
- Let
For expander graphs, both