随机算法 (Fall 2011)/The Spectral Gap: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
Created page with 'It turns out that the second largest eigenvalue of a graph contains important information about the graph's expansion parameter. The following theorem is the so-called Cheeger's …'
 
imported>Etone
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
It turns out that the second largest eigenvalue of a graph contains important information about the graph's expansion parameter. The following theorem is the so-called Cheeger's inequality.
= Cheeger's Inequality =
One of the most exciting results in spectral graph theory is the following theorem which relate the graph expansion to the spectral gap.
{{Theorem
{{Theorem
|Theorem (Cheeger's inequality)|
|Theorem (Cheeger's inequality)|
:Let <math>G</math> be a <math>d</math>-regular graph with spectrum <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>. Then
:Let <math>G</math> be a <math>d</math>-regular graph with spectrum <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>. Then
::<math>
::<math>
\frac{d-\lambda_2}{2}\le \phi(G) \le \sqrt{2d(d-\lambda_2)}
\frac{d-\lambda_2}{2}\le \phi(G) \le \sqrt{2d(d-\lambda_2)}.
</math>
</math>
}}
}}
The theorem was first stated for Riemannian manifolds, and was proved by Cheeger and Buser (for different directions of the inequalities). The discrete case is proved independently by Dodziuk and Alon-Milman.
The theorem was first stated for Riemannian manifolds, and was proved by Cheeger and Buser (for different directions of the inequalities). The discrete case is proved independently by Dodziuk and Alon-Milman.


For a <math>d</math>-regular graph, the value <math>(d-\lambda_2)</math> is known as the '''spectral gap'''. The name is due to the fact that it is the gap between the first and the second eigenvalue in the spectrum of a graph. The spectral gap provides an estimate on the expansion ratio of a graph. More precisely, a <math>d</math>-regular graph has large expansion ratio (thus being an expander) if the spectral gap is large.
For a <math>d</math>-regular graph, the quantity <math>(d-\lambda_2)</math> is called the '''spectral gap'''. The name is due to the fact that it is the gap between the first and the second largest eigenvalues of a graph.  


If we write <math>\alpha=1-\frac{\lambda_2}{d}</math> (sometimes it is called the normalized spectral gap), the Cheeger's inequality is turned into a nicer form:
If we write <math>\alpha=1-\frac{\lambda_2}{d}</math> (sometimes it is called the normalized spectral gap), the Cheeger's inequality is turned into a nicer form:
Line 17: Line 18:
</math>.
</math>.


----
== Optimization Characterization of Eigenvalues ==
We will not prove the theorem, but we will explain briefly why it works.
{{Theorem
|Theorem (Rayleigh-Ritz theorem)|
:Let <math>A</math> be a symmetric <math>n\times n</math> matrix. Let <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math> be the eigen values of <math>A</math> and <math>v_1,v_2,\ldots,v_n</math> be the corresponding eigenvectors. Then
::<math>
\begin{align}
\lambda_1
&=\max_{x\in\mathbb{R}^n}\frac{x^TAx}{x^Tx}
\end{align}
</math>  and  <math>
\begin{align}
\lambda_2
&=\max_{x\bot v_1}\frac{x^TAx}{x^Tx}.
\end{align}
</math>
}}
{{Proof|
Without loss of generality, we may assume that <math>v_1,v_2,\ldots,v_n</math> are orthonormal eigen-basis.
Then it holds that
:<math>\frac{v_1^TAv_1}{v_1^Tv_1}=\lambda_1v_1^Tv_1=\lambda_1</math>,
thus we have <math>\max_{x\in\mathbb{R}^n}\frac{x^TAx}{x^Tx}\ge\lambda_1</math>.
 
Let <math>x\in\mathbb{R}^n</math> be an arbitrary vector and let <math>y=\frac{x}{\sqrt{x^Tx}}=\frac{x}{\|x\|}</math> be its normalization.
Since <math>v_1,v_2,\ldots,v_n</math> are orthonormal basis, <math>y</math> can be expressed as <math>y=\sum_{i=1}^nc_iv_i</math>. Then
:<math>
\begin{align}
\frac{x^TAx}{x^Tx}
&=y^TAy
=\left(\sum_{i=1}^nc_iv_i\right)^TA\left(\sum_{i=1}^nc_iv_i\right)
=\left(\sum_{i=1}^nc_iv_i\right)^T\left(\sum_{i=1}^n\lambda_ic_iv_i\right)\\
&=\sum_{i=1}^n\lambda_ic_i^2
\le\lambda_1\sum_{i=1}^nc_i^2
=\lambda_1\|y\|
=\lambda_1.
\end{align}
</math>
Therefore, <math>\max_{x\in\mathbb{R}^n}\frac{x^TAx}{x^Tx}\le\lambda_1</math>. Altogether we have <math>\max_{x\in\mathbb{R}^n}\frac{x^TAx}{x^Tx}=\lambda_1</math>
 
It is similar to prove <math>\max_{x\bot v_1}\frac{x^TAx}{x^Tx}=\lambda_2</math>. In the first part take <math>x=v_2</math> to show that <math>\max_{x\bot v_1}\frac{x^TAx}{x^Tx}\ge\lambda_2</math>; and in the second part take an arbitrary <math>x\bot v_1</math> and <math>y=\frac{x}{\|x\|}</math>. Notice that <math>y\bot v_1</math>, thus <math>y=\sum_{i=1}^nc_iv_i</math> with <math>c_1=0</math>.
}}


For the spectra of graphs, the Cheeger's inequality is proved by the [http://en.wikipedia.org/wiki/Min-max_theorem Courant-Fischer theorem] in linear algebra. The Courant-Fischer theorem is a fundamental theorem in linear algebra which characterizes the eigenvalues by a series of optimizations:
The Rayleigh-Ritz Theorem is a special case of a fundamental theorem in linear algebra, called the [http://en.wikipedia.org/wiki/Min-max_theorem Courant-Fischer theorem], which characterizes the eigenvalues of a symmetric matrix by a series of optimizations:
{{Theorem
{{Theorem
|Theorem (Courant-Fischer theorem)|
|Theorem (Courant-Fischer theorem)|
Line 33: Line 72:
</math>
</math>
}}
}}
For a <math>d</math>-regular graph with adjacency matrix <math>A</math> and spectrum <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>, its largest eigenvalue <math>\lambda_1=d</math> with eigenvector <math>A\cdot\mathbf{1}=d\mathbf{1}</math>. According to the Courant-Fischer theorem, the second largest eigenvalue can be computed as
 
== Graph Laplacian ==
Let <math>G(V,E)</math> be a <math>d</math>-regular graph of <math>n</math> vertices and let <math>A</math> be its adjacency matrix. We define <math>L=dI-A</math> to be the '''Laplacian''' of the graph <math>G</math>. Take <math>x\in \mathbb{R}^V</math> as a distribution over vertices, its Laplacian quadratic form <math>x^TLx</math> measures the "smoothness" of <math>x</math> over the graph topology, just as what the [http://en.wikipedia.org/wiki/Laplace_operator Laplacian operator] does to the differentiable functions.
 
{{Theorem|Laplacian Property|
:For any vector <math>x\in\mathbb{R}^n</math>, it holds that
::<math>x^TLx=\sum_{uv\in E}(x_u-x_v)^2</math>.
}}
{{Proof|
:<math>\begin{align}
x^TLx
&=
\sum_{u,v\in V}x_u(dI-A)_{uv}x_v\\
&=
\sum_{u}\left(dx_u^2-\sum_{uv\in E}x_ux_v\right)\\
&=
\sum_{u\in V}\sum_{uv\in E}(x_u^2-x_ux_v).
\end{align}
</math>
On the other hand,
:<math>
\begin{align}
\sum_{uv\in E}(x_u-x_v)^2
&=
\sum_{uv\in E}\left(x_u^2-2x_ux_v+x_v^2\right)\\
&=
\sum_{uv\in E}\left((x_u^2-x_ux_v)+(x_v^2-x_vx_u)\right)\\
&=
\sum_{u\in V}\sum_{uv\in E}(x_u^2-x_ux_v).
\end{align}
</math>
}}
 
Applying the Rayleigh-Ritz theorem to the Laplacian matrix of the graph, we have the following "variational characterization" of the spectral gap <math>d-\lambda_2</math>.
 
{{Theorem
|Theorem (Variational Characterization)|
:Let <math>G(V,E)</math> be a <math>d</math>-regular graph of <math>n</math> vertices. Suppose that its adjacency matrix is <math>A</math>, whose eigenvalues are <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>. Let <math>L=dI-A</math> be the Laplacian matrix. Then
::<math>
\begin{align}
d-\lambda_2
&=\min_{x\bot \boldsymbol{1}}\frac{x^TLx}{x^Tx}
=\min_{x\bot \boldsymbol{1}}\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}.
\end{align}
</math>
}}
{{proof|
For <math>d</math>-regular graph, we know that <math>\lambda_1=d</math> and <math>\boldsymbol{1}A=d\boldsymbol{1}</math>, thus <math>\boldsymbol{1}</math> is the eigenvector of <math>\lambda_1</math>. Due to Rayleigh-Ritz Theorem, it holds that <math>\lambda_2
=\max_{x\bot \boldsymbol{1}}\frac{x^TAx}{x^Tx}</math>. Then
:<math>
\begin{align}
\min_{x\bot \boldsymbol{1}}\frac{x^TLx}{x^Tx}
&=\min_{x\bot \boldsymbol{1}}\frac{x^T(dI-A)x}{x^Tx}\\
&=\min_{x\bot \boldsymbol{1}}\frac{dx^Tx-x^TAx}{x^Tx}\\
&=\min_{x\bot \boldsymbol{1}}\left(d-\frac{x^TAx}{x^Tx}\right)\\
&=d-\max_{x\bot \boldsymbol{1}}\frac{x^TAx}{x^Tx}\\
&=d-\lambda_2.
\end{align}
</math>
We know it holds for the graph Laplacian that <math>x^TLx=\sum_{uv\in E}(x_u-x_v)^2</math>. So the variational characterization of the second eigenvalue of graph is proved.
}}
 
== Proof of Cheeger's Inequality ==
We will first give an informal explanation why Cheeger's inequality holds.
 
Recall that the expansion is defined as
:<math>
\phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}}\frac{|\partial S|}{|S|}.
</math>
Let <math>\chi_S</math> be the '''characteristic vector''' of the set <math>S</math> such that
:<math>\chi_S(v)=\begin{cases}
1 & v\in S,\\
0 & v\not\in S.
\end{cases}</math>
It is easy to see that
:<math>
\frac{\sum_{uv\in E}(\chi_S(u)-\chi_S(v))^2}{\sum_{v\in V}\chi_S(v)^2}=\frac{|\partial S|}{|S|}.
</math>
Thus, the expansion can be expressed algebraically as
:<math>
:<math>
\lambda_2=\max_{x\bot \mathbf{1}}\frac{x^TAx}{x^Tx},
\phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}}\frac{\sum_{uv\in E}(\chi_S(u)-\chi_S(v))^2}{\sum_{v\in V}\chi_S(v)^2}=\min_{\overset{x\in\{0,1\}^n}{\|x\|_1\le\frac{n}{2}}}\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}.
</math>
</math>
and
On the other hand, due to the variational characterization of the spectral gap, we have
:<math>
:<math>
d-\lambda_2=\min_{x\bot \mathbf{1}}\frac{x^T(dI-A)x}{x^Tx}.
d-\lambda_2=\min_{x\bot\boldsymbol{1}}\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}.
</math>
</math>
The later is an optimization, which shares some resemblance of the expansion ratio <math>\phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}}\frac{|\partial S|}{|S|}=\min_{\chi_S}\frac{\chi_S^T(dI-A)\chi_S}{\chi_S^T\chi_S}</math>, where <math>\chi_S</math> is the '''characteristic vector''' of the set <math>S</math>, defined as <math>\chi_S(i)=1</math> if <math>i\in S</math> and <math>\chi_S(i)=0</math> if <math>i\not\in S</math>. It is not hard to verify that <math>\chi_S^T\chi_S=\sum_{i}\chi_S(i)=|S|</math> and <math>\chi_S^T(dI-A)\chi_S=\sum_{i\sim j}(\chi_S(i)-\chi_S(j))^2=|\partial S|</math>.


Therefore, the spectral gap <math>d-\lambda_2</math> and the expansion ratio <math>\phi(G)</math> both involve some optimizations with the similar forms. It explains why they can be used to approximate each other.
We can easily observe the similarity between the two formulas. Both the expansion ration <math>\phi(G)</math> and the spectral gap <math>d-\lambda_2</math> can be characterized by optimizations of the same objective function <math>\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}</math> over different domains (for the spectral gap, the optimization is over all <math>x\bot\boldsymbol{1}</math>; and for the expansion ratio, it is over all such vectors <math>x\in\{0,1\}^n</math> with at most <math>n/2</math> many 1-entries).
 
-------
;Notations
Throughout the proof, we assume that <math>G(V,E)</math> is the <math>d</math>-regular graph of <math>n</math> vertices, <math>A</math> is the adjacency matrix, whose eigenvalues are <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>, and <math>L=(dI-A)</math> is the graph Laplacian.
 
=== Large spectral gap implies high expansion ===
{{Theorem
|Cheeger's inequality (lower bound)|
:<math>
\phi(G)\ge\frac{d-\lambda_2}{2}.
</math>
}}
{{Proof|
Let <math>S^*\subset V</math>, <math>|S^*|\le\frac{n}{2}</math>, be the vertex set achieving the optimal expansion ratio <math>\phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}} \frac{|\partial S|}{|S|}=\frac{|\partial S^*|}{|S^*|}</math>, and <math>x\in\mathbb{R}^n</math> be a vector defined as
:<math>
x_v=\begin{cases}
1/|S^*| & \text{if } v\in S^*,\\
-1/\left|\overline{S^*}\right| &\text{if }v\in\overline{S^*}.
\end{cases}
</math>
Clearly, <math>x\cdot \boldsymbol{1}=\sum_{v\in S^*}\frac{1}{|S^*|} -\sum_{v\in\overline{S^*}}\frac{1}{\left|\overline{S^*}\right|}=0</math>, thus <math>x\bot\boldsymbol{1}</math>.
 
Due to the variational characterization of the second eigenvalue,
:<math>
\begin{align}
d-\lambda_2
&\le\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}\\
&=\frac{\sum_{u\in S^*,v\in\overline{S^*},uv\in E}\left(1/|S^*|+1/|\overline{S^*}|\right)^2}{1/|S^*|+1/|\overline{S^*}|}\\
&=\left(\frac{1}{|S^*|}+\frac{1}{\left|\overline{S^*}\right|}\right)\cdot|\partial S^*|\\
&\le \frac{2|\partial S^*|}{|S^*|} &(\text{since }|S^*|\le\frac{n}{2})\\
&=2\phi(G).
\end{align}
</math>
}}
 
=== High expansion implies large spectral gap ===
We next prove the upper bound direction of the Cheeger's inequality:
{{Theorem|Cheeger's inequality (upper bound)|
:<math>
\phi(G) \le \sqrt{2d(d-\lambda_2)}.
</math>
}}
 
This direction is harder than the lower bound direction. But it is mathematically more interesting and also more useful to us for analyzing the mixing time of random walks.
 
We prove the following equivalent inequality:
:<math>
\frac{\phi^2}{2d} \le d-\lambda_2.
</math>
 
Let <math>x</math> satisfy that
* <math>Ax=\lambda_2x</math>, i.e., it is a eigenvector for <math>\lambda_2</math>;
* <math>|\{v\in V\mid x_v>0\}|\le\frac{n}{2}</math>, i.e., <math>x</math> has at most <math>n/2</math> positive entries. (We can always choose <math>x</math> to be <math>-x</math> if this is not satisfied.)
And let nonnegative vector <math>y</math> be defined as
:<math>y_v=\begin{cases}
x_v & x_v>0,\\
0 & \text{otherwise.}
\end{cases}</math>
We then prove the following inequalities:
# <math>\frac{y^TLy}{y^Ty}\le d-\lambda_2</math>;
# <math>\frac{\phi^2}{2d}\le\frac{y^TLy}{y^Ty}</math>.
The theorem is then a simple consequence by combining these two inequalities.
 
We prove the first inequality:
{{Theorem|Lemma|
:<math>\frac{y^TLy}{y^Ty}\le d-\lambda_2</math>.
}}
{{Proof|
If <math>x_u\ge 0</math>, then
:<math>
\begin{align}
(Ly)_u
&=((dI-A)y)_u
=dy_u-\sum_{v}A(u,v)y_v
=dx_u-\sum_{v:x_v\ge 0}A(u,v)x_v\\
&\le dx_u-\sum_{v}A(u,v)x_v
=((dI-A)x)_u
=(d-\lambda_2)x_u.
\end{align}
</math>
Then
:<math>
\begin{align}
y^TLy
&=\sum_{u}y_u(Ly)_u
=\sum_{u:x_u\ge 0}y_u(Ly)_u
=\sum_{u:x_u\ge 0}x_u(Ly)_u\\
&\le (d-\lambda_2)\sum_{u:x_u\ge 0}x_u^2
=(d-\lambda_2)\sum_{u}y_u^2
=(d-\lambda_2)y^Ty,
\end{align}
</math>
which proves the lemma.
}}
 
We then prove the second inequality:
{{Theorem|Lemma|
:<math>\frac{\phi^2}{2d}\le\frac{y^TLy}{y^Ty}</math>.
}}
{{Proof|
To prove this, we introduce a new quantity <math>\frac{\sum_{uv\in E}|y_u^2-y_v^2|}{y^Ty}</math> and shows that
:<math>\phi\le\frac{\sum_{uv\in E}|y_u^2-y_v^2|}{y^Ty}\le\sqrt{2d}\cdot\sqrt{\frac{y^TLy}{y^Ty}}</math>.
This will give us the desired inequality <math>\frac{\phi^2}{2d}\le\frac{y^TLy}{y^Ty}</math>.
 
{{Theorem|Lemma|
:<math>\frac{\sum_{uv\in E}|y_u^2-y_v^2|}{y^Ty}\le\sqrt{2d}\cdot\sqrt{\frac{y^TLy}{y^Ty}}</math>.
}}
{{Proof|
By the Cauchy-Schwarz Inequality,
:<math>
\begin{align}
\sum_{uv\in E}|y_u^2-y_v^2|
&=\sum_{uv\in E}|y_u-y_v||y_u+y_v|\\
&\le\sqrt{\sum_{uv\in E}(y_u-y_v)^2}\cdot\sqrt{\sum_{uv\in E}(y_u+y_v)^2}.
\end{align}
</math>
By the Laplacian property, the first term <math>\sqrt{\sum_{uv\in E}(y_u-y_v)^2}=\sqrt{y^TLy}</math>. By the Inequality of Arithmetic and Geometric Means, the second term
:<math>
\sqrt{\sum_{uv\in E}(y_u+y_v)^2}
\le\sqrt{2\sum_{uv\in E}(y_u^2+y_v^2)}
=\sqrt{2d\sum_{u\in V}y_u^2}
=\sqrt{2dy^Ty}.
</math>
Combining them together, we have
:<math>\sum_{uv\in E}|y_u^2-y_v^2|\le\sqrt{2d}\cdot\sqrt{y^TLy}\cdot\sqrt{y^Ty}</math>.
}}
 
{{Theorem|Lemma|
:<math>\phi\le\frac{\sum_{uv\in E}|y_u^2-y_v^2|}{y^Ty}</math>.
}}
{{Proof|
Suppose that <math>y</math> has <math>t</math> nonzero entries. We know that <math>t\le n/2</math> due to the definition of <math>y</math>. We enumerate the vertices <math>u_1,u_2,\ldots,u_n\in V</math> such that
:<math>y_{u_1}\ge y_{u_2}\ge\cdots\ge y_{u_t}>y_{u_{t+1}}=\cdots=y_{u_n}=0</math>.
Then
:<math>
\begin{align}
\sum_{uv\in E}|y_u^2-y_v^2|
&=\sum_{u_iu_j\in E\atop i<j}(y_{u_i}^2-y_{u_j}^2)
=\sum_{u_iu_j\in E\atop i<j}\sum_{k=i}^{j-1}(y_{u_k}^2-y_{u_{k+1}}^2)\\
&=\sum_{i=1}^n\sum_{j>i}A(u_i,u_j)\sum_{k=i}^{j-1}(y_{u_k}^2-y_{u_{k+1}}^2)
=\sum_{i=1}^n\sum_{j>i}\sum_{k=i}^{j-1}A(u_i,u_j)(y_{u_k}^2-y_{u_{k+1}}^2).
\end{align}
</math>
We have the following universal equation for sums:
:<math>
\begin{align}
\sum_{i=1}^n\sum_{j>i}\sum_{k=i}^{j-1}A(u_i,u_j)(y_{u_k}^2-y_{u_{k+1}}^2)
&=\sum_{k=1}^n\sum_{i\le k}\sum_{j>k}A(u_i,u_j)(y_{u_k}^2-y_{u_{k+1}}^2)\\
&=\sum_{k=1}^t(y_{u_k}^2-y_{u_{k+1}}^2)\sum_{i\le k}\sum_{j>k}A(u_i,u_j)
\end{align}
</math>
Notice that <math>\sum_{i\le k}\sum_{j>k}A(u_i,u_j)=|\partial\{u_1,\ldots, u_k\}|</math>, which is at most <math>\phi k</math> since <math>k\le t\le n/2</math>. Therefore, combining these together, we have
:<math>
\begin{align}
\sum_{uv\in E}|y_u^2-y_v^2|
&=\sum_{k=1}^t(y_{u_k}^2-y_{u_{k+1}}^2)\sum_{i\le k}\sum_{j>k}A(u_i,u_j)\\
&=\sum_{k=1}^t(y_{u_k}^2-y_{u_{k+1}}^2)|\partial\{u_1,\ldots, u_k\}|\\
&\le \phi\sum_{k=1}^t(y_{u_k}^2-y_{u_{k+1}}^2)k\\
&=\phi\sum_{k=1}^ty_{u_k}^2\\
&=\phi y^Ty.
\end{align}
</math>
}}
}}

Latest revision as of 15:16, 13 December 2011

Cheeger's Inequality

One of the most exciting results in spectral graph theory is the following theorem which relate the graph expansion to the spectral gap.

Theorem (Cheeger's inequality)
Let [math]\displaystyle{ G }[/math] be a [math]\displaystyle{ d }[/math]-regular graph with spectrum [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. Then
[math]\displaystyle{ \frac{d-\lambda_2}{2}\le \phi(G) \le \sqrt{2d(d-\lambda_2)}. }[/math]

The theorem was first stated for Riemannian manifolds, and was proved by Cheeger and Buser (for different directions of the inequalities). The discrete case is proved independently by Dodziuk and Alon-Milman.

For a [math]\displaystyle{ d }[/math]-regular graph, the quantity [math]\displaystyle{ (d-\lambda_2) }[/math] is called the spectral gap. The name is due to the fact that it is the gap between the first and the second largest eigenvalues of a graph.

If we write [math]\displaystyle{ \alpha=1-\frac{\lambda_2}{d} }[/math] (sometimes it is called the normalized spectral gap), the Cheeger's inequality is turned into a nicer form:

[math]\displaystyle{ \frac{\alpha}{2}\le \frac{\phi}{d}\le\sqrt{2\alpha} }[/math] or equivalently [math]\displaystyle{ \frac{1}{2}\left(\frac{\phi}{d}\right)^2\le \alpha\le 2\left(\frac{\phi}{d}\right) }[/math].

Optimization Characterization of Eigenvalues

Theorem (Rayleigh-Ritz theorem)
Let [math]\displaystyle{ A }[/math] be a symmetric [math]\displaystyle{ n\times n }[/math] matrix. Let [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math] be the eigen values of [math]\displaystyle{ A }[/math] and [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] be the corresponding eigenvectors. Then
[math]\displaystyle{ \begin{align} \lambda_1 &=\max_{x\in\mathbb{R}^n}\frac{x^TAx}{x^Tx} \end{align} }[/math] and [math]\displaystyle{ \begin{align} \lambda_2 &=\max_{x\bot v_1}\frac{x^TAx}{x^Tx}. \end{align} }[/math]
Proof.

Without loss of generality, we may assume that [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] are orthonormal eigen-basis. Then it holds that

[math]\displaystyle{ \frac{v_1^TAv_1}{v_1^Tv_1}=\lambda_1v_1^Tv_1=\lambda_1 }[/math],

thus we have [math]\displaystyle{ \max_{x\in\mathbb{R}^n}\frac{x^TAx}{x^Tx}\ge\lambda_1 }[/math].

Let [math]\displaystyle{ x\in\mathbb{R}^n }[/math] be an arbitrary vector and let [math]\displaystyle{ y=\frac{x}{\sqrt{x^Tx}}=\frac{x}{\|x\|} }[/math] be its normalization. Since [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] are orthonormal basis, [math]\displaystyle{ y }[/math] can be expressed as [math]\displaystyle{ y=\sum_{i=1}^nc_iv_i }[/math]. Then

[math]\displaystyle{ \begin{align} \frac{x^TAx}{x^Tx} &=y^TAy =\left(\sum_{i=1}^nc_iv_i\right)^TA\left(\sum_{i=1}^nc_iv_i\right) =\left(\sum_{i=1}^nc_iv_i\right)^T\left(\sum_{i=1}^n\lambda_ic_iv_i\right)\\ &=\sum_{i=1}^n\lambda_ic_i^2 \le\lambda_1\sum_{i=1}^nc_i^2 =\lambda_1\|y\| =\lambda_1. \end{align} }[/math]

Therefore, [math]\displaystyle{ \max_{x\in\mathbb{R}^n}\frac{x^TAx}{x^Tx}\le\lambda_1 }[/math]. Altogether we have [math]\displaystyle{ \max_{x\in\mathbb{R}^n}\frac{x^TAx}{x^Tx}=\lambda_1 }[/math]

It is similar to prove [math]\displaystyle{ \max_{x\bot v_1}\frac{x^TAx}{x^Tx}=\lambda_2 }[/math]. In the first part take [math]\displaystyle{ x=v_2 }[/math] to show that [math]\displaystyle{ \max_{x\bot v_1}\frac{x^TAx}{x^Tx}\ge\lambda_2 }[/math]; and in the second part take an arbitrary [math]\displaystyle{ x\bot v_1 }[/math] and [math]\displaystyle{ y=\frac{x}{\|x\|} }[/math]. Notice that [math]\displaystyle{ y\bot v_1 }[/math], thus [math]\displaystyle{ y=\sum_{i=1}^nc_iv_i }[/math] with [math]\displaystyle{ c_1=0 }[/math].

[math]\displaystyle{ \square }[/math]

The Rayleigh-Ritz Theorem is a special case of a fundamental theorem in linear algebra, called the Courant-Fischer theorem, which characterizes the eigenvalues of a symmetric matrix by a series of optimizations:

Theorem (Courant-Fischer theorem)
Let [math]\displaystyle{ A }[/math] be a symmetric matrix with eigenvalues [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. Then
[math]\displaystyle{ \begin{align} \lambda_k &=\max_{v_1,v_2,\ldots,v_{n-k}\in \mathbb{R}^n}\min_{\overset{x\in\mathbb{R}^n, x\neq \mathbf{0}}{x\bot v_1,v_2,\ldots,v_{n-k}}}\frac{x^TAx}{x^Tx}\\ &= \min_{v_1,v_2,\ldots,v_{k-1}\in \mathbb{R}^n}\max_{\overset{x\in\mathbb{R}^n, x\neq \mathbf{0}}{x\bot v_1,v_2,\ldots,v_{k-1}}}\frac{x^TAx}{x^Tx}. \end{align} }[/math]

Graph Laplacian

Let [math]\displaystyle{ G(V,E) }[/math] be a [math]\displaystyle{ d }[/math]-regular graph of [math]\displaystyle{ n }[/math] vertices and let [math]\displaystyle{ A }[/math] be its adjacency matrix. We define [math]\displaystyle{ L=dI-A }[/math] to be the Laplacian of the graph [math]\displaystyle{ G }[/math]. Take [math]\displaystyle{ x\in \mathbb{R}^V }[/math] as a distribution over vertices, its Laplacian quadratic form [math]\displaystyle{ x^TLx }[/math] measures the "smoothness" of [math]\displaystyle{ x }[/math] over the graph topology, just as what the Laplacian operator does to the differentiable functions.

Laplacian Property
For any vector [math]\displaystyle{ x\in\mathbb{R}^n }[/math], it holds that
[math]\displaystyle{ x^TLx=\sum_{uv\in E}(x_u-x_v)^2 }[/math].
Proof.
[math]\displaystyle{ \begin{align} x^TLx &= \sum_{u,v\in V}x_u(dI-A)_{uv}x_v\\ &= \sum_{u}\left(dx_u^2-\sum_{uv\in E}x_ux_v\right)\\ &= \sum_{u\in V}\sum_{uv\in E}(x_u^2-x_ux_v). \end{align} }[/math]

On the other hand,

[math]\displaystyle{ \begin{align} \sum_{uv\in E}(x_u-x_v)^2 &= \sum_{uv\in E}\left(x_u^2-2x_ux_v+x_v^2\right)\\ &= \sum_{uv\in E}\left((x_u^2-x_ux_v)+(x_v^2-x_vx_u)\right)\\ &= \sum_{u\in V}\sum_{uv\in E}(x_u^2-x_ux_v). \end{align} }[/math]
[math]\displaystyle{ \square }[/math]

Applying the Rayleigh-Ritz theorem to the Laplacian matrix of the graph, we have the following "variational characterization" of the spectral gap [math]\displaystyle{ d-\lambda_2 }[/math].

Theorem (Variational Characterization)
Let [math]\displaystyle{ G(V,E) }[/math] be a [math]\displaystyle{ d }[/math]-regular graph of [math]\displaystyle{ n }[/math] vertices. Suppose that its adjacency matrix is [math]\displaystyle{ A }[/math], whose eigenvalues are [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. Let [math]\displaystyle{ L=dI-A }[/math] be the Laplacian matrix. Then
[math]\displaystyle{ \begin{align} d-\lambda_2 &=\min_{x\bot \boldsymbol{1}}\frac{x^TLx}{x^Tx} =\min_{x\bot \boldsymbol{1}}\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}. \end{align} }[/math]
Proof.

For [math]\displaystyle{ d }[/math]-regular graph, we know that [math]\displaystyle{ \lambda_1=d }[/math] and [math]\displaystyle{ \boldsymbol{1}A=d\boldsymbol{1} }[/math], thus [math]\displaystyle{ \boldsymbol{1} }[/math] is the eigenvector of [math]\displaystyle{ \lambda_1 }[/math]. Due to Rayleigh-Ritz Theorem, it holds that [math]\displaystyle{ \lambda_2 =\max_{x\bot \boldsymbol{1}}\frac{x^TAx}{x^Tx} }[/math]. Then

[math]\displaystyle{ \begin{align} \min_{x\bot \boldsymbol{1}}\frac{x^TLx}{x^Tx} &=\min_{x\bot \boldsymbol{1}}\frac{x^T(dI-A)x}{x^Tx}\\ &=\min_{x\bot \boldsymbol{1}}\frac{dx^Tx-x^TAx}{x^Tx}\\ &=\min_{x\bot \boldsymbol{1}}\left(d-\frac{x^TAx}{x^Tx}\right)\\ &=d-\max_{x\bot \boldsymbol{1}}\frac{x^TAx}{x^Tx}\\ &=d-\lambda_2. \end{align} }[/math]

We know it holds for the graph Laplacian that [math]\displaystyle{ x^TLx=\sum_{uv\in E}(x_u-x_v)^2 }[/math]. So the variational characterization of the second eigenvalue of graph is proved.

[math]\displaystyle{ \square }[/math]

Proof of Cheeger's Inequality

We will first give an informal explanation why Cheeger's inequality holds.

Recall that the expansion is defined as

[math]\displaystyle{ \phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}}\frac{|\partial S|}{|S|}. }[/math]

Let [math]\displaystyle{ \chi_S }[/math] be the characteristic vector of the set [math]\displaystyle{ S }[/math] such that

[math]\displaystyle{ \chi_S(v)=\begin{cases} 1 & v\in S,\\ 0 & v\not\in S. \end{cases} }[/math]

It is easy to see that

[math]\displaystyle{ \frac{\sum_{uv\in E}(\chi_S(u)-\chi_S(v))^2}{\sum_{v\in V}\chi_S(v)^2}=\frac{|\partial S|}{|S|}. }[/math]

Thus, the expansion can be expressed algebraically as

[math]\displaystyle{ \phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}}\frac{\sum_{uv\in E}(\chi_S(u)-\chi_S(v))^2}{\sum_{v\in V}\chi_S(v)^2}=\min_{\overset{x\in\{0,1\}^n}{\|x\|_1\le\frac{n}{2}}}\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}. }[/math]

On the other hand, due to the variational characterization of the spectral gap, we have

[math]\displaystyle{ d-\lambda_2=\min_{x\bot\boldsymbol{1}}\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}. }[/math]

We can easily observe the similarity between the two formulas. Both the expansion ration [math]\displaystyle{ \phi(G) }[/math] and the spectral gap [math]\displaystyle{ d-\lambda_2 }[/math] can be characterized by optimizations of the same objective function [math]\displaystyle{ \frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2} }[/math] over different domains (for the spectral gap, the optimization is over all [math]\displaystyle{ x\bot\boldsymbol{1} }[/math]; and for the expansion ratio, it is over all such vectors [math]\displaystyle{ x\in\{0,1\}^n }[/math] with at most [math]\displaystyle{ n/2 }[/math] many 1-entries).


Notations

Throughout the proof, we assume that [math]\displaystyle{ G(V,E) }[/math] is the [math]\displaystyle{ d }[/math]-regular graph of [math]\displaystyle{ n }[/math] vertices, [math]\displaystyle{ A }[/math] is the adjacency matrix, whose eigenvalues are [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math], and [math]\displaystyle{ L=(dI-A) }[/math] is the graph Laplacian.

Large spectral gap implies high expansion

Cheeger's inequality (lower bound)
[math]\displaystyle{ \phi(G)\ge\frac{d-\lambda_2}{2}. }[/math]
Proof.

Let [math]\displaystyle{ S^*\subset V }[/math], [math]\displaystyle{ |S^*|\le\frac{n}{2} }[/math], be the vertex set achieving the optimal expansion ratio [math]\displaystyle{ \phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}} \frac{|\partial S|}{|S|}=\frac{|\partial S^*|}{|S^*|} }[/math], and [math]\displaystyle{ x\in\mathbb{R}^n }[/math] be a vector defined as

[math]\displaystyle{ x_v=\begin{cases} 1/|S^*| & \text{if } v\in S^*,\\ -1/\left|\overline{S^*}\right| &\text{if }v\in\overline{S^*}. \end{cases} }[/math]

Clearly, [math]\displaystyle{ x\cdot \boldsymbol{1}=\sum_{v\in S^*}\frac{1}{|S^*|} -\sum_{v\in\overline{S^*}}\frac{1}{\left|\overline{S^*}\right|}=0 }[/math], thus [math]\displaystyle{ x\bot\boldsymbol{1} }[/math].

Due to the variational characterization of the second eigenvalue,

[math]\displaystyle{ \begin{align} d-\lambda_2 &\le\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}\\ &=\frac{\sum_{u\in S^*,v\in\overline{S^*},uv\in E}\left(1/|S^*|+1/|\overline{S^*}|\right)^2}{1/|S^*|+1/|\overline{S^*}|}\\ &=\left(\frac{1}{|S^*|}+\frac{1}{\left|\overline{S^*}\right|}\right)\cdot|\partial S^*|\\ &\le \frac{2|\partial S^*|}{|S^*|} &(\text{since }|S^*|\le\frac{n}{2})\\ &=2\phi(G). \end{align} }[/math]
[math]\displaystyle{ \square }[/math]

High expansion implies large spectral gap

We next prove the upper bound direction of the Cheeger's inequality:

Cheeger's inequality (upper bound)
[math]\displaystyle{ \phi(G) \le \sqrt{2d(d-\lambda_2)}. }[/math]

This direction is harder than the lower bound direction. But it is mathematically more interesting and also more useful to us for analyzing the mixing time of random walks.

We prove the following equivalent inequality:

[math]\displaystyle{ \frac{\phi^2}{2d} \le d-\lambda_2. }[/math]

Let [math]\displaystyle{ x }[/math] satisfy that

  • [math]\displaystyle{ Ax=\lambda_2x }[/math], i.e., it is a eigenvector for [math]\displaystyle{ \lambda_2 }[/math];
  • [math]\displaystyle{ |\{v\in V\mid x_v\gt 0\}|\le\frac{n}{2} }[/math], i.e., [math]\displaystyle{ x }[/math] has at most [math]\displaystyle{ n/2 }[/math] positive entries. (We can always choose [math]\displaystyle{ x }[/math] to be [math]\displaystyle{ -x }[/math] if this is not satisfied.)

And let nonnegative vector [math]\displaystyle{ y }[/math] be defined as

[math]\displaystyle{ y_v=\begin{cases} x_v & x_v\gt 0,\\ 0 & \text{otherwise.} \end{cases} }[/math]

We then prove the following inequalities:

  1. [math]\displaystyle{ \frac{y^TLy}{y^Ty}\le d-\lambda_2 }[/math];
  2. [math]\displaystyle{ \frac{\phi^2}{2d}\le\frac{y^TLy}{y^Ty} }[/math].

The theorem is then a simple consequence by combining these two inequalities.

We prove the first inequality:

Lemma
[math]\displaystyle{ \frac{y^TLy}{y^Ty}\le d-\lambda_2 }[/math].
Proof.

If [math]\displaystyle{ x_u\ge 0 }[/math], then

[math]\displaystyle{ \begin{align} (Ly)_u &=((dI-A)y)_u =dy_u-\sum_{v}A(u,v)y_v =dx_u-\sum_{v:x_v\ge 0}A(u,v)x_v\\ &\le dx_u-\sum_{v}A(u,v)x_v =((dI-A)x)_u =(d-\lambda_2)x_u. \end{align} }[/math]

Then

[math]\displaystyle{ \begin{align} y^TLy &=\sum_{u}y_u(Ly)_u =\sum_{u:x_u\ge 0}y_u(Ly)_u =\sum_{u:x_u\ge 0}x_u(Ly)_u\\ &\le (d-\lambda_2)\sum_{u:x_u\ge 0}x_u^2 =(d-\lambda_2)\sum_{u}y_u^2 =(d-\lambda_2)y^Ty, \end{align} }[/math]

which proves the lemma.

[math]\displaystyle{ \square }[/math]

We then prove the second inequality:

Lemma
[math]\displaystyle{ \frac{\phi^2}{2d}\le\frac{y^TLy}{y^Ty} }[/math].
Proof.

To prove this, we introduce a new quantity [math]\displaystyle{ \frac{\sum_{uv\in E}|y_u^2-y_v^2|}{y^Ty} }[/math] and shows that

[math]\displaystyle{ \phi\le\frac{\sum_{uv\in E}|y_u^2-y_v^2|}{y^Ty}\le\sqrt{2d}\cdot\sqrt{\frac{y^TLy}{y^Ty}} }[/math].

This will give us the desired inequality [math]\displaystyle{ \frac{\phi^2}{2d}\le\frac{y^TLy}{y^Ty} }[/math].

Lemma
[math]\displaystyle{ \frac{\sum_{uv\in E}|y_u^2-y_v^2|}{y^Ty}\le\sqrt{2d}\cdot\sqrt{\frac{y^TLy}{y^Ty}} }[/math].
Proof.

By the Cauchy-Schwarz Inequality,

[math]\displaystyle{ \begin{align} \sum_{uv\in E}|y_u^2-y_v^2| &=\sum_{uv\in E}|y_u-y_v||y_u+y_v|\\ &\le\sqrt{\sum_{uv\in E}(y_u-y_v)^2}\cdot\sqrt{\sum_{uv\in E}(y_u+y_v)^2}. \end{align} }[/math]

By the Laplacian property, the first term [math]\displaystyle{ \sqrt{\sum_{uv\in E}(y_u-y_v)^2}=\sqrt{y^TLy} }[/math]. By the Inequality of Arithmetic and Geometric Means, the second term

[math]\displaystyle{ \sqrt{\sum_{uv\in E}(y_u+y_v)^2} \le\sqrt{2\sum_{uv\in E}(y_u^2+y_v^2)} =\sqrt{2d\sum_{u\in V}y_u^2} =\sqrt{2dy^Ty}. }[/math]

Combining them together, we have

[math]\displaystyle{ \sum_{uv\in E}|y_u^2-y_v^2|\le\sqrt{2d}\cdot\sqrt{y^TLy}\cdot\sqrt{y^Ty} }[/math].
[math]\displaystyle{ \square }[/math]
Lemma
[math]\displaystyle{ \phi\le\frac{\sum_{uv\in E}|y_u^2-y_v^2|}{y^Ty} }[/math].
Proof.

Suppose that [math]\displaystyle{ y }[/math] has [math]\displaystyle{ t }[/math] nonzero entries. We know that [math]\displaystyle{ t\le n/2 }[/math] due to the definition of [math]\displaystyle{ y }[/math]. We enumerate the vertices [math]\displaystyle{ u_1,u_2,\ldots,u_n\in V }[/math] such that

[math]\displaystyle{ y_{u_1}\ge y_{u_2}\ge\cdots\ge y_{u_t}\gt y_{u_{t+1}}=\cdots=y_{u_n}=0 }[/math].

Then

[math]\displaystyle{ \begin{align} \sum_{uv\in E}|y_u^2-y_v^2| &=\sum_{u_iu_j\in E\atop i\lt j}(y_{u_i}^2-y_{u_j}^2) =\sum_{u_iu_j\in E\atop i\lt j}\sum_{k=i}^{j-1}(y_{u_k}^2-y_{u_{k+1}}^2)\\ &=\sum_{i=1}^n\sum_{j\gt i}A(u_i,u_j)\sum_{k=i}^{j-1}(y_{u_k}^2-y_{u_{k+1}}^2) =\sum_{i=1}^n\sum_{j\gt i}\sum_{k=i}^{j-1}A(u_i,u_j)(y_{u_k}^2-y_{u_{k+1}}^2). \end{align} }[/math]

We have the following universal equation for sums:

[math]\displaystyle{ \begin{align} \sum_{i=1}^n\sum_{j\gt i}\sum_{k=i}^{j-1}A(u_i,u_j)(y_{u_k}^2-y_{u_{k+1}}^2) &=\sum_{k=1}^n\sum_{i\le k}\sum_{j\gt k}A(u_i,u_j)(y_{u_k}^2-y_{u_{k+1}}^2)\\ &=\sum_{k=1}^t(y_{u_k}^2-y_{u_{k+1}}^2)\sum_{i\le k}\sum_{j\gt k}A(u_i,u_j) \end{align} }[/math]

Notice that [math]\displaystyle{ \sum_{i\le k}\sum_{j\gt k}A(u_i,u_j)=|\partial\{u_1,\ldots, u_k\}| }[/math], which is at most [math]\displaystyle{ \phi k }[/math] since [math]\displaystyle{ k\le t\le n/2 }[/math]. Therefore, combining these together, we have

[math]\displaystyle{ \begin{align} \sum_{uv\in E}|y_u^2-y_v^2| &=\sum_{k=1}^t(y_{u_k}^2-y_{u_{k+1}}^2)\sum_{i\le k}\sum_{j\gt k}A(u_i,u_j)\\ &=\sum_{k=1}^t(y_{u_k}^2-y_{u_{k+1}}^2)|\partial\{u_1,\ldots, u_k\}|\\ &\le \phi\sum_{k=1}^t(y_{u_k}^2-y_{u_{k+1}}^2)k\\ &=\phi\sum_{k=1}^ty_{u_k}^2\\ &=\phi y^Ty. \end{align} }[/math]
[math]\displaystyle{ \square }[/math]
[math]\displaystyle{ \square }[/math]