随机算法 (Fall 2011)/The Spectral Gap
It turns out that the second largest eigenvalue of a graph contains important information about the graph's expansion parameter. The following theorem is the so-called Cheeger's inequality.
Theorem (Cheeger's inequality) - Let [math]\displaystyle{ G }[/math] be a [math]\displaystyle{ d }[/math]-regular graph with spectrum [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. Then
- [math]\displaystyle{ \frac{d-\lambda_2}{2}\le \phi(G) \le \sqrt{2d(d-\lambda_2)}. }[/math]
- Let [math]\displaystyle{ G }[/math] be a [math]\displaystyle{ d }[/math]-regular graph with spectrum [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. Then
The theorem was first stated for Riemannian manifolds, and was proved by Cheeger and Buser (for different directions of the inequalities). The discrete case is proved independently by Dodziuk and Alon-Milman.
For a [math]\displaystyle{ d }[/math]-regular graph, the value [math]\displaystyle{ (d-\lambda_2) }[/math] is known as the spectral gap. The name is due to the fact that it is the gap between the first and the second eigenvalue in the spectrum of a graph. The spectral gap provides an estimate on the expansion ratio of a graph. More precisely, a [math]\displaystyle{ d }[/math]-regular graph has large expansion ratio (thus being an expander) if the spectral gap is large.
If we write [math]\displaystyle{ \alpha=1-\frac{\lambda_2}{d} }[/math] (sometimes it is called the normalized spectral gap), the Cheeger's inequality is turned into a nicer form:
- [math]\displaystyle{ \frac{\alpha}{2}\le \frac{\phi}{d}\le\sqrt{2\alpha} }[/math] or equivalently [math]\displaystyle{ \frac{1}{2}\left(\frac{\phi}{d}\right)^2\le \alpha\le 2\left(\frac{\phi}{d}\right) }[/math].
We will not prove the theorem, but we will explain briefly why it works.
For the spectra of graphs, the Cheeger's inequality is proved by the Courant-Fischer theorem in linear algebra. The Courant-Fischer theorem is a fundamental theorem in linear algebra which characterizes the eigenvalues by a series of optimizations:
Theorem (Courant-Fischer theorem) - Let [math]\displaystyle{ A }[/math] be a symmetric matrix with eigenvalues [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. Then
- [math]\displaystyle{ \begin{align} \lambda_k &=\max_{v_1,v_2,\ldots,v_{n-k}\in \mathbb{R}^n}\min_{\overset{x\in\mathbb{R}^n, x\neq \mathbf{0}}{x\bot v_1,v_2,\ldots,v_{n-k}}}\frac{x^TAx}{x^Tx}\\ &= \min_{v_1,v_2,\ldots,v_{k-1}\in \mathbb{R}^n}\max_{\overset{x\in\mathbb{R}^n, x\neq \mathbf{0}}{x\bot v_1,v_2,\ldots,v_{k-1}}}\frac{x^TAx}{x^Tx}. \end{align} }[/math]
- Let [math]\displaystyle{ A }[/math] be a symmetric matrix with eigenvalues [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. Then
For a [math]\displaystyle{ d }[/math]-regular graph with adjacency matrix [math]\displaystyle{ A }[/math] and spectrum [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math], its largest eigenvalue [math]\displaystyle{ \lambda_1=d }[/math] with eigenvector [math]\displaystyle{ A\cdot\mathbf{1}=d\mathbf{1} }[/math]. According to the Courant-Fischer theorem, the second largest eigenvalue can be computed as
- [math]\displaystyle{ \lambda_2=\max_{x\bot \mathbf{1}}\frac{x^TAx}{x^Tx}, }[/math]
and
- [math]\displaystyle{ d-\lambda_2=\min_{x\bot \mathbf{1}}\frac{x^T(dI-A)x}{x^Tx}. }[/math]
The later is an optimization, which shares some resemblance of the expansion ratio [math]\displaystyle{ \phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}}\frac{|\partial S|}{|S|}=\min_{\chi_S}\frac{\chi_S^T(dI-A)\chi_S}{\chi_S^T\chi_S} }[/math], where [math]\displaystyle{ \chi_S }[/math] is the characteristic vector of the set [math]\displaystyle{ S }[/math], defined as [math]\displaystyle{ \chi_S(i)=1 }[/math] if [math]\displaystyle{ i\in S }[/math] and [math]\displaystyle{ \chi_S(i)=0 }[/math] if [math]\displaystyle{ i\not\in S }[/math]. It is not hard to verify that [math]\displaystyle{ \chi_S^T\chi_S=\sum_{i}\chi_S(i)=|S| }[/math] and [math]\displaystyle{ \chi_S^T(dI-A)\chi_S=\sum_{i\sim j}(\chi_S(i)-\chi_S(j))^2=|\partial S| }[/math].
Therefore, the spectral gap [math]\displaystyle{ d-\lambda_2 }[/math] and the expansion ratio [math]\displaystyle{ \phi(G) }[/math] both involve some optimizations with the similar forms. It explains why they can be used to approximate each other.
Let [math]\displaystyle{ G(V,E) }[/math] be a [math]\displaystyle{ d }[/math]-regular graph of [math]\displaystyle{ n }[/math] vertices and let [math]\displaystyle{ A }[/math] be its adjacency matrix. We define [math]\displaystyle{ L=dI-A }[/math] to be the Laplacian of the graph [math]\displaystyle{ G }[/math]. Take [math]\displaystyle{ x\in \mathbb{R}^V }[/math] as a distribution over vertices, its Laplacian quadratic form [math]\displaystyle{ x^TLx }[/math] measures the "smoothness" of [math]\displaystyle{ x }[/math] over the graph topology, just as what the Laplacian operator does to the differentiable functions.
Laplacian Property - For any vector [math]\displaystyle{ x\in\mathbb{R}^n }[/math], it holds that
- [math]\displaystyle{ x^TLx=\sum_{\{u,v\}\in E}(x(u)-x(v))^2 }[/math].
- For any vector [math]\displaystyle{ x\in\mathbb{R}^n }[/math], it holds that
Proof. - [math]\displaystyle{ \begin{align} x^TLx &= \sum_{u,v\in V}x_u(dI-A)_{uv}x_v\\ &= \sum_{u}\left(dx_u^2-\sum_{v\sim u}x_ux_v\right)\\ &= \sum_{u\in V}\sum_{v\sim u}(x_u^2-x_ux_v). \end{align} }[/math]
On the other hand,
- [math]\displaystyle{ \begin{align} \sum_{\{u,v\}\in E}(x(u)-x(v))^2 &= \sum_{\{u,v\}\in E}\left(x_u^2-2x_ux_v+x_v^2\right)\\ &= \sum_{\{u,v\}\in E}\left((x_u^2-x_ux_v)+(x_v^2-x_vx_u)\right)\\ &= \sum_{u\in V}\sum_{v\sim u}(x_u^2-x_ux_v). \end{align} }[/math]
- [math]\displaystyle{ \square }[/math]
Theorem (Rayleigh-Ritz theorem) - Let [math]\displaystyle{ A }[/math] be a symmetric matrix. Suppose the eigenvalues of [math]\displaystyle{ A }[/math] are [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math] and [math]\displaystyle{ v_1 }[/math] is the eigenvector of eigenvalue [math]\displaystyle{ \lambda_1 }[/math].
- Then
- [math]\displaystyle{ \begin{align} \lambda_2 &=\max_{x\bot v_1}\frac{x^TAx}{x^Tx}. \end{align} }[/math]
Theorem (Variational Characterization) - Let [math]\displaystyle{ G(V,E) }[/math] be a [math]\displaystyle{ d }[/math]-regular graph of [math]\displaystyle{ n }[/math] vertices. Suppose its adjacency matrix is [math]\displaystyle{ A }[/math], Laplacian matrix is [math]\displaystyle{ L=dI-A }[/math], and the eigenvalues of [math]\displaystyle{ A }[/math] are [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math].
- Then
- [math]\displaystyle{ \begin{align} d-\lambda_2 &=\min_{x\bot \boldsymbol{1}}\frac{x^TLx}{x^Tx} =\min_{x\bot \boldsymbol{1}}\frac{\sum_{\{u,v\}\in E}(x(u)-x(v))^2}{\sum_{v\in V}x(v)^2}. \end{align} }[/math]
Proof. For [math]\displaystyle{ d }[/math]-regular graph, we know that [math]\displaystyle{ \lambda_1=d }[/math] and [math]\displaystyle{ \boldsymbol{1}A=d\boldsymbol{1} }[/math], thus [math]\displaystyle{ \boldsymbol{1} }[/math] is the eigenvector of [math]\displaystyle{ \lambda_1 }[/math]. Due to Rayleigh-Ritz Theorem, it holds that [math]\displaystyle{ \lambda_2 =\max_{x\bot \boldsymbol{1}}\frac{x^TAx}{x^Tx} }[/math]. Then
- [math]\displaystyle{ \begin{align} \min_{x\bot \boldsymbol{1}}\frac{x^TLx}{x^Tx} &=\min_{x\bot \boldsymbol{1}}\frac{x^T(dI-A)x}{x^Tx}\\ &=\min_{x\bot \boldsymbol{1}}\frac{dx^Tx-x^TAx}{x^Tx}\\ &=\min_{x\bot \boldsymbol{1}}\left(d-\frac{x^TAx}{x^Tx}\right)\\ &=d-\max_{x\bot \boldsymbol{1}}\frac{x^TAx}{x^Tx}\\ &=d-\lambda_2. \end{align} }[/math]
We know it holds for the graph Laplacian that [math]\displaystyle{ x^TLx=\sum_{\{u,v\}\in E}(x(u)-x(v))^2 }[/math]. So the variational characterization of the second eigenvalue of graph is proved.
- [math]\displaystyle{ \square }[/math]
Cheeger's inequality (lower bound) - [math]\displaystyle{ \phi(G)\ge\frac{d-\lambda_2}{2}. }[/math]
Proof. Let [math]\displaystyle{ S^*\subset V }[/math], [math]\displaystyle{ |S^*|\le\frac{n}{2} }[/math], be the vertex set achieving the optimal expansion ratio [math]\displaystyle{ \phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}} \frac{|\partial S|}{|S|}=\frac{|\partial S^*|}{|S^*|} }[/math], and [math]\displaystyle{ x\in\mathbb{R}^n }[/math] be a vector defined as
- [math]\displaystyle{ x(v)=\begin{cases} 1/|S^*| & \text{if } v\in S^*,\\ -1/\left|\overline{S^*}\right| &\text{if }v\in\overline{S^*}. \end{cases} }[/math]
Clearly, [math]\displaystyle{ x\cdot \boldsymbol{1}=\sum_{v\in S^*}\frac{1}{|S^*|} -\sum_{v\in\overline{S^*}}\frac{1}{\left|\overline{S^*}\right|}=0 }[/math], thus [math]\displaystyle{ x\bot\boldsymbol{1} }[/math].
Due to the variational characterization of the second eigenvalue,
- [math]\displaystyle{ \begin{align} d-\lambda_2 &\le\frac{\sum_{\{u,v\}\in E}(x(u)-x(v))^2}{\sum_{v\in V}x(v)^2}\\ &=\frac{\sum_{u\in S^*,v\in\overline{S^*},\{u,v\}\in E}\left(1/|S^*|+1/|\overline{S^*}|\right)^2}{1/|S^*|+1/|\overline{S^*}|}\\ &=\left(\frac{1}{|S^*|}+\frac{1}{\left|\overline{S^*}\right|}\right)\cdot|\partial S^*|\\ &\le \frac{2|\partial S^*|}{|S^*|} &(\text{since }|S^*|\le\frac{n}{2})\\ &=2\phi(G). \end{align} }[/math]
- [math]\displaystyle{ \square }[/math]
Cheeger's inequality (upper bound) - [math]\displaystyle{ \phi(G) \le \sqrt{2d(d-\lambda_2)}. }[/math]