随机算法 (Fall 2011)/Graph Spectrum
Graph spectrum
The adjacency matrix of an [math]\displaystyle{ n }[/math]-vertex graph [math]\displaystyle{ G }[/math], denoted [math]\displaystyle{ A = A(G) }[/math], is an [math]\displaystyle{ n\times n }[/math] matrix where [math]\displaystyle{ A(u,v) }[/math] is the number of edges in [math]\displaystyle{ G }[/math] between vertex [math]\displaystyle{ u }[/math] and vertex [math]\displaystyle{ v }[/math]. Because [math]\displaystyle{ A }[/math] is a symmetric matrix with real entries, due to the Perron-Frobenius theorem, it has real eigenvalues [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math], which associate with an orthonormal system of eigenvectors [math]\displaystyle{ v_1,v_2,\ldots, v_n\, }[/math] with [math]\displaystyle{ Av_i=\lambda_i v_i\, }[/math]. We call the eigenvalues of [math]\displaystyle{ A }[/math] the spectrum of the graph [math]\displaystyle{ G }[/math].
The spectrum of a graph contains a lot of information about the graph. For example, supposed that [math]\displaystyle{ G }[/math] is [math]\displaystyle{ d }[/math]-regular, the following lemma holds.
Lemma - [math]\displaystyle{ |\lambda_i|\le d }[/math] for all [math]\displaystyle{ 1\le i\le n }[/math].
- [math]\displaystyle{ \lambda_1=d }[/math] and the corresponding eigenvector is [math]\displaystyle{ (\frac{1}{\sqrt{n}},\frac{1}{\sqrt{n}},\ldots,\frac{1}{\sqrt{n}}) }[/math].
- [math]\displaystyle{ G }[/math] is connected if and only if [math]\displaystyle{ \lambda_1\gt \lambda_2 }[/math].
- If [math]\displaystyle{ G }[/math] is bipartite then [math]\displaystyle{ \lambda_1=-\lambda_n }[/math].
Proof. Let [math]\displaystyle{ A }[/math] be the adjacency matrix of [math]\displaystyle{ G }[/math], with entries [math]\displaystyle{ a_{ij} }[/math]. It is obvious that [math]\displaystyle{ \sum_{j}a_{ij}=d\, }[/math] for any [math]\displaystyle{ j }[/math]. - (1) Suppose that [math]\displaystyle{ Ax=\lambda x, x\neq \mathbf{0} }[/math], and let [math]\displaystyle{ x_i }[/math] be an entry of [math]\displaystyle{ x }[/math] with the largest absolute value. Since [math]\displaystyle{ (Ax)_i=\lambda x_i }[/math], we have
- [math]\displaystyle{ \sum_{j}a_{ij}x_j=\lambda x_i,\, }[/math]
- and so
- [math]\displaystyle{ |\lambda||x_i|=\left|\sum_{j}a_{ij}x_j\right|\le \sum_{j}a_{ij}|x_j|\le \sum_{j}a_{ij}|x_i| \le d|x_i|. }[/math]
- Thus [math]\displaystyle{ |\lambda|\le d }[/math].
- (2) is easy to check.
- (3) Let [math]\displaystyle{ x }[/math] be the nonzero vector for which [math]\displaystyle{ Ax=dx }[/math], and let [math]\displaystyle{ x_i }[/math] be an entry of [math]\displaystyle{ x }[/math] with the largest absolute value. Since [math]\displaystyle{ (Ax)_i=d x_i }[/math], we have
- [math]\displaystyle{ \sum_{j}a_{ij}x_j=d x_i.\, }[/math]
- Since [math]\displaystyle{ \sum_{j}a_{ij}=d\, }[/math] and by the maximality of [math]\displaystyle{ x_i }[/math], it follows that [math]\displaystyle{ x_j=x_i }[/math] for all [math]\displaystyle{ j }[/math] that [math]\displaystyle{ a_{ij}\gt 0 }[/math]. Thus, [math]\displaystyle{ x_i=x_j }[/math] if [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math] are adjacent, which implies that [math]\displaystyle{ x_i=x_j }[/math] if [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math] are connected. For connected [math]\displaystyle{ G }[/math], all vertices are connected, thus all [math]\displaystyle{ x_i }[/math] are equal. This shows that if [math]\displaystyle{ G }[/math] is connected, the eigenvalue [math]\displaystyle{ d=\lambda_1 }[/math] has multiplicity 1, thus [math]\displaystyle{ \lambda_1\gt \lambda_2 }[/math].
- If otherwise, [math]\displaystyle{ G }[/math] is disconnected, then for two different components, we have [math]\displaystyle{ Ax=dx }[/math] and [math]\displaystyle{ Ay=dy }[/math], where the entries of [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math] are nonzero only for the vertices in their components components. Then [math]\displaystyle{ A(\alpha x+\beta y)=d(\alpha x+\beta y) }[/math]. Thus, the multiplicity of [math]\displaystyle{ d }[/math] is greater than 1, so [math]\displaystyle{ \lambda_1=\lambda_2 }[/math].
- (4) If [math]\displaystyle{ G }[/math] if bipartite, then the vertex set can be partitioned into two disjoint nonempty sets [math]\displaystyle{ V_1 }[/math] and [math]\displaystyle{ V_2 }[/math] such that all edges have one endpoint in each of [math]\displaystyle{ V_1 }[/math] and [math]\displaystyle{ V_2 }[/math]. Algebraically, this means that the adjacency matrix can be organized into the form
- [math]\displaystyle{ P^TAP=\begin{bmatrix} 0 & B\\ B^T & 0 \end{bmatrix} }[/math]
- where [math]\displaystyle{ P }[/math] is a permutation matrix, which has no change on the eigenvalues.
- If [math]\displaystyle{ x }[/math] is an eigenvector corresponding to the eigenvalue [math]\displaystyle{ \lambda }[/math], then [math]\displaystyle{ x' }[/math] which is obtained from [math]\displaystyle{ x }[/math] by changing the sign of the entries corresponding to vertices in [math]\displaystyle{ V_2 }[/math], is an eigenvector corresponding to the eigenvalue [math]\displaystyle{ -\lambda }[/math]. It follows that the spectrum of a bipartite graph is symmetric with respect to 0.
- [math]\displaystyle{ \square }[/math]
The spectral gap
It turns out that the second largest eigenvalue of a graph contains important information about the graph's expansion parameter. The following theorem is the so-called Cheeger's inequality.
Theorem (Cheeger's inequality) - Let [math]\displaystyle{ G }[/math] be a [math]\displaystyle{ d }[/math]-regular graph with spectrum [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. Then
- [math]\displaystyle{ \frac{d-\lambda_2}{2}\le \phi(G) \le \sqrt{2d(d-\lambda_2)} }[/math]
- Let [math]\displaystyle{ G }[/math] be a [math]\displaystyle{ d }[/math]-regular graph with spectrum [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. Then
The theorem was first stated for Riemannian manifolds, and was proved by Cheeger and Buser (for different directions of the inequalities). The discrete case is proved independently by Dodziuk and Alon-Milman.
For a [math]\displaystyle{ d }[/math]-regular graph, the value [math]\displaystyle{ (d-\lambda_2) }[/math] is known as the spectral gap. The name is due to the fact that it is the gap between the first and the second eigenvalue in the spectrum of a graph. The spectral gap provides an estimate on the expansion ratio of a graph. More precisely, a [math]\displaystyle{ d }[/math]-regular graph has large expansion ratio (thus being an expander) if the spectral gap is large.
If we write [math]\displaystyle{ \alpha=1-\frac{\lambda_2}{d} }[/math] (sometimes it is called the normalized spectral gap), the Cheeger's inequality is turned into a nicer form:
- [math]\displaystyle{ \frac{\alpha}{2}\le \frac{\phi}{d}\le\sqrt{2\alpha} }[/math] or equivalently [math]\displaystyle{ \frac{1}{2}\left(\frac{\phi}{d}\right)^2\le \alpha\le 2\left(\frac{\phi}{d}\right) }[/math].
We will not prove the theorem, but we will explain briefly why it works.
For the spectra of graphs, the Cheeger's inequality is proved by the Courant-Fischer theorem in linear algebra. The Courant-Fischer theorem is a fundamental theorem in linear algebra which characterizes the eigenvalues by a series of optimizations:
Theorem (Courant-Fischer theorem) - Let [math]\displaystyle{ A }[/math] be a symmetric matrix with eigenvalues [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. Then
- [math]\displaystyle{ \begin{align} \lambda_k &=\max_{v_1,v_2,\ldots,v_{n-k}\in \mathbb{R}^n}\min_{\overset{x\in\mathbb{R}^n, x\neq \mathbf{0}}{x\bot v_1,v_2,\ldots,v_{n-k}}}\frac{x^TAx}{x^Tx}\\ &= \min_{v_1,v_2,\ldots,v_{k-1}\in \mathbb{R}^n}\max_{\overset{x\in\mathbb{R}^n, x\neq \mathbf{0}}{x\bot v_1,v_2,\ldots,v_{k-1}}}\frac{x^TAx}{x^Tx}. \end{align} }[/math]
- Let [math]\displaystyle{ A }[/math] be a symmetric matrix with eigenvalues [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. Then
For a [math]\displaystyle{ d }[/math]-regular graph with adjacency matrix [math]\displaystyle{ A }[/math] and spectrum [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math], its largest eigenvalue [math]\displaystyle{ \lambda_1=d }[/math] with eigenvector [math]\displaystyle{ A\cdot\mathbf{1}=d\mathbf{1} }[/math]. According to the Courant-Fischer theorem, the second largest eigenvalue can be computed as
- [math]\displaystyle{ \lambda_2=\max_{x\bot \mathbf{1}}\frac{x^TAx}{x^Tx}, }[/math]
and
- [math]\displaystyle{ d-\lambda_2=\min_{x\bot \mathbf{1}}\frac{x^T(dI-A)x}{x^Tx}. }[/math]
The later is an optimization, which shares some resemblance of the expansion ratio [math]\displaystyle{ \phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}}\frac{|\partial S|}{|S|}=\min_{\chi_S}\frac{\chi_S^T(dI-A)\chi_S}{\chi_S^T\chi_S} }[/math], where [math]\displaystyle{ \chi_S }[/math] is the characteristic vector of the set [math]\displaystyle{ S }[/math], defined as [math]\displaystyle{ \chi_S(i)=1 }[/math] if [math]\displaystyle{ i\in S }[/math] and [math]\displaystyle{ \chi_S(i)=0 }[/math] if [math]\displaystyle{ i\not\in S }[/math]. It is not hard to verify that [math]\displaystyle{ \chi_S^T\chi_S=\sum_{i}\chi_S(i)=|S| }[/math] and [math]\displaystyle{ \chi_S^T(dI-A)\chi_S=\sum_{i\sim j}(\chi_S(i)-\chi_S(j))^2=|\partial S| }[/math].
Therefore, the spectral gap [math]\displaystyle{ d-\lambda_2 }[/math] and the expansion ratio [math]\displaystyle{ \phi(G) }[/math] both involve some optimizations with the similar forms. It explains why they can be used to approximate each other.
The expander mixing lemma
Given a [math]\displaystyle{ d }[/math]-regular graph [math]\displaystyle{ G }[/math] on [math]\displaystyle{ n }[/math] vertices with the spectrum [math]\displaystyle{ d=\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math], we denote [math]\displaystyle{ \lambda_\max = \max(|\lambda_2|,|\lambda_n|)\, }[/math], which is the largest absolute value of an eigenvalue other than [math]\displaystyle{ \lambda_1=d }[/math]. Sometimes, the value of [math]\displaystyle{ (d-\lambda_\max) }[/math] is also referred as the spectral gap, because it is the gap between the largest and the second largest absolute values of eigenvalues.
The next lemma is the so-called expander mixing lemma, which states a fundamental fact about expander graphs.
Lemma (expander mixing lemma) - Let [math]\displaystyle{ G }[/math] be a [math]\displaystyle{ d }[/math]-regular graph with [math]\displaystyle{ n }[/math] vertices. Then for all [math]\displaystyle{ S, T \subseteq V }[/math],
- [math]\displaystyle{ \left||E(S,T)|-\frac{d|S||T|}{n}\right|\le\lambda_\max\sqrt{|S||T|} }[/math]
- Let [math]\displaystyle{ G }[/math] be a [math]\displaystyle{ d }[/math]-regular graph with [math]\displaystyle{ n }[/math] vertices. Then for all [math]\displaystyle{ S, T \subseteq V }[/math],
The left-hand side measures the deviation between two quantities: one is [math]\displaystyle{ |E(S,T)| }[/math], the number of edges between the two sets [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math]; the other is the expected number of edges between [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math] in a random graph of edge density [math]\displaystyle{ d/n }[/math], namely [math]\displaystyle{ d|S||T|/n }[/math]. A small [math]\displaystyle{ \lambda_\max }[/math] (or large spectral gap) implies that this deviation (or discrepancy as it is sometimes called) is small, so the graph looks random everywhere although it is deterministic.
Random walks on expander graphs
Given a [math]\displaystyle{ d }[/math]-regular graph [math]\displaystyle{ G }[/math] on [math]\displaystyle{ n }[/math] vertices with the spectrum [math]\displaystyle{ d=\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math], we denote [math]\displaystyle{ \lambda_\max = \max(|\lambda_2|,|\lambda_n|)\, }[/math], which is the largest absolute value of eigenvalues except [math]\displaystyle{ \lambda_1=d }[/math].
Let [math]\displaystyle{ A }[/math] be the adjacency matrix of [math]\displaystyle{ G }[/math] and let [math]\displaystyle{ P=\frac{1}{d}A }[/math]. We know that [math]\displaystyle{ P }[/math] defines a transition matrix of a random walk on [math]\displaystyle{ G }[/math], such that
- [math]\displaystyle{ P(u,v)=\begin{cases} \frac{1}{d} & u\sim v,\\ 0 & u\not\sim v. \end{cases} }[/math]
Let [math]\displaystyle{ \rho_1\ge\rho_2\ge\cdots\ge\rho_n }[/math] be the spectrum of [math]\displaystyle{ P }[/math]. Since [math]\displaystyle{ P=\frac{1}{d}A }[/math], it is obvious that
- [math]\displaystyle{ \rho_i=\frac{\lambda_i}{d} }[/math].
Similarly, let [math]\displaystyle{ \rho_\max= \max(|\rho_2|,|\rho_n|)\, }[/math].
Lemma - Let [math]\displaystyle{ \pi=(\frac{1}{n},\ldots,\frac{1}{n}) }[/math] be the uniform distribution. For any stochastic vector [math]\displaystyle{ q }[/math],
- [math]\displaystyle{ \|qP-\pi\|_2\le\rho_\max }[/math].
- Let [math]\displaystyle{ \pi=(\frac{1}{n},\ldots,\frac{1}{n}) }[/math] be the uniform distribution. For any stochastic vector [math]\displaystyle{ q }[/math],
Proof. Since [math]\displaystyle{ G }[/math] is [math]\displaystyle{ d }[/math]-regular, the uniform distribution [math]\displaystyle{ \pi }[/math] is stationary for [math]\displaystyle{ P }[/math], i.e. [math]\displaystyle{ \pi P=\pi }[/math]. Then - [math]\displaystyle{ \|qP-\pi\|_2=\|qP-\pi P\|_2=\|(q-\pi)P\|_2 }[/math]
and because [math]\displaystyle{ (q-\pi)\cdot\mathbf{1}=0 }[/math], i.e. [math]\displaystyle{ (q-\pi) }[/math] is orthogonal to [math]\displaystyle{ \mathbf{1} }[/math], its [math]\displaystyle{ \ell_2 }[/math]-norm is shrinking by a factor of [math]\displaystyle{ \rho_\max }[/math] under the action of [math]\displaystyle{ P }[/math]. Consequently,
- [math]\displaystyle{ \|(q-\pi)P\|_2\le\rho_\max\|(q-\pi)\|_2\le\rho_\max. }[/math]
The last inequality is due to the fact that both [math]\displaystyle{ q }[/math] and [math]\displaystyle{ \pi }[/math] are stochastic vectors.
- [math]\displaystyle{ \square }[/math]
Using this lemma, we can prove the following theorem.
Theorem - Let [math]\displaystyle{ A }[/math] be the adjacency matrix of a [math]\displaystyle{ d }[/math]-regular graph [math]\displaystyle{ G }[/math] with the spectrum [math]\displaystyle{ d=\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math], and let [math]\displaystyle{ \lambda_\max = \max(|\lambda_2|,|\lambda_n|)\, }[/math].
- Let [math]\displaystyle{ P=\frac{1}{d}A }[/math] be the transition matrix of the random walk on [math]\displaystyle{ G }[/math], and [math]\displaystyle{ \pi=(\frac{1}{n},\ldots,\frac{1}{n}) }[/math] be the uniform distribution. For any initial distribution [math]\displaystyle{ q }[/math],
- [math]\displaystyle{ \|qP^t-\pi\|_2\le\rho_\max^t=\left(\frac{\lambda_\max}{d}\right)^t }[/math].
Proof. Prove by induction on [math]\displaystyle{ t }[/math] with the lemma.
- [math]\displaystyle{ \square }[/math]
The next corollary follows immediately (due to Cauchy-Schwartz theorem) from the above theorem.
Corollary - Let [math]\displaystyle{ \pi=(\frac{1}{n},\ldots,\frac{1}{n}) }[/math] be the uniform distribution. For any initial distribution [math]\displaystyle{ q }[/math],
- [math]\displaystyle{ \|qP^t-\pi\|_1\le\sqrt{n}\|qP^t-\pi\|_2\le\sqrt{n}\cdot\left(\frac{\lambda_\max}{d}\right)^t }[/math].
- Let [math]\displaystyle{ \pi=(\frac{1}{n},\ldots,\frac{1}{n}) }[/math] be the uniform distribution. For any initial distribution [math]\displaystyle{ q }[/math],
So the random walk on a graph converges exponentially fast if [math]\displaystyle{ \frac{\lambda_\max}{d} }[/math] is a constant [math]\displaystyle{ \lt 1 }[/math]. This is true when [math]\displaystyle{ \lambda_2\ge|\lambda_n| }[/math] and the spectral gap [math]\displaystyle{ (d-\lambda_2) }[/math] is a constant. This seems not a very nice condition. This situation can be changed by considering the lazy random walk.
- Lazy random walk
Recall that for the lazy random walk, at each step, we first flip a fair coin to decide whether to walk or to stay. The corresponding transition matrix is
- [math]\displaystyle{ Q=\frac{1}{2}(I+P)=\frac{1}{2}\left(I+\frac{1}{d}A\right). }[/math]
The spectrum of [math]\displaystyle{ Q }[/math] is given by
- [math]\displaystyle{ \rho'_i=\frac{1+\frac{1}{d}\lambda_i}{2} }[/math]
where [math]\displaystyle{ \lambda_i }[/math] is the [math]\displaystyle{ i }[/math]th largest eigenvalue of the adjacency matrix [math]\displaystyle{ A }[/math]. For [math]\displaystyle{ d }[/math]-regular graph, it holds that
- [math]\displaystyle{ 1=\rho_1'\ge \rho_2'\ge\cdots\ge\rho_n'\ge 0. }[/math]
The last inequality is due to that [math]\displaystyle{ |\lambda_i|\le d }[/math].
Therefore, for the lazy walk, it always holds that [math]\displaystyle{ \rho_\max'=\max(|\rho_2'|,|\rho_n'|)=\rho_2'=\frac{1+\frac{1}{d}\lambda_2}{2} }[/math].
Theorem (lazy random walk on expander) - Given a [math]\displaystyle{ d }[/math]-regular graph [math]\displaystyle{ G }[/math] with the spectrum [math]\displaystyle{ d=\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math], let [math]\displaystyle{ Q }[/math] be the transition matrix of the lazy random walk on [math]\displaystyle{ G }[/math], and [math]\displaystyle{ \pi=(\frac{1}{n},\ldots,\frac{1}{n}) }[/math] be the uniform distribution. For any initial distribution [math]\displaystyle{ q }[/math],
- [math]\displaystyle{ \|qQ^t-\pi\|_1\le\sqrt{n}\left(\frac{1+\frac{1}{d}\lambda_2}{2}\right)^t }[/math].
- In particular, the lazy random walk converges exponentially fast when the spectral gap [math]\displaystyle{ (d-\lambda_2) }[/math] is a constant [math]\displaystyle{ \gt 0 }[/math].
- Given a [math]\displaystyle{ d }[/math]-regular graph [math]\displaystyle{ G }[/math] with the spectrum [math]\displaystyle{ d=\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math], let [math]\displaystyle{ Q }[/math] be the transition matrix of the lazy random walk on [math]\displaystyle{ G }[/math], and [math]\displaystyle{ \pi=(\frac{1}{n},\ldots,\frac{1}{n}) }[/math] be the uniform distribution. For any initial distribution [math]\displaystyle{ q }[/math],
Due to the Cheeger's inequality, expander graphs have constant spectral gap, thus the lazy random walk on an expander graph converges exponentially fast.