随机算法 (Spring 2013)/Expander Graphs and Mixing

From TCS Wiki
Revision as of 15:22, 8 June 2013 by imported>Etone
Jump to navigation Jump to search

Spectral approach for symmetric chain

We consider the symmetric Markov chains defined on the state space [math]\displaystyle{ \Omega }[/math], where the transition matrix [math]\displaystyle{ P }[/math] is symmetric.

We have the following powerful spectral theorem for symmetric matrices.

Theorem (Spectral theorem)
Let [math]\displaystyle{ S }[/math] be a symmetric [math]\displaystyle{ n\times n }[/math] matrix, whose eigenvalues are [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. There exist eigenvectors [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] such that [math]\displaystyle{ v_iS=\lambda_iv_i }[/math] for all [math]\displaystyle{ i=1,\ldots, n }[/math] and [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] are orthonormal.

A set of orthonormal vectors [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] satisfy that

  • for any [math]\displaystyle{ i\neq j }[/math], [math]\displaystyle{ v_i }[/math] is orthogonal to [math]\displaystyle{ v_j }[/math], which means that [math]\displaystyle{ v_i^Tv_j=0 }[/math], denoted [math]\displaystyle{ v_i\bot v_j }[/math];
  • all [math]\displaystyle{ v_i }[/math] have unit length, i.e., [math]\displaystyle{ \|v_i\|_2=1 }[/math].

Since the eigenvectors [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] are orthonormal, we can use them as orthogonal basis, so that any vector [math]\displaystyle{ x\in\mathbb{R}^n }[/math] can be expressed as [math]\displaystyle{ x=\sum_{i=1}^nc_iv_i }[/math] where [math]\displaystyle{ c_i=q^Tv_i }[/math], therefore

[math]\displaystyle{ xS=\sum_{i=1}^nc_iv_iS=\sum_{i=1}^nc_i\lambda_iv_i. }[/math]

So multiplying by [math]\displaystyle{ S }[/math] corresponds to multiplying the length of [math]\displaystyle{ x }[/math] along the direction of every eigenvector by a factor of the corresponding eigenvalue.


Back to the symmetric Markov chain. Let [math]\displaystyle{ \Omega }[/math] be a finite state space of size [math]\displaystyle{ N }[/math], and [math]\displaystyle{ P }[/math] be a symmetric transition matrix, whose eigenvalues are [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\ldots\ge \lambda_N }[/math]. The followings hold for a symmetric transition matrix [math]\displaystyle{ P }[/math]:

  • Due to the spectral theorem, there exist orthonormal eigenvectors [math]\displaystyle{ v_1,v_2,\ldots,v_N }[/math] such that [math]\displaystyle{ v_iP=\lambda_iv_i }[/math] for [math]\displaystyle{ i=1,2,\ldots,N }[/math] and any distribution [math]\displaystyle{ q }[/math] over [math]\displaystyle{ \Omega }[/math] can be expressed as [math]\displaystyle{ q=\sum_{i=1}^Nc_iv_i }[/math] where [math]\displaystyle{ c_i=q^Tv_i }[/math].
  • A symmetric [math]\displaystyle{ P }[/math] must be double stochastic, thus the stationary distribution [math]\displaystyle{ \pi }[/math] is the uniform distribution.

Recall that due to Perron-Frobenius theorem, [math]\displaystyle{ \lambda_1=1 }[/math]. And [math]\displaystyle{ \boldsymbol{1}P=\boldsymbol{1} }[/math] since [math]\displaystyle{ P }[/math] is double stochastic, thus [math]\displaystyle{ v_1=\frac{\boldsymbol{1}}{\|\boldsymbol{1}\|_2}=\left(\frac{1}{\sqrt{N}},\ldots,\frac{1}{\sqrt{N}}\right) }[/math].

When [math]\displaystyle{ q }[/math] is a distribution, i.e., [math]\displaystyle{ q }[/math] is a nonnegative vector and [math]\displaystyle{ \|q\|_1=1 }[/math], it holds that [math]\displaystyle{ c_1=q^Tv_1=\frac{1}{\sqrt{N}} }[/math] and [math]\displaystyle{ c_1v_1=\left(\frac{1}{N},\ldots,\frac{1}{N}\right)=\pi }[/math], thus

[math]\displaystyle{ q=\sum_{i=1}^Nc_iv_i=\pi+\sum_{i=2}^Nc_iv_i, }[/math]

and the distribution at time [math]\displaystyle{ t }[/math] when the initial distribution is [math]\displaystyle{ q }[/math], is given by

[math]\displaystyle{ qP^t=\pi P^t+\sum_{i=2}^Nc_iv_iP^t=\pi+\sum_{i=2}^Nc_i\lambda_i^tv_i. }[/math]

It is easy to see that this distribution converges to [math]\displaystyle{ \pi }[/math] when the absolute values of [math]\displaystyle{ \lambda_2,\ldots,\lambda_N }[/math] are all less than 1. And the rate at which it converges to [math]\displaystyle{ \pi }[/math], namely the mixing rate, is determined by the quantity [math]\displaystyle{ \lambda_{\max}=\max\{|\lambda_2|,|\lambda_N|\}\, }[/math], which is the largest absolute eigenvalues other than [math]\displaystyle{ \lambda_1 }[/math].

Theorem
Let [math]\displaystyle{ P }[/math] be the transition matrix for a symmetric Markov chain on state space [math]\displaystyle{ \Omega }[/math] where [math]\displaystyle{ |\Omega|=N }[/math]. Let [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_N }[/math] be the spectrum of [math]\displaystyle{ P }[/math] and [math]\displaystyle{ \lambda_{\max}=\max\{|\lambda_2|,|\lambda_N|\}\, }[/math]. The mixing rate of the Markov chain is
[math]\displaystyle{ \tau(\epsilon)\le\frac{\frac{1}{2}\ln N+\ln\frac{1}{2\epsilon}}{1-\lambda_{\text{max}}} }[/math].
Proof.

As analysed above, if [math]\displaystyle{ P }[/math] is symmetric, it has orthonormal eigenvectors [math]\displaystyle{ v_1,\ldots,v_N }[/math] such that any distribution [math]\displaystyle{ q }[/math] over [math]\displaystyle{ \Omega }[/math] can be expressed as

[math]\displaystyle{ q=\sum_{i=1}^Nc_iv_i=\pi+\sum_{i=2}^Nc_iv_i }[/math]

with [math]\displaystyle{ c_i=q^Tv_i }[/math], and

[math]\displaystyle{ qP^t=\pi+\sum_{i=2}^Nc_i\lambda_i^tv_i. }[/math]

Thus,

[math]\displaystyle{ \begin{align} \|qP^t-\pi\|_1 &= \left\|\sum_{i=2}^Nc_i\lambda_i^tv_i\right\|_1\\ &\le \sqrt{N}\left\|\sum_{i=2}^Nc_i\lambda_i^tv_i\right\|_2 &\quad\text{(Cauchy-Schwarz)}\\ &= \sqrt{N}\sqrt{\sum_{i=2}^Nc_i^2\lambda_i^{2t}}\\ &\le \sqrt{N}\lambda_{\max}^t\sqrt{\sum_{i=2}^Nc_i^2}\\ &= \sqrt{N}\lambda_{\max}^t\|q\|_2\\ &\le \sqrt{N}\lambda_{\max}^t. \end{align} }[/math]

The last inequality is due to a universal relation [math]\displaystyle{ \|q\|_2\le\|q\|_1 }[/math] and the fact that [math]\displaystyle{ q }[/math] is a distribution.

Then for any [math]\displaystyle{ x\in\Omega }[/math], denoted by [math]\displaystyle{ \boldsymbol{1}_x }[/math] the indicator vector for [math]\displaystyle{ x }[/math] such that [math]\displaystyle{ \boldsymbol{1}_x(x)=1 }[/math] and [math]\displaystyle{ \boldsymbol{1}_x(y)=0 }[/math] for [math]\displaystyle{ y\neq x }[/math], we have

[math]\displaystyle{ \begin{align} \Delta_x(t) &=\left\|\boldsymbol{1}_x P^t-\pi\right\|_{TV}=\frac{1}{2}\left\|\boldsymbol{1}_x P^t-\pi\right\|_1\\ &\le\frac{\sqrt{N}}{2}\lambda_{\max}^t\le\frac{\sqrt{N}}{2}\mathrm{e}^{-t(1-\lambda_{\max})}. \end{align} }[/math]

Therefore, we have

[math]\displaystyle{ \tau_x(\epsilon) =\min\{t\mid\Delta_x(t)\le\epsilon\} \le\frac{\frac{1}{2}\ln N+\ln\frac{1}{2\epsilon}}{1-\lambda_{\max}} }[/math]

for any [math]\displaystyle{ x\in\Omega }[/math], thus the bound holds for [math]\displaystyle{ \tau(\epsilon)=\max_{x}\tau_x(\epsilon) }[/math].

[math]\displaystyle{ \square }[/math]

Expander Graphs

According to wikipedia:

"Expander graphs have found extensive applications in computer science, in designing algorithms, error correcting codes, extractors, pseudorandom generators, sorting networks and robust computer networks. They have also been used in proofs of many important results in computational complexity theory, such as SL=L and the PCP theorem. In cryptography too, expander graphs are used to construct hash functions."

We will not explore everything about expander graphs, but will focus on the performances of random walks on expander graphs.

Consider an undirected (multi-)graph [math]\displaystyle{ G(V,E) }[/math], where the parallel edges between two vertices are allowed.

Some notations:

  • For [math]\displaystyle{ S,T\subset V }[/math], let [math]\displaystyle{ E(S,T)=\{uv\in E\mid u\in S,v\in T\} }[/math].
  • The Edge Boundary of a set [math]\displaystyle{ S\subset V }[/math], denoted [math]\displaystyle{ \partial S\, }[/math], is [math]\displaystyle{ \partial S = E(S, \bar{S}) }[/math].
Definition (Graph expansion)
The expansion ratio of an undirected graph [math]\displaystyle{ G }[/math] on [math]\displaystyle{ n }[/math] vertices, is defined as
[math]\displaystyle{ \phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}} \frac{|\partial S|}{|S|}. }[/math]

Expander graphs are [math]\displaystyle{ d }[/math]-regular (multi)graphs with [math]\displaystyle{ d=O(1) }[/math] and [math]\displaystyle{ \phi(G)=\Omega(1) }[/math].

This definition states the following properties of expander graphs:

  • Expander graphs are sparse graphs. This is because the number of edges is [math]\displaystyle{ dn/2=O(n) }[/math].
  • Despite the sparsity, expander graphs have good connectivity. This is supported by the expansion ratio.
  • This one is implicit: expander graph is a family of graphs [math]\displaystyle{ \{G_n\} }[/math], where [math]\displaystyle{ n }[/math] is the number of vertices. The asymptotic order [math]\displaystyle{ O(1) }[/math] and [math]\displaystyle{ \Omega(1) }[/math] in the definition is relative to the number of vertices [math]\displaystyle{ n }[/math], which grows to infinity.

The following fact is directly implied by the definition.

An expander graph has diameter [math]\displaystyle{ O(\log n) }[/math].

The proof is left for an exercise.

For a vertex set [math]\displaystyle{ S }[/math], the size of the edge boundary [math]\displaystyle{ |\partial S| }[/math] can be seen as the "perimeter" of [math]\displaystyle{ S }[/math], and [math]\displaystyle{ |S| }[/math] can be seen as the "volume" of [math]\displaystyle{ S }[/math]. The expansion property can be interpreted as a combinatorial version of isoperimetric inequality.

Vertex expansion
We can alternatively define the vertex expansion. For a vertex set [math]\displaystyle{ S\subset V }[/math], its vertex boundary, denoted [math]\displaystyle{ \delta S\, }[/math] is defined as that
[math]\displaystyle{ \delta S=\{u\not\in S\mid uv\in E \mbox{ and }v\in S\} }[/math],
and the vertex expansion of a graph [math]\displaystyle{ G }[/math] is [math]\displaystyle{ \psi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}} \frac{|\delta S|}{|S|} }[/math].

Existence of expander graph

We will show the existence of expander graphs by the probabilistic method. In order to do so, we need to generate random [math]\displaystyle{ d }[/math]-regular graphs.

Suppose that [math]\displaystyle{ d }[/math] is even. We can generate a random [math]\displaystyle{ d }[/math]-regular graph [math]\displaystyle{ G(V,E) }[/math] as follows:

  • Let [math]\displaystyle{ V }[/math] be the vertex set. Uniformly and independently choose [math]\displaystyle{ \frac{d}{2} }[/math] cycles of [math]\displaystyle{ V }[/math].
  • For each vertex [math]\displaystyle{ v }[/math], for every cycle, assuming that the two neighbors of [math]\displaystyle{ v }[/math] in that cycle is [math]\displaystyle{ w }[/math] and [math]\displaystyle{ u }[/math], add two edges [math]\displaystyle{ wv }[/math] and [math]\displaystyle{ uv }[/math] to [math]\displaystyle{ E }[/math].

The resulting [math]\displaystyle{ G(V,E) }[/math] is a multigraph. That is, it may have multiple edges between two vertices. We will show that [math]\displaystyle{ G(V,E) }[/math] is an expander graph with high probability. Formally, for some constant [math]\displaystyle{ d }[/math] and constant [math]\displaystyle{ \alpha }[/math],

[math]\displaystyle{ \Pr[\phi(G)\ge \alpha]=1-o(1) }[/math].

By the probabilistic method, this shows that there exist expander graphs. In fact, the above probability bound shows something much stronger: it shows that almost every regular graph is an expander.

Recall that [math]\displaystyle{ \phi(G)=\min_{S:|S|\le\frac{n}{2}}\frac{|\partial S|}{|S|} }[/math]. We call such [math]\displaystyle{ S\subset V }[/math] that [math]\displaystyle{ \frac{|\partial S|}{|S|}\lt \alpha }[/math] a "bad [math]\displaystyle{ S }[/math]". Then [math]\displaystyle{ \phi(G)\lt \alpha }[/math] if and only if there exists a bad [math]\displaystyle{ S }[/math] of size at most [math]\displaystyle{ \frac{n}{2} }[/math]. Therefore,

[math]\displaystyle{ \begin{align} \Pr[\phi(G)\lt \alpha] &= \Pr\left[\min_{S:|S|\le\frac{n}{2}}\frac{|\partial S|}{|S|}\lt \alpha\right]\\ &= \sum_{k=1}^\frac{n}{2}\Pr[\,\exists \mbox{bad }S\mbox{ of size }k\,]\\ &\le \sum_{k=1}^\frac{n}{2}\sum_{S\in{V\choose k}}\Pr[\,S\mbox{ is bad}\,] \end{align} }[/math]

Let [math]\displaystyle{ R\subset S }[/math] be the set of vertices in [math]\displaystyle{ S }[/math] which has neighbors in [math]\displaystyle{ \bar{S} }[/math], and let [math]\displaystyle{ r=|R| }[/math]. It is obvious that [math]\displaystyle{ |\partial S|\ge r }[/math], thus, for a bad [math]\displaystyle{ S }[/math], [math]\displaystyle{ r\lt \alpha k }[/math]. Therefore, there are at most [math]\displaystyle{ \sum_{r=1}^{\alpha k}{k \choose r} }[/math] possible choices such [math]\displaystyle{ R }[/math]. For any fixed choice of [math]\displaystyle{ R }[/math], the probability that an edge picked by a vertex in [math]\displaystyle{ S-R }[/math] connects to a vertex in [math]\displaystyle{ S }[/math] is at most [math]\displaystyle{ k/n }[/math], and there are [math]\displaystyle{ d(k-r) }[/math] such edges. For any fixed [math]\displaystyle{ S }[/math] of size [math]\displaystyle{ k }[/math] and [math]\displaystyle{ R }[/math] of size [math]\displaystyle{ r }[/math], the probability that all neighbors of all vertices in [math]\displaystyle{ S-R }[/math] are in [math]\displaystyle{ S }[/math] is at most [math]\displaystyle{ \left(\frac{k}{n}\right)^{d(k-r)} }[/math]. Due to the union bound, for any fixed [math]\displaystyle{ S }[/math] of size [math]\displaystyle{ k }[/math],

[math]\displaystyle{ \begin{align} \Pr[\,S\mbox{ is bad}\,] &\le \sum_{r=1}^{\alpha k}{k \choose r}\left(\frac{k}{n}\right)^{d(k-r)} \le \alpha k {k \choose \alpha k}\left(\frac{k}{n}\right)^{dk(1-\alpha)} \end{align} }[/math]

Therefore,

[math]\displaystyle{ \begin{align} \Pr[\phi(G)\lt \alpha] &\le \sum_{k=1}^\frac{n}{2}\sum_{S\in{V\choose k}}\Pr[\,S\mbox{ is bad}\,]\\ &\le \sum_{k=1}^\frac{n}{2}{n\choose k}\alpha k {k \choose \alpha k}\left(\frac{k}{n}\right)^{dk(1-\alpha)} \\ &\le \sum_{k=1}^\frac{n}{2}\left(\frac{en}{k}\right)^k\alpha k \left(\frac{ek}{\alpha k}\right)^{\alpha k}\left(\frac{k}{n}\right)^{dk(1-\alpha)}&\quad (\mbox{Stirling formula }{n\choose k}\le\left(\frac{en}{k}\right)^k)\\ &\le \sum_{k=1}^\frac{n}{2}\exp(O(k))\left(\frac{k}{n}\right)^{k(d(1-\alpha)-1)}. \end{align} }[/math]

The last line is [math]\displaystyle{ o(1) }[/math] when [math]\displaystyle{ d\ge\frac{2}{1-\alpha} }[/math]. Therefore, [math]\displaystyle{ G }[/math] is an expander graph with expansion ratio [math]\displaystyle{ \alpha }[/math] with high probability for suitable choices of constant [math]\displaystyle{ d }[/math] and constant [math]\displaystyle{ \alpha }[/math].

Computing Graph Expansion

Computation of graph expansion seems hard, because the definition involves the minimum over exponentially many subsets of vertices. In fact, the problem of deciding whether a graph is an expander is co-NP-complete. For a non-expander [math]\displaystyle{ G }[/math], the vertex set [math]\displaystyle{ S\subset V }[/math] which has low expansion ratio is a proof of the fact that [math]\displaystyle{ G }[/math] is not an expander, which can be verified in poly-time. However, there is no efficient algorithm for computing the [math]\displaystyle{ \phi(G) }[/math] unless NP=P.

The expansion ratio of a graph is closely related to the sparsest cut of the graph, which is the dual problem of the multicommodity flow problem, both NP-complete. Studies of these two problems revolutionized the area of approximation algorithms.

We will see right now that although it is hard to compute the expansion ratio exactly, the expansion ratio can be approximated by some efficiently computable algebraic identity of the graph.