# 随机算法 (Spring 2014)/Expander Graphs and Mixing

## Contents

# Expander Graphs

According to wikipedia:

- "Expander graphs have found extensive applications in computer science, in designing algorithms, error correcting codes, extractors, pseudorandom generators, sorting networks and robust computer networks. They have also been used in proofs of many important results in computational complexity theory, such as SL=L and the PCP theorem. In cryptography too, expander graphs are used to construct hash functions."

We will not explore everything about expander graphs, but will focus on the performances of random walks on expander graphs.

Consider an undirected (multi-)graph , where the parallel edges between two vertices are allowed.

Some notations:

- For , let .
- The
**Edge Boundary**of a set , denoted , is .

**Definition (Graph expansion)**- The
**expansion ratio**of an undirected graph on vertices, is defined as

- The

**Expander graphs** are **-regular** (multi)graphs with and .

This definition states the following properties of expander graphs:

- Expander graphs are sparse graphs. This is because the number of edges is .
- Despite the sparsity, expander graphs have good connectivity. This is supported by the expansion ratio.
- This one is implicit: expander graph is a
*family of graphs*, where is the number of vertices. The asymptotic order and in the definition is relative to the number of vertices , which grows to infinity.

The following fact is directly implied by the definition.

- An expander graph has diameter .

The proof is left for an exercise.

For a vertex set , the size of the edge boundary can be seen as the "perimeter" of , and can be seen as the "volume" of . The expansion property can be interpreted as a combinatorial version of isoperimetric inequality.

- Vertex expansion
- We can alternatively define the
**vertex expansion**. For a vertex set , its**vertex boundary**, denoted is defined as that- ,

- and the
**vertex expansion**of a graph is .

## Existence of expander graph

We will show the existence of expander graphs by the probabilistic method. In order to do so, we need to generate random -regular graphs.

Suppose that is even. We can generate a random -regular graph as follows:

- Let be the vertex set. Uniformly and independently choose cycles of .
- For each vertex , for every cycle, assuming that the two neighbors of in that cycle is and , add two edges and to .

The resulting is a multigraph. That is, it may have multiple edges between two vertices. We will show that is an expander graph with high probability. Formally, for some constant and constant ,

- .

By the probabilistic method, this shows that there exist expander graphs. In fact, the above probability bound shows something much stronger: it shows that almost every regular graph is an expander.

Recall that . We call such that a "bad ". Then if and only if there exists a bad of size at most . Therefore,

Let be the set of vertices in which has neighbors in , and let . It is obvious that , thus, for a bad , . Therefore, there are at most possible choices such . For any fixed choice of , the probability that an edge picked by a vertex in connects to a vertex in is at most , and there are such edges. For any fixed of size and of size , the probability that all neighbors of all vertices in are in is at most . Due to the union bound, for any fixed of size ,

Therefore,

The last line is when . Therefore, is an expander graph with expansion ratio with high probability for suitable choices of constant and constant .

## Computing graph expansion

Computation of graph expansion seems hard, because the definition involves the minimum over exponentially many subsets of vertices. In fact, the problem of deciding whether a graph is an expander is **co-NP-complete**. For a non-expander , the vertex set which has low expansion ratio is a proof of the fact that is not an expander, which can be verified in poly-time. However, there is no efficient algorithm for computing the unless **NP**=**P**.

The expansion ratio of a graph is closely related to the sparsest cut of the graph, which is the dual problem of the multicommodity flow problem, both NP-complete. Studies of these two problems revolutionized the area of approximation algorithms.

We will see right now that although it is hard to compute the expansion ratio exactly, the expansion ratio can be approximated by some efficiently computable algebraic identity of the graph.

# Spectral Graph Theory

## Graph spectrum

The **adjacency matrix** of an -vertex graph , denoted , is an matrix where is the number of edges in between vertex and vertex . Because is a symmetric matrix with real entries, due to the Perron-Frobenius theorem, it has real eigenvalues , which associate with an orthonormal system of eigenvectors with . We call the eigenvalues of the **spectrum** of the graph .

The spectrum of a graph contains a lot of information about the graph. For example, supposed that is -regular, the following lemma holds.

**Lemma**- for all .
- and the corresponding eigenvector is .
- is connected if and only if .
- If is bipartite then .

**Proof.**Let be the adjacency matrix of , with entries . It is obvious that for any . - (1) Suppose that , and let be an entry of with the largest absolute value. Since , we have

- and so
- Thus .

- (2) is easy to check.
- (3) Let be the nonzero vector for which , and let be an entry of with the largest absolute value. Since , we have

- Since and by the maximality of , it follows that for all that . Thus, if and are adjacent, which implies that if and are connected. For connected , all vertices are connected, thus all are equal. This shows that if is connected, the eigenvalue has multiplicity 1, thus .

- If otherwise, is disconnected, then for two different components, we have and , where the entries of and are nonzero only for the vertices in their components components. Then . Thus, the multiplicity of is greater than 1, so .

- (4) If if bipartite, then the vertex set can be partitioned into two disjoint nonempty sets and such that all edges have one endpoint in each of and . Algebraically, this means that the adjacency matrix can be organized into the form

- where is a permutation matrix, which has no change on the eigenvalues.
- If is an eigenvector corresponding to the eigenvalue , then which is obtained from by changing the sign of the entries corresponding to vertices in , is an eigenvector corresponding to the eigenvalue . It follows that the spectrum of a bipartite graph is symmetric with respect to 0.

## Cheeger's Inequality

One of the most exciting results in spectral graph theory is the following theorem which relate the graph expansion to the spectral gap.

**Theorem (Cheeger's inequality)**- Let be a -regular graph with spectrum . Then

- Let be a -regular graph with spectrum . Then

The theorem was first stated for Riemannian manifolds, and was proved by Cheeger and Buser (for different directions of the inequalities). The discrete case is proved independently by Dodziuk and Alon-Milman.

For a -regular graph, the quantity is called the **spectral gap**. The name is due to the fact that it is the gap between the first and the second largest eigenvalues of a graph.

If we write (sometimes it is called the normalized spectral gap), the Cheeger's inequality is turned into a nicer form:

- or equivalently .

### Optimization Characterization of Eigenvalues

**Theorem (Rayleigh-Ritz theorem)**- Let be a symmetric matrix. Let be the eigen values of and be the corresponding eigenvectors. Then
- and

- Let be a symmetric matrix. Let be the eigen values of and be the corresponding eigenvectors. Then

**Proof.**Without loss of generality, we may assume that are orthonormal eigen-basis. Then it holds that

- ,

thus we have .

Let be an arbitrary vector and let be its normalization. Since are orthonormal basis, can be expressed as . Then

Therefore, . Altogether we have

It is similar to prove . In the first part take to show that ; and in the second part take an arbitrary and . Notice that , thus with .

The Rayleigh-Ritz Theorem is a special case of a fundamental theorem in linear algebra, called the Courant-Fischer theorem, which characterizes the eigenvalues of a symmetric matrix by a series of optimizations:

**Theorem (Courant-Fischer theorem)**- Let be a symmetric matrix with eigenvalues . Then

- Let be a symmetric matrix with eigenvalues . Then

### Graph Laplacian

Let be a -regular graph of vertices and let be its adjacency matrix. We define to be the **Laplacian** of the graph . Take as a distribution over vertices, its Laplacian quadratic form measures the "smoothness" of over the graph topology, just as what the Laplacian operator does to the differentiable functions.

**Laplacian Property**- For any vector , it holds that
- .

- For any vector , it holds that

**Proof.**On the other hand,

Applying the Rayleigh-Ritz theorem to the Laplacian matrix of the graph, we have the following "variational characterization" of the spectral gap .

**Theorem (Variational Characterization)**- Let be a -regular graph of vertices. Suppose that its adjacency matrix is , whose eigenvalues are . Let be the Laplacian matrix. Then

- Let be a -regular graph of vertices. Suppose that its adjacency matrix is , whose eigenvalues are . Let be the Laplacian matrix. Then

**Proof.**For -regular graph, we know that and , thus is the eigenvector of . Due to Rayleigh-Ritz Theorem, it holds that . Then

We know it holds for the graph Laplacian that . So the variational characterization of the second eigenvalue of graph is proved.

### Proof of Cheeger's Inequality

We will first give an informal explanation why Cheeger's inequality holds.

Recall that the expansion is defined as

Let be the **characteristic vector** of the set such that

It is easy to see that

Thus, the expansion can be expressed algebraically as

On the other hand, due to the variational characterization of the spectral gap, we have

We can easily observe the similarity between the two formulas. Both the expansion ration and the spectral gap can be characterized by optimizations of the same objective function over different domains (for the spectral gap, the optimization is over all ; and for the expansion ratio, it is over all such vectors with at most many 1-entries).

- Notations

Throughout the proof, we assume that is the -regular graph of vertices, is the adjacency matrix, whose eigenvalues are , and is the graph Laplacian.

#### Large spectral gap implies high expansion

**Cheeger's inequality (lower bound)**

**Proof.**Let , , be the vertex set achieving the optimal expansion ratio , and be a vector defined as

Clearly, , thus .

Due to the variational characterization of the second eigenvalue,

#### High expansion implies large spectral gap

We next prove the upper bound direction of the Cheeger's inequality:

**Cheeger's inequality (upper bound)**

This direction is harder than the lower bound direction. But it is mathematically more interesting and also more useful to us for analyzing the mixing time of random walks.

We prove the following equivalent inequality:

Let satisfy that

- , i.e., it is a eigenvector for ;
- , i.e., has at most positive entries. (We can always choose to be if this is not satisfied.)

And let nonnegative vector be defined as

We then prove the following inequalities:

- ;
- .

The theorem is then a simple consequence by combining these two inequalities.

We prove the first inequality:

**Lemma**- .

**Proof.**If , then

Then

which proves the lemma.

We then prove the second inequality:

**Lemma**- .

**Proof.**To prove this, we introduce a new quantity and shows that

- .

This will give us the desired inequality .

**Lemma**- .

**Proof.**By the Cauchy-Schwarz Inequality,

By the Laplacian property, the first term . By the Inequality of Arithmetic and Geometric Means, the second term

Combining them together, we have

- .

**Lemma**- .

**Proof.**Suppose that has nonzero entries. We know that due to the definition of . We enumerate the vertices such that

- .

Then

We have the following universal equation for sums:

Notice that , which is at most since . Therefore, combining these together, we have

# Mixing Time

The **mixing time** of a Markov chain gives the rate at which a Markov chain converges to the stationary distribution. To rigorously define this notion, we need a way of measuring the closeness between two distributions.

## Total Variation Distance

In probability theory, the **total variation distance** measures the difference between two probability distributions.

**Definition (total variation distance)**- Let and be two probability distributions over the same finite state space , the
**total variation distance**between and is defined as- ,

- where is the -norm of vectors.

- Let and be two probability distributions over the same finite state space , the

It can be verified (left as an exercise) that

- ,

thus the total variation distance can be equivalently defined as

- .

So the total variation distance between two distributions gives an upper bound on the difference between the probabilities of the same event according to the two distributions.

## Mixing Time

**Definition (mixing time)**- Let be the stationary of the chain, and be the distribution after steps when the initial state is .
- is the distance to stationary distribution after steps, started at state .
- is the maximum distance to stationary distribution after steps.
- is the time until the total variation distance to the stationary distribution, started at the initial state , reaches .
- is the time until the total variation distance to the stationary distribution, started at the worst possible initial state, reaches .

- Let be the stationary of the chain, and be the distribution after steps when the initial state is .

We note that is monotonically non-increasing in . So the definition of makes sense, and is actually the inverse of .

**Definition (mixing time)**- The
**mixing time**of a Markov chain is .

- The

The mixing time is the time until the total variation distance to the stationary distribution, starting from the worst possible initial state , reaches . The value is chosen just for the convenience of calculation. The next proposition says that with general can be estimated from .

**Proposition**- for any integer .
- .

So the distance to stationary distribution decays exponentially in multiplications of .

Both the formal proofs of the monotonicity of and the above proposition uses the coupling technique and is postponed to next section.

# Spectral approach for symmetric chain

We consider the **symmetric Markov chains** defined on the state space , where the transition matrix is symmetric.

We have the following powerful spectral theorem for symmetric matrices.

**Theorem (Spectral theorem)**- Let be a symmetric matrix, whose eigenvalues are . There exist eigenvectors such that for all and are
**orthonormal**.

- Let be a symmetric matrix, whose eigenvalues are . There exist eigenvectors such that for all and are

A set of orthonormal vectors satisfy that

- for any , is orthogonal to , which means that , denoted ;
- all have unit length, i.e., .

Since the eigenvectors are orthonormal, we can use them as orthogonal basis, so that any vector can be expressed as where , therefore

So multiplying by corresponds to multiplying the length of along the direction of every eigenvector by a factor of the corresponding eigenvalue.

Back to the symmetric Markov chain. Let be a finite state space of size , and be a symmetric transition matrix, whose eigenvalues are . The followings hold for a symmetric transition matrix :

- Due to the spectral theorem, there exist orthonormal eigenvectors such that for and any distribution over can be expressed as where .
- A symmetric must be double stochastic, thus the stationary distribution is the uniform distribution.

Recall that due to Perron-Frobenius theorem, . And since is double stochastic, thus .

When is a distribution, i.e., is a nonnegative vector and , it holds that and , thus

and the distribution at time when the initial distribution is , is given by

It is easy to see that this distribution converges to when the absolute values of are all less than 1. And the rate at which it converges to , namely the mixing rate, is determined by the quantity , which is the largest absolute eigenvalues other than .

**Theorem**- Let be the transition matrix for a symmetric Markov chain on state space where . Let be the spectrum of and . The mixing rate of the Markov chain is
- .

- Let be the transition matrix for a symmetric Markov chain on state space where . Let be the spectrum of and . The mixing rate of the Markov chain is

**Proof.**As analysed above, if is symmetric, it has orthonormal eigenvectors such that any distribution over can be expressed as

with , and

Thus,

The last inequality is due to a universal relation and the fact that is a distribution.

Then for any , denoted by the indicator vector for such that and for , we have

Therefore, we have

for any , thus the bound holds for .

## Rapid mixing of expander walk

Let be a -regular graph on vertices. Let be its adjacency matrix. The transition matrix of the lazy random walk on is given by . Specifically,

Obviously is symmetric.

Let be the eigenvalues of , and be the eigenvalues of . It is easy to verify that

- .

We know that , thus . Therefore, . Due to the above analysis of symmetric Markov chain,

- .

Thus we prove the following theorem for lazy random walk on -regular graphs.

**Theorem**- Let be a -regular graph on vertices, with spectrum . The mixing rate of lazy random walk on is
- .

- Let be a -regular graph on vertices, with spectrum . The mixing rate of lazy random walk on is

Due to Cheeger's inequality, the spectral gap is bounded by expansion ratio as

where is the expansion ratio of the graph . Therefore, we have the following corollary which bounds the mixing time by graph expansion.

**Corollary**- Let be a -regular graph on vertices, whose expansion ratio is . The mixing rate of lazy random walk on is
- .

- In particular, the mixing time is
- .

- Let be a -regular graph on vertices, whose expansion ratio is . The mixing rate of lazy random walk on is

For expander graphs, both and are constants. The mixing time of lazy random walk is so the random walk is rapidly mixing.