Randomized Algorithms (Spring 2010)/Random sampling
Random sampling
The Markov Chain Monte Carlo (MCMC) method
The MCMC method provide a very general approach to near uniform sampling. The basic idea of the method is as follows:
- Define a Markov chain whose state space is the sample space, and whose stationary distribution is the uniform distribution.
- Start the chain from an arbitrary state, run the Markov chain for a sufficiently long time, and return the current state.
Usually, the name "MCMC" refers to a class of methods for numerical computation and simulation by sampling via random walks. Here we use the name to refer to the methods of sampling via random walks.
Consider the following problem:
- Given an undirected graph [math]\displaystyle{ G(V,E) }[/math] on [math]\displaystyle{ n }[/math] vertices, uniformly sample an independent set of [math]\displaystyle{ G }[/math].
By the MCMC method, we consider a Markov chain whose state space [math]\displaystyle{ \mathcal{I} }[/math] is the space of all independent sets of [math]\displaystyle{ G }[/math]. Our sampling algorithm simply run the Markov chain for a sufficiently long time [math]\displaystyle{ T }[/math], and return the current independent set at time [math]\displaystyle{ T }[/math].
To guarantee that the returned independent set is nearly uniformly distributed, the Markov chain has to meet the following constraints:
- The chain converges (irreducible and aperiodic).
- The stationary distribution is uniform.
The running time of the algorithm is determined by: (1) the time complexity of a transition of a single step, and (2) the total number of steps [math]\displaystyle{ T }[/math]. Thus, to have an efficient algorithm, we have to also guarantee:
- The transition at each step of the chain is efficiently computable.
- The chain is rapid mixing.
Given as the input an undirected graph [math]\displaystyle{ G(V,E) }[/math]:
- [math]\displaystyle{ I_0=\emptyset\, }[/math].
- At step [math]\displaystyle{ t }[/math], assuming that the current independent set is [math]\displaystyle{ I_t }[/math], the [math]\displaystyle{ I_{t+1} }[/math] is computed:
- choose a vertex [math]\displaystyle{ v }[/math] uniformly at random from [math]\displaystyle{ V }[/math];
- if [math]\displaystyle{ v\in I_t }[/math] then [math]\displaystyle{ I_{t+1}=I_{t}\setminus \{v\}\, }[/math];
- if [math]\displaystyle{ v\not\in I_t }[/math] and [math]\displaystyle{ \forall u\in I_t, uv\not\in E }[/math], then [math]\displaystyle{ I_{t+1}=I_{t}\cup \{v\} }[/math];
- otherwise, [math]\displaystyle{ I_t=I_{t+1}\, }[/math].
The Metropolis algorithm
Conductance
Recap
- A Markov chain with finite space [math]\displaystyle{ \mathcal{S} }[/math], where [math]\displaystyle{ N=|\mathcal{S}| }[/math], transition matrix [math]\displaystyle{ P }[/math], whose eigenvalues are [math]\displaystyle{ \lambda_1\ge\cdots\ge\lambda_N }[/math], stationary distribution [math]\displaystyle{ \pi }[/math].
- The mixing time [math]\displaystyle{ \tau(\epsilon)\, }[/math]: time to be close to [math]\displaystyle{ \pi }[/math] within total variation distance [math]\displaystyle{ \epsilon }[/math], starting from a worst-case state.
Conditions:
- Lazy random walk: [math]\displaystyle{ P_{i,i}\ge\frac{1}{2} }[/math] for any [math]\displaystyle{ i\in\mathcal{S} }[/math], so [math]\displaystyle{ 1=\lambda_1\ge\cdots\ge\lambda_N\ge 0 }[/math] and [math]\displaystyle{ \lambda_{\max}=\max(|\lambda_2|,|\lambda_N|)=\lambda_2\, }[/math].
- The Markov chain is time-reversible: [math]\displaystyle{ \pi_{i}P_{i,j}=\pi_{j}P_{j,i} }[/math] for all [math]\displaystyle{ i,j\in\mathcal{S} }[/math].
- The stationary [math]\displaystyle{ \pi }[/math] is the uniform distribution, that is, [math]\displaystyle{ \pi_i=\frac{1}{N} }[/math] for all [math]\displaystyle{ i\in\mathcal{S} }[/math].
Then:
Theorem
|
Conductance and the mixing time
For many problems, such as card shuffling, the state space is exponentially large, so the estimation of [math]\displaystyle{ \lambda_2 }[/math] becomes very difficult. The following technique based on conductance overcomes this issue by considering the conductance of a Markov chain.
Definition (conductance)
|
The definition of conductance looks quite similar to the expansion ratio of graphs. In fact, for the random walk on a undirected [math]\displaystyle{ d }[/math]-regular graph [math]\displaystyle{ G }[/math], there is a straightforward relation between the conductance [math]\displaystyle{ \Phi }[/math] of the walk and the expansion ratio [math]\displaystyle{ \phi(G) }[/math] of the graph
- [math]\displaystyle{ \Phi=\frac{\phi(G)}{d}. }[/math]
Very informally, the conductance can be seen as the weighted normalized version of expansion ratio.
The following theorem states a Cheeger's inequality for the conductance.
Lemma (Jerrum-Sinclair 1988)
|
The inequality can be equivalent written for the spectral gap:
- [math]\displaystyle{ \frac{\Phi^2}{2}\le1-\lambda_2\le 2\Phi }[/math]
thus a large conductance implies a large spectral gap, which in turn implies the rapid mixing of the random walk.
Proposition
|
Canonical paths
Let [math]\displaystyle{ \Gamma=\{\gamma_{xy}\} }[/math] be a collection of canonical paths. The congestion caused by these paths is computed as
- [math]\displaystyle{ \rho=\max_{uv E}\frac{1}{\pi_u P_{uv}}\sum_{\gamma_{xy}\ni uv}\pi_x\pi_y }[/math].
Let [math]\displaystyle{ \Gamma=\{\gamma_{xy}\} }[/math] be a collection of canonical path. The conductance is lower bounded by
- [math]\displaystyle{ \Phi\ge\frac{1}{2\rho} }[/math].
Therefore, assuming a collection [math]\displaystyle{ \Gamma=\{\gamma_{xy}\} }[/math] of canonical paths whose congestion is [math]\displaystyle{ \rho }[/math], the mixing time of a reversible chain whose stationary distribution is uniform distribution is bounded by
- [math]\displaystyle{ \tau(\epsilon)=O(\rho^2(\ln N+\ln(1/\epsilon)))\, }[/math].