随机算法 (Fall 2011)/Coupling

From EtoneWiki
Jump to: navigation, search

Coupling of Two Distributions

Coupling is a powerful proof technique in probability theory. It allows us to compare two unrelated variables (or processes) by forcing them to share some randomness.

Definition (coupling)
Let and be two probability distributions over . A probability distribution over is said to be a coupling if its marginal distributions are and ; that is

The interesting case is when and are not independent. In particular, we want the probability that to be as large as possible, yet still maintaining the respective marginal distributions and . When and are different it is inevitable that with some probability, but how small this probability can be, that is, how well two distributions can be coupled? The following coupling lemma states that this is determined by the total variation distance between the two distributions.

Lemma (coupling lemma)
Let and be two probability distributions over .
  1. For any coupling of and , it holds that .
  2. There exists a coupling of and such that .
Proof.
1.

Suppose follows the distribution over . Due to the definition of coupling,

Then for any , , thus

Therefore,

2.

We can couple and so that for each , with probability . The remaining follows as above.

The proof is depicted by the following figure, where the curves are the probability density functions for the two distributions and , the colored area is the diference between the two distribution, and the "lower envelope" (the red line) is the .

Coupling.png

Monotonicity of

Consider a Markov chain on state space . Let be the stationary distribution, and be the distribution after steps when the initial state is . Recall that is the distance to stationary distribution after steps, started at state .

We will show that is non-decreasing in . That is, for a Markov chain, the variation distance to the stationary distribution monotonically decreases as time passes. Although this is rather intuitive at the first glance, the proof is nontrivial. A brute force (algebraic) proof can be obtained by analyze the change to the 1-norm of by multiplying the transition matrix . The analysis uses eigen decomposition. We introduce a cleverer proof using coupling.

Proposition
is non-decreasing in .
Proof.

Let and follow the stationary distribution . Two chains and can be defined by the initial distributions and . The distributions of and are and , respectively.

For any fixed , we can couple and such that . Due to the Coupling Lemma, such coupling exists. We then define a coupling between and as follows.

  • If , then following the transition matrix of the Markov chain.
  • Otherwise, do the transitions and independently, following the transitin matrix.

It is easy to see that the marginal distributions of and both follow the original Markov chain, and

In conclusion, it holds that

where the first inequality is due to the easy direction of the Coupling Lemma.

Rapid Mixing by Coupling

Consider an ergodic (irreducible and aperiodic) Markov chain on state space . We want to upper bound its mixing time. The coupling technique for bounding the mixing time can be sumerized as follows:

Consider two random walks starting at state and in . Each individual walk follows the transition rule of the original Markov chain. But the two random walks may be "coupled" in some way, that is, may be correlated with each other. A key observation is that for any such coupling, it always holds that
where we recall that .
Definition (coupling of Markov chain)
A coupling of a Markov chain with state space and transition matrix , is a Markov chain with state space , satisfying
  1. each of and viewed in isolation is a faithful copy of the original Markov chain, i.e.
    ;
  2. once and reaches the same state, they make identical moves ever since, i.e.
    if .
Lemma (coupling lemma for Markov chains)
Let be a coupling of a Markov chain with state space . Then for is bounded by
.
Proof.

Due to the coupling lemma (for probability distributions),

where are distributions of the Markov chain at time , started at states .

Therefore,


Corollary
Let be a coupling of a Markov chain with state space . Then for is bounded as follows:
.

Geometric convergence

Consider a Markov chain on state space . Let be the stationary distribution, and be the distribution after steps when the initial state is . Recall that

  • ;
  • ;
  • ;
  • ;
  • .

We prove that

Proposition
  1. for any integer .
  2. .
Proof.
1.

We denote that . We construct a coupling of the Markov chain as follows. Suppose for some integer .

  • If then for .
  • For the case that for some , we couple the and so that . Due to the Coupling Lemma, such coupling does exist.

For any that ,

Thus by the law of total probability. Therefore, for any ,

Then by induction,

and this holds for any . By the Coupling Lemma for Markov chains, this means .

2.

By the definition of , the inequality is straightforwardly implied by (1).

Random Walk on the Hypercube

A -dimensional hypercube is a graph on vertex set . For any , there is an edge between and if and differ at exact one coordinate (the hamming distance between and is 1).

We use coupling to bound the mixing time of the following simple random walk on a hypercube.

Random Walk on Hypercube
At each step, for the current state :
  1. with probability , do nothing;
  2. else, pick a coordinate uniformly at random and flip the bit (change to ).

This is a lazy uniform random walk in an -dimensional hypercube. It is equivalent to the following random walk.

Random Walk on Hypercube
At each step, for the current state :
  1. pick a coordinate uniformly at random and a bit uniformly at random;
  2. let .

It is easy to see the two random walks are the same. The second form hint us to couple the random choice of and in each step. Consider two copies of the random walk and , started from arbitrary two states and , the coupling rule is:

  • At each step, and choose the same (uniformly random) coordinate and the same (uniformly random) bit .

Obviously the two individual random walks are both faithful copies of the original walk.

For arbitrary , started at , the event occurs if everyone of the coordinates on which and disagree has been picked at least once. In the worst case, and disagree on all coordinates. The time that all coordinates have been picked follows the coupon collector problem collecting coupons, and we know from the coupon collector problem that for ,

Thus for any , if , then . Due to the coupling lemma for Markov chains,

which implies that

.

So the random walk achieves a polynomially small variation distance to the stationary distribution in time .

Note that the number of states (vertices in the -dimensional hypercube) is . Therefore, the mixing time is exponentially small compared to the size of the state space.