# 随机算法 (Fall 2011)/Mixing Time

# Total Variation Distance

In probability theory, the **total variation distance** measures the difference between two probability distributions.

**Definition (total variation distance)**- Let and be two probability distributions over the same finite state space , the
**total variation distance**between and is defined as- ,

- where is the -norm of vectors.

- Let and be two probability distributions over the same finite state space , the

It can be verified (left as an exercise) that

- ,

thus the total variation distance can be equivalently defined as

- .

So the total variation distance between two distributions gives an upper bound on the difference between the probabilities of the same event according to the two distributions.

# Mixing Time

The **mixing time** of a Markov chain gives the rate at which a Markov chain converges to the stationary distribution.

**Definition (mixing time)**- Let be the stationary of the chain, and be the distribution after steps when the initial state is .
- is the distance to stationary distribution after steps, started at state .
- is the maximum distance to stationary distribution after steps.
- is the time until the total variation distance to the stationary distribution, started at the initial state , reaches .
- is the time until the total variation distance to the stationary distribution, started at the worst possible initial state, reaches .

- Let be the stationary of the chain, and be the distribution after steps when the initial state is .

We note that is monotonically non-increasing in . So the definition of makes sense, and is actually the inverse of .

**Definition (mixing time)**- The
**mixing time**of a Markov chain is .

- The

The mixing time is the time until the total variation distance to the stationary distribution, starting from the worst possible initial state , reaches . The value is chosen just for the convenience of calculation. The next proposition says that with general can be estimated from .

**Proposition**- for any integer .
- .

So the distance to stationary distribution decays exponentially in multiplications of .

Both the formal proofs of the monotonicity of and the above proposition uses the coupling technique and is postponed to next section.