Randomized Algorithms (Spring 2010)/Markov chains and random walks: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>WikiSysop
imported>WikiSysop
Line 132: Line 132:




{|border="1"
|'''Theorem (The Markov chain convergence theorem)'''
:Let <math>X_0,X_1,\ldots,</math> be an ''irreducible'', ''aperiodic'' Markov chain with ''finite'' state space <math>\mathcal{S}=[N]</math>, transition matrix <math>P</math>, and arbitrary initial distribution <math>\pi^{(0)}</math>. Then, there exists a unique stationary distribution <math>\pi</math> such that <math>\pi P=\pi</math>, and
::<math>
\lim_{n\rightarrow\infty}\pi^{(n)}(i)=\pi(i)
</math>
:for all states <math>i\in\mathcal{S}</math>.
|}
There are three pieces of information delivered by the theorem:
* Existence: there exists a stationary distribution.
* Uniqueness: the stationary distribution is unique.
* Convergence: starting from any initial distribution, the chain converges to the stationary distribution.
=== Infinite chains* ===
The following theorem holds for both finite and infinite chains.
{|border="1"
{|border="1"
|'''Theorem (Basic limit theorem)'''
|'''Theorem (Basic limit theorem)'''
Line 140: Line 159:
:for all states <math>i</math>.
:for all states <math>i</math>.
|}
|}
Note that the theorem states that "...having a stationary distribution..."


==== Recurrence and Ergodicity* ====
==== Recurrence and Ergodicity ====


{|border="1"
{|border="1"

Revision as of 08:08, 28 April 2010

Markov Chains

The Markov property and transition matrices

A stochastic processes [math]\displaystyle{ \{X_t\mid t\in T\} }[/math] is a collection of random variables. The index [math]\displaystyle{ t }[/math] is often called time, as the process represents the value of a random variable changing over time. Let [math]\displaystyle{ \mathcal{S} }[/math] be the set of values assumed by the random variables [math]\displaystyle{ X_t }[/math]. We call each element of [math]\displaystyle{ \mathcal{S} }[/math] a state, as [math]\displaystyle{ X_t }[/math] represents the state of the process at time [math]\displaystyle{ t }[/math].

The model of stochastic processes can be very general. In this class, we only consider the stochastic processes with the following properties:

discrete time
The index set [math]\displaystyle{ T }[/math] is countable. Specifically, we assume the process is [math]\displaystyle{ X_0,X_1,X_2,\ldots }[/math]
discrete space
The state space [math]\displaystyle{ \mathcal{S} }[/math] is countable. We are especially interested in the case that [math]\displaystyle{ \mathcal{S} }[/math] is finite, in which case the process is called a finite process.

The next property is about the dependency structure among random variables. The simplest dependency structure for [math]\displaystyle{ X_0,X_1,\ldots }[/math] is no dependency at all, that is, independence. We consider the next simplest dependency structure called the Markov property.

Definition (the Markov property)
A process [math]\displaystyle{ X_0,X_1,\ldots }[/math] satisfies the Markov property if
[math]\displaystyle{ \Pr[X_{n+1}=x_{n+1}\mid X_{0}=x_{0}, X_{1}=x_{1},\ldots,X_{n}=x_{n}]=\Pr[X_{n+1}=x_{n+1}\mid X_{n}=x_{n}] }[/math]
for all [math]\displaystyle{ n }[/math] and all [math]\displaystyle{ x_0,\ldots,x_{n+1}\in \mathcal{S} }[/math].

Informally, the Markov property means: "conditioning on the present, the future does not depend on the past." Hence, the Markov property is also called the memoryless property.

A stochastic process [math]\displaystyle{ X_0,X_1,\ldots }[/math] of discrete time and discrete space is a Markov chain if it has the Markov property.

Let [math]\displaystyle{ P^{(t+1)}_{i,j}=\Pr[X_{t+1}=j\mid X_t=i] }[/math]. For a Markov chain with a finite state space [math]\displaystyle{ \mathcal{S}=[N] }[/math]. This gives us a transition matrix [math]\displaystyle{ P^{(t+1)} }[/math] at time [math]\displaystyle{ t }[/math]. The transition matrix is an [math]\displaystyle{ N\times N }[/math] matrix of nonnegative entries such that the sum over each row of [math]\displaystyle{ P^{(t)} }[/math] is 1, since

[math]\displaystyle{ \begin{align}\sum_{j}P^{(t+1)}_{i,j}=\sum_{j}\Pr[X_{t+1}=j\mid X_t=i]=1\end{align} }[/math].

In linear algebra, matrices of this type are called stochastic matrices.

Let [math]\displaystyle{ \pi^{(t)} }[/math] be the distribution of the chain at time [math]\displaystyle{ t }[/math], that is, [math]\displaystyle{ \begin{align}\pi^{(t)}_i=\Pr[X_t=i]\end{align} }[/math]. For a finite chain, [math]\displaystyle{ \pi^{(t)} }[/math] is a vector of [math]\displaystyle{ N }[/math] nonnegative entries such that [math]\displaystyle{ \begin{align}\sum_{i}\pi^{(t)}_i=1\end{align} }[/math]. In linear algebra, vectors of this type are called stochastic vectors. Then, it holds that

[math]\displaystyle{ \begin{align}\pi^{(t+1)}=\pi^{(t)}P^{(t+1)}\end{align} }[/math].

To see this, we apply the law of total probability,

[math]\displaystyle{ \begin{align} \pi^{(t+1)}_j &= \Pr[X_{t+1}=j]\\ &= \sum_{i}\Pr[X_{t+1}=j\mid X_t=i]\Pr[X_t=i]\\ &=\sum_{i}\pi^{(t)}_iP^{(t+1)}_{i,j}\\ &=(\pi^{(t)}P^{(t+1)})_j. \end{align} }[/math]

Therefore, a finite Markov chain [math]\displaystyle{ X_0,X_1,\ldots }[/math] is specified by an initial distribution [math]\displaystyle{ \pi^{(0)} }[/math] and a sequence of transition matrices [math]\displaystyle{ P^{(1)},P^{(2)},\ldots }[/math]. And the transitions of chain can be described by a series of matrix products:

[math]\displaystyle{ \pi^{(0)}\stackrel{P^{(1)}}{\longrightarrow}\pi^{(1)}\stackrel{P^{(2)}}{\longrightarrow}\pi^{(2)}\stackrel{P^{(3)}}{\longrightarrow}\cdots\cdots\pi^{(t)}\stackrel{P^{(t+1)}}{\longrightarrow}\pi^{(t+1)}\cdots }[/math]
Homogeneity

A Markov chain is said to be homogenous if the transitions depend only on the current states but not on the time, that is

[math]\displaystyle{ P^{(t)}_{i,j}=P_{i,j} }[/math] for all [math]\displaystyle{ t }[/math].

The transitions of a homogenous Markov chain is given by a single matrix [math]\displaystyle{ P }[/math]. Suppose that [math]\displaystyle{ \pi^{(0)} }[/math] is the initial distribution. At each time [math]\displaystyle{ t }[/math],

[math]\displaystyle{ \begin{align}\pi^{(t+1)}=\pi^{(t)}P\end{align} }[/math].

Expanding this recursion, we have

[math]\displaystyle{ \begin{align}\pi^{(n)}=\pi^{(0)}P^n\end{align} }[/math].

From now on, we restrict ourselves to the homogenous Markov chains, and the term "Markov chain" means "homogenous Markov chian" unless stated otherwise.

Definition (finite Markov chain)
Let [math]\displaystyle{ P }[/math] be an [math]\displaystyle{ N\times N }[/math] stochastic matrix. A process [math]\displaystyle{ X_0,X_1,\ldots }[/math] with finite space [math]\displaystyle{ \mathcal{S}=[N] }[/math] is said to be a (homogenous) Markov chain with transition matrix [math]\displaystyle{ P }[/math], if for all [math]\displaystyle{ n\ge0, }[/math] all [math]\displaystyle{ i,j\in[N] }[/math] and all [math]\displaystyle{ i_0,\ldots,i_{n-1}\in[N] }[/math] we have
[math]\displaystyle{ \begin{align} \Pr[X_{n+1}=j\mid X_0=i_0,\ldots,X_{n-1}=i_{n-1},X_n=i] &=Pr[X_{n+1}=j\mid X_n=i]\\ &=P_{i,j}. \end{align} }[/math]

To specify a Markov chain, we only need to give:

  • initial distribution [math]\displaystyle{ \pi^{(0)} }[/math];
  • transition matrix [math]\displaystyle{ P }[/math].

Then the transitions can be simulated by matrix products:

[math]\displaystyle{ \pi^{(0)}\stackrel{P}{\longrightarrow}\pi^{(1)}\stackrel{P}{\longrightarrow}\pi^{(2)}\stackrel{P}{\longrightarrow}\cdots\cdots\pi^{(t)}\stackrel{P}{\longrightarrow}\pi^{(t+1)}\stackrel{P}{\longrightarrow}\cdots }[/math]

The distribution of the chain at time [math]\displaystyle{ n }[/math] can be computed by [math]\displaystyle{ \pi^{(n)}=\pi^{(0)}P^n }[/math].

Another way to picture a Markov chain is by its transition graph. A weighted directed graph [math]\displaystyle{ G(V,E,w) }[/math] is said to be a transition graph of a finite Markov chain with transition matrix [math]\displaystyle{ P }[/math] if:

  • [math]\displaystyle{ V=\mathcal{S} }[/math], i.e. each node of the transition graph corresponds to a state of the Markov chain;
  • for any [math]\displaystyle{ i,j\in V }[/math], [math]\displaystyle{ (i,j)\in E }[/math] if and only if [math]\displaystyle{ P_{i,j}\gt 0 }[/math], and the weight [math]\displaystyle{ w(i,j)=P_{i,j} }[/math].

A transition graph defines a natural random walk: at each time step, at the current node, the walk moves through an adjacent edge with the probability of the weight of the edge. It is easy to see that this is a well-defined random walk, since [math]\displaystyle{ \begin{align}\sum_j P_{i,j}=1\end{align} }[/math] for every [math]\displaystyle{ i }[/math]. Therefore, a Markov chain is equivalent to a random walk, so these two terms are often used interchangeably.

Irreducibility and aperiodicity

Definition (irreducibility)
State [math]\displaystyle{ j }[/math] is accessible from state [math]\displaystyle{ i }[/math] if it is possible for the chain to visit state [math]\displaystyle{ j }[/math] if the chain starts in state [math]\displaystyle{ i }[/math], or, in other words,
[math]\displaystyle{ \begin{align}P^n(i,j)\gt 0\end{align} }[/math]
for some integer [math]\displaystyle{ n\ge 0 }[/math]. State [math]\displaystyle{ i }[/math] communicates with state [math]\displaystyle{ j }[/math] if [math]\displaystyle{ j }[/math] is accessible from [math]\displaystyle{ i }[/math] and [math]\displaystyle{ i }[/math] is accessible from [math]\displaystyle{ j }[/math].
We say that the Markov chain is irreducible if all pairs of states communicate.

It is more clear to interprete these concepts in terms of transition graphs:

  • [math]\displaystyle{ j }[/math] is accessible from [math]\displaystyle{ i }[/math] means that [math]\displaystyle{ j }[/math] is connected from [math]\displaystyle{ i }[/math] in the transition graph, i.e. there is a directed path from [math]\displaystyle{ i }[/math] to [math]\displaystyle{ j }[/math].
  • [math]\displaystyle{ i }[/math] communicates with [math]\displaystyle{ j }[/math] means that [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math] are strongly connected in the transition graph.
  • A finite Markov chain is irreducible if and only if its transition graph is strongly connected.

It is easy to see that communicating is an equivalence relation. That is, it is reflexive, symmetric, and transitive. Thus, the communicating relation partition the state space into disjoint equivalence classes, called communicating classes. For a finite Markov chain, communicating classes correspond to the strongly connected components in the transition graph. It is possible for the chain to move from one communicating class to another, but in that case it is impossible to return to the original class.


Definition (aperiodicity)
The period of a state [math]\displaystyle{ i }[/math] is the greatest common divisor (gcd)
[math]\displaystyle{ \begin{align}d_i=\gcd\{n\mid (P^n)_{i,i}\gt 0\}\end{align} }[/math].
A state is aperiodic if its period is 1. A Markov chain is aperiodic if all its states are aperiodic.

For example, suppose that the period of state [math]\displaystyle{ i }[/math] is [math]\displaystyle{ d_i=3 }[/math]. Then, starting from state [math]\displaystyle{ i }[/math],

[math]\displaystyle{ i,\bigcirc,\bigcirc,\square,\bigcirc,\bigcirc,\square,\bigcirc,\bigcirc,\square,\bigcirc,\bigcirc,\square,\cdots\cdots }[/math]

only the squares are possible to be [math]\displaystyle{ i }[/math].

In the transition graph of a finite Markov chain, [math]\displaystyle{ (P^n)_{i,i}\gt 0 }[/math] is equivalent to that [math]\displaystyle{ i }[/math] is on a cycle of length [math]\displaystyle{ n }[/math]. Period of a state [math]\displaystyle{ i }[/math] is the greatest common devisor of the lengths of cycles passing [math]\displaystyle{ i }[/math].

The next theorem shows that period is in fact a class property.

Theorem
If the states [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math] communicate, then [math]\displaystyle{ d_i=d_j }[/math].

Proof: For communicating [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math], there is a path [math]\displaystyle{ P_1 }[/math] from [math]\displaystyle{ i }[/math] to [math]\displaystyle{ j }[/math] of length [math]\displaystyle{ n_1 }[/math], and there is a path [math]\displaystyle{ P_2 }[/math] from [math]\displaystyle{ j }[/math] to [math]\displaystyle{ i }[/math] of length [math]\displaystyle{ n_2 }[/math]. Then [math]\displaystyle{ P_1P_2 }[/math] gives a cycle starting at [math]\displaystyle{ i }[/math] of length [math]\displaystyle{ n_1+n+2 }[/math], and for any cycle [math]\displaystyle{ C }[/math] starting at [math]\displaystyle{ j }[/math] of length [math]\displaystyle{ n }[/math], [math]\displaystyle{ P_1CP_2 }[/math] gives a cycle starting at [math]\displaystyle{ i }[/math] of length [math]\displaystyle{ n_1+n_2+n }[/math]. Since the period of [math]\displaystyle{ i }[/math] is [math]\displaystyle{ d_i }[/math], then both [math]\displaystyle{ (n_1+n_2) }[/math] and [math]\displaystyle{ (n_1+n_2+n) }[/math] are devisable by [math]\displaystyle{ d_i }[/math]. Subtracting the two, [math]\displaystyle{ n }[/math] is devisable by [math]\displaystyle{ d_i }[/math]. Note that this holds for arbitrary cycle [math]\displaystyle{ C }[/math] starting at [math]\displaystyle{ j }[/math], then [math]\displaystyle{ d_i }[/math] is the common divisor of all such [math]\displaystyle{ n }[/math] that [math]\displaystyle{ P^n_{j,j}\gt 0 }[/math]. Since [math]\displaystyle{ d_j }[/math] is defined to be the greatest common divisor of the same set of [math]\displaystyle{ n }[/math], it holds that [math]\displaystyle{ d_j\ge d_i }[/math]. Interchanging the role of [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math], we can show that [math]\displaystyle{ d_i\ge d_j }[/math]. Therefore [math]\displaystyle{ d_i=d_j }[/math].

[math]\displaystyle{ \square }[/math]

Due to the above theorem, an irreducible Markov chain is aperiodic if one of the states is aperiodic.

Stationary distributions

Definition (stationary distribution)
A stationary distribution of a Markov chain is a probability distribution [math]\displaystyle{ \pi }[/math] such that
[math]\displaystyle{ \begin{align}\pi P=\pi\end{align} }[/math].


Theorem (The Markov chain convergence theorem)
Let [math]\displaystyle{ X_0,X_1,\ldots, }[/math] be an irreducible, aperiodic Markov chain with finite state space [math]\displaystyle{ \mathcal{S}=[N] }[/math], transition matrix [math]\displaystyle{ P }[/math], and arbitrary initial distribution [math]\displaystyle{ \pi^{(0)} }[/math]. Then, there exists a unique stationary distribution [math]\displaystyle{ \pi }[/math] such that [math]\displaystyle{ \pi P=\pi }[/math], and
[math]\displaystyle{ \lim_{n\rightarrow\infty}\pi^{(n)}(i)=\pi(i) }[/math]
for all states [math]\displaystyle{ i\in\mathcal{S} }[/math].

There are three pieces of information delivered by the theorem:

  • Existence: there exists a stationary distribution.
  • Uniqueness: the stationary distribution is unique.
  • Convergence: starting from any initial distribution, the chain converges to the stationary distribution.



Infinite chains*

The following theorem holds for both finite and infinite chains.

Theorem (Basic limit theorem)
Let [math]\displaystyle{ X_0,X_1,\ldots, }[/math] be an irreducible, aperiodic Markov chain having a stationary distribution [math]\displaystyle{ \pi }[/math]. Let [math]\displaystyle{ X_0 }[/math] have the distribution [math]\displaystyle{ \pi_0 }[/math], an arbitrary initial distribution. Then
[math]\displaystyle{ \lim_{n\rightarrow\infty}\pi_n(i)=\pi(i) }[/math]
for all states [math]\displaystyle{ i }[/math].

Note that the theorem states that "...having a stationary distribution..."

Recurrence and Ergodicity

Definition (recurrence)
A state [math]\displaystyle{ i }[/math] is recurrent if [math]\displaystyle{ \Pr[T_i\lt \infty\mid X_0=i]=1 }[/math]. If [math]\displaystyle{ i }[/math] is not recurrent, it is called transient.
A recurrent state [math]\displaystyle{ i }[/math] is null recurrent if [math]\displaystyle{ h_{i,i}=\infty }[/math]. Otherwise, it is positive recurrent.


Definition (ergodicity)
An aperiodic, positive recurrent state is an ergodic state. A Markov chain is ergodic if all its states are ergodic.

Reversibility

Random Walks on Graphs

Reversibility

Hitting and covering

Mixing