Randomized Algorithms (Spring 2010)/Markov chains and random walks

From TCS Wiki
Revision as of 06:13, 28 April 2010 by imported>WikiSysop (→‎The Markov property and transition matrices)
Jump to navigation Jump to search

Markov Chains

The Markov property and transition matrices

A stochastic processes [math]\displaystyle{ \{X_t\mid t\in T\} }[/math] is a collection of random variables. The index [math]\displaystyle{ t }[/math] is often called time, as the process represents the value of a random variable changing over time. Let [math]\displaystyle{ \mathcal{S} }[/math] be the set of values assumed by the random variables [math]\displaystyle{ X_t }[/math]. We call each element of [math]\displaystyle{ \mathcal{S} }[/math] a state, as [math]\displaystyle{ X_t }[/math] represents the state of the process at time [math]\displaystyle{ t }[/math].

The model of stochastic processes can be very general. In this class, we only consider the stochastic processes with the following properties:

discrete time
The index set [math]\displaystyle{ T }[/math] is countable. Specifically, we assume the process is [math]\displaystyle{ X_0,X_1,X_2,\ldots }[/math]
discrete space
The state space [math]\displaystyle{ \mathcal{S} }[/math] is countable. We are especially interested in the case that [math]\displaystyle{ \mathcal{S} }[/math] is finite, in which case the process is called a finite process.

The next property is about the dependency structure among random variables. The simplest dependency structure for [math]\displaystyle{ X_0,X_1,\ldots }[/math] is no dependency at all, that is, independence. We consider the next simplest dependency structure called the Markov property.

Definition (the Markov property)
A process [math]\displaystyle{ X_0,X_1,\ldots }[/math] satisfies the Markov property if
[math]\displaystyle{ \Pr[X_{n+1}=x_{n+1}\mid X_{0}=x_{0}, X_{1}=x_{1},\ldots,X_{n}=x_{n}]=\Pr[X_{n+1}=x_{n+1}\mid X_{n}=x_{n}] }[/math]
for all [math]\displaystyle{ n }[/math] and all [math]\displaystyle{ x_0,\ldots,x_{n+1}\in \mathcal{S} }[/math].

Informally, the Markov property means: "conditioning on the present, the future does not depend on the past." Hence, the Markov property is also called the memoryless property.

A stochastic process [math]\displaystyle{ X_0,X_1,\ldots }[/math] of discrete time and discrete space is a Markov chain if it has the Markov property.

Let [math]\displaystyle{ P^{(t+1)}_{i,j}=\Pr[X_{t+1}=j\mid X_t=i] }[/math]. For a Markov chain with a finite state space [math]\displaystyle{ \mathcal{S}=[N] }[/math]. This gives us a transition matrix [math]\displaystyle{ P^{(t+1)} }[/math] at time [math]\displaystyle{ t }[/math]. The transition matrix is an [math]\displaystyle{ N\times N }[/math] matrix of nonnegative entries such that the sum over each row of [math]\displaystyle{ P^{(t)} }[/math] is 1, since

[math]\displaystyle{ \begin{align}\sum_{j}P^{(t+1)}_{i,j}=\sum_{j}\Pr[X_{t+1}=j\mid X_t=i]=1\end{align} }[/math].

In linear algebra, matrices of this type are called stochastic matrices.

Let [math]\displaystyle{ \pi^{(t)} }[/math] be the distribution of the chain at time [math]\displaystyle{ t }[/math], that is, [math]\displaystyle{ \begin{align}\pi^{(t)}_i=\Pr[X_t=i]\end{align} }[/math]. For a finite chain, [math]\displaystyle{ \pi^{(t)} }[/math] is a vector of [math]\displaystyle{ N }[/math] nonnegative entries such that [math]\displaystyle{ \begin{align}\sum_{i}\pi^{(t)}_i=1\end{align} }[/math]. In linear algebra, vectors of this type are called stochastic vectors. Then, it holds that

[math]\displaystyle{ \begin{align}\pi^{(t+1)}=\pi^{(t)}P^{(t+1)}\end{align} }[/math].

To see this, we apply the law of total probability,

[math]\displaystyle{ \begin{align} \pi^{(t+1)}_j &= \Pr[X_{t+1}=j]\\ &= \sum_{i}\Pr[X_{t+1}=j\mid X_t=i]\Pr[X_t=i]\\ &=\sum_{i}\pi^{(t)}_iP^{(t+1)}_{i,j}\\ &=(\pi^{(t)}P^{(t+1)})_j. \end{align} }[/math]

Therefore, a finite Markov chain [math]\displaystyle{ X_0,X_1,\ldots }[/math] is specified by an initial distribution [math]\displaystyle{ \pi^{(0)} }[/math] and a sequence of transition matrices [math]\displaystyle{ P^{(1)},P^{(2)},\ldots }[/math]. And the transitions of chain can be described by a series of matrix products:

[math]\displaystyle{ \pi^{(0)}\stackrel{P^{(1)}}{\longrightarrow}\pi^{(1)}\stackrel{P^{(2)}}{\longrightarrow}\pi^{(2)}\stackrel{P^{(3)}}{\longrightarrow}\cdots\cdots\pi^{(t)}\stackrel{P^{(t+1)}}{\longrightarrow}\pi^{(t+1)}\cdots }[/math]
Homogeneity

A Markov chain is said to be homogenous if the transitions depend only on the current states but not on the time, that is

[math]\displaystyle{ P^{(t)}_{i,j}=P_{i,j} }[/math] for all [math]\displaystyle{ t }[/math].

The transitions of a homogenous Markov chain is given by a single matrix [math]\displaystyle{ P }[/math]. Suppose that [math]\displaystyle{ \pi^{(0)} }[/math] is the initial distribution. At each time [math]\displaystyle{ t }[/math],

[math]\displaystyle{ \begin{align}\pi^{(t+1)}=\pi^{(t)}P\end{align} }[/math].

Expanding this recursion, we have

[math]\displaystyle{ \begin{align}\pi^{(n)}=\pi^{(0)}P^n\end{align} }[/math].

From now on, we restrict ourselves to the homogenous Markov chains, and the term "Markov chain" means "homogenous Markov chian" unless stated otherwise.

Definition (finite Markov chain)
Let [math]\displaystyle{ P }[/math] be an [math]\displaystyle{ N\times N }[/math] stochastic matrix. A process [math]\displaystyle{ X_0,X_1,\ldots }[/math] with finite space [math]\displaystyle{ \mathcal{S}=[N] }[/math] is said to be a (homogenous) Markov chain with transition matrix [math]\displaystyle{ P }[/math], if for all [math]\displaystyle{ n\ge0, }[/math] all [math]\displaystyle{ i,j\in[N] }[/math] and all [math]\displaystyle{ i_0,\ldots,i_{n-1}\in[N] }[/math] we have
[math]\displaystyle{ \begin{align} \Pr[X_{n+1}=j\mid X_0=i_0,\ldots,X_{n-1}=i_{n-1},X_n=i] &=Pr[X_{n+1}=j\mid X_n=i]\\ &=P_{i,j}. \end{align} }[/math]

To specify a Markov chain, we only need to give:

  • initial distribution [math]\displaystyle{ \pi^{(0)} }[/math];
  • transition matrix [math]\displaystyle{ P }[/math].

Then the transitions can be simulated by matrix products:

[math]\displaystyle{ \pi^{(0)}\stackrel{P}{\longrightarrow}\pi^{(1)}\stackrel{P}{\longrightarrow}\pi^{(2)}\stackrel{P}{\longrightarrow}\cdots\cdots\pi^{(t)}\stackrel{P}{\longrightarrow}\pi^{(t+1)}\stackrel{P}{\longrightarrow}\cdots }[/math]

The distribution of the chain at time [math]\displaystyle{ n }[/math] can be computed by [math]\displaystyle{ \pi^{(n)}=\pi^{(0)}P^n }[/math].

Another way to picture a Markov chain is by its transition graph. A weighted directed graph [math]\displaystyle{ G(V,E,w) }[/math] is said to be a transition graph of a finite Markov chain with transition matrix [math]\displaystyle{ P }[/math] if:

  • [math]\displaystyle{ V=\mathcal{S} }[/math], i.e. each node of the transition graph corresponds to a state of the Markov chain;
  • for any [math]\displaystyle{ i,j\in V }[/math], [math]\displaystyle{ (i,j)\in E }[/math] if and only if [math]\displaystyle{ P_{i,j}\gt 0 }[/math], and the weight [math]\displaystyle{ w(i,j)=P_{i,j} }[/math].

Irreducibility and aperiodicity

Definition (irreducibility)
State [math]\displaystyle{ j }[/math] is accessible from state [math]\displaystyle{ i }[/math] if it is possible for the chain to visit state [math]\displaystyle{ j }[/math] if the chain starts in state [math]\displaystyle{ i }[/math], or, in other words,
[math]\displaystyle{ \begin{align}P^n(i,j)\gt 0\end{align} }[/math]
for some integer [math]\displaystyle{ n\ge 0 }[/math]. State [math]\displaystyle{ i }[/math] communicates with state [math]\displaystyle{ j }[/math] if [math]\displaystyle{ j }[/math] is accessible from [math]\displaystyle{ i }[/math] and [math]\displaystyle{ i }[/math] is accessible from [math]\displaystyle{ j }[/math].
We say that the Markov chain is irreducible if all pairs of states communicate.


Definition (aperiodicity)
The period of a state [math]\displaystyle{ i }[/math] is the greatest common divisor (gcd)
[math]\displaystyle{ \begin{align}d_i=\gcd\{n\mid (P^n)_{i,i}\gt 0\}\end{align} }[/math].
A state is aperiodic if its period is 1. A Markov chain is aperiodic if all its states are aperiodic.

Stationary distributions

Definition (stationary distribution)
A stationary distribution of a Markov chain is a probability distribution [math]\displaystyle{ \pi }[/math] such that
[math]\displaystyle{ \begin{align}\pi P=\pi\end{align} }[/math].


Theorem (Basic limit theorem)
Let [math]\displaystyle{ X_0,X_1,\ldots, }[/math] be an irreducible, aperiodic Markov chain having a stationary distribution [math]\displaystyle{ \pi }[/math]. Let [math]\displaystyle{ X_0 }[/math] have the distribution [math]\displaystyle{ \pi_0 }[/math], an arbitrary initial distribution. Then
[math]\displaystyle{ \lim_{n\rightarrow\infty}\pi_n(i)=\pi(i) }[/math]
for all states [math]\displaystyle{ i }[/math].

Recurrence and Ergodicity*

Definition (recurrence)
A state [math]\displaystyle{ i }[/math] is recurrent if [math]\displaystyle{ \Pr[T_i\lt \infty\mid X_0=i]=1 }[/math]. If [math]\displaystyle{ i }[/math] is not recurrent, it is called transient.
A recurrent state [math]\displaystyle{ i }[/math] is null recurrent if [math]\displaystyle{ h_{i,i}=\infty }[/math]. Otherwise, it is positive recurrent.


Definition (ergodicity)
An aperiodic, positive recurrent state is an ergodic state. A Markov chain is ergodic if all its states are ergodic.

Reversibility

Random Walks on Graphs

Reversibility

Hitting and covering

Mixing