Randomized Algorithms (Spring 2010)/Markov chains and random walks
Markov Chains
The Markov property and transition matrices
A stochastic processes [math]\displaystyle{ \{X_t\mid t\in T\} }[/math] is a collection of random variables. The index [math]\displaystyle{ t }[/math] is often called time, as the process represents the value of a random variable changing over time. Let [math]\displaystyle{ \mathcal{S} }[/math] be the set of values assumed by the random variables [math]\displaystyle{ X_t }[/math]. We call each element of [math]\displaystyle{ \mathcal{S} }[/math] a state, as [math]\displaystyle{ X_t }[/math] represents the state of the process at time [math]\displaystyle{ t }[/math].
The model of stochastic processes can be very general. In this class, we only consider the stochastic processes with the following properties:
- discrete time
- The index set [math]\displaystyle{ T }[/math] is countable. Specifically, we assume the process is [math]\displaystyle{ X_0,X_1,X_2,\ldots }[/math]
- discrete space
- The state space [math]\displaystyle{ \mathcal{S} }[/math] is countable. We are especially interested in the case that [math]\displaystyle{ \mathcal{S} }[/math] is finite, in which case the process is called a finite process.
The next property is about the dependency structure among random variables. The simplest dependency structure for [math]\displaystyle{ X_0,X_1,\ldots }[/math] is no dependency at all, that is, independence. We consider the next simplest dependency structure called the Markov property.
Definition (the Markov property)
|
Informally, the Markov property means: "conditioning on the present, the future does not depend on the past." Hence, the Markov property is also called the memoryless property.
A stochastic process [math]\displaystyle{ X_0,X_1,\ldots }[/math] of discrete time and discrete space is a Markov chain if it has the Markov property.
Let [math]\displaystyle{ P^{(t+1)}_{i,j}=\Pr[X_{t+1}=j\mid X_t=i] }[/math]. For a Markov chain with a finite state space [math]\displaystyle{ \mathcal{S}=[N] }[/math]. This gives us a transition matrix [math]\displaystyle{ P^{(t+1)} }[/math] at time [math]\displaystyle{ t }[/math]. The transition matrix is an [math]\displaystyle{ N\times N }[/math] matrix of nonnegative entries such that the sum over each row of [math]\displaystyle{ P^{(t)} }[/math] is 1, since
- [math]\displaystyle{ \begin{align}\sum_{j}P^{(t+1)}_{i,j}=\sum_{j}\Pr[X_{t+1}=j\mid X_t=i]=1\end{align} }[/math].
In linear algebra, matrices of this type are called stochastic matrices.
Let [math]\displaystyle{ \pi^{(t)} }[/math] be the distribution of the chain at time [math]\displaystyle{ t }[/math], that is, [math]\displaystyle{ \begin{align}\pi^{(t)}_i=\Pr[X_t=i]\end{align} }[/math]. For a finite chain, [math]\displaystyle{ \pi^{(t)} }[/math] is a vector of [math]\displaystyle{ N }[/math] nonnegative entries such that [math]\displaystyle{ \begin{align}\sum_{i}\pi^{(t)}_i=1\end{align} }[/math]. In linear algebra, vectors of this type are called stochastic vectors. Then, it holds that
- [math]\displaystyle{ \begin{align}\pi^{(t+1)}=\pi^{(t)}P^{(t+1)}\end{align} }[/math].
To see this, we apply the law of total probability,
- [math]\displaystyle{ \begin{align} \pi^{(t+1)}_j &= \Pr[X_{t+1}=j]\\ &= \sum_{i}\Pr[X_{t+1}=j\mid X_t=i]\Pr[X_t=i]\\ &=\sum_{i}\pi^{(t)}_iP^{(t+1)}_{i,j}\\ &=(\pi^{(t)}P^{(t+1)})_j. \end{align} }[/math]
Therefore, a finite Markov chain [math]\displaystyle{ X_0,X_1,\ldots }[/math] is specified by an initial distribution [math]\displaystyle{ \pi^{(0)} }[/math] and a sequence of transition matrices [math]\displaystyle{ P^{(1)},P^{(2)},\ldots }[/math]. And the transitions of chain can be described by a series of matrix products:
- [math]\displaystyle{ \pi^{(0)}\stackrel{P^{(1)}}{\longrightarrow}\pi^{(1)}\stackrel{P^{(2)}}{\longrightarrow}\pi^{(2)}\stackrel{P^{(3)}}{\longrightarrow}\cdots\cdots\pi^{(t)}\stackrel{P^{(t+1)}}{\longrightarrow}\pi^{(t+1)}\cdots }[/math]
- Homogeneity
A Markov chain is said to be homogenous if the transitions depend only on the current states but not on the time, that is
- [math]\displaystyle{ P^{(t)}_{i,j}=P_{i,j} }[/math] for all [math]\displaystyle{ t }[/math].
The transitions of a homogenous Markov chain is given by a single matrix [math]\displaystyle{ P }[/math]. Suppose that [math]\displaystyle{ \pi^{(0)} }[/math] is the initial distribution. At each time [math]\displaystyle{ t }[/math],
- [math]\displaystyle{ \begin{align}\pi^{(t+1)}=\pi^{(t)}P\end{align} }[/math].
Expanding this recursion, we have
- [math]\displaystyle{ \begin{align}\pi^{(n)}=\pi^{(0)}P^n\end{align} }[/math].
From now on, we restrict ourselves to the homogenous Markov chains, and the term "Markov chain" means "homogenous Markov chian" unless stated otherwise.
Definition (finite Markov chain)
|
To specify a Markov chain, we only need to give:
- initial distribution [math]\displaystyle{ \pi^{(0)} }[/math];
- transition matrix [math]\displaystyle{ P }[/math].
Then the transitions can be simulated by matrix products:
- [math]\displaystyle{ \pi^{(0)}\stackrel{P}{\longrightarrow}\pi^{(1)}\stackrel{P}{\longrightarrow}\pi^{(2)}\stackrel{P}{\longrightarrow}\cdots\cdots\pi^{(t)}\stackrel{P}{\longrightarrow}\pi^{(t+1)}\stackrel{P}{\longrightarrow}\cdots }[/math]
The distribution of the chain at time [math]\displaystyle{ n }[/math] can be computed by [math]\displaystyle{ \pi^{(n)}=\pi^{(0)}P^n }[/math].
Another way to picture a Markov chain is by its transition graph. A weighted directed graph [math]\displaystyle{ G(V,E,w) }[/math] is said to be a transition graph of a finite Markov chain with transition matrix [math]\displaystyle{ P }[/math] if:
- [math]\displaystyle{ V=\mathcal{S} }[/math], i.e. each node of the transition graph corresponds to a state of the Markov chain;
- for any [math]\displaystyle{ i,j\in V }[/math], [math]\displaystyle{ (i,j)\in E }[/math] if and only if [math]\displaystyle{ P_{i,j}\gt 0 }[/math], and the weight [math]\displaystyle{ w(i,j)=P_{i,j} }[/math].
A transition graph defines a natural random walk: at each time step, at the current node, the walk moves through an adjacent edge with the probability of the weight of the edge. It is easy to see that this is a well-defined random walk, since [math]\displaystyle{ \sum_j\P_{i,j}=1 }[/math] for every [math]\displaystyle{ i }[/math]. Therefore, a finite Markov chain is equivalent to the random walk on its transition graph.
Irreducibility and aperiodicity
Definition (irreducibility)
|
Definition (aperiodicity)
|
Stationary distributions
Definition (stationary distribution)
|
Theorem (Basic limit theorem)
|
Recurrence and Ergodicity*
Definition (recurrence)
|
Definition (ergodicity)
|