随机算法 (Fall 2011)/Random Walks on Undirected Graphs

From TCS Wiki
Jump to navigation Jump to search

A walk on a graph [math]\displaystyle{ G = (V,E) }[/math] is a sequence of vertices [math]\displaystyle{ v_1, v_2, \ldots \in V }[/math] such that [math]\displaystyle{ v_{i+1} }[/math] is a neighbor of [math]\displaystyle{ v_i }[/math] for every index [math]\displaystyle{ i }[/math]. When [math]\displaystyle{ v_{i+1} }[/math] is selected uniformly at random from among [math]\displaystyle{ v_i }[/math]’s neighbors, independently for every [math]\displaystyle{ i }[/math], this is called a random walk on [math]\displaystyle{ G }[/math].

We consider the special case that [math]\displaystyle{ G(V,E) }[/math] is an undirected graph, and denote that [math]\displaystyle{ n=|V| }[/math] and [math]\displaystyle{ m=|E| }[/math].

A Markov chain is defined by this random walk, with the vertex set [math]\displaystyle{ V }[/math] as the state space, and the transition matrix [math]\displaystyle{ P }[/math], which is defined as follows:

[math]\displaystyle{ P(u,v)=\begin{cases} \frac{1}{d(u)}&\mbox{if }(u,v)\in E,\\ 0 & \mbox{otherwise }, \end{cases} }[/math]

where [math]\displaystyle{ d(u) }[/math] denotes the degree of vertex [math]\displaystyle{ u }[/math].

Note that unlike the PageRank example, now the probability depends on [math]\displaystyle{ d(u) }[/math] instead of [math]\displaystyle{ d_+(u) }[/math]. This is because the graph is undirected.

Proposition
Let [math]\displaystyle{ M_G }[/math] be the Markov chain defined as above.
  • [math]\displaystyle{ M_G }[/math] is irreducible if and only if [math]\displaystyle{ G }[/math] is connected.
  • [math]\displaystyle{ M_G }[/math] is aperiodic if and only if [math]\displaystyle{ G }[/math] is non-bipartite.

We leave the proof as an exercise.

We can just assume [math]\displaystyle{ G }[/math] is connected, so we do not have to worry about the reducibility any more.

The periodicity of a random walk on a undirected bipartite graph is usually dealt with by the following trick of "lazy" random walk.

Lazy random walk
Given an undirected graph [math]\displaystyle{ G(V,E) }[/math], a random walk is defined by the transition matrix
[math]\displaystyle{ P'(u,v)=\begin{cases} \frac{1}{2} & \mbox{if }u=v,\\ \frac{1}{2d(u)}&\mbox{if }(u,v)\in E,\\ 0 & \mbox{otherwise }. \end{cases} }[/math]
For this random walk, at each step, we first flip a fair coin to decide whether to move or to stay, and if the coin told us to move, we pick a uniform edge and move to the adjacent vertex. It is easy to see that the resulting Markov chain is aperiodic for any [math]\displaystyle{ G }[/math].

We consider the non-lazy version of random walk. We observe that the random walk has the following stationary distribution.

Theorem
The random walk on [math]\displaystyle{ G(V,E) }[/math] with [math]\displaystyle{ |E|=m }[/math] has a stationary distribution [math]\displaystyle{ \pi }[/math], where [math]\displaystyle{ \forall v\in V }[/math],
[math]\displaystyle{ \begin{align} \pi_v=\frac{d(v)}{2m}\end{align} }[/math] for [math]\displaystyle{ v\in V }[/math].
Proof.
First, since [math]\displaystyle{ \sum_{v\in V}d(v)=2m }[/math], it follows that
[math]\displaystyle{ \sum_{v\in V}\pi_v=\sum_{v\in V}\frac{d(v)}{2m}=1, }[/math]

thus [math]\displaystyle{ \pi }[/math] is a well-defined distribution. And let [math]\displaystyle{ N(v) }[/math] denote the set of neighbors of [math]\displaystyle{ v }[/math]. Then for any [math]\displaystyle{ v\in V }[/math],

[math]\displaystyle{ (\pi P)_v=\sum_{u\in V}\pi_uP(u,v)=\sum_{u\in N(v)}\frac{d(u)}{2m}\frac{1}{d(u)}=\frac{1}{2m}\sum_{u\in N(v)}1=\frac{d(v)}{2m}=\pi_v. }[/math]

Thus [math]\displaystyle{ \pi P=\pi }[/math], and [math]\displaystyle{ \pi }[/math] is stationary.

[math]\displaystyle{ \square }[/math]

For connected and non-bipartite [math]\displaystyle{ G }[/math], the random walk converges to this stationary distribution. Note that the stationary distribution is proportional to the degrees of the vertices, therefore, if [math]\displaystyle{ G }[/math] is a regular graph, that is, [math]\displaystyle{ d(v) }[/math] is the same for all [math]\displaystyle{ v\in V }[/math], the stationary distribution is the uniform distribution.

The following parameters of random walks are closely related to the performances of randomized algorithms based on random walks:

  • Hitting time: It takes how long for a random walk to visit some specific vertex.
  • Cover time: It takes how long for a random walk to visit all vertices.
  • Mixing time: It takes how long for a random walk to be close enough to the stationary distribution.

USTCON

USTCON stands for undirected [math]\displaystyle{ s }[/math]-[math]\displaystyle{ t }[/math] connectivity. It is the problem which asks whether there is a path from vertex [math]\displaystyle{ s }[/math] to vertex [math]\displaystyle{ t }[/math] in a given undirected graph [math]\displaystyle{ G(V,E) }[/math]. This problem is an abstraction of various search problems in graphs, and has theoretical significance in complexity theory.

The problem can be solved deterministically by traversing the graph [math]\displaystyle{ G(V,E) }[/math], which takes [math]\displaystyle{ \Omega(n) }[/math] extra space to keep track of which vertices have been visited, where [math]\displaystyle{ n=|V| }[/math]. And the following theorem is implied by the upper bound on the cover time.

Theorem (Aleliunas-Karp-Lipton-Lovász-Rackoff 1979)
USTCON can be solved by a polynomial time Monte Carlo randomized algorithm with bounded one-sided error, which uses [math]\displaystyle{ O(\log n) }[/math] extra space.

The algorithm is a random walk starting at [math]\displaystyle{ s }[/math]. If the walk reaches [math]\displaystyle{ t }[/math] in [math]\displaystyle{ 4n^3 }[/math] steps, then return "yes", otherwise return "no".

It is obvious that if [math]\displaystyle{ s }[/math] and [math]\displaystyle{ t }[/math] are disconnected, the random walk from [math]\displaystyle{ s }[/math] can never reach [math]\displaystyle{ t }[/math], thus the algorithm always returns "no".

We know that for an undirected [math]\displaystyle{ G }[/math], the cover time is [math]\displaystyle{ \lt 4nm\lt 2n^2 }[/math]. So if [math]\displaystyle{ s }[/math] and [math]\displaystyle{ t }[/math] are connected, the expected time to reach [math]\displaystyle{ t }[/math] from [math]\displaystyle{ s }[/math] is [math]\displaystyle{ \lt 2n^3 }[/math]. By Markov's inequality, the probability that it takes longer than [math]\displaystyle{ 4n^3 }[/math] steps to reach [math]\displaystyle{ t }[/math] from [math]\displaystyle{ s }[/math] is [math]\displaystyle{ \lt 1/2 }[/math].

The random walk use [math]\displaystyle{ O(\log n) }[/math] bits to store the current position, and another [math]\displaystyle{ O(\log n) }[/math] bits to count the number of steps. So the total space used by the algorithm inaddition to the input is [math]\displaystyle{ O(\log n) }[/math].

This shows that USTCON is in the complexity class RL (randomized log-space).

Story in complexity theory

If the randomness if forbidden, it is known that USTCON can be solved nondeterministically in logarithmic space, thus USTCON is in NL. In fact USTCON is complete for the symmetric version of nondeterministic log-space. That is, every problem in the class of SL can be solved by USTCON via log-space reductions. Therefore, USTCON[math]\displaystyle{ \in }[/math]RL implies that SL[math]\displaystyle{ \subseteq }[/math]RL.

In 2004, Reingold shows that USTCON can be solved deterministically in log-space, which proves SL=L. The deterministic algorithm for USTCON is by the derandomization of random walk.

It is conjectured that RL=L, but this is still open.