随机算法 (Fall 2011)/Random Walks on Undirected Graphs: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
 
Line 63: Line 63:
* '''Cover time''': It takes how long for a random walk to visit all vertices.
* '''Cover time''': It takes how long for a random walk to visit all vertices.
* '''Mixing time''': It takes how long for a random walk to be close enough to the stationary distribution.
* '''Mixing time''': It takes how long for a random walk to be close enough to the stationary distribution.
= USTCON =
USTCON stands for '''undirected <math>s</math>-<math>t</math> connectivity'''. It is the problem which asks whether there is a path from vertex <math>s</math> to vertex <math>t</math> in a given undirected graph <math>G(V,E)</math>. This problem is an abstraction of various search problems in graphs, and has theoretical significance in complexity theory.
The problem can be solved deterministically by traversing the graph <math>G(V,E)</math>, which takes <math>\Omega(n)</math> extra space to keep track of which vertices have been visited, where <math>n=|V|</math>. And the following theorem is implied by the upper bound on the cover time.
{{Theorem
|Theorem (Aleliunas-Karp-Lipton-Lovász-Rackoff 1979)|
: USTCON can be solved by a polynomial time Monte Carlo randomized algorithm with bounded one-sided error, which uses <math>O(\log n)</math> extra space.
}}
The algorithm is a random walk starting at <math>s</math>. If the walk reaches <math>t</math> in <math>4n^3</math> steps, then return "yes", otherwise return "no".
It is obvious that if <math>s</math> and <math>t</math> are disconnected, the random walk from <math>s</math> can never reach <math>t</math>, thus the algorithm always returns "no".
We know that for an undirected <math>G</math>, the cover time is <math><4nm<2n^2</math>. So if <math>s</math> and <math>t</math> are connected, the expected time to reach <math>t</math> from <math>s</math> is <math><2n^3</math>. By Markov's inequality, the probability that it takes longer than <math>4n^3</math> steps to reach <math>t</math> from <math>s</math> is <math><1/2</math>.
The random walk use <math>O(\log n)</math> bits to store the current position, and another <math>O(\log n)</math> bits to count the number of steps. So the total space used by the algorithm inaddition to the input is <math>O(\log n)</math>.
This shows that USTCON is in the complexity class [http://qwiki.stanford.edu/wiki/Complexity_Zoo:R#rl RL] (randomized log-space).
;Story in complexity theory
If the randomness if forbidden, it is known that USTCON can be solved nondeterministically in logarithmic space, thus USTCON is in [http://qwiki.stanford.edu/wiki/Complexity_Zoo:N#nl NL]. In fact USTCON is complete for the symmetric version of nondeterministic log-space. That is, every problem in the class of [http://qwiki.stanford.edu/wiki/Complexity_Zoo:S#sl SL] can be solved by USTCON via log-space reductions. Therefore, USTCON<math>\in</math>RL implies that SL<math>\subseteq</math>RL.
In 2004, Reingold shows that USTCON can be solved deterministically in log-space, which proves SL=L. The deterministic algorithm for USTCON is by the derandomization of random walk.
It is conjectured that RL=L, but this is still open.

Latest revision as of 15:06, 19 July 2011

A walk on a graph [math]\displaystyle{ G = (V,E) }[/math] is a sequence of vertices [math]\displaystyle{ v_1, v_2, \ldots \in V }[/math] such that [math]\displaystyle{ v_{i+1} }[/math] is a neighbor of [math]\displaystyle{ v_i }[/math] for every index [math]\displaystyle{ i }[/math]. When [math]\displaystyle{ v_{i+1} }[/math] is selected uniformly at random from among [math]\displaystyle{ v_i }[/math]’s neighbors, independently for every [math]\displaystyle{ i }[/math], this is called a random walk on [math]\displaystyle{ G }[/math].

We consider the special case that [math]\displaystyle{ G(V,E) }[/math] is an undirected graph, and denote that [math]\displaystyle{ n=|V| }[/math] and [math]\displaystyle{ m=|E| }[/math].

A Markov chain is defined by this random walk, with the vertex set [math]\displaystyle{ V }[/math] as the state space, and the transition matrix [math]\displaystyle{ P }[/math], which is defined as follows:

[math]\displaystyle{ P(u,v)=\begin{cases} \frac{1}{d(u)}&\mbox{if }(u,v)\in E,\\ 0 & \mbox{otherwise }, \end{cases} }[/math]

where [math]\displaystyle{ d(u) }[/math] denotes the degree of vertex [math]\displaystyle{ u }[/math].

Note that unlike the PageRank example, now the probability depends on [math]\displaystyle{ d(u) }[/math] instead of [math]\displaystyle{ d_+(u) }[/math]. This is because the graph is undirected.

Proposition
Let [math]\displaystyle{ M_G }[/math] be the Markov chain defined as above.
  • [math]\displaystyle{ M_G }[/math] is irreducible if and only if [math]\displaystyle{ G }[/math] is connected.
  • [math]\displaystyle{ M_G }[/math] is aperiodic if and only if [math]\displaystyle{ G }[/math] is non-bipartite.

We leave the proof as an exercise.

We can just assume [math]\displaystyle{ G }[/math] is connected, so we do not have to worry about the reducibility any more.

The periodicity of a random walk on a undirected bipartite graph is usually dealt with by the following trick of "lazy" random walk.

Lazy random walk
Given an undirected graph [math]\displaystyle{ G(V,E) }[/math], a random walk is defined by the transition matrix
[math]\displaystyle{ P'(u,v)=\begin{cases} \frac{1}{2} & \mbox{if }u=v,\\ \frac{1}{2d(u)}&\mbox{if }(u,v)\in E,\\ 0 & \mbox{otherwise }. \end{cases} }[/math]
For this random walk, at each step, we first flip a fair coin to decide whether to move or to stay, and if the coin told us to move, we pick a uniform edge and move to the adjacent vertex. It is easy to see that the resulting Markov chain is aperiodic for any [math]\displaystyle{ G }[/math].

We consider the non-lazy version of random walk. We observe that the random walk has the following stationary distribution.

Theorem
The random walk on [math]\displaystyle{ G(V,E) }[/math] with [math]\displaystyle{ |E|=m }[/math] has a stationary distribution [math]\displaystyle{ \pi }[/math], where [math]\displaystyle{ \forall v\in V }[/math],
[math]\displaystyle{ \begin{align} \pi_v=\frac{d(v)}{2m}\end{align} }[/math] for [math]\displaystyle{ v\in V }[/math].
Proof.
First, since [math]\displaystyle{ \sum_{v\in V}d(v)=2m }[/math], it follows that
[math]\displaystyle{ \sum_{v\in V}\pi_v=\sum_{v\in V}\frac{d(v)}{2m}=1, }[/math]

thus [math]\displaystyle{ \pi }[/math] is a well-defined distribution. And let [math]\displaystyle{ N(v) }[/math] denote the set of neighbors of [math]\displaystyle{ v }[/math]. Then for any [math]\displaystyle{ v\in V }[/math],

[math]\displaystyle{ (\pi P)_v=\sum_{u\in V}\pi_uP(u,v)=\sum_{u\in N(v)}\frac{d(u)}{2m}\frac{1}{d(u)}=\frac{1}{2m}\sum_{u\in N(v)}1=\frac{d(v)}{2m}=\pi_v. }[/math]

Thus [math]\displaystyle{ \pi P=\pi }[/math], and [math]\displaystyle{ \pi }[/math] is stationary.

[math]\displaystyle{ \square }[/math]

For connected and non-bipartite [math]\displaystyle{ G }[/math], the random walk converges to this stationary distribution. Note that the stationary distribution is proportional to the degrees of the vertices, therefore, if [math]\displaystyle{ G }[/math] is a regular graph, that is, [math]\displaystyle{ d(v) }[/math] is the same for all [math]\displaystyle{ v\in V }[/math], the stationary distribution is the uniform distribution.

The following parameters of random walks are closely related to the performances of randomized algorithms based on random walks:

  • Hitting time: It takes how long for a random walk to visit some specific vertex.
  • Cover time: It takes how long for a random walk to visit all vertices.
  • Mixing time: It takes how long for a random walk to be close enough to the stationary distribution.