随机算法 (Fall 2011)/Random Walks on Undirected Graphs: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>WikiSysop
Created page with 'A '''walk''' on a graph <math>G = (V,E)</math> is a sequence of vertices <math>v_1, v_2, \ldots \in V</math> such that <math>v_{i+1}</math> is a neighbor of <math>v_i</math> for …'
 
Line 63: Line 63:
* '''Cover time''': It takes how long for a random walk to visit all vertices.
* '''Cover time''': It takes how long for a random walk to visit all vertices.
* '''Mixing time''': It takes how long for a random walk to be close enough to the stationary distribution.
* '''Mixing time''': It takes how long for a random walk to be close enough to the stationary distribution.
= Hitting and covering =
For any <math>u,v\in V</math>, the '''hitting time''' <math>\tau_{u,v}</math> is the expected number of steps before vertex <math>v</math> is visited, starting from vertex <math>u</math>.
Recall that any irreducible aperiodic Markov chain with finite state space converges to the unique stationary distribution
<math>
\begin{align}
\pi_v=\frac{1}{\tau_{v,v}}.
\end{align}
</math>
Combining with what we know about the stationary distribution <math>\pi</math> of a random walk on an undirected graph <math>G(V,E)</math> where <math>|E|=m</math>, we have that for any vertex <math>v\in V</math>,
:<math>
\tau_{v,v}=\frac{1}{\pi_v}=\frac{2m}{d(v)}.
</math>
This fact can be used to estimate the hitting time between two adjacent vertices.
{{Theorem
|Lemma|
:For a random walk on an undirected graph <math>G(V,E)</math> where<math>|E|=m</math>, for any <math>u,v\in V</math> that <math>(u,v)\in E</math>, the mean hitting time <math>\begin{align}\tau_{u,v}<2m\end{align}</math>.
}}
{{Proof| The proof is by double counting.  We know that
:<math>
\tau_{v,v}=\frac{1}{\pi_v}=\frac{2m}{d(v)}.
</math>
Let <math>N(v)</math> be the set of neighbors of vertex <math>v</math>.  We run the random walk from <math>v</math> for one step, and by the law of total expectation,
:<math>
\tau_{v,v}=\sum_{w\in N(v)}\frac{1}{d(v)}(1+\tau_{w,v}).
</math>
Combining the above two equations, we have
:<math>
2m=\sum_{w\in N(v)}(1+\tau_{w,v}),
</math>
which implies that <math>\tau_{u,v}<2m</math>.
}}
Note that the lemma holds only for the adjacent <math>u</math> and <math>v</math>. With this lemma, we can prove an upper bound on the cover time.
*Let <math>C_u</math> be the expected number of steps taken by a random walk which starts at <math>u</math> to visit every vertex in <math>G</math> at least once. The '''cover time''' of <math>G</math>, denoted <math>C(G)</math> is defined as <math>C(G)=\max_{u\in V}C_u</math>.
{{Theorem
|Theorem (cover time)|
:For any connected undirected graph <math>G(V,E)</math> with <math>|V|=n</math> and <math>|E|=m</math>, the cover time <math>C(G)\le 4nm</math>
}}
{{Proof| Let <math>T</math> be an arbitrary spanning tree of <math>G</math>. There exists an Eulerian tour of <math>T</math> in which each edge is traversed once in each direction. Let <math>v_1\rightarrow v_2\rightarrow \cdots \rightarrow v_{2(n-1)}\rightarrow v_{2n-1}=v_1</math> be such a tour. Clearly the expected time to go through the vertices in the tour is an upper bound on the cover time. Hence,
:<math>
C(G)\le\sum_{i=1}^{2(n-1)}\tau_{v_{i},v_{i+1}}<2(n-1)\cdot 2m<4nm.
</math>
}}
A tighter bound (with a smaller constant factor) can be proved with a more careful analysis. Please read the textbook [MR].


= USTCON =
= USTCON =

Revision as of 15:05, 19 July 2011

A walk on a graph [math]\displaystyle{ G = (V,E) }[/math] is a sequence of vertices [math]\displaystyle{ v_1, v_2, \ldots \in V }[/math] such that [math]\displaystyle{ v_{i+1} }[/math] is a neighbor of [math]\displaystyle{ v_i }[/math] for every index [math]\displaystyle{ i }[/math]. When [math]\displaystyle{ v_{i+1} }[/math] is selected uniformly at random from among [math]\displaystyle{ v_i }[/math]’s neighbors, independently for every [math]\displaystyle{ i }[/math], this is called a random walk on [math]\displaystyle{ G }[/math].

We consider the special case that [math]\displaystyle{ G(V,E) }[/math] is an undirected graph, and denote that [math]\displaystyle{ n=|V| }[/math] and [math]\displaystyle{ m=|E| }[/math].

A Markov chain is defined by this random walk, with the vertex set [math]\displaystyle{ V }[/math] as the state space, and the transition matrix [math]\displaystyle{ P }[/math], which is defined as follows:

[math]\displaystyle{ P(u,v)=\begin{cases} \frac{1}{d(u)}&\mbox{if }(u,v)\in E,\\ 0 & \mbox{otherwise }, \end{cases} }[/math]

where [math]\displaystyle{ d(u) }[/math] denotes the degree of vertex [math]\displaystyle{ u }[/math].

Note that unlike the PageRank example, now the probability depends on [math]\displaystyle{ d(u) }[/math] instead of [math]\displaystyle{ d_+(u) }[/math]. This is because the graph is undirected.

Proposition
Let [math]\displaystyle{ M_G }[/math] be the Markov chain defined as above.
  • [math]\displaystyle{ M_G }[/math] is irreducible if and only if [math]\displaystyle{ G }[/math] is connected.
  • [math]\displaystyle{ M_G }[/math] is aperiodic if and only if [math]\displaystyle{ G }[/math] is non-bipartite.

We leave the proof as an exercise.

We can just assume [math]\displaystyle{ G }[/math] is connected, so we do not have to worry about the reducibility any more.

The periodicity of a random walk on a undirected bipartite graph is usually dealt with by the following trick of "lazy" random walk.

Lazy random walk
Given an undirected graph [math]\displaystyle{ G(V,E) }[/math], a random walk is defined by the transition matrix
[math]\displaystyle{ P'(u,v)=\begin{cases} \frac{1}{2} & \mbox{if }u=v,\\ \frac{1}{2d(u)}&\mbox{if }(u,v)\in E,\\ 0 & \mbox{otherwise }. \end{cases} }[/math]
For this random walk, at each step, we first flip a fair coin to decide whether to move or to stay, and if the coin told us to move, we pick a uniform edge and move to the adjacent vertex. It is easy to see that the resulting Markov chain is aperiodic for any [math]\displaystyle{ G }[/math].

We consider the non-lazy version of random walk. We observe that the random walk has the following stationary distribution.

Theorem
The random walk on [math]\displaystyle{ G(V,E) }[/math] with [math]\displaystyle{ |E|=m }[/math] has a stationary distribution [math]\displaystyle{ \pi }[/math], where [math]\displaystyle{ \forall v\in V }[/math],
[math]\displaystyle{ \begin{align} \pi_v=\frac{d(v)}{2m}\end{align} }[/math] for [math]\displaystyle{ v\in V }[/math].
Proof.
First, since [math]\displaystyle{ \sum_{v\in V}d(v)=2m }[/math], it follows that
[math]\displaystyle{ \sum_{v\in V}\pi_v=\sum_{v\in V}\frac{d(v)}{2m}=1, }[/math]

thus [math]\displaystyle{ \pi }[/math] is a well-defined distribution. And let [math]\displaystyle{ N(v) }[/math] denote the set of neighbors of [math]\displaystyle{ v }[/math]. Then for any [math]\displaystyle{ v\in V }[/math],

[math]\displaystyle{ (\pi P)_v=\sum_{u\in V}\pi_uP(u,v)=\sum_{u\in N(v)}\frac{d(u)}{2m}\frac{1}{d(u)}=\frac{1}{2m}\sum_{u\in N(v)}1=\frac{d(v)}{2m}=\pi_v. }[/math]

Thus [math]\displaystyle{ \pi P=\pi }[/math], and [math]\displaystyle{ \pi }[/math] is stationary.

[math]\displaystyle{ \square }[/math]

For connected and non-bipartite [math]\displaystyle{ G }[/math], the random walk converges to this stationary distribution. Note that the stationary distribution is proportional to the degrees of the vertices, therefore, if [math]\displaystyle{ G }[/math] is a regular graph, that is, [math]\displaystyle{ d(v) }[/math] is the same for all [math]\displaystyle{ v\in V }[/math], the stationary distribution is the uniform distribution.

The following parameters of random walks are closely related to the performances of randomized algorithms based on random walks:

  • Hitting time: It takes how long for a random walk to visit some specific vertex.
  • Cover time: It takes how long for a random walk to visit all vertices.
  • Mixing time: It takes how long for a random walk to be close enough to the stationary distribution.

USTCON

USTCON stands for undirected [math]\displaystyle{ s }[/math]-[math]\displaystyle{ t }[/math] connectivity. It is the problem which asks whether there is a path from vertex [math]\displaystyle{ s }[/math] to vertex [math]\displaystyle{ t }[/math] in a given undirected graph [math]\displaystyle{ G(V,E) }[/math]. This problem is an abstraction of various search problems in graphs, and has theoretical significance in complexity theory.

The problem can be solved deterministically by traversing the graph [math]\displaystyle{ G(V,E) }[/math], which takes [math]\displaystyle{ \Omega(n) }[/math] extra space to keep track of which vertices have been visited, where [math]\displaystyle{ n=|V| }[/math]. And the following theorem is implied by the upper bound on the cover time.

Theorem (Aleliunas-Karp-Lipton-Lovász-Rackoff 1979)
USTCON can be solved by a polynomial time Monte Carlo randomized algorithm with bounded one-sided error, which uses [math]\displaystyle{ O(\log n) }[/math] extra space.

The algorithm is a random walk starting at [math]\displaystyle{ s }[/math]. If the walk reaches [math]\displaystyle{ t }[/math] in [math]\displaystyle{ 4n^3 }[/math] steps, then return "yes", otherwise return "no".

It is obvious that if [math]\displaystyle{ s }[/math] and [math]\displaystyle{ t }[/math] are disconnected, the random walk from [math]\displaystyle{ s }[/math] can never reach [math]\displaystyle{ t }[/math], thus the algorithm always returns "no".

We know that for an undirected [math]\displaystyle{ G }[/math], the cover time is [math]\displaystyle{ \lt 4nm\lt 2n^2 }[/math]. So if [math]\displaystyle{ s }[/math] and [math]\displaystyle{ t }[/math] are connected, the expected time to reach [math]\displaystyle{ t }[/math] from [math]\displaystyle{ s }[/math] is [math]\displaystyle{ \lt 2n^3 }[/math]. By Markov's inequality, the probability that it takes longer than [math]\displaystyle{ 4n^3 }[/math] steps to reach [math]\displaystyle{ t }[/math] from [math]\displaystyle{ s }[/math] is [math]\displaystyle{ \lt 1/2 }[/math].

The random walk use [math]\displaystyle{ O(\log n) }[/math] bits to store the current position, and another [math]\displaystyle{ O(\log n) }[/math] bits to count the number of steps. So the total space used by the algorithm inaddition to the input is [math]\displaystyle{ O(\log n) }[/math].

This shows that USTCON is in the complexity class RL (randomized log-space).

Story in complexity theory

If the randomness if forbidden, it is known that USTCON can be solved nondeterministically in logarithmic space, thus USTCON is in NL. In fact USTCON is complete for the symmetric version of nondeterministic log-space. That is, every problem in the class of SL can be solved by USTCON via log-space reductions. Therefore, USTCON[math]\displaystyle{ \in }[/math]RL implies that SL[math]\displaystyle{ \subseteq }[/math]RL.

In 2004, Reingold shows that USTCON can be solved deterministically in log-space, which proves SL=L. The deterministic algorithm for USTCON is by the derandomization of random walk.

It is conjectured that RL=L, but this is still open.