随机算法 \ 高级算法 (Fall 2016)/Problem Set 1 and 高级算法 (Fall 2016)/Nonconstructive Proof of Lovász Local Lemma: Difference between pages

From TCS Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Etone
 
imported>Etone
(Created page with "== Lovász Local Lemma== Given a sequence of events <math>A_1,A_2,\ldots,A_n</math>, we use the '''dependency graph''' to describe the dependencies between these events. {{Th...")
 
Line 1: Line 1:
== Problem 1==
== Lovász Local Lemma==
For any <math>\alpha\ge 1</math>, a cut <math>C</math> in an undirected (multi)graph <math>G(V,E)</math>is called an <math>\alpha</math>-min-cut if <math>|C|\le\alpha|C^*|</math> where <math>C^*</math> is a min-cut in <math>G</math>.  
Given a sequence of events <math>A_1,A_2,\ldots,A_n</math>, we use the '''dependency graph''' to describe the dependencies between these events.


# Give a lower bound to the probability that Karger's Random Contraction algorithm returns an <math>\alpha</math>-min-cut in a graph <math>G(V,E)</math> of <math>n</math> vertices.
{{Theorem
# Use the above bound to estimate the number of distinct <math>\alpha</math>-min cuts in <math>G</math>.
|Definition|
:Let <math>A_1,A_2,\ldots,A_n</math> be a sequence of events. A graph <math>D=(V,E)</math> on the set of vertices <math>V=\{1,2,\ldots,n\}</math> is called a '''dependency graph''' for the events <math>A_1,\ldots,A_n</math> if for each <math>i</math>, <math>1\le i\le n</math>, the event <math>A_i</math> is mutually independent of all the events <math>\{A_j\mid (i,j)\not\in E\}</math>.
}}


== Problem 2==
The notion of mutual independence between an event and a set of events is formally defined as follows.
Let <math>G(V,E)</math> be an undirected graph with positive edge weights <math>w:E\to\mathbb{Z}^+</math>. Given a partition of <math>V</math> into <math>k</math> disjoint subsets <math>S_1,S_2,\ldots,S_k</math>, we define
{{Theorem|Definition|
:<math>w(S_1,S_2,\ldots,S_k)=\sum_{uv\in E\atop \exists i\neq j: u\in S_i,v\in S_j}w(uv)</math>
:An event <math>A</math> is said to be '''mutually independent''' of events <math>B_1,B_2,\ldots, B_k</math>, if for any disjoint <math>I^+,I^-\subseteq\{1,2,lots,k\}</math>, it holds that
as the cost of the '''<math>k</math>-cut''' <math>\{S_1,S_2,\ldots,S_k\}</math>. Our goal is to find a <math>k</math>-cut with maximum cost.
::<math>\Pr\left[A\mid \bigwedge_{i\in I^+}B_i\wedge \bigwedge_{i\in I^-}\overline{B_i}\right]=\Pr[A]</math>.
# Give a poly-time greedy algorithm for finding the weighted max <math>k</math>-cut. Prove that the approximation ratio is <math>(1-1/k)</math>.
}}
# Consider the following local search algorithm for the weighted max cut (max 2-cut).
start with an arbitrary bipartition of <math>V</math> into disjoint <math>S_0,S_1</math>;
while (true) do
    if <math>\exists i\in\{0,1\}</math> and <math>v\in S_i</math> such that <font color=red>(______________)</font>
      then <math>v</math> leaves <math>S_i</math> and joins <math>S_{1-i}</math>;
      continue;
    end if
    break;
end
:Fill in the blank parenthesis. Give an analysis of the running time of the algorithm. And prove that the approximation ratio is 0.5.


== Problem 3==
;Example
Given <math>m</math> subsets <math>S_1,S_2,\ldots, S_m\subseteq U</math> of a universe <math>U</math> of size <math>n</math>, we want to find a <math>C\subseteq\{1,2,\ldots, n\}</math> of fixed size <math>k=|C|</math> with the maximum '''coverage''' <math>\left|\bigcup_{i\in C}S_i\right|</math>.
:Let <math>X_1,X_2,\ldots,X_m</math> be a set of ''mutually independent'' random variables. Each event <math>A_i</math> is a predicate defined on a number of variables among <math>X_1,X_2,\ldots,X_m</math>. Let <math>v(A_i)</math> be the unique smallest set of variables which determine <math>A_i</math>. The dependency graph <math>D=(V,E)</math> is defined by
:::<math>(i,j)\in E</math> iff <math>v(A_i)\cap v(A_j)\neq \emptyset</math>.


* Give a poly-time greedy algorithm for the problem. Prove that the approximation ratio is <math>1-(1-1/k)^k>1-1/e</math>.
The following lemma, known as the Lovász local lemma, first proved by Erdős and Lovász in 1975, is an extremely powerful tool, as it supplies a way for dealing with rare events.


== Problem 4==
{{Theorem
We consider minimum makespan scheduling on parallel identical machines when jobs are subject to '''precedence constraints'''.
|Lovász Local Lemma (symmetric case)|
:Let <math>A_1,A_2,\ldots,A_n</math> be a set of events, and assume that the following hold:
:#for all <math>1\le i\le n</math>, <math>\Pr[A_i]\le p</math>;
:#the maximum degree of the dependency graph for the events <math>A_1,A_2,\ldots,A_n</math> is <math>d</math>, and
:::<math>ep(d+1)\le 1</math>.
:Then
::<math>\Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]>0</math>.
}}


We still want to schedule <math>n</math> jobs <math>j=1,2,\ldots, n</math> on <math>m</math> identical machines, where job <math>j</math> has  processing time <math>p_j</math>. But now a partial order <math>\preceq</math> is defined on jobs, so that if <math>j\prec k</math> then job <math>j</math> must be completely finished before job <math>k</math> begins. The following is a variant of the ''List'' algorithm for this problem.
We will prove a general version of the local lemma, where the events <math>A_i</math> are not symmetric. This generalization is due to Spencer.
Input: a list of <math>n</math> jobs with processing times <math>p_1,p_2,\ldots, p_n</math>;
{{Theorem
|Lovász Local Lemma (general case)|
:Let <math>D=(V,E)</math> be the dependency graph of events <math>A_1,A_2,\ldots,A_n</math>. Suppose there exist real numbers <math>x_1,x_2,\ldots, x_n</math> such that <math>0\le x_i<1</math> and for all <math>1\le i\le n</math>,
::<math>\Pr[A_i]\le x_i\prod_{(i,j)\in E}(1-x_j)</math>.
:Then
::<math>\Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]\ge\prod_{i=1}^n(1-x_i)</math>.
}}


whenever a machine becomes idle
To see that the general LLL implies symmetric LLL, we set <math>x_i=\frac{1}{d+1}</math> for all <math>i=1,2,\ldots,n</math>. Then we have <math>\left(1-\frac{1}{d+1}\right)^d>\frac{1}{\mathrm{e}}</math>.
    assign the next ''available'' job on the list to the machine;


Here a job <math>k</math> is available if all jobs <math>j\prec k</math> have already been completely processed.
Assume the condition in the symmetric LLL:
:#for all <math>1\le i\le n</math>, <math>\Pr[A_i]\le p</math>;
:#<math>ep(d+1)\le 1</math>;
then it is easy to verify that for all <math>1\le i\le n</math>,
:<math>\Pr[A_i]\le p\le\frac{1}{e(d+1)}<\frac{1}{d+1}\left(1-\frac{1}{d+1}\right)^d\le x_i\prod_{(i,j)\in E}(1-x_j)</math>.
Due to the general LLL, we have
:<math>\Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]\ge\prod_{i=1}^n(1-x_i)=\left(1-\frac{1}{d+1}\right)^n>0</math>.
This proves the symmetric LLL.


* Prove that the approximation ratio is 2.
Now we prove the general LLL by the original induction proof.
{{Proof|
First, apply the chain rule. We have
:<math>\Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]=\prod_{i=1}^n\Pr\left[\overline{A_i}\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]=\prod_{i=1}^n\left(1-\Pr\left[{A_i}\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]\right)</math>.


== Problem 5 ==
Next we prove by induction on <math>m</math> that for any set of <math>m</math> events <math>i_1,\ldots,i_m</math>,
For a '''hypergraph''' <math>H(V,E)</math> with vertex set <math>V</math>, every '''hyperedge''' <math>e\in E</math> is a subset <math>e\subset V</math> of vertices, not necessarily of size 2. A hypergraph <math>H(V,E)</math> is '''<math>k</math>-uniform''' if every hyperedge <math>e\in V</math> is of size <math>k=|e|</math>.  
:<math>\Pr\left[A_{i_1}\mid \bigwedge_{j=2}^m\overline{A_{i_j}}\right]\le x_{i_1}</math>.
The local lemma follows immediately by the above chain rule.


A hypergraph <math>H(V,E)</math> is said to have '''property B''' (named after Bernstein) if <math>H</math> is 2-coloable; that is, if there is a '''proper 2-coloring''' <math>f:V\to\{{\color{red}R},{\color{blue}B}\}</math> which assigns each vertex one of the two colors <font color=red>Red</font> or <font color=blue>Blue</font>, such that none of the hyperedge is ''monochromatic''.
For <math>m=1</math>, this is obvious because
:<math>\Pr[A_{i_1}]\le x_{i_1}\prod_{(i_1,j)\in E}(1-x_j)\le x_{i_1}</math>.  


# Let <math>H(V,E)</math> be a <math>k</math>-uniform hypergraph in which every hyperedge <math>e\in E</math> shares vertices with at most <math>d</math> other hyperedges.
For general <math>m</math>, let <math>i_2,\ldots,i_k</math> be the set of vertices adjacent to  <math>i_1</math> in the dependency graph, i.e. event <math>A_{i_1}</math> is mutually independent of <math>A_{i_{k+1}},A_{i_{k+2}},\ldots, A_{i_{m}}</math>.
#*Show that if <math>2\mathrm{e}\cdot (d+1)\le 2^{k}</math>, then <math>H</math> has property B.
By conditional probability, we have
#*Describe how to use Moser's recursive Fix algorithm to find a proper 2-coloring of <math>H</math>. Give the pseudocode. Prove the condition in interns of <math>d</math> and <math>k</math> under which the algorithm can find a 2-coloring of <math>H</math> with high probability.
:<math>
#*Describe how to use Moser-Tardos random solver to find a proper 2-coloring of <math>H</math>. Give the pseudocode. Prove the condition in interns of <math>d</math> and <math>k</math> under which the algorithm can find a 2-coloring of <math>H</math> within bounded expected time.
\Pr\left[A_{i_1}\mid \bigwedge_{j=2}^m\overline{A_{i_j}}\right]
# Let <math>H(V,E)</math> be a hypergraph (not necessarily uniform) with at least <math>n\ge 2</math> vertices satisfying that
=\frac{\Pr\left[ A_i\wedge \bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]}
#:<math>\forall v\in V, \sum_{e\ni v}(1-1/k)^{-|e|}2^{-|e|+1}\le \frac{1}{n}</math>.
{\Pr\left[\bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]}
#*Show that <math>H</math> has property B.
</math>.
#*Describe how to use Moser-Tardos random solver to find a proper 2-coloring of <math>H</math>. Give an upper bound on the expected running time.
First, we bound the numerator. Due to that <math>A_{i_1}</math> is mutually independent of <math>A_{i_{k+1}},A_{i_{k+2}},\ldots, A_{i_{m}}</math>, we have
:<math>
\begin{align}
\Pr\left[ A_{i_1}\wedge \bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]
&\le\Pr\left[ A_{i_1}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]\\
&=\Pr[A_{i_1}]\\
&\le x_{i_1}\prod_{(i_1,j)\in E}(1-x_j).
\end{align}
</math>
 
Next, we bound the denominator. Applying the chain rule, we have
:<math>
\Pr\left[\bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]
=\prod_{j=2}^k\Pr\left[\overline{A_{i_j}}\mid \bigwedge_{\ell=j+1}^m\overline{A_{i_\ell}}\right]
</math>
which by the induction hypothesis, is at least
:<math>
\prod_{j=2}^k(1-x_{i_j})=\prod_{\{i_1,i_j\}\in E}(1-x_j)
</math>
where <math>E</math> is the set of edges in the dependency graph.
 
Altogether, we prove the induction hypothesis
:<math>
\Pr\left[A_{i_1}\mid \bigwedge_{j=2}^m\overline{A_{i_j}}\right]
\le\frac{x_{i_1}\prod_{(i_1,j)\in E}(1-x_j)}{\prod_{\{i_1,i_j\}\in E}(1-x_j)}\le x_{i_1}.
</math>
 
Due to the chain rule, it holds that
:<math>
\begin{align}
\Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]
&=\prod_{i=1}^n\Pr\left[\overline{A_i}\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]\\
&=\prod_{i=1}^n\left(1-\Pr\left[A_i\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]\right)\\
&\ge\prod_{i=1}^n\left(1-x_i\right).
\end{align}
</math>
}}

Revision as of 09:30, 3 October 2016

Lovász Local Lemma

Given a sequence of events [math]\displaystyle{ A_1,A_2,\ldots,A_n }[/math], we use the dependency graph to describe the dependencies between these events.

Definition
Let [math]\displaystyle{ A_1,A_2,\ldots,A_n }[/math] be a sequence of events. A graph [math]\displaystyle{ D=(V,E) }[/math] on the set of vertices [math]\displaystyle{ V=\{1,2,\ldots,n\} }[/math] is called a dependency graph for the events [math]\displaystyle{ A_1,\ldots,A_n }[/math] if for each [math]\displaystyle{ i }[/math], [math]\displaystyle{ 1\le i\le n }[/math], the event [math]\displaystyle{ A_i }[/math] is mutually independent of all the events [math]\displaystyle{ \{A_j\mid (i,j)\not\in E\} }[/math].

The notion of mutual independence between an event and a set of events is formally defined as follows.

Definition
An event [math]\displaystyle{ A }[/math] is said to be mutually independent of events [math]\displaystyle{ B_1,B_2,\ldots, B_k }[/math], if for any disjoint [math]\displaystyle{ I^+,I^-\subseteq\{1,2,lots,k\} }[/math], it holds that
[math]\displaystyle{ \Pr\left[A\mid \bigwedge_{i\in I^+}B_i\wedge \bigwedge_{i\in I^-}\overline{B_i}\right]=\Pr[A] }[/math].
Example
Let [math]\displaystyle{ X_1,X_2,\ldots,X_m }[/math] be a set of mutually independent random variables. Each event [math]\displaystyle{ A_i }[/math] is a predicate defined on a number of variables among [math]\displaystyle{ X_1,X_2,\ldots,X_m }[/math]. Let [math]\displaystyle{ v(A_i) }[/math] be the unique smallest set of variables which determine [math]\displaystyle{ A_i }[/math]. The dependency graph [math]\displaystyle{ D=(V,E) }[/math] is defined by
[math]\displaystyle{ (i,j)\in E }[/math] iff [math]\displaystyle{ v(A_i)\cap v(A_j)\neq \emptyset }[/math].

The following lemma, known as the Lovász local lemma, first proved by Erdős and Lovász in 1975, is an extremely powerful tool, as it supplies a way for dealing with rare events.

Lovász Local Lemma (symmetric case)
Let [math]\displaystyle{ A_1,A_2,\ldots,A_n }[/math] be a set of events, and assume that the following hold:
  1. for all [math]\displaystyle{ 1\le i\le n }[/math], [math]\displaystyle{ \Pr[A_i]\le p }[/math];
  2. the maximum degree of the dependency graph for the events [math]\displaystyle{ A_1,A_2,\ldots,A_n }[/math] is [math]\displaystyle{ d }[/math], and
[math]\displaystyle{ ep(d+1)\le 1 }[/math].
Then
[math]\displaystyle{ \Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]\gt 0 }[/math].

We will prove a general version of the local lemma, where the events [math]\displaystyle{ A_i }[/math] are not symmetric. This generalization is due to Spencer.

Lovász Local Lemma (general case)
Let [math]\displaystyle{ D=(V,E) }[/math] be the dependency graph of events [math]\displaystyle{ A_1,A_2,\ldots,A_n }[/math]. Suppose there exist real numbers [math]\displaystyle{ x_1,x_2,\ldots, x_n }[/math] such that [math]\displaystyle{ 0\le x_i\lt 1 }[/math] and for all [math]\displaystyle{ 1\le i\le n }[/math],
[math]\displaystyle{ \Pr[A_i]\le x_i\prod_{(i,j)\in E}(1-x_j) }[/math].
Then
[math]\displaystyle{ \Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]\ge\prod_{i=1}^n(1-x_i) }[/math].

To see that the general LLL implies symmetric LLL, we set [math]\displaystyle{ x_i=\frac{1}{d+1} }[/math] for all [math]\displaystyle{ i=1,2,\ldots,n }[/math]. Then we have [math]\displaystyle{ \left(1-\frac{1}{d+1}\right)^d\gt \frac{1}{\mathrm{e}} }[/math].

Assume the condition in the symmetric LLL:

  1. for all [math]\displaystyle{ 1\le i\le n }[/math], [math]\displaystyle{ \Pr[A_i]\le p }[/math];
  2. [math]\displaystyle{ ep(d+1)\le 1 }[/math];

then it is easy to verify that for all [math]\displaystyle{ 1\le i\le n }[/math],

[math]\displaystyle{ \Pr[A_i]\le p\le\frac{1}{e(d+1)}\lt \frac{1}{d+1}\left(1-\frac{1}{d+1}\right)^d\le x_i\prod_{(i,j)\in E}(1-x_j) }[/math].

Due to the general LLL, we have

[math]\displaystyle{ \Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]\ge\prod_{i=1}^n(1-x_i)=\left(1-\frac{1}{d+1}\right)^n\gt 0 }[/math].

This proves the symmetric LLL.

Now we prove the general LLL by the original induction proof.

Proof.

First, apply the chain rule. We have

[math]\displaystyle{ \Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]=\prod_{i=1}^n\Pr\left[\overline{A_i}\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]=\prod_{i=1}^n\left(1-\Pr\left[{A_i}\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]\right) }[/math].

Next we prove by induction on [math]\displaystyle{ m }[/math] that for any set of [math]\displaystyle{ m }[/math] events [math]\displaystyle{ i_1,\ldots,i_m }[/math],

[math]\displaystyle{ \Pr\left[A_{i_1}\mid \bigwedge_{j=2}^m\overline{A_{i_j}}\right]\le x_{i_1} }[/math].

The local lemma follows immediately by the above chain rule.

For [math]\displaystyle{ m=1 }[/math], this is obvious because

[math]\displaystyle{ \Pr[A_{i_1}]\le x_{i_1}\prod_{(i_1,j)\in E}(1-x_j)\le x_{i_1} }[/math].

For general [math]\displaystyle{ m }[/math], let [math]\displaystyle{ i_2,\ldots,i_k }[/math] be the set of vertices adjacent to [math]\displaystyle{ i_1 }[/math] in the dependency graph, i.e. event [math]\displaystyle{ A_{i_1} }[/math] is mutually independent of [math]\displaystyle{ A_{i_{k+1}},A_{i_{k+2}},\ldots, A_{i_{m}} }[/math]. By conditional probability, we have

[math]\displaystyle{ \Pr\left[A_{i_1}\mid \bigwedge_{j=2}^m\overline{A_{i_j}}\right] =\frac{\Pr\left[ A_i\wedge \bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]} {\Pr\left[\bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]} }[/math].

First, we bound the numerator. Due to that [math]\displaystyle{ A_{i_1} }[/math] is mutually independent of [math]\displaystyle{ A_{i_{k+1}},A_{i_{k+2}},\ldots, A_{i_{m}} }[/math], we have

[math]\displaystyle{ \begin{align} \Pr\left[ A_{i_1}\wedge \bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right] &\le\Pr\left[ A_{i_1}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]\\ &=\Pr[A_{i_1}]\\ &\le x_{i_1}\prod_{(i_1,j)\in E}(1-x_j). \end{align} }[/math]

Next, we bound the denominator. Applying the chain rule, we have

[math]\displaystyle{ \Pr\left[\bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right] =\prod_{j=2}^k\Pr\left[\overline{A_{i_j}}\mid \bigwedge_{\ell=j+1}^m\overline{A_{i_\ell}}\right] }[/math]

which by the induction hypothesis, is at least

[math]\displaystyle{ \prod_{j=2}^k(1-x_{i_j})=\prod_{\{i_1,i_j\}\in E}(1-x_j) }[/math]

where [math]\displaystyle{ E }[/math] is the set of edges in the dependency graph.

Altogether, we prove the induction hypothesis

[math]\displaystyle{ \Pr\left[A_{i_1}\mid \bigwedge_{j=2}^m\overline{A_{i_j}}\right] \le\frac{x_{i_1}\prod_{(i_1,j)\in E}(1-x_j)}{\prod_{\{i_1,i_j\}\in E}(1-x_j)}\le x_{i_1}. }[/math]

Due to the chain rule, it holds that

[math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right] &=\prod_{i=1}^n\Pr\left[\overline{A_i}\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]\\ &=\prod_{i=1}^n\left(1-\Pr\left[A_i\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]\right)\\ &\ge\prod_{i=1}^n\left(1-x_i\right). \end{align} }[/math]
[math]\displaystyle{ \square }[/math]