高级算法 (Fall 2016)/Greedy and Local Search and 高级算法 (Fall 2016)/Nonconstructive Proof of Lovász Local Lemma: Difference between pages

From TCS Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Etone
 
imported>Etone
No edit summary
 
Line 1: Line 1:
<font color=red size=4>Under construction.</font> [[File:Under_construction.png‎|50px]]
Given a sequence of events <math>A_1,A_2,\ldots,A_n</math>, we use the '''dependency graph''' to describe the dependencies between these events.


= Set cover =
{{Theorem
Given <math>m</math> subsets <math>S_1,S_2,\ldots,S_m\subseteq U</math> of a universe <math>U</math> of size <math>n=|U|</math>, a <math>C\subseteq\{1,2,\ldots,m\}</math> forms a '''set cover''' if <math>U=\bigcup_{i\in\mathcal{C}}S_i</math>, that is, <math>C</math> is a sub-collection of sets whose union "covers" all elements in the universe.
|Definition (dependency graph)|
 
:Let <math>A_1,A_2,\ldots,A_n</math> be a sequence of events. A graph <math>D=(V,E)</math> on the set of vertices <math>V=\{1,2,\ldots,n\}</math> is called a '''dependency graph''' for the events <math>A_1,\ldots,A_n</math> if for each <math>i</math>, <math>1\le i\le n</math>, the event <math>A_i</math> is mutually independent of all the events <math>\{A_j\mid (i,j)\not\in E\}</math>.
Without loss of generality, we always assume that the universe is <math>U==\bigcup_{i=1}^mS_i</math>.
 
This defines an important optimization problem:
{{Theorem|Set Cover Problem|
*'''Input''': <math>m</math> subsets <math>S_1,S_2,\ldots,S_m\subseteq U</math> of a universe <math>U</math> of size <math>n</math>;
*'''Output''': the smallest <math>C\subseteq\{1,2,\ldots,m\}</math> such that <math>U=\bigcup_{i\in C}S_i</math>.
}}
}}


We can think of each instance as a bipartite graph <math>G(U,\{S_1,S_2,\ldots,S_n\}, E)</math> with <math>n</math> vertices on the right side, each corresponding to an element <math>x\in U</math>, <math>m</math> vertices on the left side, each corresponding to one of the <math>m</math> subsets <math>S_1,S_2,\ldots,S_m</math>, and there is a bipartite edge connecting <math>x</math> with <math>S_i</math> if and only if <math>x\in S_i</math>. By this translation the set cover problem is precisely the problem of given as input a bipartite graph <math>G(U,V,E)</math>, to find the smallest subset <math>C\subseteq V</math> of vertices on the right side to "cover" all vertices on the left side, such that every vertex on the left side <math>x\in U</math> is incident to some vertex in <math>C</math>.
The notion of mutual independence between an event and a set of events is formally defined as follows.
 
{{Theorem|Definition (mutual independence)|
By alternating the roles of sets and elements in the above interpretation of set cover instances as bipartite graphs, the set cover problem can be translated to the following equivalent hitting set problem:
:An event <math>A</math> is said to be '''mutually independent''' of events <math>B_1,B_2,\ldots, B_k</math>, if for any disjoint <math>I^+,I^-\subseteq\{1,2,\ldots,k\}</math>, it holds that
{{Theorem|Hitting Set Problem|
::<math>\Pr\left[A \mid \left(\bigwedge_{i\in I^+}B_i\right) \wedge \left(\bigwedge_{i\in I^-}\overline{B_i}\right)\right]=\Pr[A]</math>.
*'''Input''': <math>n</math> subsets <math>S_1,S_2,\ldots,S_n\subseteq U</math> of a universe <math>U</math> of size <math>m</math>;
*'''Output''': the smallest subset <math>C\subseteq U</math> of elements such that <math>C</math> intersects with every set <math>S_i</math> for <math>1\le i\le n</math>.
}}
}}


== Frequency and Vertex Cover==
;Example
Given an instance of set cover problem <math>S_1,S_2,\ldots,S_m\subseteq U</math>, for every element <math>x\in U</math>, its '''frequency''', denoted as <math>frequency(x)</math>, is defined as the number of sets containing <math>X</math>. Formally,
:Let <math>X_1,X_2,\ldots,X_m</math> be a set of ''mutually independent'' random variables. Each event <math>A_i</math> is a predicate defined on a number of variables among <math>X_1,X_2,\ldots,X_m</math>. Let <math>v(A_i)</math> be the unique smallest set of variables which determine <math>A_i</math>. The dependency graph <math>D=(V,E)</math> is defined by
:<math>frequency(x)=|\{i\mid x\in S_i\}|</math>.
:::<math>(i,j)\in E</math> iff <math>v(A_i)\cap v(A_j)\neq \emptyset</math>.


In the hitting set version, the frequency should be defined for each set: for a set <math>S_i</math> its frequency <math>frequency(S_i)=|S_i|</math> is just the size of the set <math>S_i</math>.
The following lemma, known as the Lovász local lemma, first proved by Erdős and Lovász in 1975, is an extremely powerful tool, as it supplies a way for dealing with rare events.


The set cover problem restricted to the instances with frequency 2 becomes the vertex cover problem.  
{{Theorem
|Lovász Local Lemma (symmetric case)|
:Let <math>A_1,A_2,\ldots,A_n</math> be a set of events, and assume that there is a <math>p\in[0,1)</math> such that the followings are satisfied:
:#for all <math>1\le i\le n</math>, <math>\Pr[A_i]\le p</math>;
:#the maximum degree of the dependency graph for the events <math>A_1,A_2,\ldots,A_n</math> is <math>d</math>, and
:::<math>\mathrm{e}p\cdot (d+1)\le 1</math>.
:Then
::<math>\Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]>0</math>.
}}


Given an undirected graph <math>G(U,V)</math>, a '''vertex cover''' is a subset <math>C\subseteq V</math> of vertices such that every edge <math>uv\in E</math> has at least one endpoint in <math>C</math>.
We will prove a general version of the local lemma, where the events <math>A_i</math> are not symmetric. This generalization is due to Spencer.
{{Theorem|Vertex Cover Problem|
{{Theorem
*'''Input''': an undirected graph <math>G(V,E)</math>
|Lovász Local Lemma (general case)|
*'''Output''': the smallest <math>C\subseteq V</math> such that every edge <math>e\in E</math> is incident to at least one vertex in <math>C</math>.
:Let <math>D=(V,E)</math> be the dependency graph of events <math>A_1,A_2,\ldots,A_n</math>. Suppose there exist real numbers <math>x_1,x_2,\ldots, x_n</math> such that <math>0\le x_i<1</math> and for all <math>1\le i\le n</math>,
::<math>\Pr[A_i]\le x_i\prod_{(i,j)\in E}(1-x_j)</math>.
:Then
::<math>\Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]\ge\prod_{i=1}^n(1-x_i)</math>.
}}
}}
It is easy to compare with the hitting set problem:
*For graph <math>G(V,E)</math>, its edges <math>e_1,e_2,\ldots,e_n\subseteq V</math> are vertex-sets of size 2.
*A subset <math>C\subseteq V</math> of vertices is a vertex cover if and only if it is a hitting sets for <math>e_1,e_2,\ldots,e_n</math>, i.e. every <math>e_i</math> intersects with <math>C</math>.
Therefore vertex cover is just set cover with frequency 2.


The vertex cover problem is '''NP-hard'''. Its decision version is among [https://en.wikipedia.org/wiki/Karp%27s_21_NP-complete_problems Karp's 21 '''NP-complete''' problems]. Since vertex cover is a special case of set cover, the set cover problem is also '''NP-hard'''.
To see that the general LLL implies symmetric LLL, we set <math>x_i=\frac{1}{d+1}</math> for all <math>i=1,2,\ldots,n</math>. Then we have <math>\left(1-\frac{1}{d+1}\right)^d>\frac{1}{\mathrm{e}}</math>.


== Greedy Algorithm==
Assume the condition in the symmetric LLL:
We present our algorithms in the original set cover setting (instead of the hitting set version).
:#for all <math>1\le i\le n</math>, <math>\Pr[A_i]\le p</math>;
:#<math>\mathrm{e}p\cdot(d+1)\le 1</math>;
then it is easy to verify that for all <math>1\le i\le n</math>,
:<math>\Pr[A_i]\le p\le\frac{1}{e(d+1)}<\frac{1}{d+1}\left(1-\frac{1}{d+1}\right)^d\le x_i\prod_{(i,j)\in E}(1-x_j)</math>.
Due to the general LLL, we have
:<math>\Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]\ge\prod_{i=1}^n(1-x_i)=\left(1-\frac{1}{d+1}\right)^n>0</math>.
This proves the symmetric LLL.


A natural algorithm is the greedy algorithm: sequentially add such <math>i</math> to the cover <math>C</math>, where each <math>S_i</math> covers the largest number of ''currently uncovered'' elements, until no element is left uncovered.
Now we prove the general LLL by the original induction proof.
{{Proof|
First, apply the chain rule. We have
:<math>\Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]=\prod_{i=1}^n\Pr\left[\overline{A_i}\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]=\prod_{i=1}^n\left(1-\Pr\left[{A_i}\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]\right)</math>.


{{Theorem|''GreedyCover''|
Next we prove by induction on <math>m</math> that for any set of <math>m</math> events <math>i_1,\ldots,i_m</math>,
:'''Input:''' sets <math>S_1,S_2,\ldots,S_m</math>;
:<math>\Pr\left[A_{i_1}\mid \bigwedge_{j=2}^m\overline{A_{i_j}}\right]\le x_{i_1}</math>.
----
The local lemma follows immediately by the above chain rule.
:initially, <math>U=\bigcup_{i=1}^mS_i</math>, and <math>C=\emptyset</math>;
:while <math>U\neq\emptyset</math> do
::find <math>i\in\{1,2,\ldots, m\}</math> with the largest <math>|S_i\cap U|</math>;
::let <math>C=C\cup\{i\}</math> and <math>U=U\setminus S_i</math>;
:return <math>C</math>;
}}


Obviously the algorithm runs in polynomial time and always returns a set cover.  We will then show how good the set cover returned by the algorithm compared to the optimal solution by analyzing its approximation ratio.
For <math>m=1</math>, this is obvious because
:<math>\Pr[A_{i_1}]\le x_{i_1}\prod_{(i_1,j)\in E}(1-x_j)\le x_{i_1}</math>.  


We define the following notations:
For general <math>m</math>, let <math>i_2,\ldots,i_k</math> be the set of vertices adjacent to  <math>i_1</math> in the dependency graph, i.e. event <math>A_{i_1}</math> is mutually independent of <math>A_{i_{k+1}},A_{i_{k+2}},\ldots, A_{i_{m}}</math>.
* We enumerate all elements of the universe <math>U</math> as <math>x_1,x_2,\ldots,x_n</math>, in the order in which they are covered in the algorithm.
* For <math>t=1,2,\ldots</math>, let <math>U_t</math> denote the set of uncovered elements in the beginning of the <math>t</math>-th iteration of the algorithm.
* For the <math>k</math>-th element <math>x_k</math> covered, supposed that it is covered by <math>S_i</math> in the <math>t</math>-th iteration, define
::<math>price(x_k)=\frac{1}{|S_i\cap U_t|}</math>
: to be the average "price" to cover element <math>x_k</math> in the algorithm.  


Observe that if <math>x_k</math> is covered by <math>S_i</math> in the <math>t</math>-th iteration, then there are precisely <math>|S_i\cap U_t|</math> elements, including <math>x_k</math>, become covered in that iteration, and all these elements have price <math>1/|S_i\cap U_t|</math>. Then it is easy to have the following lemma:
By conditional probability, we have
{{Theorem|Lemma 1|
:<math>
:For the set cover <math>C</math> returned by the algorithm, <math>|C|=\sum_{k=1}^nprice(x_k)</math>.
\Pr\left[A_{i_1}\mid \bigwedge_{j=2}^m\overline{A_{i_j}}\right]
}}
=\frac{\Pr\left[ A_i\wedge \bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]}
This lemma connect the size of the returned set cover to the prices of elements. The next lemme connects the price of each element to the optimal solution.
{\Pr\left[\bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]}
</math>.
First, we bound the numerator. Due to that <math>A_{i_1}</math> is mutually independent of <math>A_{i_{k+1}},A_{i_{k+2}},\ldots, A_{i_{m}}</math>, we have
:<math>
\begin{align}
\Pr\left[ A_{i_1}\wedge \bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]
&\le\Pr\left[ A_{i_1}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]\\
&=\Pr[A_{i_1}]\\
&\le x_{i_1}\prod_{(i_1,j)\in E}(1-x_j).
\end{align}
</math>


{{Theorem|Lemma 2|
Next, we bound the denominator. Applying the chain rule, we have
:For each <math>x_k</math>, <math>price(x_k)\le \frac{OPT}{n-k+1}</math>, where <math>OPT</math> is the size of the optimal set cover.
:<math>
}}
\Pr\left[\bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]
{{Proof|
=\prod_{j=2}^k\Pr\left[\overline{A_{i_j}}\mid \bigwedge_{\ell=j+1}^m\overline{A_{i_\ell}}\right]
For an instance <math>S_1,S_2,\ldots,S_m\subseteq U</math> with a universe of size <math>n=|U|</math>, let <math>C^*\subseteq\{1,2,\ldots,m\}</math> denote an optimal set cover. Then
</math>
:<math>U=\bigcup_{i\in C^*}S_i</math>.
which by the induction hypothesis, is at least
By averaging principle, there must be an <math>S_i</math> of size
:<math>
:<math>|S_i|\ge\frac{n}{|C^*|}=\frac{n}{OPT}</math>.
\prod_{j=2}^k(1-x_{i_j})=\prod_{\{i_1,i_j\}\in E}(1-x_j)
By the greediness of the algorithm, in the first iteration the algorithm must choose a set <math>S_i</math> of at least this size to add to the set cover <math>C</math>, which means the price the element covered at first, <math>x_1</math>, along with all elements covered in the first iteration, are priced as
:<math>price(x_1)=\frac{1}{\max_{i}|S_i|}\le \frac{OPT}{n}</math>.
For the <math>k</math>-th element covered by the algorithm, supposed that it is covered by in the <math>t</math>-th iteration, and the current universe for the uncovered elements is <math>U_t</math>, it holds that
:<math>|U_t|\le n-k+1</math>,
since there are at most <math>k-1</math> elements covered before <math>x_k</math>.
 
The uncovered elements constitutes a set cover instance <math>S_1\cap U_t, S_2\cap U_t, \ldots, S_m\cap U_t</math> (some of which may be empty), with universe <math>U_t</math>. Note that this smaller instance has an optimal set cover of size at most OPT (since the optimal solution for the original instance must also be an optimal solution for this smaller instance), and <math>x_k</math> is among the elements covered in the first iteration of the algorithm running on this smaller instance. By the above argument, it holds for the <math>price(x_k)=\frac{1}{|S_i\cap U_t|}</math> (also note that this value is not changed no matter as in the <math>t</math>-th integration of the algorithm running on the original instance or as in the first iteration of the algorithm on the smaller instance) that
<math>
price(x_k)=\frac{1}{|S_i\cap U_t|}\le\frac{OPT}{|U_t|}=\frac{OPT}{n-k+1}.
</math>
</math>
The lemma is proved.
where <math>E</math> is the set of edges in the dependency graph.
}}
 


Combining Lemma 1 and Lemma 2, we have
Altogether, we prove the induction hypothesis
:<math>
:<math>
|C|=\sum_{k=1}^nprice(x_k) \le \sum_{k=1}^\frac{OPT}{n-k+1}=\sum_{k=1}^n\frac{OPT}{k}=H_n\cdot OPT,
\Pr\left[A_{i_1}\mid \bigwedge_{j=2}^m\overline{A_{i_j}}\right]
\le\frac{x_{i_1}\prod_{(i_1,j)\in E}(1-x_j)}{\prod_{\{i_1,i_j\}\in E}(1-x_j)}\le x_{i_1}.
</math>
</math>
where <math>H_n=\sum_{k=1}^n\frac{1}{k}\approx\ln n+O(1)</math> is the <math>n</math>-th [http://en.wikipedia.org/wiki/Harmonic_number Harmonic number].


This shows that the ''GreedyCover'' algorithm has approximation ratio <math>H_n\approx\ln n</math>.
Due to the chain rule, it holds that
{{Theorem|Theorem|
:For any set cover instance <math>S_1,S_2,\ldots,S_m\subseteq U</math> with optimal set cover of size <math>OPT</math>, the ''GreedyCover'' returns a set cover of size
::<math>C\le H_n\cdot {OPT}</math>,
:where <math>n=|U|</math> is the size of the universe and <math>H_n\approx\ln n</math> represents the <math>n</math>-th Harmonic number.
}}
 
Ignoring lower order terms, <math>\ln n</math> is also the best approximation ratio achievable by polynomial time algorithms, assuming that '''NP'''<math>neq</math>'''P'''.
* [http://www.cs.mun.ca/~yzchen/papers/papers/hardness-approx-lund-yannakakis.pdf Lund and Yannakakis (1994)] showed that there is no poly-time algorithm with approximation ratio <math><\frac{1}{2}\log_2 n</math>, unless all '''NP''' problems have quasi-polynomial time algorithms (which runs in time <math>n^{\mathrm{polylog}(n)}</math>).
* [http://www.cs.duke.edu/courses/spring07/cps296.2/papers/p634-feige.pdf Feige (1998)] showed that there is no poly-time algorithm with approximation ratio better than <math>(1-o(1))\ln n</math> with the same complexity assumption.
* [http://courses.cs.tau.ac.il/368-3168/03a/ACT2/articles/raz97subconstant.pdf Ras and Safra (1997)] showed that there is no poly-time algorithm with approximation ratio better than <math>c\ln n</math> for a constant <math>c</math> assuming that '''NP'''<math>\neq</math>'''P'''.
* [http://arxiv.org/pdf/1305.1979.pdf Dinur and Steurer (2014)] showed that there is no poly-time algorithm with approximation ratio better than <math>(1-o(1))\ln n</math> assuming that '''NP'''<math>\neq</math>'''P'''.
 
== Primal-Dual Algorithm==
 
 
Given a set cover instance <math>S_1,S_2,\ldots,S_m\subseteq U</math>, we call such <math>M\subseteq U</math> satisfying that <math>|S_i\cap M|\le 1</math> for all <math>i=1,2,\ldots, m</math> a '''matching''' for <math>S_1,S_2,\ldots,S_m</math>.
 
{{Theorem|''DualCover''|
:'''Input:''' sets <math>S_1,S_2,\ldots,S_m\subseteq U</math>;
----
:construct a ''maximal'' matching <math>M\subseteq U</math> such that <math>|S_i\cap M|\le 1</math> for all <math>i=1,2,\ldots, m</math>
::by sequentially adding elements to <math>M</math> until nothing can be added;
:return <math>C=\{i\mid S_i\cap M\neq\}</math>
}}
 
 
{{Theorem|Lemma (Weak Duality)|
:For every matching <math>M</math> and every vertex cover <math>C</math> for <math>S_1,S_2,\ldots,S_m\subseteq U</math>, it holds that <math>|M|\le |C|</math>.
}}
{{Proof|
If <math>M\subseteq U</math> is a matching for <math>S_1,S_2,\ldots,S_m\subseteq U</math>, then every <math>S_i</math> intersects with <math>M</math> on at most one element, which means no two elements in <math>M</math> can be covered by one <math>S_i</math>, and hence each element in <math>M</math> will consume at least one distinct <math>S_i</math> to cover. Therefore, for any set cover, in order to cover all elements in <math>M</math> must cost at least <math>|M|</math> sets.
 
More formally, for any matching <math>M</math> and set cover <math>C</math>, it holds that
:<math>
:<math>
|M|=\left|\bigcup_{i\in C}(S_i\cap M)\right|\le \sum_{i\in C}|S_i\cap M|\le |C|,
\begin{align}
\Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]
&=\prod_{i=1}^n\Pr\left[\overline{A_i}\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]\\
&=\prod_{i=1}^n\left(1-\Pr\left[A_i\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]\right)\\
&\ge\prod_{i=1}^n\left(1-x_i\right).
\end{align}
</math>
</math>
where the first equality is because <math>C</math> is a set cover and the last inequality is because <math>M</math> is a matching.
}}
{{Theorem|Theorem|
:For any set cover instance <math>S_1,S_2,\ldots,S_m\subseteq U</math> with optimal set cover of size <math>OPT</math>, the ''DualCover'' returns a set cover of size
::<math>C\le f\cdot {OPT}</math>,
:where <math>f=\max_{x\in U}frequency(x)=\max_{x\in U}|\{i\mid x\in S_i\}|</math> is the maximum frequency.
}}
}}
= Scheduling =

Latest revision as of 09:48, 3 October 2016

Given a sequence of events [math]\displaystyle{ A_1,A_2,\ldots,A_n }[/math], we use the dependency graph to describe the dependencies between these events.

Definition (dependency graph)
Let [math]\displaystyle{ A_1,A_2,\ldots,A_n }[/math] be a sequence of events. A graph [math]\displaystyle{ D=(V,E) }[/math] on the set of vertices [math]\displaystyle{ V=\{1,2,\ldots,n\} }[/math] is called a dependency graph for the events [math]\displaystyle{ A_1,\ldots,A_n }[/math] if for each [math]\displaystyle{ i }[/math], [math]\displaystyle{ 1\le i\le n }[/math], the event [math]\displaystyle{ A_i }[/math] is mutually independent of all the events [math]\displaystyle{ \{A_j\mid (i,j)\not\in E\} }[/math].

The notion of mutual independence between an event and a set of events is formally defined as follows.

Definition (mutual independence)
An event [math]\displaystyle{ A }[/math] is said to be mutually independent of events [math]\displaystyle{ B_1,B_2,\ldots, B_k }[/math], if for any disjoint [math]\displaystyle{ I^+,I^-\subseteq\{1,2,\ldots,k\} }[/math], it holds that
[math]\displaystyle{ \Pr\left[A \mid \left(\bigwedge_{i\in I^+}B_i\right) \wedge \left(\bigwedge_{i\in I^-}\overline{B_i}\right)\right]=\Pr[A] }[/math].
Example
Let [math]\displaystyle{ X_1,X_2,\ldots,X_m }[/math] be a set of mutually independent random variables. Each event [math]\displaystyle{ A_i }[/math] is a predicate defined on a number of variables among [math]\displaystyle{ X_1,X_2,\ldots,X_m }[/math]. Let [math]\displaystyle{ v(A_i) }[/math] be the unique smallest set of variables which determine [math]\displaystyle{ A_i }[/math]. The dependency graph [math]\displaystyle{ D=(V,E) }[/math] is defined by
[math]\displaystyle{ (i,j)\in E }[/math] iff [math]\displaystyle{ v(A_i)\cap v(A_j)\neq \emptyset }[/math].

The following lemma, known as the Lovász local lemma, first proved by Erdős and Lovász in 1975, is an extremely powerful tool, as it supplies a way for dealing with rare events.

Lovász Local Lemma (symmetric case)
Let [math]\displaystyle{ A_1,A_2,\ldots,A_n }[/math] be a set of events, and assume that there is a [math]\displaystyle{ p\in[0,1) }[/math] such that the followings are satisfied:
  1. for all [math]\displaystyle{ 1\le i\le n }[/math], [math]\displaystyle{ \Pr[A_i]\le p }[/math];
  2. the maximum degree of the dependency graph for the events [math]\displaystyle{ A_1,A_2,\ldots,A_n }[/math] is [math]\displaystyle{ d }[/math], and
[math]\displaystyle{ \mathrm{e}p\cdot (d+1)\le 1 }[/math].
Then
[math]\displaystyle{ \Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]\gt 0 }[/math].

We will prove a general version of the local lemma, where the events [math]\displaystyle{ A_i }[/math] are not symmetric. This generalization is due to Spencer.

Lovász Local Lemma (general case)
Let [math]\displaystyle{ D=(V,E) }[/math] be the dependency graph of events [math]\displaystyle{ A_1,A_2,\ldots,A_n }[/math]. Suppose there exist real numbers [math]\displaystyle{ x_1,x_2,\ldots, x_n }[/math] such that [math]\displaystyle{ 0\le x_i\lt 1 }[/math] and for all [math]\displaystyle{ 1\le i\le n }[/math],
[math]\displaystyle{ \Pr[A_i]\le x_i\prod_{(i,j)\in E}(1-x_j) }[/math].
Then
[math]\displaystyle{ \Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]\ge\prod_{i=1}^n(1-x_i) }[/math].

To see that the general LLL implies symmetric LLL, we set [math]\displaystyle{ x_i=\frac{1}{d+1} }[/math] for all [math]\displaystyle{ i=1,2,\ldots,n }[/math]. Then we have [math]\displaystyle{ \left(1-\frac{1}{d+1}\right)^d\gt \frac{1}{\mathrm{e}} }[/math].

Assume the condition in the symmetric LLL:

  1. for all [math]\displaystyle{ 1\le i\le n }[/math], [math]\displaystyle{ \Pr[A_i]\le p }[/math];
  2. [math]\displaystyle{ \mathrm{e}p\cdot(d+1)\le 1 }[/math];

then it is easy to verify that for all [math]\displaystyle{ 1\le i\le n }[/math],

[math]\displaystyle{ \Pr[A_i]\le p\le\frac{1}{e(d+1)}\lt \frac{1}{d+1}\left(1-\frac{1}{d+1}\right)^d\le x_i\prod_{(i,j)\in E}(1-x_j) }[/math].

Due to the general LLL, we have

[math]\displaystyle{ \Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]\ge\prod_{i=1}^n(1-x_i)=\left(1-\frac{1}{d+1}\right)^n\gt 0 }[/math].

This proves the symmetric LLL.

Now we prove the general LLL by the original induction proof.

Proof.

First, apply the chain rule. We have

[math]\displaystyle{ \Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]=\prod_{i=1}^n\Pr\left[\overline{A_i}\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]=\prod_{i=1}^n\left(1-\Pr\left[{A_i}\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]\right) }[/math].

Next we prove by induction on [math]\displaystyle{ m }[/math] that for any set of [math]\displaystyle{ m }[/math] events [math]\displaystyle{ i_1,\ldots,i_m }[/math],

[math]\displaystyle{ \Pr\left[A_{i_1}\mid \bigwedge_{j=2}^m\overline{A_{i_j}}\right]\le x_{i_1} }[/math].

The local lemma follows immediately by the above chain rule.

For [math]\displaystyle{ m=1 }[/math], this is obvious because

[math]\displaystyle{ \Pr[A_{i_1}]\le x_{i_1}\prod_{(i_1,j)\in E}(1-x_j)\le x_{i_1} }[/math].

For general [math]\displaystyle{ m }[/math], let [math]\displaystyle{ i_2,\ldots,i_k }[/math] be the set of vertices adjacent to [math]\displaystyle{ i_1 }[/math] in the dependency graph, i.e. event [math]\displaystyle{ A_{i_1} }[/math] is mutually independent of [math]\displaystyle{ A_{i_{k+1}},A_{i_{k+2}},\ldots, A_{i_{m}} }[/math].

By conditional probability, we have

[math]\displaystyle{ \Pr\left[A_{i_1}\mid \bigwedge_{j=2}^m\overline{A_{i_j}}\right] =\frac{\Pr\left[ A_i\wedge \bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]} {\Pr\left[\bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]} }[/math].

First, we bound the numerator. Due to that [math]\displaystyle{ A_{i_1} }[/math] is mutually independent of [math]\displaystyle{ A_{i_{k+1}},A_{i_{k+2}},\ldots, A_{i_{m}} }[/math], we have

[math]\displaystyle{ \begin{align} \Pr\left[ A_{i_1}\wedge \bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right] &\le\Pr\left[ A_{i_1}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]\\ &=\Pr[A_{i_1}]\\ &\le x_{i_1}\prod_{(i_1,j)\in E}(1-x_j). \end{align} }[/math]

Next, we bound the denominator. Applying the chain rule, we have

[math]\displaystyle{ \Pr\left[\bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right] =\prod_{j=2}^k\Pr\left[\overline{A_{i_j}}\mid \bigwedge_{\ell=j+1}^m\overline{A_{i_\ell}}\right] }[/math]

which by the induction hypothesis, is at least

[math]\displaystyle{ \prod_{j=2}^k(1-x_{i_j})=\prod_{\{i_1,i_j\}\in E}(1-x_j) }[/math]

where [math]\displaystyle{ E }[/math] is the set of edges in the dependency graph.

Altogether, we prove the induction hypothesis

[math]\displaystyle{ \Pr\left[A_{i_1}\mid \bigwedge_{j=2}^m\overline{A_{i_j}}\right] \le\frac{x_{i_1}\prod_{(i_1,j)\in E}(1-x_j)}{\prod_{\{i_1,i_j\}\in E}(1-x_j)}\le x_{i_1}. }[/math]

Due to the chain rule, it holds that

[math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right] &=\prod_{i=1}^n\Pr\left[\overline{A_i}\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]\\ &=\prod_{i=1}^n\left(1-\Pr\left[A_i\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]\right)\\ &\ge\prod_{i=1}^n\left(1-x_i\right). \end{align} }[/math]
[math]\displaystyle{ \square }[/math]