随机算法 (Fall 2011)/Probability Space and 随机算法 (Fall 2011)/The Probabilistic Method: Difference between pages

From TCS Wiki
(Difference between pages)
Jump to navigation Jump to search
 
imported>WikiSysop
 
Line 1: Line 1:
=Axioms of Probability=
= Counting =
The axiom foundation of probability theory is laid by [http://en.wikipedia.org/wiki/Andrey_Kolmogorov Kolmogorov], one of the greatest mathematician of the 20th century, who advanced various very different fields of mathematics.


{{Theorem|Definition (Probability Space)|
===Circuit complexity===
A '''probability space''' is a triple <math>(\Omega,\Sigma,\Pr)</math>.  
 
*<math>\Omega</math> is a set, called the '''sample space'''.  
A '''boolean function''' is a function is the form <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.
*<math>\Sigma\subseteq 2^{\Omega}</math> is the set of all '''events''', satisfying:
 
*:(A1). <math>\Omega\in\Sigma</math> and <math>\empty\in\Sigma</math>. (The ''certain'' event and the ''impossible'' event.)
Formally, a boolean circuit is a directed acyclic graph. Nodes with indegree zero are input nodes, labeled <math>x_1, x_2, \ldots , x_n</math>. A circuit has a unique node with outdegree zero, called the output node. Every other node is a gate. There are three types of gates: AND, OR (both with indegree two), and NOT (with indegree one).
*:(A2). If <math>A,B\in\Sigma</math>, then <math>A\cap B, A\cup B, A-B\in\Sigma</math>. (Intersection, union, and diference of two events are events).
 
* A '''probability measure''' <math>\Pr:\Sigma\rightarrow\mathbb{R}</math> is a function that maps each event to a nonnegative real number, satisfying
Computations in Turing machines can be simulated by circuits, and any boolean function in '''P''' can be computed by a circuit with polynomially many gates. Thus, if we can find a function in '''NP''' that cannot be computed by any circuit with polynomially many gates, then '''NP'''<math>\neq</math>'''P'''.
*:(A3). <math>\Pr(\Omega)=1</math>.
 
*:(A4). If <math>A\cap B=\emptyset</math> (such events are call ''disjoint'' events), then <math>\Pr(A\cup B)=\Pr(A)+\Pr(B)</math>.
The following theorem due to Shannon says that functions with exponentially large circuit complexity do exist.
*:(A5*). For a decreasing sequence of events <math>A_1\supset A_2\supset \cdots\supset A_n\supset\cdots</math> of events with <math>\bigcap_n A_n=\emptyset</math>, it holds that <math>\lim_{n\rightarrow \infty}\Pr(A_n)=0</math>.
 
{{Theorem
|Theorem (Shannon 1949)|
:There is a boolean function <math>f:\{0,1\}^n\rightarrow \{0,1\}</math> with circuit complexity greater than <math>\frac{2^n}{3n}</math>.
}}
}}
The sample space <math>\Omega</math> is the set of all possible outcomes of the random process modeled by the probability space. An event is a subset of <math>\Omega</math>. The statements (A1)--(A5) are axioms of probability. A probability space is well defined as long as these axioms are satisfied.
{{Proof| There are <math>2^{2^n}</math> boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.
;Example
 
:Consider the probability space defined by rolling a dice with six faces. The sample space is <math>\Omega=\{1,2,3,4,5,6\}</math>, and <math>\Sigma</math> is the power set <math>2^{\Omega}</math>. For any event <math>A\in\Sigma</math>, its probability is given by <math>\Pr(A)=\frac{|A|}{6}</math>.
Fix an integer <math>t</math>, we then count the number of circuits with <math>t</math> gates. By the [http://en.wikipedia.org/wiki/De_Morgan's_laws De Morgan's laws], we can assume that all NOTs are pushed back to the inputs. Each gate has one of the two types (AND or OR), and has two inputs. Each of the inputs to a gate is either a constant 0 or 1, an input variable <math>x_i</math>, an inverted input variable <math>\neg x_i</math>, or the output of another gate; thus, there are at most <math>2+2n+t-1</math> possible gate inputs. It follows that the number of circuits with <math>t</math> gates is at most <math>2^t(t+2n+1)^{2t}</math>.  


;Remark
Uniformly choose a boolean function <math>f</math> at random. Note that each circuit can compute one boolean function (the converse is not true). The probability that <math>f</math> can be computed by a circuit with <math>t</math> gates is at most
* In general, the set <math>\Omega</math> may be continuous, but we only consider '''discrete''' probability in this lecture, thus we assume that <math>\Omega</math> is either finite or countably infinite.
:<math>
* In many cases (such as the above example), <math>\Sigma=2^{\Omega}</math>, i.e. the events enumerates all subsets of <math>\Omega</math>. But in general, a probability space is well-defined by any <math>\Sigma</math> satisfying (A1) and (A2). Such <math>\Sigma</math> is called a <math>\sigma</math>-algebra defined on <math>\Omega</math>.
\frac{2^t(t+2n+1)^{2t}}{2^{2^n}}.
* The last axiom (A5*) is redundant if <math>\Sigma</math> is finite, thus it is only essential when there are infinitely many events. The role of axiom (A5*) in probability theory is like [http://en.wikipedia.org/wiki/Zorn's_lemma Zorn's Lemma] (or equivalently the [http://en.wikipedia.org/wiki/Axiom_of_choice Axiom of Choice]) in axiomatic set theory.
</math>
If <math>t=2^n/3n</math>, then
:<math>
\frac{2^t(t+2n+1)^{2t}}{2^{2^n}}=o(1)<1.
</math>


Laws for probability can be deduced from the above axiom system. Denote that <math>\bar{A}=\Omega-A</math>.
Therefore, there exists a boolean function <math>f</math> which cannot be computed by any circuits with <math>2^n/3n</math> gates.
{{Theorem|Proposition|
:<math>\Pr(\bar{A})=1-\Pr(A)</math>.
}}
{{Proof|
Due to Axiom (A4), <math>\Pr(\bar{A})+\Pr(A)=\Pr(\Omega)</math> which is equal to 1 according to Axiom (A3), thus <math>\Pr(\bar{A})+\Pr(A)=1</math>. The proposition follows.
}}
}}


Exercise: Deduce other useful laws for probability from the axioms. For example, <math>A\subseteq B\Longrightarrow\Pr(A)\le\Pr(B)</math>.
Note that by Shannon's theorem, not only there exists a boolean function with exponentially large circuit complexity, but ''almost all'' boolean functions have exponentially large circuit complexity.
 
= Probabilistic Method =
===Ramsey number===


= Notation =
Recall the Ramsey theorem which states that in a meeting of at least six people, there are either three people knowing each other or three people not knowing each other. In graph theoretical terms, this means that no matter how we color the edges of <math>K_6</math> (the complete graph on six vertices), there must be a '''monochromatic''' <math>K_3</math> (a triangle whose edges have the same color).
An event <math>A\subseteq\Omega</math> can be represented as <math>A=\{a\in\Omega\mid \mathcal{E}(a)\}</math> with a predicate <math>\mathcal{E}</math>.  


The predicate notation of probability is
Generally, the '''Ramsey number''' <math>R(k,\ell)</math> is the smallest integer <math>n</math> such that in any two-coloring of the edges of a complete graph on <math>n</math> vertices <math>K_n</math> by red and blue, either there is a red <math>K_k</math> or there is a blue <math>K_\ell</math>.
:<math>\Pr[\mathcal{E}]=\Pr(\{a\in\Omega\mid \mathcal{E}(a)\})</math>.
;Example
: We still consider the probability space by rolling a six-face dice. The sample space is <math>\Omega=\{1,2,3,4,5,6\}</math>. Consider the event that the outcome is odd.
:: <math>\Pr[\text{ the outcome is odd }]=\Pr(\{1,3,5\})</math>.


During the lecture, we mostly use the predicate notation instead of subset notation.
Ramsey showed in 1929 that <math>R(k,\ell)</math> is finite for any <math>k</math> and <math>\ell</math>. It is extremely hard to compute the exact value of <math>R(k,\ell)</math>. Here we give a lower bound of <math>R(k,k)</math> by the probabilistic method.


= The Union Bound =
We are familiar with the [http://en.wikipedia.org/wiki/Inclusion–exclusion_principle principle of inclusion-exclusion] for finite sets.
{{Theorem
{{Theorem
|Principle of Inclusion-Exclusion|
|Theorem (Erdős 1947)|
:Let <math>S_1, S_2, \ldots, S_n</math> be <math>n</math> finite sets. Then
:If <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math> then it is possible to color the edges of <math>K_n</math> with two colors so that there is no monochromatic <math>K_k</math> subgraph.
::<math>\begin{align}
}}
\left|\bigcup_{1\le i\le n}S_i\right|
{{Proof| Consider a random two-coloring of edges of <math>K_n</math> obtained as follows:
* For each edge of <math>K_n</math>, independently flip a fair coin to decide the color of the edge.
 
For any fixed set <math>S</math> of <math>k</math> vertices, let <math>\mathcal{E}_S</math> be the event that the <math>K_k</math> subgraph induced by <math>S</math> is monochromatic. There are <math>{k\choose 2}</math> many edges in <math>K_k</math>, therefore
:<math>\Pr[\mathcal{E}_S]=2\cdot 2^{-{k\choose 2}}=2^{1-{k\choose 2}}.</math>
 
Since there are <math>{n\choose k}</math> possible choices of <math>S</math>, by the union bound
:<math>
\Pr[\exists S, \mathcal{E}_S]\le {n\choose k}\cdot\Pr[\mathcal{E}_S]={n\choose k}\cdot 2^{1-{k\choose 2}}.
</math>
Due to the assumption, <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math>, thus there exists a two coloring that none of <math>\mathcal{E}_S</math> occurs, which means  there is no monochromatic <math>K_k</math> subgraph.
}}
 
For <math>k\ge 3</math> and we take <math>n=\lfloor2^{k/2}\rfloor</math>, then
:<math>
\begin{align}
{n\choose k}\cdot 2^{1-{k\choose 2}}
&<
\frac{n^k}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&\le  
\frac{2^{k^2/2}}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&=
&=
\sum_{i=1}^n|S_i|
\frac{2^{1+\frac{k}{2}}}{k!}\\
-\sum_{i<j}|S_i\cap S_j|
&<1.
+\sum_{i<j<k}|S_i\cap S_j\cap S_k|\\
\end{align}
& \quad -\cdots
</math>
+(-1)^{\ell-1}\sum_{i_1<i_2<\cdots<i_\ell}\left|\bigcap_{r=1}^\ell S_{i_r}\right|
By the above theorem, there exists a two-coloring of <math>K_n</math> that there is no monochromatic <math>K_k</math>. Therefore, the Ramsey number <math>R(k,k)>\lfloor2^{k/2}\rfloor</math> for all <math>k\ge 3</math>.
+\cdots
 
+(-1)^{n-1} \left|\bigcap_{i=1}^n S_i\right|.
Note that for sufficiently large <math>k</math>, if <math>n= \lfloor 2^{k/2}\rfloor</math>, then the probability that there exists a monochromatic <math>K_k</math> is bounded by
\end{align}</math>
:<math>
}}
{n\choose k}\cdot 2^{1-{k\choose 2}}
<
\frac{2^{1+\frac{k}{2}}}{k!}
\ll 1,
</math>
which means that a random two-coloring of <math>K_n</math> is very likely not to contain a monochromatic  <math>K_{2\log n}</math>. This gives us a very simple randomized algorithm for finding a two-coloring of <math>K_n</math> without monochromatic <math>K_{2\log n}</math>.
 
= Averaging Principle =
 
===Maximum cut===
 
Given an undirected graph <math>G(V,E)</math>, a set <math>C</math> of edges of <math>G</math> is called a '''cut''' if <math>G</math> is disconnected after removing the edges in <math>C</math>. We can represent a cut by <math>c(S,T)</math> where <math>(S,T)</math> is a bipartition of the vertex set <math>V</math>, and <math>c(S,T)=\{uv\in E\mid u\in S,v\in T\}</math> is the set of edges crossing between <math>S</math> and <math>T</math>.
 
We have seen how to compute min-cut: either by deterministic max-flow algorithm, or by Karger's randomized algorithm. On the other hand, max-cut is hard to compute, because it is '''NP-complete'''. Actually, the weighted version of max-cut is among the [http://en.wikipedia.org/wiki/Karp's_21_NP-complete_problems Karp's 21 NP-complete problems].
 
We now show by the probabilistic method that a max-cut always has at least half the edges.


The principle can be generalized to probability events.
{{Theorem
{{Theorem
|Principle of Inclusion-Exclusion for Probability|
|Theorem|
:Let <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> be <math>n</math> events. Then
:Given an undirected graph <math>G</math> with <math>n</math> vertices and <math>m</math> edges, there is a cut of size at least <math>\frac{m}{2}</math>.
::<math>\begin{align}
\Pr\left[\bigvee_{1\le i\le n}\mathcal{E}_i\right]
&=
\sum_{i=1}^n\Pr[\mathcal{E}_i]
-\sum_{i<j}\Pr[\mathcal{E}_i\wedge \mathcal{E}_j]
+\sum_{i<j<k}\Pr[\mathcal{E}_i\wedge \mathcal{E}_j\wedge \mathcal{E}_k]\\
& \quad -\cdots
+(-1)^{\ell-1}\sum_{i_1<i_2<\cdots<i_\ell}\Pr\left[\bigwedge_{r=1}^\ell \mathcal{E}_{i_r}\right]
+\cdots
+(-1)^{n-1}\Pr\left[\bigwedge_{i=1}^n \mathcal{E}_{i}\right].
\end{align}</math>
}}
}}
{{Proof| Enumerate the vertices in an arbitrary order. Partition the vertex set <math>V</math> into two disjoint sets <math>S</math> and <math>T</math> as follows.
:For each vertex <math>v\in V</math>,
:* independently choose one of <math>S</math> and <math>T</math> with equal probability, and let <math>v</math> join the chosen set.


We only prove the basic case for two events.
For each vertex <math>v\in V</math>, let <math>X_v\in\{S,T\}</math> be the random variable which represents the set that <math>v</math> joins. For each edge <math>uv\in E</math>, let <math>Y_{uv}</math> be the 0-1 random variable which indicates whether <math>uv</math> crosses between <math>S</math> and <math>T</math>. Clearly,
{{Theorem|Lemma|
:<math>
:For any two events <math>\mathcal{E}_1</math> and <math>\mathcal{E}_2</math>,
\Pr[Y_{uv}=1]=\Pr[X_u\neq X_v]=\frac{1}{2}.
::<math>\Pr[\mathcal{E}_1\vee\mathcal{E}_2]=\Pr[\mathcal{E}_1]+\Pr[\mathcal{E}_2]-\Pr[\mathcal{E}_1\wedge\mathcal{E}_2]</math>.
</math>
}}
 
{{Proof| The followings are due to Axiom (A4).
The size of <math>c(S,T)</math> is given by <math>Y=\sum_{uv\in E}Y_{uv}</math>. By the linearity of expectation,
:<math>\begin{align}
:<math>
\Pr[\mathcal{E}_1]
\mathbf{E}[Y]=\sum_{uv\in E}\mathbf{E}[Y_{uv}]=\sum_{uv\in E}\Pr[Y_{uv}=1]=\frac{m}{2}.
&=\Pr[\mathcal{E}_1\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_1\wedge\mathcal{E}_2];\\
</math>
\Pr[\mathcal{E}_2]
Therefore, there exist a bipartition <math>(S,T)</math> of <math>V</math> such that <math>|c(S,T)|\ge\frac{m}{2}</math>, i.e. there exists a cut of <math>G</math> which contains at least <math>\frac{m}{2}</math> edges.
&=\Pr[\mathcal{E}_2\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_1\wedge\mathcal{E}_2];\\
\Pr[\mathcal{E}_1\vee\mathcal{E}_2]
&=\Pr[\mathcal{E}_1\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_2\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_1\wedge\mathcal{E}_2].
\end{align}</math>
The lemma follows directly.
}}
}}


A direct consequence of the lemma is the following theorem, the '''union bound'''.
=Alternations=
 
 
===Independent sets===
An independent set of a graph is a set of vertices with no edges between them. The following theorem gives a lower bound on the size of the largest independent set.
{{Theorem
{{Theorem
|Theorem (Union Bound)|
|Theorem|
:Let <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> be <math>n</math> events. Then
:Let <math>G(V,E)</math> be a graph on <math>n</math> vertices with <math>m</math> edges. Then <math>G</math> has an independent set with at least <math>\frac{n^2}{4m}</math> vertices.
::<math>\begin{align}
\Pr\left[\bigvee_{1\le i\le n}\mathcal{E}_i\right]
&\le
\sum_{i=1}^n\Pr[\mathcal{E}_i].
\end{align}</math>
}}
}}
The name of this inequality is [http://en.wikipedia.org/wiki/Boole's_inequality Boole's inequality]. It is usually referred by its nickname the "union bound". The bound holds for arbitrary events, even if they are dependent. Due to this generality, the union bound is extremely useful in probabilistic analysis.
{{Proof| Let <math>S</math> be a set of vertices constructed as follows:
:For each vertex <math>v\in V</math>:
:* <math>v</math> is included in <math>S</math> independently with probability <math>p</math>,
<math>p</math> to be determined.


= Independence =
Let <math>X=|S|</math>. It is obvious that <math>\mathbf{E}[X]=np</math>.
{{Theorem
 
|Definition (Independent events)|
For each edge <math>e\in E</math>, let <math>Y_{e}</math> be the random variable which indicates whether both endpoints of <math></math> are in <math>S</math>.
:Two events <math>\mathcal{E}_1</math> and <math>\mathcal{E}_2</math> are '''independent''' if and only if  
:<math>
::<math>\begin{align}
\mathbf{E}[Y_{uv}]=\Pr[u\in S\wedge v\in S]=p^2.
\Pr\left[\mathcal{E}_1 \wedge \mathcal{E}_2\right]
</math>
&=
Let <math>Y</math> be the number of edges in the subgraph of <math>G</math> induced by <math>S</math>. It holds that <math>Y=\sum_{e\in E}Y_e</math>. By linearity of expectation,
\Pr[\mathcal{E}_1]\cdot\Pr[\mathcal{E}_2].
:<math>\mathbf{E}[Y]=\sum_{e\in E}\mathbf{E}[Y_e]=mp^2</math>.
\end{align}</math>
 
Note that although <math>S</math> is not necessary an independent set, it can be modified to one if for each edge <math>e</math> of the induced subgraph <math>G(S)</math>, we delete one of the endpoint of <math>e</math> from <math>S</math>. Let <math>S^*</math> be the resulting set. It is obvious that <math>S^*</math> is an independent set since there is no edge left in the induced subgraph <math>G(S^*)</math>.
 
Since there are <math>Y</math> edges in <math>G(S)</math>, there are at most <math>Y</math> vertices in <math>S</math> are deleted to make it become <math>S^*</math>. Therefore, <math>|S^*|\ge X-Y</math>. By linearity of expectation,
:<math>
\mathbf{E}[|S^*|]\ge\mathbf{E}[X-Y]=\mathbf{E}[X]-\mathbf{E}[Y]=np-mp^2.
</math>
The expectation is maximized when <math>p=\frac{n}{2m}</math>, thus
:<math>
\mathbf{E}[|S^*|]\ge n\cdot\frac{n}{2m}-m\left(\frac{n}{2m}\right)^2=\frac{n^2}{4m}.
</math>
There exists an independent set which contains at least <math>\frac{n^2}{4m}</math> vertices.
}}
}}
This definition can be generalized to any number of events:
 
The proof actually propose a randomized algorithm for constructing large independent set:
 
{{Theorem
{{Theorem
|Definition (Independent events)|
|Algorithm|
:Events <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> are '''mutually independent''' if and only if, for any subset <math>I\subseteq\{1,2,\ldots,n\}</math>,
Given a graph on <math>n</math> vertices with <math>m</math> edges, let <math>d=\frac{2m}{n}</math> be the average degree.
::<math>\begin{align}
#For each vertex <math>v\in V</math>, <math>v</math> is included in <math>S</math> independently with probability <math>\frac{1}{d}</math>.
\Pr\left[\bigwedge_{i\in I}\mathcal{E}_i\right]
#For each remaining edge in the induced subgraph <math>G(S)</math>, remove one of the endpoints from <math>S</math>.
&=
\prod_{i\in I}\Pr[\mathcal{E}_i].
\end{align}</math>
}}
}}


Note that in probability theory, the "mutual independence" is <font color="red">not</font> equivalent with "pair-wise independence", which we will learn in the future.
Let <math>S^*</math> be the resulting set. We have shown that <math>S^*</math> is an independent set and <math>\mathbf{E}[|S^*|]\ge\frac{n^2}{4m}</math>.

Revision as of 03:56, 24 July 2011

Counting

Circuit complexity

A boolean function is a function is the form [math]\displaystyle{ f:\{0,1\}^n\rightarrow \{0,1\} }[/math].

Formally, a boolean circuit is a directed acyclic graph. Nodes with indegree zero are input nodes, labeled [math]\displaystyle{ x_1, x_2, \ldots , x_n }[/math]. A circuit has a unique node with outdegree zero, called the output node. Every other node is a gate. There are three types of gates: AND, OR (both with indegree two), and NOT (with indegree one).

Computations in Turing machines can be simulated by circuits, and any boolean function in P can be computed by a circuit with polynomially many gates. Thus, if we can find a function in NP that cannot be computed by any circuit with polynomially many gates, then NP[math]\displaystyle{ \neq }[/math]P.

The following theorem due to Shannon says that functions with exponentially large circuit complexity do exist.

Theorem (Shannon 1949)
There is a boolean function [math]\displaystyle{ f:\{0,1\}^n\rightarrow \{0,1\} }[/math] with circuit complexity greater than [math]\displaystyle{ \frac{2^n}{3n} }[/math].
Proof.
There are [math]\displaystyle{ 2^{2^n} }[/math] boolean functions [math]\displaystyle{ f:\{0,1\}^n\rightarrow \{0,1\} }[/math].

Fix an integer [math]\displaystyle{ t }[/math], we then count the number of circuits with [math]\displaystyle{ t }[/math] gates. By the De Morgan's laws, we can assume that all NOTs are pushed back to the inputs. Each gate has one of the two types (AND or OR), and has two inputs. Each of the inputs to a gate is either a constant 0 or 1, an input variable [math]\displaystyle{ x_i }[/math], an inverted input variable [math]\displaystyle{ \neg x_i }[/math], or the output of another gate; thus, there are at most [math]\displaystyle{ 2+2n+t-1 }[/math] possible gate inputs. It follows that the number of circuits with [math]\displaystyle{ t }[/math] gates is at most [math]\displaystyle{ 2^t(t+2n+1)^{2t} }[/math].

Uniformly choose a boolean function [math]\displaystyle{ f }[/math] at random. Note that each circuit can compute one boolean function (the converse is not true). The probability that [math]\displaystyle{ f }[/math] can be computed by a circuit with [math]\displaystyle{ t }[/math] gates is at most

[math]\displaystyle{ \frac{2^t(t+2n+1)^{2t}}{2^{2^n}}. }[/math]

If [math]\displaystyle{ t=2^n/3n }[/math], then

[math]\displaystyle{ \frac{2^t(t+2n+1)^{2t}}{2^{2^n}}=o(1)\lt 1. }[/math]

Therefore, there exists a boolean function [math]\displaystyle{ f }[/math] which cannot be computed by any circuits with [math]\displaystyle{ 2^n/3n }[/math] gates.

[math]\displaystyle{ \square }[/math]

Note that by Shannon's theorem, not only there exists a boolean function with exponentially large circuit complexity, but almost all boolean functions have exponentially large circuit complexity.

Probabilistic Method

Ramsey number

Recall the Ramsey theorem which states that in a meeting of at least six people, there are either three people knowing each other or three people not knowing each other. In graph theoretical terms, this means that no matter how we color the edges of [math]\displaystyle{ K_6 }[/math] (the complete graph on six vertices), there must be a monochromatic [math]\displaystyle{ K_3 }[/math] (a triangle whose edges have the same color).

Generally, the Ramsey number [math]\displaystyle{ R(k,\ell) }[/math] is the smallest integer [math]\displaystyle{ n }[/math] such that in any two-coloring of the edges of a complete graph on [math]\displaystyle{ n }[/math] vertices [math]\displaystyle{ K_n }[/math] by red and blue, either there is a red [math]\displaystyle{ K_k }[/math] or there is a blue [math]\displaystyle{ K_\ell }[/math].

Ramsey showed in 1929 that [math]\displaystyle{ R(k,\ell) }[/math] is finite for any [math]\displaystyle{ k }[/math] and [math]\displaystyle{ \ell }[/math]. It is extremely hard to compute the exact value of [math]\displaystyle{ R(k,\ell) }[/math]. Here we give a lower bound of [math]\displaystyle{ R(k,k) }[/math] by the probabilistic method.

Theorem (Erdős 1947)
If [math]\displaystyle{ {n\choose k}\cdot 2^{1-{k\choose 2}}\lt 1 }[/math] then it is possible to color the edges of [math]\displaystyle{ K_n }[/math] with two colors so that there is no monochromatic [math]\displaystyle{ K_k }[/math] subgraph.
Proof.
Consider a random two-coloring of edges of [math]\displaystyle{ K_n }[/math] obtained as follows:
  • For each edge of [math]\displaystyle{ K_n }[/math], independently flip a fair coin to decide the color of the edge.

For any fixed set [math]\displaystyle{ S }[/math] of [math]\displaystyle{ k }[/math] vertices, let [math]\displaystyle{ \mathcal{E}_S }[/math] be the event that the [math]\displaystyle{ K_k }[/math] subgraph induced by [math]\displaystyle{ S }[/math] is monochromatic. There are [math]\displaystyle{ {k\choose 2} }[/math] many edges in [math]\displaystyle{ K_k }[/math], therefore

[math]\displaystyle{ \Pr[\mathcal{E}_S]=2\cdot 2^{-{k\choose 2}}=2^{1-{k\choose 2}}. }[/math]

Since there are [math]\displaystyle{ {n\choose k} }[/math] possible choices of [math]\displaystyle{ S }[/math], by the union bound

[math]\displaystyle{ \Pr[\exists S, \mathcal{E}_S]\le {n\choose k}\cdot\Pr[\mathcal{E}_S]={n\choose k}\cdot 2^{1-{k\choose 2}}. }[/math]

Due to the assumption, [math]\displaystyle{ {n\choose k}\cdot 2^{1-{k\choose 2}}\lt 1 }[/math], thus there exists a two coloring that none of [math]\displaystyle{ \mathcal{E}_S }[/math] occurs, which means there is no monochromatic [math]\displaystyle{ K_k }[/math] subgraph.

[math]\displaystyle{ \square }[/math]

For [math]\displaystyle{ k\ge 3 }[/math] and we take [math]\displaystyle{ n=\lfloor2^{k/2}\rfloor }[/math], then

[math]\displaystyle{ \begin{align} {n\choose k}\cdot 2^{1-{k\choose 2}} &\lt \frac{n^k}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\ &\le \frac{2^{k^2/2}}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\ &= \frac{2^{1+\frac{k}{2}}}{k!}\\ &\lt 1. \end{align} }[/math]

By the above theorem, there exists a two-coloring of [math]\displaystyle{ K_n }[/math] that there is no monochromatic [math]\displaystyle{ K_k }[/math]. Therefore, the Ramsey number [math]\displaystyle{ R(k,k)\gt \lfloor2^{k/2}\rfloor }[/math] for all [math]\displaystyle{ k\ge 3 }[/math].

Note that for sufficiently large [math]\displaystyle{ k }[/math], if [math]\displaystyle{ n= \lfloor 2^{k/2}\rfloor }[/math], then the probability that there exists a monochromatic [math]\displaystyle{ K_k }[/math] is bounded by

[math]\displaystyle{ {n\choose k}\cdot 2^{1-{k\choose 2}} \lt \frac{2^{1+\frac{k}{2}}}{k!} \ll 1, }[/math]

which means that a random two-coloring of [math]\displaystyle{ K_n }[/math] is very likely not to contain a monochromatic [math]\displaystyle{ K_{2\log n} }[/math]. This gives us a very simple randomized algorithm for finding a two-coloring of [math]\displaystyle{ K_n }[/math] without monochromatic [math]\displaystyle{ K_{2\log n} }[/math].

Averaging Principle

Maximum cut

Given an undirected graph [math]\displaystyle{ G(V,E) }[/math], a set [math]\displaystyle{ C }[/math] of edges of [math]\displaystyle{ G }[/math] is called a cut if [math]\displaystyle{ G }[/math] is disconnected after removing the edges in [math]\displaystyle{ C }[/math]. We can represent a cut by [math]\displaystyle{ c(S,T) }[/math] where [math]\displaystyle{ (S,T) }[/math] is a bipartition of the vertex set [math]\displaystyle{ V }[/math], and [math]\displaystyle{ c(S,T)=\{uv\in E\mid u\in S,v\in T\} }[/math] is the set of edges crossing between [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math].

We have seen how to compute min-cut: either by deterministic max-flow algorithm, or by Karger's randomized algorithm. On the other hand, max-cut is hard to compute, because it is NP-complete. Actually, the weighted version of max-cut is among the Karp's 21 NP-complete problems.

We now show by the probabilistic method that a max-cut always has at least half the edges.

Theorem
Given an undirected graph [math]\displaystyle{ G }[/math] with [math]\displaystyle{ n }[/math] vertices and [math]\displaystyle{ m }[/math] edges, there is a cut of size at least [math]\displaystyle{ \frac{m}{2} }[/math].
Proof.
Enumerate the vertices in an arbitrary order. Partition the vertex set [math]\displaystyle{ V }[/math] into two disjoint sets [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math] as follows.
For each vertex [math]\displaystyle{ v\in V }[/math],
  • independently choose one of [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math] with equal probability, and let [math]\displaystyle{ v }[/math] join the chosen set.

For each vertex [math]\displaystyle{ v\in V }[/math], let [math]\displaystyle{ X_v\in\{S,T\} }[/math] be the random variable which represents the set that [math]\displaystyle{ v }[/math] joins. For each edge [math]\displaystyle{ uv\in E }[/math], let [math]\displaystyle{ Y_{uv} }[/math] be the 0-1 random variable which indicates whether [math]\displaystyle{ uv }[/math] crosses between [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math]. Clearly,

[math]\displaystyle{ \Pr[Y_{uv}=1]=\Pr[X_u\neq X_v]=\frac{1}{2}. }[/math]

The size of [math]\displaystyle{ c(S,T) }[/math] is given by [math]\displaystyle{ Y=\sum_{uv\in E}Y_{uv} }[/math]. By the linearity of expectation,

[math]\displaystyle{ \mathbf{E}[Y]=\sum_{uv\in E}\mathbf{E}[Y_{uv}]=\sum_{uv\in E}\Pr[Y_{uv}=1]=\frac{m}{2}. }[/math]

Therefore, there exist a bipartition [math]\displaystyle{ (S,T) }[/math] of [math]\displaystyle{ V }[/math] such that [math]\displaystyle{ |c(S,T)|\ge\frac{m}{2} }[/math], i.e. there exists a cut of [math]\displaystyle{ G }[/math] which contains at least [math]\displaystyle{ \frac{m}{2} }[/math] edges.

[math]\displaystyle{ \square }[/math]

Alternations

Independent sets

An independent set of a graph is a set of vertices with no edges between them. The following theorem gives a lower bound on the size of the largest independent set.

Theorem
Let [math]\displaystyle{ G(V,E) }[/math] be a graph on [math]\displaystyle{ n }[/math] vertices with [math]\displaystyle{ m }[/math] edges. Then [math]\displaystyle{ G }[/math] has an independent set with at least [math]\displaystyle{ \frac{n^2}{4m} }[/math] vertices.
Proof.
Let [math]\displaystyle{ S }[/math] be a set of vertices constructed as follows:
For each vertex [math]\displaystyle{ v\in V }[/math]:
  • [math]\displaystyle{ v }[/math] is included in [math]\displaystyle{ S }[/math] independently with probability [math]\displaystyle{ p }[/math],

[math]\displaystyle{ p }[/math] to be determined.

Let [math]\displaystyle{ X=|S| }[/math]. It is obvious that [math]\displaystyle{ \mathbf{E}[X]=np }[/math].

For each edge [math]\displaystyle{ e\in E }[/math], let [math]\displaystyle{ Y_{e} }[/math] be the random variable which indicates whether both endpoints of [math]\displaystyle{ }[/math] are in [math]\displaystyle{ S }[/math].

[math]\displaystyle{ \mathbf{E}[Y_{uv}]=\Pr[u\in S\wedge v\in S]=p^2. }[/math]

Let [math]\displaystyle{ Y }[/math] be the number of edges in the subgraph of [math]\displaystyle{ G }[/math] induced by [math]\displaystyle{ S }[/math]. It holds that [math]\displaystyle{ Y=\sum_{e\in E}Y_e }[/math]. By linearity of expectation,

[math]\displaystyle{ \mathbf{E}[Y]=\sum_{e\in E}\mathbf{E}[Y_e]=mp^2 }[/math].

Note that although [math]\displaystyle{ S }[/math] is not necessary an independent set, it can be modified to one if for each edge [math]\displaystyle{ e }[/math] of the induced subgraph [math]\displaystyle{ G(S) }[/math], we delete one of the endpoint of [math]\displaystyle{ e }[/math] from [math]\displaystyle{ S }[/math]. Let [math]\displaystyle{ S^* }[/math] be the resulting set. It is obvious that [math]\displaystyle{ S^* }[/math] is an independent set since there is no edge left in the induced subgraph [math]\displaystyle{ G(S^*) }[/math].

Since there are [math]\displaystyle{ Y }[/math] edges in [math]\displaystyle{ G(S) }[/math], there are at most [math]\displaystyle{ Y }[/math] vertices in [math]\displaystyle{ S }[/math] are deleted to make it become [math]\displaystyle{ S^* }[/math]. Therefore, [math]\displaystyle{ |S^*|\ge X-Y }[/math]. By linearity of expectation,

[math]\displaystyle{ \mathbf{E}[|S^*|]\ge\mathbf{E}[X-Y]=\mathbf{E}[X]-\mathbf{E}[Y]=np-mp^2. }[/math]

The expectation is maximized when [math]\displaystyle{ p=\frac{n}{2m} }[/math], thus

[math]\displaystyle{ \mathbf{E}[|S^*|]\ge n\cdot\frac{n}{2m}-m\left(\frac{n}{2m}\right)^2=\frac{n^2}{4m}. }[/math]

There exists an independent set which contains at least [math]\displaystyle{ \frac{n^2}{4m} }[/math] vertices.

[math]\displaystyle{ \square }[/math]

The proof actually propose a randomized algorithm for constructing large independent set:

Algorithm

Given a graph on [math]\displaystyle{ n }[/math] vertices with [math]\displaystyle{ m }[/math] edges, let [math]\displaystyle{ d=\frac{2m}{n} }[/math] be the average degree.

  1. For each vertex [math]\displaystyle{ v\in V }[/math], [math]\displaystyle{ v }[/math] is included in [math]\displaystyle{ S }[/math] independently with probability [math]\displaystyle{ \frac{1}{d} }[/math].
  2. For each remaining edge in the induced subgraph [math]\displaystyle{ G(S) }[/math], remove one of the endpoints from [math]\displaystyle{ S }[/math].

Let [math]\displaystyle{ S^* }[/math] be the resulting set. We have shown that [math]\displaystyle{ S^* }[/math] is an independent set and [math]\displaystyle{ \mathbf{E}[|S^*|]\ge\frac{n^2}{4m} }[/math].