随机算法 (Spring 2013)/Conditional Probability: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>Etone
imported>Etone
No edit summary
 
(24 intermediate revisions by the same user not shown)
Line 1: Line 1:
<font color=red size=3> This part of the lecture note is still under editting. This notice will be removed once the lecture note has been fully updated.</font>
= Conditional Probability =
= Conditional Probability =
In probability theory, the word "condition" is a verb. "Conditioning on the event ..." means that it is assumed that the event occurs.  
In probability theory, the word "condition" is a verb. "Conditioning on the event ..." means that it is assumed that the event occurs.  
Line 103: Line 101:
In the Theorem below, we’ll see that if <math>f\not\equiv 0</math> then the algorithm is incorrect with probability at most <math>\frac{d}{|S|}</math>, where <math>d</math> is the degree of the polynomial <math>f</math>.  
In the Theorem below, we’ll see that if <math>f\not\equiv 0</math> then the algorithm is incorrect with probability at most <math>\frac{d}{|S|}</math>, where <math>d</math> is the degree of the polynomial <math>f</math>.  


{{Theorem|Theorem (Schwartz-Zippel)|
{{Theorem|Schwartz-Zippel Theorem|
: Let <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> be a multivariate polynomial of degree <math>d</math> over a field <math>\mathbb{F}</math> such that <math>f\not\equiv 0</math>. Fix any finite set <math>S\subset\mathbb{F}</math>, and let <math>r_1,r_2\ldots,r_n</math> be chosen uniformly and independently at random from <math>S</math>. Then
: Let <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> be a multivariate polynomial of degree <math>d</math> over a field <math>\mathbb{F}</math> such that <math>f\not\equiv 0</math>. Fix any finite set <math>S\subset\mathbb{F}</math>, and let <math>r_1,r_2\ldots,r_n</math> be chosen uniformly and independently at random from <math>S</math>. Then
::<math>\Pr[f(r_1,f_2,\ldots,r_n)=0]\le\frac{d}{|S|}.</math>
::<math>\Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}.</math>
}}
}}
{{Proof|  
{{Proof|  
Line 159: Line 157:
:<math>
:<math>
\Pr[f(r_1,r_2,\ldots,r_n)=0]
\Pr[f(r_1,r_2,\ldots,r_n)=0]
\le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}.
\le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|},
</math>
which proves the theorem.
 
------
 
In above proof, for the second case that <math>f_k(r_1,\ldots,r_{n-1})\neq 0</math>, we use an "probabilistic arguement" to deal with the random choices in the condition. Here we give a more rigorous proof by enumerating all elementary events in applying the law of total probability. You make your own judgement which proof is better.
 
By the law of total probability,
:<math>
\begin{align}
&\Pr[f(\vec{r})=0]\\
=
&\sum_{x_1,\ldots,x_{n-1}\in S}\Pr[f(\vec{r})=0\mid \forall i<n, r_i=x_i]\cdot\Pr[\forall i<n, r_i=x_i]\\
=
&\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})=0}\Pr[f(\vec{r})=0\mid \forall i<n, r_i=x_i]\cdot\Pr[\forall i<n, r_i=x_i]\\
&+\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq0}\Pr[f(\vec{r})=0\mid \forall i<n, r_i=x_i]\cdot\Pr[\forall i<n, r_i=x_i]\\
\le
&\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})=0}\Pr[\forall i<n, r_i=x_i]\\
&+\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq 0}\Pr[f(x_1,\ldots,x_{n-1},r_n)=0\mid \forall i<n, r_i=x_i]\cdot\Pr[\forall i<n, r_i=x_i]\\
=
&\Pr[f_k(r_1,\ldots,r_{n-1})=0]+\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq 0}\Pr[f(x_1,\ldots,x_{n-1},r_n)=0]\cdot\Pr[\forall i<n, r_i=x_i].
\end{align}
</math>
We have argued that <math>f_k\not\equiv 0</math> and the degree of <math>f_k</math> is <math>d-k</math>. By the induction hypothesis, we have
:<math>
\Pr[f_k(r_1,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}.
</math>
And for every fixed <math>x_1,\ldots,x_{n-1}\in S</math> such that <math>f_k(x_1,\ldots,x_{n-1})\neq 0</math>, we have argued that <math>f(x_1,\ldots,x_{n-1},x_n)</math> is a polynomial in <math>x_n</math> of degree <math>k</math>, thus
:<math>
\Pr[f(x_1,\ldots,x_{n-1},r_n)=0]\le\frac{k}{|S|},
</math>
which holds for all <math>x_1,\ldots,x_{n-1}\in S</math> such that <math>f_k(x_1,\ldots,x_{n-1})\neq 0</math>, therefore the weighted average
:<math>
\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq 0}\Pr[f(x_1,\ldots,x_{n-1},r_n)=0]\cdot\Pr[\forall i<n, r_i=x_i]
\le\frac{k}{|S|}.
</math>
Substituting these inequalities back to the total probability, we have
<math>
\Pr[f(\vec{r})=0]
\le\frac{d-k}{|S|}+\frac{k}{|S|}
=\frac{d}{|S|}.
</math>
</math>
}}
}}


== Bipartite Perfect Matching==
= Min-Cut in a Graph =
Let <math>G(V, E)</math> be a multi-graph, which allows parallel edges between two distinct vertices <math>u</math> and <math>v</math> but does not allow any self-loop, i.e. an edge connect a vertex to itself. Such a multi-graph can be represented as data structures like adjacency matrix <math>A</math>, where <math>A</math> is symmetric (undirected graph) with zero diagonal, and each entry <math>A(u,v)</math> is a nonnegative integer giving the number of edges between vertices <math>u</math> and <math>v</math>.


= Min-Cut in a Graph =
A '''cut''' in a multi-graph <math>G(V,E)</math> is an edge set <math>C\subseteq E</math>, which can be equivalently defined as
Let <math>G(V, E)</math> be a graph. Suppose that we want to partition the vertex set <math>V</math> into two parts <math>S</math> and <math>T</math> such that the number of ''crossing edges'', edges with one endpoint in each part, is as small as possible. This can be described as the following problem: the min-cut problem.
* there exists a nonempty <math>S\subset V</math>, such that <math>C=\{uv\in E\mid u\in S,v\not\in S\}</math>; or
* removing of <math>C</math> disconnects <math>G</math>, that is, <math>G'(V,E\setminus C)</math> disconnects.
 
The '''min-cut''' or minimum cut problem is defined as follows:
*'''Input''': a multi-graph <math>G(V,E)</math>;
*'''Output''': a cut <math>C</math> in <math>G</math> with the minimum size <math>|C|</math>.


For a connected graph <math>G(V, E)</math>, a '''cut''' is a set <math>C\subseteq E</math> of edges, removal of which causes <math>G</math> becomes disconnected. The min-cut problem is to find the cut with minimum cardinality. A canonical deterministic algorithm for this problem is through the [http://en.wikipedia.org/wiki/Max-flow_min-cut_theorem max-flow min-cut theorem]. A global minimum cut is the minimum <math>s</math>-<math>t</math> min-cut, which is equal to the minimum <math>s</math>-<math>t</math> max-flow.
The problem itself is well-defined on simple graph (without parallel edges), and our main goal is indeed solving the min-cut in simple graphs, however, as we shall see the algorithm creates parallel edges during its running, even though we start with a simple graph without parallel edges.


Do we have to rely on the "advanced" tools like flows? The answer is "no", with a little help of randomness.
A canonical deterministic algorithm for this problem is through the [http://en.wikipedia.org/wiki/Max-flow_min-cut_theorem max-flow min-cut theorem]. A global minimum cut is the minimum <math>s</math>-<math>t</math> min-cut, which is equal to the minimum <math>s</math>-<math>t</math> max-flow.


== Karger's Min-Cut Algorithm ==
== Karger's Min-Cut Algorithm ==
We will introduce an extremely simple algorithm discovered by [http://people.csail.mit.edu/karger/ David Karger]. The algorithm works on multigraphs, graphs allowing multiple edges between vertices.  
We will introduce a very simple and elegant algorithm discovered by [http://people.csail.mit.edu/karger/ David Karger].
 
We define an operation on multi-graphs called ''contraction'':
For a multigraph <math>G(V, E)</math>, for any edge <math>uv\in E</math>, let <math>contract(G,uv)</math> be a new multigraph obtained by:
* replacing the vertices <math>u</math> and <math>v</math> by a new vertex <math>x\not\in V</math>;
* for each <math>w\not\in\{u,v\}</math> replacing any edge <math>uw</math> or <math>vw</math> by the edge <math>xw</math>;
* removing all parallel edges between <math>u</math> and <math>v</math> in <math>E</math>;
* the rest of the graph remains unchanged.


We define an operation on multigraphs called ''contraction'':
To conclude, the <math>contract(G,uv)</math> operation merges the two vertices <math>u</math> and <math>v</math> into a new vertex which maintains the old neighborhoods of both <math>u</math> and <math>v</math> except for that all the parallel edges between <math>u</math> and <math>v</math> are removed.
For a multigraph <math>G(V, E)</math>, for any edge <math>uv\in E</math>, let <math>contract(G,uv)</math> be a new multigraph constructed as follows: <math>u</math> and <math>v</math> in <math>V</math> are replaced by a singe new vertex whose neighbors are all the old neighbors of <math>u</math> and <math>v</math>. In other words, <math>u</math> and <math>v</math> are merged into one vertex. The old edges between <math>u</math> and <math>v</math> are deleted.


Karger's min-cut algorithm is described as follows:
Perhaps a better way to look at contraction is to interpret it as union of equivalent classes of vertices. Initially every vertex is in a dinstinct equivalent class. Upon call a <math>contract(G,uv)</math>, the two equivalent classes corresponding to <math>u</math> and <math>v</math> are unioned together, and only those edges crossing between different equivalent classes are counted as valid edges in the graph.
 
{{Theorem|''RandomContract'' (Karger 1993)|
:while <math>|V|>2</math> do
:* choose an edge <math>uv\in E</math> uniformly at random;
:* <math>G=contract(G,uv)</math>;
:return <math>C=E</math> (the parallel edges between the only remaining vertices in <math>V</math>);
}}


'''MinCut(multigraph <math>G(V, E)</math>)'''
A multi-graph can be maintained by appropriate data strucrtures such that each contraction takes <math>O(n)</math> time, where <math>n</math> is the number of vertices, so the algorithm terminates in time <math>O(n^2)</math>. We leave this as an exercise.
* while <math>|V|>2</math> do
** choose an edge <math>uv\in E</math> uniformly at random;
** <math>G=contract(G,uv)</math>;
*return the edges between the only two vertices in <math>V</math>;


----
== Analysis of accuracy ==
A better way to understand Karger's min-cut algorithm is to describe it as randomly merging sets of vertices. Initially, each vertex <math>v\in V</math> corresponds to a singleton set <math>\{v\}</math>.  At each step, (1) a crossing edge (edge whose endpoints are in different sets) is chosen uniformly at random from all crossing edges; and (2) the two sets connected by the chosen crossing-edge are merged to one set. Repeat this process until there are only two sets. The crossing edges between the two sets are returned.
For convenience, we assume that each edge has a unique "identity" <math>e</math>. And when an edge <math>uv\in E</math> is contracted to new vertex <math>x</math>, and each adjacent edge <math>uw</math> of <math>u</math> (or adjacent edge <math>vw</math> of <math>v</math>) is replaced by <math>xw</math>, the identity <math>e</math> of the edge <math>uw</math> (or <math>vw</math>) is transfered to the new edge <math>xw</math> replacing it. When referring a cut <math>C</math>, we consider <math>C</math> as a set of edge identities <math>e</math>, so that a cut <math>C</math> is changed by the algorithm only if some of its edges are removed during contraction.
----


== Analysis ==
We first prove some lemma.
For a multigraph <math>G(V, E)</math>, fixed a minimum cut <math>C</math> (there might be more than one minimum cuts), we analyze the probability that <math>C</math> is returned by the MinCut algorithm. <math>C</math> is returned by MinCut if and only if no edge in <math>C</math> is contracted during the execution of MinCut. We will bound this probability
<math>\Pr[\mbox{no edge in }C\mbox{ is contracted}]</math>.


{{Theorem
{{Theorem
|Lemma 1|
|Lemma 1|
:Let <math>G(V, E)</math> be a multigraph with <math>n</math> vertices, if the size of the minimum cut of <math>G</math> is <math>k</math>, then <math>|E|\ge nk/2</math>.
:If <math>C</math> is a cut in a multi-graph <math>G</math> and <math>e\not\in C</math>, then <math>C</math> is still a cut in <math>G'=contract(G,e)</math>.
}}
}}
{{Proof|  
{{Proof|
:It holds that every vertex has at least <math>k</math> neighbors, because if there exists <math>v</math> with <math><k</math> neighbors, then the <math><k</math> edges adjacent to <math>v</math> disconnect <math>v</math> from the rest of <math>G</math>, forming a cut of size smaller than <math>k</math>. Therefore <math>|E|\ge kn/2</math>.
It is easy to verify that <math>C</math> is a cut in <math>G'=contract(G,e)</math> if none of its edges is lost during the contraction.
Since <math>C</math> is a cut in <math>G(V,E)</math>, there exists a nonempty vertex set <math>S\subset V</math> and its complement <math>\bar{S}=V\setminus S</math> such that <math>C=\{uv\mid u\in S, v\in\bar{S}\}</math>. And if <math>e\not\in C</math>, it must hold that either <math>e\in G[S]</math> or <math>e\in G[\bar{S}]</math> where <math>G[S]</math> and <math>G[\bar{S}]</math> are the subgraphs induced by <math>S</math> and <math>\bar{S}</math> respectively. In both cases none of edges in <math>C</math> is removed in <math>G'=contract(G,e)</math>.  
}}
}}


{{Theorem
{{Theorem
|Lemma 2|
|Lemma 2|
:Let <math>G(V, E)</math> be a multigraph with <math>n</math> vertices, and <math>C</math> a minimum cut of <math>G</math>. If <math>e\not\in C</math>, then <math>C</math> is still a minimum cut of <math>contract(G, e)</math>.
: The size of min-cut in <math>G'=contract(G,e)</math> is at least as large as the size of min-cut in <math>G</math>, i.e. contraction never reduces the size of min-cut.
}}
{{Proof|
: Note that every cut in the contracted graph <math>G'</math> is also a cut in the original graph <math>G</math>.
}}
 
{{Theorem
|Lemma 3|
:If <math>C</math> is a min-cut in a multi-graph <math>G(V,E)</math>, then <math>|E|\ge \frac{|V||C|}{2}</math>.
}}
}}
{{Proof|  
{{Proof|  
:We first show that no edge in <math>C</math> is lost during the contraction. Due to the definition of contraction, the only edges removed from <math>G</math> in a contraction <math>contract(G, e)</math> are the parallel-edges sharing both endpoints with <math>e</math>. Since <math>e\not\in C</math>, none of these edges can be in <math>C</math>, or otherwise <math>C</math> cannot be a minimum cut of <math>G</math>. Thus every edge in <math>C</math> remains in <math>G</math>.
:It must hold that the degree of each vertex <math>v\in V</math> is at least <math>|C|</math>, or otherwise the set of adjacent edges of <math>v</math> forms a cut which separates <math>v</math> from the rest of <math>V</math> and has size less than <math>|C|</math>, contradicting the assumption that <math>|C|</math> is a min-cut. And the bound <math>|E|\ge \frac{|V||C|}{2}</math> follows directly from the fact that every vertex in <math>G</math> has degree at least <math>|C|</math>.
 
:It is then obvious to see that <math>C</math> is a cut of <math>contract(G, e)</math>. All paths in a contracted graph can be revived in the original multigraph by inserting the contracted edges into the path, thus a connected <math>contract(G, e)-C</math> would imply a connected <math>G-C</math>, which contradicts that <math>C</math> is a cut in <math>G</math>.
 
:Notice that a cut in a contracted graph must be a cut in the original graph. This can be easily verified by seeing contraction as taking the union of two sets of vertices. Therefore a contraction can never reduce the size of minimum cuts of a multigraph. A minimum cut <math>C</math> must still be a minimum cut in the contracted graph as long as it is still a cut.
 
:Concluding the above arguments, we have that <math>C</math> is a minimum cut of <math>contract(G, e)</math> for any <math>e\not\in C</math>.
}}
}}


Let <math>G(V, E)</math> be a multigraph, and <math>C</math> a minimum cut of <math>G</math>.
For a multigraph <math>G(V, E)</math>, fixed a minimum cut <math>C</math> (there might be more than one minimum cuts), we analyze the probability that <math>C</math> is returned by the above algorithm.  


Initially <math>|V|=n</math>.
Initially <math>|V|=n</math>. We say that the min-cut <math>C</math> "survives" a random contraction if none of the edges in <math>C</math> is chosen to be contracted.
After <math>(i-1)</math> contractions,  denote the current multigraph as <math>G_i(V_i, E_i)</math>. Suppose that no edge in <math>C</math> has been chosen to be contracted yet. According to Lemma 2, <math>C</math> must be a minimum cut of the <math>G_i</math>. Then due to Lemma 1, the current edge number is <math>|E_i|\ge |V_i||C|/2</math>. Uniformly choosing an edge <math>e\in E_i</math> to contract, the probability that the <math>i</math>th contraction contracts an edge in <math>C</math> is given by:
After <math>(i-1)</math> contractions,  denote the current multigraph as <math>G_i(V_i, E_i)</math>. Supposed that <math>C</math> survives the first <math>(i-1)</math> contractions, according to Lemma 1 and 2, <math>C</math> must be a minimum cut in the current multi-graph <math>G_i</math>. Then due to Lemma 3, the current edge number is <math>|E_i|\ge |V_i||C|/2</math>. Uniformly choosing an edge <math>e\in E_i</math> to contract, the probability that the <math>i</math>th contraction contracts an edge in <math>C</math> is given by:


:<math>\begin{align}\Pr_{e\in E_i}[e\in C] &= \frac{|C|}{|E_i|}  
:<math>\begin{align}\Pr_{e\in E_i}[e\in C] &= \frac{|C|}{|E_i|}  
Line 225: Line 279:
&= \frac{2}{|V_i|}.\end{align}</math>
&= \frac{2}{|V_i|}.\end{align}</math>


Therefore, assuming that <math>C</math> is intact after <math>(i-1)</math> contractions, the probability that <math>C</math> survives the <math>i</math>th contraction is at least <math>1-2/|V_i|</math>. Note that <math>|V_i|=n-i+1</math>, because each contraction decrease the vertex number by 1.
Therefore, conditioning on that <math>C</math> survives the first <math>(i-1)</math> contractions, the probability that <math>C</math> survives the <math>i</math>th contraction is at least <math>1-2/|V_i|</math>. Note that <math>|V_i|=n-i+1</math>, because each contraction decrease the vertex number by 1.


The probability that no edge in the minimum cut <math>C</math> is ever contracted is:
The probability that no edge in the minimum cut <math>C</math> is ever contracted is:
Line 233: Line 287:
&=
&=
\prod_{i=1}^{n-2}\Pr[\,C\mbox{ survives the }i\mbox{-th contraction}\mid C\mbox{ survives the first }(i-1)\mbox{-th contractions}]\\
\prod_{i=1}^{n-2}\Pr[\,C\mbox{ survives the }i\mbox{-th contraction}\mid C\mbox{ survives the first }(i-1)\mbox{-th contractions}]\\
&=
&\ge
\prod_{i=1}^{n-2}\left(1-\frac{2}{|V_i|}\right) \\
\prod_{i=1}^{n-2}\left(1-\frac{2}{|V_i|}\right) \\
&=
&=
Line 242: Line 296:
\end{align}</math>
\end{align}</math>


Therefore, we prove the following theorem,
This gives the following theorem.
 
{{Theorem
{{Theorem
|Theorem|
|Theorem|
: For any multigraph with <math>n</math> vertices, the MinCut algorithm returns a minimum cut with probability at least <math>\frac{2}{n(n-1)}</math>.
: For any multigraph with <math>n</math> vertices, the ''RandomContract'' algorithm returns a minimum cut with probability at least <math>\frac{2}{n(n-1)}</math>.
}}
}}


Run MinCut independently for <math>n(n-1)/2</math> times and return the smallest cut returned. The probability that this the minimum cut is found is:
Run ''RandomContract'' independently for <math>n(n-1)/2</math> times and return the smallest cut returned. The probability that a minimum cut is found is at least:


:<math>\begin{align}
:<math>\begin{align}
1-\Pr[\mbox{failed every time}] &= 1-\Pr[\mbox{MinCut fails}]^{n(n-1)/2} \\
1-\Pr[\mbox{failed every time}] &= 1-\Pr[{RandomContract}\mbox{ fails}]^{n(n-1)/2} \\
&\ge 1- \left(1-\frac{2}{n(n-1)}\right)^{n(n-1)/2} \\
&\ge 1- \left(1-\frac{2}{n(n-1)}\right)^{n(n-1)/2} \\
&\ge 1-\frac{1}{e}.
&\ge 1-\frac{1}{e}.
Line 258: Line 311:


A constant probability!
A constant probability!
== A Corollary by the Probabilistic Method ==
Karger's algorithm and its analysis implies the following combinatorial theorem regarding the number of distinct minimum cuts in a graph.
{{Theorem|Corollary|
:For any graph <math>G(V,E)</math> of <math>n</math> vertices, the number of distinct minimum cuts in <math>G</math> is at most <math>\frac{n(n-2)}{2}</math>.
}}
{{Proof|
For each minimum cut <math>C</math> in <math>G</math>, we define <math>\mathcal{E}_C</math> to be the event that ''RandomContract'' returns <math>C</math>. Due to the analysis of RandomContract, <math>\Pr[\mathcal{E}_C]\ge \frac{2}{n(n-1)}</math>. The events <math>\mathcal{E}_C</math> are mutually disjoint for distinct <math>C</math> and the event that ''RandomContract'' returns a min-cut is the disjoint union of <math>\mathcal{E}_C</math> over all min-cut <math>C</math>. Therefore,
:<math>
\begin{align}
&\Pr[\mbox{ RandomContract returns a min-cut}]\\
=
&\sum_{\mbox{min-cut }C\mbox{ in }G}\Pr[\mathcal{E}_C]\\
\ge
&\sum_{\mbox{min-cut }C\mbox{ in }G}\frac{2}{n(n-1)},
\end{align}
</math>
which must be no greater than 1 for a well-defined probability space. This means the total number of min-cut in <math>G</math> must be no greater than <math>\frac{n(n-1)}{2}</math>.
}}
Note that the statement of this theorem has no randomness at all, however the proof involves a randomized algorithm. This is an example of [http://en.wikipedia.org/wiki/Probabilistic_method the probabilistic method].

Latest revision as of 10:22, 6 March 2013

Conditional Probability

In probability theory, the word "condition" is a verb. "Conditioning on the event ..." means that it is assumed that the event occurs.

Definition (conditional probability)
The conditional probability that event [math]\displaystyle{ \mathcal{E}_1 }[/math] occurs given that event [math]\displaystyle{ \mathcal{E}_2 }[/math] occurs is
[math]\displaystyle{ \Pr[\mathcal{E}_1\mid \mathcal{E}_2]=\frac{\Pr[\mathcal{E}_1\wedge \mathcal{E}_2]}{\Pr[\mathcal{E}_2]}. }[/math]

The conditional probability is well-defined only if [math]\displaystyle{ \Pr[\mathcal{E}_2]\neq0 }[/math].

For independent events [math]\displaystyle{ \mathcal{E}_1 }[/math] and [math]\displaystyle{ \mathcal{E}_2 }[/math], it holds that

[math]\displaystyle{ \Pr[\mathcal{E}_1\mid \mathcal{E}_2]=\frac{\Pr[\mathcal{E}_1\wedge \mathcal{E}_2]}{\Pr[\mathcal{E}_2]} =\frac{\Pr[\mathcal{E}_1]\cdot\Pr[\mathcal{E}_2]}{\Pr[\mathcal{E}_2]} =\Pr[\mathcal{E}_1]. }[/math]

It supports our intuition that for two independent events, whether one of them occurs will not affect the chance of the other.

Law of total probability

The following fact is known as the law of total probability. It computes the probability by averaging over all possible cases.

Theorem (law of total probability)
Let [math]\displaystyle{ \mathcal{E}_1,\mathcal{E}_2,\ldots,\mathcal{E}_n }[/math] be mutually disjoint events, and [math]\displaystyle{ \bigvee_{i=1}^n\mathcal{E}_i=\Omega }[/math] is the sample space.
Then for any event [math]\displaystyle{ \mathcal{E} }[/math],
[math]\displaystyle{ \Pr[\mathcal{E}]=\sum_{i=1}^n\Pr[\mathcal{E}\mid\mathcal{E}_i]\cdot\Pr[\mathcal{E}_i]. }[/math]
Proof.
Since [math]\displaystyle{ \mathcal{E}_1,\mathcal{E}_2,\ldots,\mathcal{E}_n }[/math] are mutually disjoint and [math]\displaystyle{ \bigvee_{i=1}^n\mathcal{E}_i=\Omega }[/math], events [math]\displaystyle{ \mathcal{E}\wedge\mathcal{E}_1,\mathcal{E}\wedge\mathcal{E}_2,\ldots,\mathcal{E}\wedge\mathcal{E}_n }[/math] are also mutually disjoint, and [math]\displaystyle{ \mathcal{E}=\bigvee_{i=1}^n\left(\mathcal{E}\wedge\mathcal{E}_i\right) }[/math]. Then
[math]\displaystyle{ \Pr[\mathcal{E}]=\sum_{i=1}^n\Pr[\mathcal{E}\wedge\mathcal{E}_i], }[/math]

which according to the definition of conditional probability, is [math]\displaystyle{ \sum_{i=1}^n\Pr[\mathcal{E}\mid\mathcal{E}_i]\cdot\Pr[\mathcal{E}_i] }[/math].

[math]\displaystyle{ \square }[/math]

The law of total probability provides us a standard tool for breaking a probability into sub-cases. Sometimes, it helps the analysis.

A Chain of Conditioning

By the definition of conditional probability, [math]\displaystyle{ \Pr[A\mid B]=\frac{\Pr[A\wedge B]}{\Pr[B]} }[/math]. Thus, [math]\displaystyle{ \Pr[A\wedge B] =\Pr[B]\cdot\Pr[A\mid B] }[/math]. This hints us that we can compute the probability of the AND of events by conditional probabilities. Formally, we have the following theorem:

Theorem
Let [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] be any [math]\displaystyle{ n }[/math] events. Then
[math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i=1}^n\mathcal{E}_i\right] &= \prod_{k=1}^n\Pr\left[\mathcal{E}_k \mid \bigwedge_{i\lt k}\mathcal{E}_i\right]. \end{align} }[/math]
Proof.
It holds that [math]\displaystyle{ \Pr[A\wedge B] =\Pr[B]\cdot\Pr[A\mid B] }[/math]. Thus, let [math]\displaystyle{ A=\mathcal{E}_n }[/math] and [math]\displaystyle{ B=\mathcal{E}_1\wedge\mathcal{E}_2\wedge\cdots\wedge\mathcal{E}_{n-1} }[/math], then
[math]\displaystyle{ \begin{align} \Pr[\mathcal{E}_1\wedge\mathcal{E}_2\wedge\cdots\wedge\mathcal{E}_n] &= \Pr[\mathcal{E}_1\wedge\mathcal{E}_2\wedge\cdots\wedge\mathcal{E}_{n-1}]\cdot\Pr\left[\mathcal{E}_n\mid \bigwedge_{i\lt n}\mathcal{E}_i\right]. \end{align} }[/math]

Recursively applying this equation to [math]\displaystyle{ \Pr[\mathcal{E}_1\wedge\mathcal{E}_2\wedge\cdots\wedge\mathcal{E}_{n-1}] }[/math] until there is only [math]\displaystyle{ \mathcal{E}_1 }[/math] left, the theorem is proved.

[math]\displaystyle{ \square }[/math]

Polynomial Identity Testing (PIT)

Consider the following problem of Polynomial Identity Testing (PIT):

  • Input: two [math]\displaystyle{ n }[/math]-variate polynomials [math]\displaystyle{ f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Output: "yes" if [math]\displaystyle{ f\equiv g }[/math], and "no" if otherwise.

The [math]\displaystyle{ \mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] is the of multi-variate polynomials over field [math]\displaystyle{ \mathbb{F} }[/math]. The most natural way to represent an [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math] is to write it as a sum of monomials:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math].

The degree or total degree of a monomial [math]\displaystyle{ a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math] is given by [math]\displaystyle{ i_1+i_2+\cdots+i_n }[/math] and the degree of a polynomial [math]\displaystyle{ f }[/math] is the maximum degree of monomials of nonzero coefficients.

Alternatively, we can consider the following equivalent problem:

  • Input: a polynomial [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Output: "yes" if [math]\displaystyle{ f\equiv 0 }[/math], and "no" if otherwise.

If [math]\displaystyle{ f }[/math] is written explicitly as a sum of monomials, then the problem is trivial. Again we allow [math]\displaystyle{ f }[/math] to be represented in product form.

Example

The Vandermonde matrix [math]\displaystyle{ M=M(x_1,x_2,\ldots,x_n) }[/math] is defined as that [math]\displaystyle{ M_{ij}=x_i^{j-1} }[/math], that is

[math]\displaystyle{ M=\begin{bmatrix} 1 & x_1 & x_1^2 & \dots & x_1^{n-1}\\ 1 & x_2 & x_2^2 & \dots & x_2^{n-1}\\ 1 & x_3 & x_3^2 & \dots & x_3^{n-1}\\ \vdots & \vdots & \vdots & \ddots &\vdots \\ 1 & x_n & x_n^2 & \dots & x_n^{n-1} \end{bmatrix} }[/math].

Let [math]\displaystyle{ f }[/math] be the polynomial defined as

[math]\displaystyle{ f(x_1,\ldots,x_n)=\det(M)=\prod_{j\lt i}(x_i-x_j). }[/math]

It is pretty easy to evaluate [math]\displaystyle{ f(x_1,x_2,\ldots,x_n) }[/math] on any particular [math]\displaystyle{ x_1,x_2,\ldots,x_n }[/math], however it is prohibitively expensive to symbolically expand [math]\displaystyle{ f(x_1,\ldots,x_n) }[/math] to its sum-of-monomial form.

Schwartz-Zippel Theorem

Here is a very simple randomized algorithm, due to Schwartz and Zippel.

Randomized algorithm for multi-variate PIT
  • fix an arbitrary set [math]\displaystyle{ S\subseteq \mathbb{F} }[/math] whose size to be fixed;
  • pick [math]\displaystyle{ r_1,r_2,\ldots,r_n\in S }[/math] uniformly and independently at random;
  • if [math]\displaystyle{ f(\vec{r})=f(r_1,r_2,\ldots,r_n) = 0 }[/math] then return “yes” else return “no”;

This algorithm requires only the evaluation of [math]\displaystyle{ f }[/math] at a single point [math]\displaystyle{ \vec{r} }[/math]. And if [math]\displaystyle{ f\equiv 0 }[/math] it is always correct.

In the Theorem below, we’ll see that if [math]\displaystyle{ f\not\equiv 0 }[/math] then the algorithm is incorrect with probability at most [math]\displaystyle{ \frac{d}{|S|} }[/math], where [math]\displaystyle{ d }[/math] is the degree of the polynomial [math]\displaystyle{ f }[/math].

Schwartz-Zippel Theorem
Let [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] be a multivariate polynomial of degree [math]\displaystyle{ d }[/math] over a field [math]\displaystyle{ \mathbb{F} }[/math] such that [math]\displaystyle{ f\not\equiv 0 }[/math]. Fix any finite set [math]\displaystyle{ S\subset\mathbb{F} }[/math], and let [math]\displaystyle{ r_1,r_2\ldots,r_n }[/math] be chosen uniformly and independently at random from [math]\displaystyle{ S }[/math]. Then
[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}. }[/math]
Proof.

We prove by induction on [math]\displaystyle{ n }[/math] the number of variables.

For [math]\displaystyle{ n=1 }[/math], assuming that [math]\displaystyle{ f\not\equiv 0 }[/math], due to the fundamental theorem of algebra, the degree-[math]\displaystyle{ d }[/math] polynomial [math]\displaystyle{ f(x) }[/math] has at most [math]\displaystyle{ d }[/math] roots, thus

[math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]

Assume the induction hypothesis for a multi-variate polynomial up to [math]\displaystyle{ n-1 }[/math] variable.

An [math]\displaystyle{ n }[/math]-variate polynomial [math]\displaystyle{ f(x_1,x_2,\ldots,x_n) }[/math] can be represented as

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1}) }[/math],

where [math]\displaystyle{ k }[/math] is the largest power of [math]\displaystyle{ x_n }[/math], which means that the degree of [math]\displaystyle{ f_k }[/math] is at most [math]\displaystyle{ d-k }[/math] and [math]\displaystyle{ f_k\not\equiv 0 }[/math].

In particular, we write [math]\displaystyle{ f }[/math] as a sum of two parts:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n) }[/math],

where both [math]\displaystyle{ f_k }[/math] and [math]\displaystyle{ \bar{f} }[/math] are polynomials, such that

  • as argued above, the degree of [math]\displaystyle{ f_k }[/math] is at most [math]\displaystyle{ d-k }[/math] and [math]\displaystyle{ f_k\not\equiv 0 }[/math];
  • [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1}) }[/math], thus [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^{k} }[/math] factor in any term.

By the law of total probability, it holds that

[math]\displaystyle{ \begin{align} &\Pr[f(r_1,r_2,\ldots,r_n)=0]\\ = &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\ &+\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})\neq0]. \end{align} }[/math]

Note that [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1}) }[/math] is a polynomial on [math]\displaystyle{ n-1 }[/math] variables of degree [math]\displaystyle{ d-k }[/math] such that [math]\displaystyle{ f_k\not\equiv 0 }[/math]. By the induction hypothesis, we have

[math]\displaystyle{ \begin{align} (*) &\qquad &\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. \end{align} }[/math]

For the second case, recall that [math]\displaystyle{ \bar{f}(x_1,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^k }[/math] factor in any term, thus the condition [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math] guarantees that

[math]\displaystyle{ f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_n)=g_{r_1,\ldots,r_{n-1}}(x_n) }[/math]

is a single-variate polynomial such that the degree of [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }[/math] is [math]\displaystyle{ k }[/math] and [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math], for which we already known that the probability [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(r_n)=0 }[/math] is at most [math]\displaystyle{ \frac{k}{|S|} }[/math]. Therefore,

[math]\displaystyle{ \begin{align} (**) &\qquad &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|} \end{align} }[/math].

Substituting both [math]\displaystyle{ (*) }[/math] and [math]\displaystyle{ (**) }[/math] back in the total probability, we have

[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0] \le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}, }[/math]

which proves the theorem.


In above proof, for the second case that [math]\displaystyle{ f_k(r_1,\ldots,r_{n-1})\neq 0 }[/math], we use an "probabilistic arguement" to deal with the random choices in the condition. Here we give a more rigorous proof by enumerating all elementary events in applying the law of total probability. You make your own judgement which proof is better.

By the law of total probability,

[math]\displaystyle{ \begin{align} &\Pr[f(\vec{r})=0]\\ = &\sum_{x_1,\ldots,x_{n-1}\in S}\Pr[f(\vec{r})=0\mid \forall i\lt n, r_i=x_i]\cdot\Pr[\forall i\lt n, r_i=x_i]\\ = &\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})=0}\Pr[f(\vec{r})=0\mid \forall i\lt n, r_i=x_i]\cdot\Pr[\forall i\lt n, r_i=x_i]\\ &+\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq0}\Pr[f(\vec{r})=0\mid \forall i\lt n, r_i=x_i]\cdot\Pr[\forall i\lt n, r_i=x_i]\\ \le &\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})=0}\Pr[\forall i\lt n, r_i=x_i]\\ &+\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq 0}\Pr[f(x_1,\ldots,x_{n-1},r_n)=0\mid \forall i\lt n, r_i=x_i]\cdot\Pr[\forall i\lt n, r_i=x_i]\\ = &\Pr[f_k(r_1,\ldots,r_{n-1})=0]+\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq 0}\Pr[f(x_1,\ldots,x_{n-1},r_n)=0]\cdot\Pr[\forall i\lt n, r_i=x_i]. \end{align} }[/math]

We have argued that [math]\displaystyle{ f_k\not\equiv 0 }[/math] and the degree of [math]\displaystyle{ f_k }[/math] is [math]\displaystyle{ d-k }[/math]. By the induction hypothesis, we have

[math]\displaystyle{ \Pr[f_k(r_1,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. }[/math]

And for every fixed [math]\displaystyle{ x_1,\ldots,x_{n-1}\in S }[/math] such that [math]\displaystyle{ f_k(x_1,\ldots,x_{n-1})\neq 0 }[/math], we have argued that [math]\displaystyle{ f(x_1,\ldots,x_{n-1},x_n) }[/math] is a polynomial in [math]\displaystyle{ x_n }[/math] of degree [math]\displaystyle{ k }[/math], thus

[math]\displaystyle{ \Pr[f(x_1,\ldots,x_{n-1},r_n)=0]\le\frac{k}{|S|}, }[/math]

which holds for all [math]\displaystyle{ x_1,\ldots,x_{n-1}\in S }[/math] such that [math]\displaystyle{ f_k(x_1,\ldots,x_{n-1})\neq 0 }[/math], therefore the weighted average

[math]\displaystyle{ \sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq 0}\Pr[f(x_1,\ldots,x_{n-1},r_n)=0]\cdot\Pr[\forall i\lt n, r_i=x_i] \le\frac{k}{|S|}. }[/math]

Substituting these inequalities back to the total probability, we have [math]\displaystyle{ \Pr[f(\vec{r})=0] \le\frac{d-k}{|S|}+\frac{k}{|S|} =\frac{d}{|S|}. }[/math]

[math]\displaystyle{ \square }[/math]

Min-Cut in a Graph

Let [math]\displaystyle{ G(V, E) }[/math] be a multi-graph, which allows parallel edges between two distinct vertices [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math] but does not allow any self-loop, i.e. an edge connect a vertex to itself. Such a multi-graph can be represented as data structures like adjacency matrix [math]\displaystyle{ A }[/math], where [math]\displaystyle{ A }[/math] is symmetric (undirected graph) with zero diagonal, and each entry [math]\displaystyle{ A(u,v) }[/math] is a nonnegative integer giving the number of edges between vertices [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math].

A cut in a multi-graph [math]\displaystyle{ G(V,E) }[/math] is an edge set [math]\displaystyle{ C\subseteq E }[/math], which can be equivalently defined as

  • there exists a nonempty [math]\displaystyle{ S\subset V }[/math], such that [math]\displaystyle{ C=\{uv\in E\mid u\in S,v\not\in S\} }[/math]; or
  • removing of [math]\displaystyle{ C }[/math] disconnects [math]\displaystyle{ G }[/math], that is, [math]\displaystyle{ G'(V,E\setminus C) }[/math] disconnects.

The min-cut or minimum cut problem is defined as follows:

  • Input: a multi-graph [math]\displaystyle{ G(V,E) }[/math];
  • Output: a cut [math]\displaystyle{ C }[/math] in [math]\displaystyle{ G }[/math] with the minimum size [math]\displaystyle{ |C| }[/math].

The problem itself is well-defined on simple graph (without parallel edges), and our main goal is indeed solving the min-cut in simple graphs, however, as we shall see the algorithm creates parallel edges during its running, even though we start with a simple graph without parallel edges.

A canonical deterministic algorithm for this problem is through the max-flow min-cut theorem. A global minimum cut is the minimum [math]\displaystyle{ s }[/math]-[math]\displaystyle{ t }[/math] min-cut, which is equal to the minimum [math]\displaystyle{ s }[/math]-[math]\displaystyle{ t }[/math] max-flow.

Karger's Min-Cut Algorithm

We will introduce a very simple and elegant algorithm discovered by David Karger.

We define an operation on multi-graphs called contraction: For a multigraph [math]\displaystyle{ G(V, E) }[/math], for any edge [math]\displaystyle{ uv\in E }[/math], let [math]\displaystyle{ contract(G,uv) }[/math] be a new multigraph obtained by:

  • replacing the vertices [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math] by a new vertex [math]\displaystyle{ x\not\in V }[/math];
  • for each [math]\displaystyle{ w\not\in\{u,v\} }[/math] replacing any edge [math]\displaystyle{ uw }[/math] or [math]\displaystyle{ vw }[/math] by the edge [math]\displaystyle{ xw }[/math];
  • removing all parallel edges between [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math] in [math]\displaystyle{ E }[/math];
  • the rest of the graph remains unchanged.

To conclude, the [math]\displaystyle{ contract(G,uv) }[/math] operation merges the two vertices [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math] into a new vertex which maintains the old neighborhoods of both [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math] except for that all the parallel edges between [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math] are removed.

Perhaps a better way to look at contraction is to interpret it as union of equivalent classes of vertices. Initially every vertex is in a dinstinct equivalent class. Upon call a [math]\displaystyle{ contract(G,uv) }[/math], the two equivalent classes corresponding to [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math] are unioned together, and only those edges crossing between different equivalent classes are counted as valid edges in the graph.

RandomContract (Karger 1993)
while [math]\displaystyle{ |V|\gt 2 }[/math] do
  • choose an edge [math]\displaystyle{ uv\in E }[/math] uniformly at random;
  • [math]\displaystyle{ G=contract(G,uv) }[/math];
return [math]\displaystyle{ C=E }[/math] (the parallel edges between the only remaining vertices in [math]\displaystyle{ V }[/math]);

A multi-graph can be maintained by appropriate data strucrtures such that each contraction takes [math]\displaystyle{ O(n) }[/math] time, where [math]\displaystyle{ n }[/math] is the number of vertices, so the algorithm terminates in time [math]\displaystyle{ O(n^2) }[/math]. We leave this as an exercise.

Analysis of accuracy

For convenience, we assume that each edge has a unique "identity" [math]\displaystyle{ e }[/math]. And when an edge [math]\displaystyle{ uv\in E }[/math] is contracted to new vertex [math]\displaystyle{ x }[/math], and each adjacent edge [math]\displaystyle{ uw }[/math] of [math]\displaystyle{ u }[/math] (or adjacent edge [math]\displaystyle{ vw }[/math] of [math]\displaystyle{ v }[/math]) is replaced by [math]\displaystyle{ xw }[/math], the identity [math]\displaystyle{ e }[/math] of the edge [math]\displaystyle{ uw }[/math] (or [math]\displaystyle{ vw }[/math]) is transfered to the new edge [math]\displaystyle{ xw }[/math] replacing it. When referring a cut [math]\displaystyle{ C }[/math], we consider [math]\displaystyle{ C }[/math] as a set of edge identities [math]\displaystyle{ e }[/math], so that a cut [math]\displaystyle{ C }[/math] is changed by the algorithm only if some of its edges are removed during contraction.

We first prove some lemma.

Lemma 1
If [math]\displaystyle{ C }[/math] is a cut in a multi-graph [math]\displaystyle{ G }[/math] and [math]\displaystyle{ e\not\in C }[/math], then [math]\displaystyle{ C }[/math] is still a cut in [math]\displaystyle{ G'=contract(G,e) }[/math].
Proof.

It is easy to verify that [math]\displaystyle{ C }[/math] is a cut in [math]\displaystyle{ G'=contract(G,e) }[/math] if none of its edges is lost during the contraction. Since [math]\displaystyle{ C }[/math] is a cut in [math]\displaystyle{ G(V,E) }[/math], there exists a nonempty vertex set [math]\displaystyle{ S\subset V }[/math] and its complement [math]\displaystyle{ \bar{S}=V\setminus S }[/math] such that [math]\displaystyle{ C=\{uv\mid u\in S, v\in\bar{S}\} }[/math]. And if [math]\displaystyle{ e\not\in C }[/math], it must hold that either [math]\displaystyle{ e\in G[S] }[/math] or [math]\displaystyle{ e\in G[\bar{S}] }[/math] where [math]\displaystyle{ G[S] }[/math] and [math]\displaystyle{ G[\bar{S}] }[/math] are the subgraphs induced by [math]\displaystyle{ S }[/math] and [math]\displaystyle{ \bar{S} }[/math] respectively. In both cases none of edges in [math]\displaystyle{ C }[/math] is removed in [math]\displaystyle{ G'=contract(G,e) }[/math].

[math]\displaystyle{ \square }[/math]
Lemma 2
The size of min-cut in [math]\displaystyle{ G'=contract(G,e) }[/math] is at least as large as the size of min-cut in [math]\displaystyle{ G }[/math], i.e. contraction never reduces the size of min-cut.
Proof.
Note that every cut in the contracted graph [math]\displaystyle{ G' }[/math] is also a cut in the original graph [math]\displaystyle{ G }[/math].
[math]\displaystyle{ \square }[/math]
Lemma 3
If [math]\displaystyle{ C }[/math] is a min-cut in a multi-graph [math]\displaystyle{ G(V,E) }[/math], then [math]\displaystyle{ |E|\ge \frac{|V||C|}{2} }[/math].
Proof.
It must hold that the degree of each vertex [math]\displaystyle{ v\in V }[/math] is at least [math]\displaystyle{ |C| }[/math], or otherwise the set of adjacent edges of [math]\displaystyle{ v }[/math] forms a cut which separates [math]\displaystyle{ v }[/math] from the rest of [math]\displaystyle{ V }[/math] and has size less than [math]\displaystyle{ |C| }[/math], contradicting the assumption that [math]\displaystyle{ |C| }[/math] is a min-cut. And the bound [math]\displaystyle{ |E|\ge \frac{|V||C|}{2} }[/math] follows directly from the fact that every vertex in [math]\displaystyle{ G }[/math] has degree at least [math]\displaystyle{ |C| }[/math].
[math]\displaystyle{ \square }[/math]

For a multigraph [math]\displaystyle{ G(V, E) }[/math], fixed a minimum cut [math]\displaystyle{ C }[/math] (there might be more than one minimum cuts), we analyze the probability that [math]\displaystyle{ C }[/math] is returned by the above algorithm.

Initially [math]\displaystyle{ |V|=n }[/math]. We say that the min-cut [math]\displaystyle{ C }[/math] "survives" a random contraction if none of the edges in [math]\displaystyle{ C }[/math] is chosen to be contracted. After [math]\displaystyle{ (i-1) }[/math] contractions, denote the current multigraph as [math]\displaystyle{ G_i(V_i, E_i) }[/math]. Supposed that [math]\displaystyle{ C }[/math] survives the first [math]\displaystyle{ (i-1) }[/math] contractions, according to Lemma 1 and 2, [math]\displaystyle{ C }[/math] must be a minimum cut in the current multi-graph [math]\displaystyle{ G_i }[/math]. Then due to Lemma 3, the current edge number is [math]\displaystyle{ |E_i|\ge |V_i||C|/2 }[/math]. Uniformly choosing an edge [math]\displaystyle{ e\in E_i }[/math] to contract, the probability that the [math]\displaystyle{ i }[/math]th contraction contracts an edge in [math]\displaystyle{ C }[/math] is given by:

[math]\displaystyle{ \begin{align}\Pr_{e\in E_i}[e\in C] &= \frac{|C|}{|E_i|} &\le |C|\cdot\frac{2}{|V_i||C|} &= \frac{2}{|V_i|}.\end{align} }[/math]

Therefore, conditioning on that [math]\displaystyle{ C }[/math] survives the first [math]\displaystyle{ (i-1) }[/math] contractions, the probability that [math]\displaystyle{ C }[/math] survives the [math]\displaystyle{ i }[/math]th contraction is at least [math]\displaystyle{ 1-2/|V_i| }[/math]. Note that [math]\displaystyle{ |V_i|=n-i+1 }[/math], because each contraction decrease the vertex number by 1.

The probability that no edge in the minimum cut [math]\displaystyle{ C }[/math] is ever contracted is:

[math]\displaystyle{ \begin{align} &\quad\,\prod_{i=1}^{n-2}\Pr[\,C\mbox{ survives all }(n-2)\mbox{ contractions }]\\ &= \prod_{i=1}^{n-2}\Pr[\,C\mbox{ survives the }i\mbox{-th contraction}\mid C\mbox{ survives the first }(i-1)\mbox{-th contractions}]\\ &\ge \prod_{i=1}^{n-2}\left(1-\frac{2}{|V_i|}\right) \\ &= \prod_{i=1}^{n-2}\left(1-\frac{2}{n-i+1}\right)\\ &= \prod_{k=3}^{n}\frac{k-2}{k}\\ &= \frac{2}{n(n-1)}. \end{align} }[/math]

This gives the following theorem.

Theorem
For any multigraph with [math]\displaystyle{ n }[/math] vertices, the RandomContract algorithm returns a minimum cut with probability at least [math]\displaystyle{ \frac{2}{n(n-1)} }[/math].

Run RandomContract independently for [math]\displaystyle{ n(n-1)/2 }[/math] times and return the smallest cut returned. The probability that a minimum cut is found is at least:

[math]\displaystyle{ \begin{align} 1-\Pr[\mbox{failed every time}] &= 1-\Pr[{RandomContract}\mbox{ fails}]^{n(n-1)/2} \\ &\ge 1- \left(1-\frac{2}{n(n-1)}\right)^{n(n-1)/2} \\ &\ge 1-\frac{1}{e}. \end{align} }[/math]

A constant probability!

A Corollary by the Probabilistic Method

Karger's algorithm and its analysis implies the following combinatorial theorem regarding the number of distinct minimum cuts in a graph.

Corollary
For any graph [math]\displaystyle{ G(V,E) }[/math] of [math]\displaystyle{ n }[/math] vertices, the number of distinct minimum cuts in [math]\displaystyle{ G }[/math] is at most [math]\displaystyle{ \frac{n(n-2)}{2} }[/math].
Proof.

For each minimum cut [math]\displaystyle{ C }[/math] in [math]\displaystyle{ G }[/math], we define [math]\displaystyle{ \mathcal{E}_C }[/math] to be the event that RandomContract returns [math]\displaystyle{ C }[/math]. Due to the analysis of RandomContract, [math]\displaystyle{ \Pr[\mathcal{E}_C]\ge \frac{2}{n(n-1)} }[/math]. The events [math]\displaystyle{ \mathcal{E}_C }[/math] are mutually disjoint for distinct [math]\displaystyle{ C }[/math] and the event that RandomContract returns a min-cut is the disjoint union of [math]\displaystyle{ \mathcal{E}_C }[/math] over all min-cut [math]\displaystyle{ C }[/math]. Therefore,

[math]\displaystyle{ \begin{align} &\Pr[\mbox{ RandomContract returns a min-cut}]\\ = &\sum_{\mbox{min-cut }C\mbox{ in }G}\Pr[\mathcal{E}_C]\\ \ge &\sum_{\mbox{min-cut }C\mbox{ in }G}\frac{2}{n(n-1)}, \end{align} }[/math]

which must be no greater than 1 for a well-defined probability space. This means the total number of min-cut in [math]\displaystyle{ G }[/math] must be no greater than [math]\displaystyle{ \frac{n(n-1)}{2} }[/math].

[math]\displaystyle{ \square }[/math]

Note that the statement of this theorem has no randomness at all, however the proof involves a randomized algorithm. This is an example of the probabilistic method.