随机算法 (Spring 2014)/Conditional Probability
Contents
Conditional Probability
In probability theory, the word "condition" is a verb. "Conditioning on the event ..." means that it is assumed that the event occurs.
Definition (conditional probability) - The conditional probability that event occurs given that event occurs is
- The conditional probability that event occurs given that event occurs is
The conditional probability is well-defined only if .
For independent events and , it holds that
It supports our intuition that for two independent events, whether one of them occurs will not affect the chance of the other.
Law of total probability
The following fact is known as the law of total probability. It computes the probability by averaging over all possible cases.
Theorem (law of total probability) - Let be mutually disjoint events, and is the sample space.
- Then for any event ,
Proof. Since are mutually disjoint and , events are also mutually disjoint, and . Then which according to the definition of conditional probability, is .
The law of total probability provides us a standard tool for breaking a probability into sub-cases. Sometimes, it helps the analysis.
A Chain of Conditioning
By the definition of conditional probability, . Thus, . This hints us that we can compute the probability of the AND of events by conditional probabilities. Formally, we have the following theorem:
Theorem - Let be any events. Then
- Let be any events. Then
Proof. It holds that . Thus, let and , then Recursively applying this equation to until there is only left, the theorem is proved.
Polynomial Identity Testing (PIT)
Consider the following problem of Polynomial Identity Testing (PIT):
- Input: two -variate polynomials of degree .
- Output: "yes" if , and "no" if otherwise.
The is the ring of multi-variate polynomials over field . The most natural way to represent an -variate polynomial of degree is to write it as a sum of monomials:
- .
The degree or total degree of a monomial is given by and the degree of a polynomial is the maximum degree of monomials of nonzero coefficients.
Alternatively, we can consider the following equivalent problem:
- Input: a polynomial of degree .
- Output: "yes" if , and "no" if otherwise.
If is written explicitly as a sum of monomials, then the problem is trivial. Again we allow to be represented in product form.
Example The Vandermonde matrix is defined as that , that is
- .
Let be the polynomial defined as
It is pretty easy to evaluate on any particular , however it is prohibitively expensive to symbolically expand to its sum-of-monomial form.
Schwartz-Zippel Theorem
Here is a very simple randomized algorithm, due to Schwartz and Zippel.
Randomized algorithm for multi-variate PIT - fix an arbitrary set whose size to be fixed;
- pick uniformly and independently at random;
- if then return “yes” else return “no”;
This algorithm requires only the evaluation of at a single point . And if it is always correct.
In the Theorem below, we’ll see that if then the algorithm is incorrect with probability at most , where is the degree of the polynomial .
Schwartz-Zippel Theorem - Let be a multivariate polynomial of degree over a field such that . Fix any finite set , and let be chosen uniformly and independently at random from . Then
- Let be a multivariate polynomial of degree over a field such that . Fix any finite set , and let be chosen uniformly and independently at random from . Then
Proof. We prove by induction on the number of variables.
For , assuming that , due to the fundamental theorem of algebra, the degree- polynomial has at most roots, thus
Assume the induction hypothesis for a multi-variate polynomial up to variable.
An -variate polynomial can be represented as
- ,
where is the largest power of , which means that the degree of is at most and .
In particular, we write as a sum of two parts:
- ,
where both and are polynomials, such that
- as argued above, the degree of is at most and ;
- , thus has no factor in any term.
By the law of total probability, it holds that
Note that is a polynomial on variables of degree such that . By the induction hypothesis, we have
For the second case, recall that has no factor in any term, thus the condition guarantees that
is a single-variate polynomial such that the degree of is and , for which we already known that the probability is at most . Therefore,
- .
Substituting both and back in the total probability, we have
which proves the theorem.
In above proof, for the second case that , we use an "probabilistic arguement" to deal with the random choices in the condition. Here we give a more rigorous proof by enumerating all elementary events in applying the law of total probability. You make your own judgement which proof is better.
By the law of total probability,
We have argued that and the degree of is . By the induction hypothesis, we have
And for every fixed such that , we have argued that is a polynomial in of degree , thus
which holds for all such that , therefore the weighted average
Substituting these inequalities back to the total probability, we have
Min-Cut in a Graph
Let be a multi-graph, which allows parallel edges between two distinct vertices and but does not allow any self-loop, i.e. an edge connect a vertex to itself. Such a multi-graph can be represented as data structures like adjacency matrix , where is symmetric (undirected graph) with zero diagonal, and each entry is a nonnegative integer giving the number of edges between vertices and .
A cut in a multi-graph is an edge set , which can be equivalently defined as
- there exists a nonempty , such that ; or
- removing of disconnects , that is, disconnects.
The min-cut or minimum cut problem is defined as follows:
- Input: a multi-graph ;
- Output: a cut in with the minimum size .
The problem itself is well-defined on simple graph (without parallel edges), and our main goal is indeed solving the min-cut in simple graphs, however, as we shall see the algorithm creates parallel edges during its running, even though we start with a simple graph without parallel edges.
A canonical deterministic algorithm for this problem is through the max-flow min-cut theorem. A global minimum cut is the minimum - min-cut, which is equal to the minimum - max-flow.
Karger's Min-Cut Algorithm
We will introduce a very simple and elegant algorithm discovered by David Karger.
We define an operation on multi-graphs called contraction: For a multigraph , for any edge , let be a new multigraph obtained by:
- replacing the vertices and by a new vertex ;
- for each replacing any edge or by the edge ;
- removing all parallel edges between and in ;
- the rest of the graph remains unchanged.
To conclude, the operation merges the two vertices and into a new vertex which maintains the old neighborhoods of both and except for that all the parallel edges between and are removed.
Perhaps a better way to look at contraction is to interpret it as union of equivalent classes of vertices. Initially every vertex is in a dinstinct equivalent class. Upon call a , the two equivalent classes corresponding to and are unioned together, and only those edges crossing between different equivalent classes are counted as valid edges in the graph.
RandomContract (Karger 1993) - while do
- choose an edge uniformly at random;
- ;
- return (the parallel edges between the only remaining vertices in );
- while do
A multi-graph can be maintained by appropriate data strucrtures such that each contraction takes time, where is the number of vertices, so the algorithm terminates in time . We leave this as an exercise.
Analysis of accuracy
For convenience, we assume that each edge has a unique "identity" . And when an edge is contracted to new vertex , and each adjacent edge of (or adjacent edge of ) is replaced by , the identity of the edge (or ) is transfered to the new edge replacing it. When referring a cut , we consider as a set of edge identities , so that a cut is changed by the algorithm only if some of its edges are removed during contraction.
We first prove some lemma.
Lemma 1 - If is a cut in a multi-graph and , then is still a cut in .
Proof. It is easy to verify that is a cut in if none of its edges is lost during the contraction. Since is a cut in , there exists a nonempty vertex set and its complement such that . And if , it must hold that either or where and are the subgraphs induced by and respectively. In both cases none of edges in is removed in .
Lemma 2 - The size of min-cut in is at least as large as the size of min-cut in , i.e. contraction never reduces the size of min-cut.
Proof. - Note that every cut in the contracted graph is also a cut in the original graph .
Lemma 3 - If is a min-cut in a multi-graph , then .
Proof. - It must hold that the degree of each vertex is at least , or otherwise the set of adjacent edges of forms a cut which separates from the rest of and has size less than , contradicting the assumption that is a min-cut. And the bound follows directly from the fact that every vertex in has degree at least .
For a multigraph , fixed a minimum cut (there might be more than one minimum cuts), we analyze the probability that is returned by the above algorithm.
Initially . We say that the min-cut "survives" a random contraction if none of the edges in is chosen to be contracted. After contractions, denote the current multigraph as . Supposed that survives the first contractions, according to Lemma 1 and 2, must be a minimum cut in the current multi-graph . Then due to Lemma 3, the current edge number is . Uniformly choosing an edge to contract, the probability that the th contraction contracts an edge in is given by:
Therefore, conditioning on that survives the first contractions, the probability that survives the th contraction is at least . Note that , because each contraction decrease the vertex number by 1.
The probability that no edge in the minimum cut is ever contracted is:
This gives the following theorem.
Theorem - For any multigraph with vertices, the RandomContract algorithm returns a minimum cut with probability at least .
Run RandomContract independently for times and return the smallest cut returned. The probability that a minimum cut is found is at least:
A constant probability!
A Corollary by the Probabilistic Method
Karger's algorithm and its analysis implies the following combinatorial theorem regarding the number of distinct minimum cuts in a graph.
Corollary - For any graph of vertices, the number of distinct minimum cuts in is at most .
Proof. For each minimum cut in , we define to be the event that RandomContract returns . Due to the analysis of RandomContract, . The events are mutually disjoint for distinct and the event that RandomContract returns a min-cut is the disjoint union of over all min-cut . Therefore,
which must be no greater than 1 for a well-defined probability space. This means the total number of min-cut in must be no greater than .
Note that the statement of this theorem has no randomness at all, however the proof involves a randomized algorithm. This is an example of the probabilistic method.
Fast Min-Cut
In the analysis of RandomContract, we have the following observation:
- The probability of success is only getting worse when the graph becomes small.
This motivates us to consider the following alternation to the algorithm: first using random contractions to reduce the number of vertices to a moderately small number, and then recursively finding a min-cut in this smaller instance. This seems just a restatement of exactly what we have been doing. Inspired by the idea of boosting the accuracy via independent repetition, here we apply the recursion on two smaller instances generated independently.
The algorithm obtained in this way is called FastCut. We first define a procedure to randomly contract edges until there are number of vertices left.
RandomContract - while do
- choose an edge uniformly at random;
- ;
- return ;
- while do
The FastCut algorithm is recursively defined as follows.
FastCut - if then return a mincut by brute force;
- else let ;
- ;
- ;
- return the smaller one of and ;
As before, all are multigraphs.
Let be a min-cut in the original multigraph . By the same analysis as in the case of RandomContract, we have
When , this probability is at least .
We use to denote the probability that is returned by , where is a multigraph of vertices. We then have the following recursion for .
where the last inequality is due to the fact that and our previous discussions in the analysis of RandomContract that if the min-cut survives all first contractions then must be a min-cut in the remaining multigraph.
The base case is that for . Solving this recursion of (or proving by induction) gives us that
Recall that we can implement an edge contraction in time, thus it is easy to verify the following recursion of time complexity:
where denotes the running time of on a multigraph of vertices.
Solving the recursion of with the base case for , we have .
Therefore, for a multigraph of vertices, the algorithm returns a min-cut in with probability in time . Repeat this independently for times, we have an algorithm which runs in time and returns a min-cut with probability , a high probability.