Randomized Algorithms (Spring 2010)/Introduction
This is the first lecture of the Randomized Algorithms class. In this lecture, I will use two examples: randomized Quicksort and Karger's min-cut algorithm, to demonstrate what randomized algorithms look like. Two basic principles are used to analyze these algorithms: linearity of expectations, and independence of events.
These two algorithms belong respectively to two important classes of randomized algorithms: Las Vegas algorithms and Monte Carlo algorithms. I will introduce their definitions at the end of this lecture.
Randomized Quicksort
The following is the pseudocode of the famous Quicksort algorithm, whose input is a set [math]\displaystyle{ S }[/math] of numbers.
- if [math]\displaystyle{ |S|\gt 1 }[/math] do:
- pick an element [math]\displaystyle{ x }[/math] from [math]\displaystyle{ S }[/math] as the pivot;
- partition [math]\displaystyle{ S }[/math] into [math]\displaystyle{ S_1 }[/math], [math]\displaystyle{ \{x\} }[/math], and [math]\displaystyle{ S_2 }[/math], where all elements in [math]\displaystyle{ S_1 }[/math] are smaller than [math]\displaystyle{ x }[/math] and all elements in [math]\displaystyle{ S_2 }[/math] are larger than [math]\displaystyle{ x }[/math];
- recursively sort [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math];
The time complexity of this sorting algorithm is measured by the number of comparisons.
For the deterministic quicksort algorithm, the pivot element is usually the element in a fixed position (e.g. the first one) of the [math]\displaystyle{ S }[/math]. This will make the worst-case time complexity [math]\displaystyle{ \Omega(n^2) }[/math], which means there exists a bad case [math]\displaystyle{ S }[/math], sorting which will cost us [math]\displaystyle{ \Omega(n^2) }[/math] comparisons, every time!
It is just so unfair to have an unbeatable input for this brilliant algorithm. So we tweak the algorithm a little bit:
Algorithm: RandQSort
- if [math]\displaystyle{ |S|\gt 1 }[/math] do:
- uniformly pick a random element [math]\displaystyle{ x }[/math] from [math]\displaystyle{ S }[/math] as the pivot;
- partition [math]\displaystyle{ S }[/math] into [math]\displaystyle{ S_1 }[/math], [math]\displaystyle{ \{x\} }[/math], and [math]\displaystyle{ S_2 }[/math], where all elements in [math]\displaystyle{ S_1 }[/math] are smaller than [math]\displaystyle{ x }[/math] and all elements in [math]\displaystyle{ S_2 }[/math] are larger than [math]\displaystyle{ x }[/math];
- recursively sort [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math];
Analysis
Our goal is to analyze the expected number of comparisons during an execution of RandQSort with an arbitrary input [math]\displaystyle{ S }[/math]. We achieve this by measuring the chance that each pair of elements are compared, and summing all of them up due to Linearity of Expectation.
Let [math]\displaystyle{ a_i }[/math] denote the [math]\displaystyle{ i }[/math]th smallest element in [math]\displaystyle{ S }[/math]. Let [math]\displaystyle{ X_{ij}\in\{0,1\} }[/math] be the random variable which indicates whether [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are compared during the execution of RandQSort. That is:
- [math]\displaystyle{ \begin{align} X_{ij} &= \begin{cases} 1 & a_i\mbox{ and }a_j\mbox{ are compared}\\ 0 & \mbox{otherwise} \end{cases}. \end{align} }[/math]
Elements [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are compared only if one of them is chosen as pivot. After comparison they are separated (thus are never compared again). So we have the following observation:
Claim 1: Every pair of [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are compared at most once.
Therefore the sum of [math]\displaystyle{ X_{ij} }[/math] for all pair [math]\displaystyle{ \{i, j\} }[/math] gives the total number of comparisons. The expected number of comparisons is [math]\displaystyle{ \mathbf{E}\left[\sum_{i=1}^n\sum_{j\gt i}X_{ij}\right] }[/math]. Due to Linearity of Expectation, [math]\displaystyle{ \mathbf{E}\left[\sum_{i=1}^n\sum_{j\gt i}X_{ij}\right] = \sum_{i=1}^n\sum_{j\gt i}\mathbf{E}\left[X_{ij}\right] }[/math]. Our next step is to analyze [math]\displaystyle{ \mathbf{E}\left[X_{ij}\right] }[/math] for each [math]\displaystyle{ \{i, j\} }[/math].
By the definition of expectation and [math]\displaystyle{ X_{ij} }[/math],
- [math]\displaystyle{ \begin{align} \mathbf{E}\left[X_{ij}\right] &= 1\cdot \Pr[a_i\mbox{ and }a_j\mbox{ are compared}] + 0\cdot \Pr[a_i\mbox{ and }a_j\mbox{ are not compared}]\\ &= \Pr[a_i\mbox{ and }a_j\mbox{ are compared}]. \end{align} }[/math]
We are going to bound this probability.
Claim 2: [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are compared if and only if one of them is chosen as pivot when they are still in the same subset.
This is easy to verify: just check the algorithm. The next one is a bit complicated.
Claim 3: If [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are still in the same subset then all [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math] are in the same subset.
We can verify this by induction. Initially, [math]\displaystyle{ S }[/math] itself has the property described above; and partitioning any [math]\displaystyle{ S }[/math] with the property into [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math] will preserve the property for both [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math]. Therefore Claim 3 holds.
Combining Claim 2 and 3, we have:
Claim 4: [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are compared only if one of [math]\displaystyle{ \{a_i, a_j\} }[/math] is chosen from [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math].
And apparently,
Claim 5: Every one of [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math] is chosen equal-probably.
This is because our RandQSort chooses the pivot uniformly at random.
Claim 4 and Claim 5 together imply:
- [math]\displaystyle{ \begin{align} \Pr[a_i\mbox{ and }a_j\mbox{ are compared}] &\le \frac{2}{j-i+1}. \end{align} }[/math]
Remark: Perhaps you feel confused about the above argument. You may ask: "The algorithm chooses pivots for many times during the execution. Why in the above argument, it looks like the pivot is chosen only once?" Good question! Let's see what really happens by looking closely.
For any pair [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math], initially [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math] are all in the same set [math]\displaystyle{ S }[/math] (obviously!). During the execution of the algorithm, the set which containing [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math] are shrinking (due to the pivoting), until one of [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math] is chosen, and the set is partitioned into different subsets. We ask for the probability that the chosen one is among [math]\displaystyle{ \{a_i, a_j\} }[/math]. So we really care about "the last" pivoting before [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math] is split. Formally, let [math]\displaystyle{ Y }[/math] be the random variable denoting the pivot element. We know that for each [math]\displaystyle{ a_k\in\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math], [math]\displaystyle{ Y=a_k }[/math] with the same probability, and [math]\displaystyle{ Y\not\in\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math] with an unknown probability (remember that there might be other elements in the same subset with [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math]). The probability we are looking for is actually [math]\displaystyle{ \Pr[Y\in \{a_i, a_j\}\mid Y\in\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}] }[/math], which is always [math]\displaystyle{ \frac{2}{j-i+1} }[/math], provided that [math]\displaystyle{ Y }[/math] is uniform over [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math]. The conditional probability rules out the irrelevant events in a probabilistic argument. |
Summing all up:
- [math]\displaystyle{ \begin{align} \mathbf{E}\left[\sum_{i=1}^n\sum_{j\gt i}X_{ij}\right] &= \sum_{i=1}^n\sum_{j\gt i}\mathbf{E}\left[X_{ij}\right]\\ &\le \sum_{i=1}^n\sum_{j\gt i}\frac{2}{j-i+1}\\ &= \sum_{i=1}^n\sum_{k=2}^{n-i+1}\frac{2}{k} & & (\mbox{Let }k=j-i+1)\\ &\le \sum_{i=1}^n\sum_{k=1}^{n}\frac{2}{k}\\ &= 2n\sum_{k=1}^{n}\frac{1}{k}\\ &= 2n H(n). \end{align} }[/math]
[math]\displaystyle{ H(n) }[/math] is the [math]\displaystyle{ n }[/math]th Harmonic number. It holds that
- [math]\displaystyle{ \begin{align}H(n) = \ln n+O(1)\end{align} }[/math].
Therefore, for an arbitrary input [math]\displaystyle{ S }[/math] of [math]\displaystyle{ n }[/math] numbers, the expected number of comparisons taken by RandQSort to sort [math]\displaystyle{ S }[/math] is [math]\displaystyle{ \mathrm{O}(n\log n) }[/math].
Minimum Cut
Let [math]\displaystyle{ G(V, E) }[/math] be a graph. Suppose that we want to partition the vertex set [math]\displaystyle{ V }[/math] into two parts [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math] such that the number of crossing edges, edges with one endpoint in each part, is as small as possible. This can be described as the following problem: the min-cut problem.
For a connected graph [math]\displaystyle{ G(V, E) }[/math], a cut is a set [math]\displaystyle{ C\subseteq E }[/math] of edges, removal of which causes [math]\displaystyle{ G }[/math] becomes disconnected. The min-cut problem is to find the cut with minimum cardinality. A canonical deterministic algorithm for this problem is through the max-flow min-cut theorem. A global minimum cut is the minimum [math]\displaystyle{ s }[/math]-[math]\displaystyle{ t }[/math] min-cut, which is equal to the minimum [math]\displaystyle{ s }[/math]-[math]\displaystyle{ t }[/math] max-flow.
Do we have to rely on the "advanced" tools like flows? The answer is "no", with a little help of randomness.
Karger's Min-Cut Algorithm
We will introduce an extremely simple algorithm discovered by David Karger. The algorithm works on multigraphs, graphs allowing multiple edges between vertices.
We define an operation on multigraphs called contraction: For a multigraph [math]\displaystyle{ G(V, E) }[/math], for any edge [math]\displaystyle{ uv\in E }[/math], let [math]\displaystyle{ contract(G,uv) }[/math] be a new multigraph constructed as follows: [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math] in [math]\displaystyle{ V }[/math] are replaced by a singe new vertex whose neighbors are all the old neighbors of [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math]. In other words, [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math] are merged into one vertex. The old edges between [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math] are deleted.
Karger's min-cut algorithm is described as follows:
MinCut(multigraph [math]\displaystyle{ G(V, E) }[/math])
- while [math]\displaystyle{ |V|\gt 2 }[/math] do
- choose an edge [math]\displaystyle{ uv\in E }[/math] uniformly at random;
- [math]\displaystyle{ G=contract(G,uv) }[/math];
- return the edges between the only two vertices in [math]\displaystyle{ V }[/math];
A better way to understand Karger's min-cut algorithm is to describe it as randomly merging sets of vertices. Initially, each vertex [math]\displaystyle{ v\in V }[/math] corresponds to a singleton set [math]\displaystyle{ \{v\} }[/math]. At each step, (1) a crossing edge (edge whose endpoints are in different sets) is chosen uniformly at random from all crossing edges; and (2) the two sets connected by the chosen crossing-edge are merged to one set. Repeat this process until there are only two sets. The crossing edges between the two sets are returned.
Analysis
For a multigraph [math]\displaystyle{ G(V, E) }[/math], fixed a minimum cut [math]\displaystyle{ C }[/math] (there might be more than one minimum cuts), we analyze the probability that [math]\displaystyle{ C }[/math] is returned by the MinCut algorithm. [math]\displaystyle{ C }[/math] is returned by MinCut if and only if no edge in [math]\displaystyle{ C }[/math] is contracted during the execution of MinCut. We will bound this probability [math]\displaystyle{ \Pr[\mbox{no edge in }C\mbox{ is contracted}] }[/math].
Lemma 1 - Let [math]\displaystyle{ G(V, E) }[/math] be a multigraph with [math]\displaystyle{ n }[/math] vertices, if the size of the minimum cut of [math]\displaystyle{ G }[/math] is [math]\displaystyle{ k }[/math], then [math]\displaystyle{ |E|\ge nk/2 }[/math].
Proof. - It holds that every vertex has at least [math]\displaystyle{ k }[/math] neighbors, because if there exists [math]\displaystyle{ v }[/math] with [math]\displaystyle{ \lt k }[/math] neighbors, then the [math]\displaystyle{ \lt k }[/math] edges adjacent to [math]\displaystyle{ v }[/math] disconnect [math]\displaystyle{ v }[/math] from the rest of [math]\displaystyle{ G }[/math], forming a cut of size smaller than [math]\displaystyle{ k }[/math]. Therefore [math]\displaystyle{ |E|\ge kn/2 }[/math].
- [math]\displaystyle{ \square }[/math]
Lemma 2 - Let [math]\displaystyle{ G(V, E) }[/math] be a multigraph with [math]\displaystyle{ n }[/math] vertices, and [math]\displaystyle{ C }[/math] a minimum cut of [math]\displaystyle{ G }[/math]. If [math]\displaystyle{ e\not\in C }[/math], then [math]\displaystyle{ C }[/math] is still a minimum cut of [math]\displaystyle{ contract(G, e) }[/math].
Proof. - We first show that no edge in [math]\displaystyle{ C }[/math] is lost during the contraction. Due to the definition of contraction, the only edges removed from [math]\displaystyle{ G }[/math] in a contraction [math]\displaystyle{ contract(G, e) }[/math] are the parallel-edges sharing both endpoints with [math]\displaystyle{ e }[/math]. Since [math]\displaystyle{ e\not\in C }[/math], none of these edges can be in [math]\displaystyle{ C }[/math], or otherwise [math]\displaystyle{ C }[/math] cannot be a minimum cut of [math]\displaystyle{ G }[/math]. Thus every edge in [math]\displaystyle{ C }[/math] remains in [math]\displaystyle{ G }[/math].
- It is then obvious to see that [math]\displaystyle{ C }[/math] is a cut of [math]\displaystyle{ contract(G, e) }[/math]. All paths in a contracted graph can be revived in the original multigraph by inserting the contracted edges into the path, thus a connected [math]\displaystyle{ contract(G, e)-C }[/math] would imply a connected [math]\displaystyle{ G-C }[/math], which contradicts that [math]\displaystyle{ C }[/math] is a cut in [math]\displaystyle{ G }[/math].
- Notice that a cut in a contracted graph must be a cut in the original graph. This can be easily verified by seeing contraction as taking the union of two sets of vertices. Therefore a contraction can never reduce the size of minimum cuts of a multigraph. A minimum cut [math]\displaystyle{ C }[/math] must still be a minimum cut in the contracted graph as long as it is still a cut.
- Concluding the above arguments, we have that [math]\displaystyle{ C }[/math] is a minimum cut of [math]\displaystyle{ contract(G, e) }[/math] for any [math]\displaystyle{ e\not\in C }[/math].
- [math]\displaystyle{ \square }[/math]
Let [math]\displaystyle{ G(V, E) }[/math] be a multigraph, and [math]\displaystyle{ C }[/math] a minimum cut of [math]\displaystyle{ G }[/math].
Initially [math]\displaystyle{ |V|=n }[/math]. After [math]\displaystyle{ (i-1) }[/math] contractions, denote the current multigraph as [math]\displaystyle{ G_i(V_i, E_i) }[/math]. Suppose that no edge in [math]\displaystyle{ C }[/math] has been chosen to be contracted yet. According to Lemma 2, [math]\displaystyle{ C }[/math] must be a minimum cut of the [math]\displaystyle{ G_i }[/math]. Then due to Lemma 1, the current edge number is [math]\displaystyle{ |E_i|\ge |V_i||C|/2 }[/math]. Uniformly choosing an edge [math]\displaystyle{ e\in E_i }[/math] to contract, the probability that the [math]\displaystyle{ i }[/math]th contraction contracts an edge in [math]\displaystyle{ C }[/math] is given by:
- [math]\displaystyle{ \begin{align}\Pr_{e\in E_i}[e\in C] &= \frac{|C|}{|E_i|} &\le |C|\cdot\frac{2}{|V_i||C|} &= \frac{2}{|V_i|}.\end{align} }[/math]
Therefore, assuming that [math]\displaystyle{ C }[/math] is intact after [math]\displaystyle{ (i-1) }[/math] contractions, the probability that [math]\displaystyle{ C }[/math] survives the [math]\displaystyle{ i }[/math]th contraction is at least [math]\displaystyle{ 1-2/|V_i| }[/math]. Note that [math]\displaystyle{ |V_i|=n-i+1 }[/math], because each contraction decrease the vertex number by 1.
In each iteration, the contracted edge is independently chosen from the current graph. The probability that the minimum cut [math]\displaystyle{ C }[/math] survives all [math]\displaystyle{ (n-2) }[/math] contractions is at least
- [math]\displaystyle{ \begin{align} \prod_{i=1}^{n-2}\left(1-\frac{2}{|V_i|}\right) &= \prod_{i=1}^{n-2}\left(1-\frac{2}{n-i+1}\right)\\ &= \prod_{k=3}^{n}\frac{k-2}{k}\\ &= \frac{2}{n(n-1)}. \end{align} }[/math]
Therefore, we prove the following theorem,
Theorem - For any multigraph with [math]\displaystyle{ n }[/math] vertices, the MinCut algorithm returns a minimum cut with probability at least [math]\displaystyle{ \frac{2}{n(n-1)} }[/math].
Run MinCut independently for [math]\displaystyle{ n(n-1)/2 }[/math] times and return the smallest cut returned. The probability that this the minimum cut is found is:
- [math]\displaystyle{ \begin{align} 1-\Pr[\mbox{failed every time}] &= 1-\Pr[\mbox{MinCut fails}]^{n(n-1)/2} \\ &\ge 1- \left(1-\frac{2}{n(n-1)}\right)^{n(n-1)/2} \\ &\ge 1-\frac{1}{e}. \end{align} }[/math]
A constant probability!
Las Vegas and Monte Carlo
- Las Vegas(赌徒)
- A randomized algorithm is called Las Vegas if its output is always correct but its running time is a random variable. Randomized quicksort is an example of Las Vegas algorithm. Its output is always a sorted table, but the running time is random.
- Usually the analysis of a Las Vegas algorithm tries to bound the expected running time, or bound the running time with high probability.
- Monte Carlo(醉汉)
- A randomized algorithm is called Monte Carlo if its running time is fixed, but the correctness of the output is a random variable. Karger's min-cut algorithm is an example of Monte Carlo algorithm.
- The analysis of a Monte Carlo algorithm tries to bound the probability of errors, the probability that the output is incorrect.
What if both the running time and the correctness are random? Sometimes, we also call it Monte Carlo.
A Las Vegas algorithm can be transformed to a Monte Carlo algorithm with fixed time complexity by truncating: fix a time threshold and force terminating once the running time pass the threshold.
A Monte Carlo algorithm can be transformed to a Las Vegas algorithm by independent trials until the correct solution is returned. Note that this method relies on the efficiency of checking the correctness of a solution. The resulting Las Vegas algorithm is efficient only when the correctness can be efficiently verified.