imported>Etone |
imported>Etone |
Line 1: |
Line 1: |
| =Random Quicksort= | | == Problem 1 == |
| Given as input a set <math>S</math> of <math>n</math> numbers, we want to sort the numbers in <math>S</math> in increasing order. One of the most famous algorithm for this problem is the [http://en.wikipedia.org/wiki/Quicksort Quicksort] algorithm.
| | (Due to D.R. Karger and R. Motwani.) |
| * if <math>|S|>1</math> do:
| |
| ** pick an <math>x\in S</math> as the ''pivot'';
| |
| ** partition <math>S</math> into <math>S_1</math>, <math>\{x\}</math>, and <math>S_2</math>, where all numbers in <math>S_1</math> are smaller than <math>x</math> and all numbers in <math>S_2</math> are larger than <math>x</math>;
| |
| ** recursively sort <math>S_1</math> and <math>S_2</math>;
| |
|
| |
|
| The time complexity of this sorting algorithm is measured by the '''number of comparisons'''.
| | # Let <math>S,T</math> be two disjoint subsets of a universe <math>U</math> such that <math>|S|=|T|=n</math>. Suppose we select a random set <math>R\subseteq U</math> by independently sampling each element of <math>U</math> with probability <math>p</math>. We say that the random sample <math>R</math> is <i>good</i> if the following two conditions hold: <math>R\cap S=\emptyset</math> and <math>R\cap T\ne\emptyset</math>. Show that for <math>p=1/n</math>, the probability that <math>R</math> is good is larger than some positive constant. |
| | # Suppose now that the random set <math>R</math> is chosen by sampling the elements of <math>U</math> with only <i>pairwise</i> independence. Show that for a suitable choice of the value of <math>p</math>, the probability that <math>R</math> is good is larger than some positive constant. |
|
| |
|
| For the '''deterministic''' quicksort algorithm, the pivot is picked from a fixed position (e.g. the first number in the array). The worst-case time complexity in terms of number of comparisons is <math>\Theta(n^2)</math>.
| | == Problem 2 == |
|
| |
|
| We consider the following randomized version of the quicksort.
| | Let <math>X_1,X_2,\dots,X_n</math> be independent geometrically distributed random variables each having expectation 2 (each of the <math>X_i</math> is an independent experiment counting the number of tosses of an unbiased coin up to and including the first HEADS). Let <math>X=\sum_{i=1}^nX_i</math> and <math>\delta</math> be a positive real constant. Derive the best upper bound you can on <math>\Pr[X>(1+\delta)(2n)]</math>. |
| * if <math>|S|>1</math> do:
| |
| ** ''uniformly'' pick a ''random'' <math>x\in S</math> as the pivot;
| |
| ** partition <math>S</math> into <math>S_1</math>, <math>\{x\}</math>, and <math>S_2</math>, where all numbers in <math>S_1</math> are smaller than <math>x</math> and all numbers in <math>S_2</math> are larger than <math>x</math>;
| |
| ** recursively sort <math>S_1</math> and <math>S_2</math>;
| |
|
| |
|
| == Analysis of Random Quicksort== | | ==Problem 3== |
| Our goal is to analyze the expected number of comparisons during an execution of RandQSort with an arbitrary input <math>S</math>. We achieve this by measuring the chance that each pair of elements are compared, and summing all of them up due to [http://en.wikipedia.org/wiki/Expected_value#Linearity Linearity of Expectation].
| | A '''boolean code''' is a mapping <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math>. Each <math>x\in\{0,1\}^k</math> is called a '''message''' and <math>y=C(x)</math> is called a '''codeword'''. The '''code rate''' <math>r</math> of a code <math>C</math> is <math>r=\frac{k}{n}</math>. A boolean code <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math> is a '''linear code''' if it is a linear transformation, i.e. there is a matrix <math>A\in\{0,1\}^{n\times k}</math> such that <math>C(x)=Ax</math> for any <math>x\in\{0,1\}^k</math>, where the additions and multiplications are defined over the finite field of order two, <math>(\{0,1\},+_{\bmod 2},\times_{\bmod 2})</math>. |
|
| |
|
| Let <math>a_i</math> denote the <math>i</math>th smallest element in <math>S</math>.
| | The '''distance''' between two codeword <math>y_1</math> and <math>y_2</math>, denoted by <math>d(y_1,y_2)</math>, is defined as the Hamming distance between them. Formally, <math>d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)|</math>. The distance of a code <math>C</math> is the minimum distance between any two codewords. Formally, <math>d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2))</math>. |
| Let <math>X_{ij}\in\{0,1\}</math> be the random variable which indicates whether <math>a_i</math> and <math>a_j</math> are compared during the execution of RandQSort. That is:
| |
|
| |
|
| :<math>
| | Usually we want to make both the code rate <math>r</math> and the code distance <math>d</math> as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection. |
| \begin{align}
| |
| X_{ij} &=
| |
| \begin{cases}
| |
| 1 & a_i\mbox{ and }a_j\mbox{ are compared}\\
| |
| 0 & \mbox{otherwise}
| |
| \end{cases}.
| |
| \end{align}
| |
| </math> | |
|
| |
|
| Elements <math>a_i</math> and <math>a_j</math> are compared only if one of them is chosen as pivot. After comparison they are separated (thus are never compared again). So we have the following observations:
| | * Use the probabilistic method to prove that there exists a boolean code <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math> of code rate <math>r</math> and distance <math>\left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n</math>. Try to optimize the constant in <math>\Theta(\cdot)</math>. |
| | | * Prove a similar result for linear boolean codes. |
| '''Observation 1: Every pair of <math>a_i</math> and <math>a_j</math> are compared at most once.'''
| |
| | |
| Therefore the sum of <math>X_{ij}</math> for all pair <math>\{i, j\}</math> gives the total number of comparisons. The expected number of comparisons is <math>\mathbf{E}\left[\sum_{i=1}^n\sum_{j>i}X_{ij}\right]</math>. Due to [http://en.wikipedia.org/wiki/Expected_value#Linearity Linearity of Expectation], <math>\mathbf{E}\left[\sum_{i=1}^n\sum_{j>i}X_{ij}\right] = \sum_{i=1}^n\sum_{j>i}\mathbf{E}\left[X_{ij}\right]</math>.
| |
| Our next step is to analyze <math>\mathbf{E}\left[X_{ij}\right]</math> for each <math>\{i, j\}</math>.
| |
| | |
| By the definition of expectation and <math>X_{ij}</math>,
| |
| | |
| :<math>\begin{align} | |
| \mathbf{E}\left[X_{ij}\right]
| |
| &= 1\cdot \Pr[a_i\mbox{ and }a_j\mbox{ are compared}] + 0\cdot \Pr[a_i\mbox{ and }a_j\mbox{ are not compared}]\\
| |
| &= \Pr[a_i\mbox{ and }a_j\mbox{ are compared}].
| |
| \end{align}</math>
| |
| | |
| We are going to bound this probability.
| |
| | |
| '''Observation 2: <math>a_i</math> and <math>a_j</math> are compared if and only if one of them is chosen as pivot when they are still in the same subset.'''
| |
| | |
| This is easy to verify: just check the algorithm. The next one is a bit complicated.
| |
| | |
| '''Observation 3: If <math>a_i</math> and <math>a_j</math> are still in the same subset then all <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math> are in the same subset.'''
| |
| | |
| We can verify this by induction. Initially, <math>S</math> itself has the property described above; and partitioning any <math>S</math> with the property into <math>S_1</math> and <math>S_2</math> will preserve the property for both <math>S_1</math> and <math>S_2</math>. Therefore Claim 3 holds.
| |
| | |
| Combining Observation 2 and 3, we have:
| |
| | |
| '''Observation 4: <math>a_i</math> and <math>a_j</math> are compared only if one of <math>\{a_i, a_j\}</math> is chosen from <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math>.'''
| |
| | |
| And,
| |
| | |
| '''Observation 5: Every one of <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math> is chosen equal-probably.'''
| |
| | |
| This is because the Random Quicksort chooses the pivot ''uniformly at random''.
| |
| | |
| Observation 4 and 5 together imply:
| |
| | |
| :<math>\begin{align}
| |
| \Pr[a_i\mbox{ and }a_j\mbox{ are compared}]
| |
| &\le \frac{2}{j-i+1}.
| |
| \end{align}</math>
| |
| | |
| {|border="1"
| |
| |'''Remark:''' Perhaps you feel confused about the above argument. You may ask: "''The algorithm chooses pivots for many times during the execution. Why in the above argument, it looks like the pivot is chosen only once?''" Good question! Let's see what really happens by looking closely.
| |
| | |
| For any pair <math>a_i</math> and <math>a_j</math>, initially <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math> are all in the same set <math>S</math> (obviously!). During the execution of the algorithm, the set which containing <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math> are shrinking (due to the pivoting), until one of <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math> is chosen, and the set is partitioned into different subsets. We ask for the probability that the chosen one is among <math>\{a_i, a_j\}</math>. So we really care about "the last" pivoting before <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math> is split.
| |
| | |
| Formally, let <math>Y</math> be the random variable denoting the pivot element. We know that for each <math>a_k\in\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math>, <math>Y=a_k</math> with the same probability, and <math>Y\not\in\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math> with an unknown probability (remember that there might be other elements in the same subset with <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math>). The probability we are looking for is actually
| |
| <math>\Pr[Y\in \{a_i, a_j\}\mid Y\in\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}]</math>, which is always <math>\frac{2}{j-i+1}</math>, provided that <math>Y</math> is uniform over <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math>.
| |
| | |
| The '''conditional probability''' rules out the ''irrelevant'' events in a probabilistic argument.
| |
| |}
| |
| | |
| Summing all up:
| |
| | |
| :<math>\begin{align}
| |
| \mathbf{E}\left[\sum_{i=1}^n\sum_{j>i}X_{ij}\right]
| |
| &=
| |
| \sum_{i=1}^n\sum_{j>i}\mathbf{E}\left[X_{ij}\right]\\
| |
| &\le \sum_{i=1}^n\sum_{j>i}\frac{2}{j-i+1}\\
| |
| &= \sum_{i=1}^n\sum_{k=2}^{n-i+1}\frac{2}{k} & & (\mbox{Let }k=j-i+1)\\
| |
| &\le \sum_{i=1}^n\sum_{k=1}^{n}\frac{2}{k}\\
| |
| &= 2n\sum_{k=1}^{n}\frac{1}{k}\\
| |
| &= 2n H(n).
| |
| \end{align}</math>
| |
| | |
| <math>H(n)</math> is the <math>n</math>th [http://en.wikipedia.org/wiki/Harmonic_number Harmonic number]. It holds that
| |
| | |
| :<math>\begin{align}H(n) = \ln n+O(1)\end{align}</math>.
| |
| | |
| Therefore, for an arbitrary input <math>S</math> of <math>n</math> numbers, the expected number of comparisons taken by RandQSort to sort <math>S</math> is <math>\mathrm{O}(n\log n)</math>.
| |
| | |
| =Markov's Inequality=
| |
| | |
| One of the most natural information about a random variable is its expectation, which is the first moment of the random variable. Markov's inequality draws a tail bound for a random variable from its expectation.
| |
| {{Theorem
| |
| |Theorem (Markov's Inequality)|
| |
| :Let <math>X</math> be a random variable assuming only nonnegative values. Then, for all <math>t>0</math>,
| |
| ::<math>\begin{align}
| |
| \Pr[X\ge t]\le \frac{\mathbf{E}[X]}{t}.
| |
| \end{align}</math>
| |
| }}
| |
| {{Proof| Let <math>Y</math> be the indicator such that
| |
| :<math>\begin{align}
| |
| Y &=
| |
| \begin{cases}
| |
| 1 & \mbox{if }X\ge t,\\
| |
| 0 & \mbox{otherwise.}
| |
| \end{cases}
| |
| \end{align}</math>
| |
| | |
| It holds that <math>Y\le\frac{X}{t}</math>. Since <math>Y</math> is 0-1 valued, <math>\mathbf{E}[Y]=\Pr[Y=1]=\Pr[X\ge t]</math>. Therefore,
| |
| :<math>
| |
| \Pr[X\ge t]
| |
| =
| |
| \mathbf{E}[Y]
| |
| \le
| |
| \mathbf{E}\left[\frac{X}{t}\right]
| |
| =\frac{\mathbf{E}[X]}{t}.
| |
| </math>
| |
| }}
| |
| | |
| ==Example (from Las Vegas to Monte Carlo)==
| |
| Let <math>A</math> be a Las Vegas randomized algorithm for a decision problem <math>f</math>, whose expected running time is within <math>T(n)</math> on any input of size <math>n</math>. We transform <math>A</math> to a Monte Carlo randomized algorithm <math>B</math> with bounded one-sided error as follows:
| |
| :<math>B(x)</math>:
| |
| :*Run <math>A(x)</math> for <math>2T(n)</math> long where <math>n</math> is the size of <math>x</math>.
| |
| :*If <math>A(x)</math> returned within <math>2T(n)</math> time, then return what <math>A(x)</math> just returned, else return 1.
| |
| | |
| Since <math>A</math> is Las Vegas, its output is always correct, thus <math>B(x)</math> only errs when it returns 1, thus the error is one-sided. The error probability is bounded by the probability that <math>A(x)</math> runs longer than <math>2T(n)</math>. Since the expected running time of <math>A(x)</math> is at most <math>T(n)</math>, due to Markov's inequality,
| |
| :<math>
| |
| \Pr[\mbox{the running time of }A(x)\ge2T(n)]\le\frac{\mathbf{E}[\mbox{running time of }A(x)]}{2T(n)}\le\frac{1}{2},
| |
| </math> | |
| thus the error probability is bounded.
| |
| | |
| ==Generalization ==
| |
| For any random variable <math>X</math>, for an arbitrary non-negative real function <math>h</math>, the <math>h(X)</math> is a non-negative random variable. Applying Markov's inequality, we directly have that
| |
| :<math>
| |
| \Pr[h(X)\ge t]\le\frac{\mathbf{E}[h(X)]}{t}.
| |
| </math>
| |
| | |
| This trivial application of Markov's inequality gives us a powerful tool for proving tail inequalities. With the function <math>h</math> which extracts more information about the random variable, we can prove sharper tail inequalities.
| |
Problem 1
(Due to D.R. Karger and R. Motwani.)
- Let [math]\displaystyle{ S,T }[/math] be two disjoint subsets of a universe [math]\displaystyle{ U }[/math] such that [math]\displaystyle{ |S|=|T|=n }[/math]. Suppose we select a random set [math]\displaystyle{ R\subseteq U }[/math] by independently sampling each element of [math]\displaystyle{ U }[/math] with probability [math]\displaystyle{ p }[/math]. We say that the random sample [math]\displaystyle{ R }[/math] is good if the following two conditions hold: [math]\displaystyle{ R\cap S=\emptyset }[/math] and [math]\displaystyle{ R\cap T\ne\emptyset }[/math]. Show that for [math]\displaystyle{ p=1/n }[/math], the probability that [math]\displaystyle{ R }[/math] is good is larger than some positive constant.
- Suppose now that the random set [math]\displaystyle{ R }[/math] is chosen by sampling the elements of [math]\displaystyle{ U }[/math] with only pairwise independence. Show that for a suitable choice of the value of [math]\displaystyle{ p }[/math], the probability that [math]\displaystyle{ R }[/math] is good is larger than some positive constant.
Problem 2
Let [math]\displaystyle{ X_1,X_2,\dots,X_n }[/math] be independent geometrically distributed random variables each having expectation 2 (each of the [math]\displaystyle{ X_i }[/math] is an independent experiment counting the number of tosses of an unbiased coin up to and including the first HEADS). Let [math]\displaystyle{ X=\sum_{i=1}^nX_i }[/math] and [math]\displaystyle{ \delta }[/math] be a positive real constant. Derive the best upper bound you can on [math]\displaystyle{ \Pr[X\gt (1+\delta)(2n)] }[/math].
Problem 3
A boolean code is a mapping [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math]. Each [math]\displaystyle{ x\in\{0,1\}^k }[/math] is called a message and [math]\displaystyle{ y=C(x) }[/math] is called a codeword. The code rate [math]\displaystyle{ r }[/math] of a code [math]\displaystyle{ C }[/math] is [math]\displaystyle{ r=\frac{k}{n} }[/math]. A boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] is a linear code if it is a linear transformation, i.e. there is a matrix [math]\displaystyle{ A\in\{0,1\}^{n\times k} }[/math] such that [math]\displaystyle{ C(x)=Ax }[/math] for any [math]\displaystyle{ x\in\{0,1\}^k }[/math], where the additions and multiplications are defined over the finite field of order two, [math]\displaystyle{ (\{0,1\},+_{\bmod 2},\times_{\bmod 2}) }[/math].
The distance between two codeword [math]\displaystyle{ y_1 }[/math] and [math]\displaystyle{ y_2 }[/math], denoted by [math]\displaystyle{ d(y_1,y_2) }[/math], is defined as the Hamming distance between them. Formally, [math]\displaystyle{ d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)| }[/math]. The distance of a code [math]\displaystyle{ C }[/math] is the minimum distance between any two codewords. Formally, [math]\displaystyle{ d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2)) }[/math].
Usually we want to make both the code rate [math]\displaystyle{ r }[/math] and the code distance [math]\displaystyle{ d }[/math] as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.
- Use the probabilistic method to prove that there exists a boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] of code rate [math]\displaystyle{ r }[/math] and distance [math]\displaystyle{ \left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n }[/math]. Try to optimize the constant in [math]\displaystyle{ \Theta(\cdot) }[/math].
- Prove a similar result for linear boolean codes.