随机算法 (Spring 2014)/Random Recurrence: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>Etone
imported>Etone
 
(28 intermediate revisions by the same user not shown)
Line 105: Line 105:


=Random Recurrence=
=Random Recurrence=
We consider the '''selection''' problem: given as input a set <math>S</math> of <math>n</math> distinct numbers and an integer <math>1\le k\le n</math>, find the <math>k</math>-th smallest number in <math>S</math>.
We know that this problem can be solved deterministically by the '''''median-of-median''''' algorithm or randomly by the '''''LazySelect''''' algorithm, both in linear time. In the same spirit of random QuickSort, we propose the following random QuickSelection algorithm, called the '''''RandomQS'''''.


{{Theorem|''RandomQS(<math>S</math>,<math>k</math>)'' |
{{Theorem|''RandomQS(<math>S</math>,<math>k</math>)'' |
:if <math>|S|=1</math> return the only element in <math>S</math>;
:pick a ''uniform random'' pivot <math>x\in S</math>;
:pick a ''uniform random'' pivot <math>x\in S</math>;
:construct <math>S_1=\{y\in S\mid y< x\}</math> and <math>S_2=\{y\in S\mid y> x\}</math>;
:construct <math>S_1=\{y\in S\mid y\le x\}</math> and <math>S_2=\{y\in S\mid y> x\}</math>;
:if <math>|S_1|=k-1</math> then return <math>x</math>;
:if <math>|S_1|\ge k</math>: return ''RandomQS''<math>(S_1,k)</math>;
:else if <math>|S_1|\ge k</math> then return ''RandomQS''<math>(S_1,k)</math>;
:if <math>|S_1|<k</math>: return ''RandomQS''<math>(S_2,k-|S_1|)</math>;
:else return ''RandomQS''<math>(S_2,k-|S_1|-1)</math>;
}}
}}
It can be verified as an exercise problem that the expected number of comparisons of '''''RandomQS''''' is <math>O(n)</math>. Alternatively, we ask the following question:
* How many recursive calls to RandomQS are made in expectation in a running of the algorithm?
Let <math>T(n)</math> denote the number of recursive calls in a running of RandomQS. Obviously <math>T(n)</math> is a random variable and it satisfies the following recurrence:
:<math>
T(n)
=
\begin{cases}
1+T(n-X(n)) & n>1\\
0 & n=1
\end{cases}
</math>
where <math>X(n)</math> is a random variable representing the number of elements in the input <math>S</math> which are thrown away in the current call. Specifically, <math>X(n)=|S_2|=n-|S_1|</math> when <math>|S_1|\ge k</math>, and <math>X(n)=|S_1|</math> when <math>|S_1|<k</math>. We are looking for an upper bound on the expectation <math>\mathbf{E}[T(n)]</math>.
==The token game ==
The above recursion can be seen as the following token game:
* Initially we have <math>n</math> tokens.
* In each round, we independently draw a random number <math>0\le X(m)\le m</math> which depends on the current number of tokens <math>m</math> and throw away <math>X(m)</math> tokens.
* The game terminates when there is no token left.
The number of rounds of the game is given by the random variable <math>T(n)</math>.
Needless to say, this procedure has some generality. It is actually pretty common in randomized algorithms. The followings are some examples:
;Survival game for coins (with an application in ''Skip List'')
:Suppose we start with <math>n</math> biased coins, each with probability <math>p</math> being a HEAD. In each round, we independently flip every coins and remove those coins whose outcome is TAIL. We play the game until no coin left. The number of rounds is given by <math>T(n)</math>, satisfying the recursion
::<math>
T(n)=
\begin{cases}
1+T(n-X(n)) & n>0\\
0 & n=0
\end{cases}
</math>
:where <math>X(n)</math> gives the number of TAILs, which follows the binomial distribution <math>B(n,p)</math>.
:This quantity has an application in the analysis of the ''[http://en.wikipedia.org/wiki/Skip_list Skip List]'', a randomized data structure. A Skip List is a multi-level linked list, in which we start with a standard linked list of <math>n</math> nodes, and each node in the list is lifted to the upper level independently with a fixed probability <math>p</math>. This lifting process is recursively applied until there is no node left. The number of levels of a Skip List is given by <math>T(n)</math>.
:[[File:Skiplist.png|border|450px]]
;Coupon collector
:Suppose a coupon collector collecting totally <math>n</math> coupons by uniform independent trials. For each <math>0\le k\le n</math>, let <math>T(k)</math> be the number of trials to take when there are <math>k</math> coupons remaining to collect. <math>T(k)</math> satisfies the recursion
::<math>
T(k)
=
\begin{cases}
1+T(k-X(k)) & k>0\\
0 & k=0
\end{cases}
</math>
where <math>X(k)</math> is the number of trials taken when there are exactly <math>k</math> coupons remaining to collect, which we know satisfy that
::<math>
X(k)
=
\begin{cases}
1 & \text{with probability }\frac{k}{n},\\
0 & \text{with probability }1-\frac{k}{n}.
\end{cases}
</math>
;Geometric distribution
:Any geometric random variable can be described as a <math>T(1)</math> defined as follows
::<math>T(1) =1 +T(1-X(1))</math> and <math>T(0)=0</math>
:where <math>X(1)\in\{0,1\}</math> is a Bernoulli random variable.


== Karp-Upfal-Wigderson bound==
== Karp-Upfal-Wigderson bound==
The following theorem was proved by Karp, Upfa, and Wigderson, and later was improved by Karp.
Today it is known as the Karp-Upfal-Wigderson bound.
{{Theorem
{{Theorem
|Theorem (Karp-Upfal-Wigderson 1988)|
|Theorem (Karp-Upfal-Wigderson 1988)|
Line 123: Line 189:
&=
&=
\begin{cases}
\begin{cases}
1+T(n-X_n) & n>c\\
1+T(n-X(n)) & n>c\\
0 & n=c
0 & n=c
\end{cases}
\end{cases}
\end{align}
\end{align}
</math>
</math>
:where <math>c</math> is a constant and each <math>X_n</math> is a nonnegative integral random variable such that <math>0\le X_n\le n-c</math>.
:where <math>c</math> is a constant and each <math>X(n)</math> is an ''independent drawn'' of a nonnegative integral random variable <math>X_n</math> such that <math>0\le X_n\le n-c</math>.


:If there exists a positive-valued non-degreasing function <math>\,\mu(x)</math> satisfying that <math>\mathbf{E}[X_n]\ge\mu(n)</math> for all <math>n</math>, then
:If there exists a positive-valued non-degreasing function <math>\,\mu(x)</math> satisfying that <math>\mathbf{E}[X_n]\ge\mu(n)</math> for all <math>n</math>, then
Line 137: Line 203:
</math>
</math>
}}
}}
The theorem gives an upper bound on the expected number of rounds of the token game if we can nicely lower bound the expectation of each <math>X_n</math>.
The theorem itself is quite intuitive: for each <math>k</math>, <math>X_k</math> gives the rate at which the number of tokens drops from <math>k</math>. At this rate it takes <math>1/X_k</math> rounds to cross from <math>k</math> to <math>k-1</math> tokens. From the point of view of an interval <math>(x-\epsilon,x]</math>, we do not know which <math>X_k</math> we are using, but for any <math>k\ge x</math> the average rate is lower bounded by <math>\mu(k)\ge \mu(x)</math>. Taking an integration, it is not surprising that it takes at most <math>\int_{c}^n\frac{1}{\mu(x)}\,\mathrm{d}x</math> rounds on average to consume all but <math>c</math> tokens. The only issue in this informal argument is that we do not have <math>\mathbf{E}[1/X]=1/\mathbf{E}[X]</math>, so this is not a rigorous proof. But with bit care we can still manage to prove the theorem rigorously.
{{Proof|  
{{Proof|  
We prove the theorem by applying an induction on <math>n</math>.
We prove the theorem by applying an induction on <math>n</math>.
The bound is trivially true when <math>n=c</math>.  
The bound is trivially true when <math>n=c</math>.  
For <math>n>c</math>, assuming the induction hypothesis for all smaller <math>n</math>, due to the recursion <math>T(n)=1+T(n-X_n)</math>, we have
For <math>n>c</math>, assume the induction hypothesis for all smaller <math>n</math>. Denote that <math>X=X(n)</math>. Due to the recursion <math>T(n)=1+T(n-X(n))</math>, we have
:<math>
:<math>
\begin{align}
\begin{align}
\mathbf{E}[T(n)]
\mathbf{E}[T(n)]
&=
&=
1+\mathbf{E}[T(n-X_n)]\\
1+\mathbf{E}[T(n-X)]\\
&=
&=
1+\mathbf{E}[T(n-X_n)\mid X_n=0]\Pr[X_n=0]+\mathbf{E}[T(n-X_n)\mid X_n> 0]\Pr[X_n>0].
1+\mathbf{E}[T(n-X)\mid X=0]\Pr[X_n=0]+\mathbf{E}[T(n-X)\mid X> 0]\Pr[X>0].
\end{align}
\end{align}
</math>
</math>
Denote that <math>p=\Pr[X_n=0]</math>. We have <math>\Pr[X_n>0]=1-p</math> because <math>X_n</math> is nonnegative. Thus the above equation becomes
Denote that <math>p=\Pr[X=0]</math>. We have <math>\Pr[X>0]=1-p</math> because <math>X=X(n)</math> is nonnegative. Thus the above equation becomes
:<math>
:<math>
\begin{align}
\begin{align}
\mathbf{E}[T(n)]
\mathbf{E}[T(n)]
&=
&=
1+p\mathbf{E}[T(n)]+(1-p)\mathbf{E}[T(n-X_n)\mid X_n> 0],
1+p\mathbf{E}[T(n)]+(1-p)\mathbf{E}[T(n-X)\mid X> 0],
\end{align}
\end{align}
</math>
</math>
which gives us that  
which gives us that  
:<math>\mathbf{E}[T(n)]=\frac{1}{1-p}+\mathbf{E}[T(n-X_n)\mid X_n> 0]</math>.
:<math>\mathbf{E}[T(n)]=\frac{1}{1-p}+\mathbf{E}[T(n-X)\mid X> 0]</math>.
For the conditional expectation, we have
For the conditional expectation, we have
:<math>
:<math>
\begin{align}
\begin{align}
&\mathbf{E}[T(n-X_n)\mid X_n> 0]\\
&\mathbf{E}[T(n-X)\mid X> 0]\\
=
=
&\mathbf{E}[\,\mathbf{E}[T(n-X_n)\mid X_n]\mid X_n> 0]\\
&\mathbf{E}[\,\mathbf{E}[T(n-X)\mid X]\mid X> 0]\\
\le
\le
&\mathbf{E}\left[\int_{c}^{n-X_n}\frac{1}{\mu(x)}\,\mathrm{d}x \,\Big|\, X_n>0\right] & \text{(induction hypothesis)}\\
&\mathbf{E}\left[\int_{c}^{n-X}\frac{1}{\mu(x)}\,\mathrm{d}x \,\Big|\, X>0\right] & \text{(induction hypothesis)}\\
=
=
&\mathbf{E}\left[\int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x - \int_{n-X_n}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x\,\Big|\, X_n>0\right] \\
&\mathbf{E}\left[\int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x - \int_{n-X}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x\,\Big|\, X>0\right] \\
=
=
&\int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x-\mathbf{E}\left[\int_{n-X_n}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x\,\Big|\, X_n>0\right] \\
&\int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x-\mathbf{E}\left[\int_{n-X}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x\,\Big|\, X>0\right] \\
\le
\le
&\int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x-\mathbf{E}\left[\int_{n-X_n}^{n}\frac{1}{\mu(n)}\,\mathrm{d}x\,\Big|\, X_n>0\right] &\text{(}\mu(x)\text{ is non-degreasing)}\\
&\int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x-\mathbf{E}\left[\int_{n-X}^{n}\frac{1}{\mu(n)}\,\mathrm{d}x\,\Big|\, X>0\right] &\text{(}\mu(x)\text{ is non-degreasing)}\\
=
=
&\int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x-\frac{1}{\mu(n)}\mathbf{E}[X_n\mid X_n>0].
&\int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x-\frac{1}{\mu(n)}\mathbf{E}[X\mid X>0].
\end{align}
\end{align}
</math>
</math>
Line 183: Line 253:
=\mathbf{E}[X_n\mid X_n=0]\Pr[X_n=0]+\mathbf{E}[X_n\mid X_n>0]\Pr[X_n>0]=(1-p)\mathbf{E}[X_n\mid X_n>0],
=\mathbf{E}[X_n\mid X_n=0]\Pr[X_n=0]+\mathbf{E}[X_n\mid X_n>0]\Pr[X_n>0]=(1-p)\mathbf{E}[X_n\mid X_n>0],
</math>
</math>
which gives us that <math>\mathbf{E}[X_n\mid X_n>0]=\frac{\mathbf{E}[X_n]}{1-p}\ge\frac{\mu(n)}{1-p}</math>. Substituting this in the above inequality gives us that
which gives us that <math>\mathbf{E}[X\mid X>0]=\frac{\mathbf{E}[X_n]}{1-p}\ge\frac{\mu(n)}{1-p}</math>. Substituting this in the above inequality gives us that
:<math>
:<math>
\begin{align}
\begin{align}
\mathbf{E}[T(n)]
\mathbf{E}[T(n)]
&=
&=
\frac{1}{1-p}+\mathbf{E}[T(n-X_n)\mid X_n> 0]\\
\frac{1}{1-p}+\mathbf{E}[T(n-X)\mid X> 0]\\
&\le  
&\le  
\frac{1}{1-p}+\int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x-\frac{1}{\mu(n)}\cdot\frac{\mu(n)}{1-p}\\
\frac{1}{1-p}+\int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x-\frac{1}{\mu(n)}\cdot\frac{\mu(n)}{1-p}\\
Line 196: Line 266:
</math>
</math>
}}
}}
== Applications of Karp-Upfal-Wigderson bound==
==== Random QuickSelect ====
As shown above, the number <math>T(n)</math> of recursive calls in a running of RandomQS satisfies the recurrence:
:<math>
T(n)
=
\begin{cases}
1+T(n-X(n)) & n>1\\
0 & n=1
\end{cases}
</math>
where <math>X(n)</math> is a random variable representing the number of elements in the input <math>S</math> which are thrown away in the current call.
:<math>
X(n)
=\begin{cases}
n-|S_1| & \text{if }|S_1|\ge k-1\\
|S_1| & \text{if }|S_1|<k-1\\
\end{cases}
</math>
Recall that <math>S_1=\{y\in S\mid y\le x\}</math> is the number of elements in the current input <math>S</math> no larger than the pivot <math>x</math> and <math>x</math> is uniformly chosen from <math>S</math> at random. It is easy to see that no matter for what <math>k</math> it holds that <math>X(n)\ge \min(m,n-m)</math> where <math>m</math> is uniformly random over <math>\{0,1,\ldots n-1\}</math>. An easy calculation gives us that <math>\textbf{E}[X(n)]\ge \frac{n}{4}</math>. By the Karp-Upfal-Wigderson bound, we have
:<math>
\mathbf{E}[T(n)]\le\int_{1}^n\frac{4}{x}\,\mathrm{d}x=4\ln n.
</math>
==== Skip List====
As discussed above, the number of levels of a Skip List for <math>n</math> elements is given by <math>T(n)</math> satisfying that
:<math>
T(n)=
\begin{cases}
1+T(n-X(n)) & n>0\\
0 & n=0
\end{cases}
</math>
where <math>X(n)</math> is the number of TAILs when independently flip <math>n</math> biased coins, where each coin turns HEAD with probability <math>p</math>. Clearly <math>X(n)</math> follows the binomial distribution <math>B(n,1-p)</math>, and its expectation is <math>\mathbf{E}[X(n)]=(1-p)n</math>.
By the Karp-Upfal-Wigderson bound, we have
:<math>
\mathbf{E}[T(n)]\le\int_{0}^n\frac{1}{(1-p)x}\,\mathrm{d}x=\frac{1}{1-p}(\ln n-\ln 0).
</math>
Here the <math>\ln 0</math> will cause a problem. We can fix the problem by observing that for integral <math>n</math>, the expectation <math>\mathbf{E}[X(n)]</math> is actually
:<math>
\mathbf{E}[X(n)]=(1-p)n=(1-p)\lceil n\rceil.
</math>
Thus the real Karp-Upfal-Wigderson bound for this process is
:<math>
\mathbf{E}[T(n)]\le\int_{0}^n\frac{1}{(1-p)\lceil x\rceil}\,\mathrm{d}x=\frac{1}{1-p}\sum_{k=1}^n\frac{1}{k}=\frac{H_n}{1-p},
</math>
where <math>H_n</math> is the [http://en.wikipedia.org/wiki/Harmonic_number harmonic number].
==== Coupon collector ====
We have represented the number of trials in a coupon collector game with <math>n</math> coupons as <math>T(n)</math> defined by the following recursion
:<math>
T(k)
=
\begin{cases}
1+T(k-X(k)) & k>0\\
0 & k=0
\end{cases}
</math>
where <math>X(k)</math> is the number of trials taken when there are exactly <math>k</math> coupons remaining to collect, which we know satisfy that
:<math>
X(k)
=
\begin{cases}
1 & \text{with probability }\frac{k}{n},\\
0 & \text{with probability }1-\frac{k}{n}.
\end{cases}
</math>
Thus <math>\mathbf{E}[X(k)]=\frac{k}{n}=\frac{\lceil k\rceil}{n}</math>. By the Karp-Upfal-Wigderson bound, we have
:<math>
\mathbf{E}[T(n)]\le\int_{0}^n\frac{n}{\lceil x\rceil}\,\mathrm{d}x=n\sum_{k=1}^n\frac{1}{k}=n H_n.
</math>
==== Geometric distribution ====
We know that a geometric random variable can be described as a <math>T(1)</math> defined as follows
::<math>T(1) =1 +T(1-X(1))</math> and <math>T(0)=0</math>
where <math>X(1)\in\{0,1\}</math> is a Bernoulli random variable being 1 with probability <math>p</math>. Then <math>\mathbf{E}[X]=p</math>. By the Karp-Upfal-Wigderson bound, we have
:<math>
\mathbf{E}[T(1)]\le\int_{0}^1\frac{1}{p}\,\mathrm{d}x=\frac{1}{p}.
</math>

Latest revision as of 06:53, 14 April 2014

Random Quicksort

Given as input a set [math]\displaystyle{ S }[/math] of [math]\displaystyle{ n }[/math] numbers, we want to sort the numbers in [math]\displaystyle{ S }[/math] in increasing order. One of the most famous algorithm for this problem is the Quicksort algorithm.

  • if [math]\displaystyle{ |S|\gt 1 }[/math] do:
    • pick an [math]\displaystyle{ x\in S }[/math] as the pivot;
    • partition [math]\displaystyle{ S }[/math] into [math]\displaystyle{ S_1 }[/math], [math]\displaystyle{ \{x\} }[/math], and [math]\displaystyle{ S_2 }[/math], where all numbers in [math]\displaystyle{ S_1 }[/math] are smaller than [math]\displaystyle{ x }[/math] and all numbers in [math]\displaystyle{ S_2 }[/math] are larger than [math]\displaystyle{ x }[/math];
    • recursively sort [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math];

The time complexity of this sorting algorithm is measured by the number of comparisons.

For the deterministic quicksort algorithm, the pivot is picked from a fixed position (e.g. the first number in the array). The worst-case time complexity in terms of number of comparisons is [math]\displaystyle{ \Theta(n^2) }[/math].

We consider the following randomized version of the quicksort.

  • if [math]\displaystyle{ |S|\gt 1 }[/math] do:
    • uniformly pick a random [math]\displaystyle{ x\in S }[/math] as the pivot;
    • partition [math]\displaystyle{ S }[/math] into [math]\displaystyle{ S_1 }[/math], [math]\displaystyle{ \{x\} }[/math], and [math]\displaystyle{ S_2 }[/math], where all numbers in [math]\displaystyle{ S_1 }[/math] are smaller than [math]\displaystyle{ x }[/math] and all numbers in [math]\displaystyle{ S_2 }[/math] are larger than [math]\displaystyle{ x }[/math];
    • recursively sort [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math];

Analysis of Random Quicksort

Our goal is to analyze the expected number of comparisons during an execution of RandQSort with an arbitrary input [math]\displaystyle{ S }[/math]. We achieve this by measuring the chance that each pair of elements are compared, and summing all of them up due to Linearity of Expectation.

Let [math]\displaystyle{ a_i }[/math] denote the [math]\displaystyle{ i }[/math]th smallest element in [math]\displaystyle{ S }[/math]. Let [math]\displaystyle{ X_{ij}\in\{0,1\} }[/math] be the random variable which indicates whether [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are compared during the execution of RandQSort. That is:

[math]\displaystyle{ \begin{align} X_{ij} &= \begin{cases} 1 & a_i\mbox{ and }a_j\mbox{ are compared}\\ 0 & \mbox{otherwise} \end{cases}. \end{align} }[/math]

Elements [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are compared only if one of them is chosen as pivot. After comparison they are separated (thus are never compared again). So we have the following observations:

Observation 1: Every pair of [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are compared at most once.

Therefore the sum of [math]\displaystyle{ X_{ij} }[/math] for all pair [math]\displaystyle{ \{i, j\} }[/math] gives the total number of comparisons. The expected number of comparisons is [math]\displaystyle{ \mathbf{E}\left[\sum_{i=1}^n\sum_{j\gt i}X_{ij}\right] }[/math]. Due to Linearity of Expectation, [math]\displaystyle{ \mathbf{E}\left[\sum_{i=1}^n\sum_{j\gt i}X_{ij}\right] = \sum_{i=1}^n\sum_{j\gt i}\mathbf{E}\left[X_{ij}\right] }[/math]. Our next step is to analyze [math]\displaystyle{ \mathbf{E}\left[X_{ij}\right] }[/math] for each [math]\displaystyle{ \{i, j\} }[/math].

By the definition of expectation and [math]\displaystyle{ X_{ij} }[/math],

[math]\displaystyle{ \begin{align} \mathbf{E}\left[X_{ij}\right] &= 1\cdot \Pr[a_i\mbox{ and }a_j\mbox{ are compared}] + 0\cdot \Pr[a_i\mbox{ and }a_j\mbox{ are not compared}]\\ &= \Pr[a_i\mbox{ and }a_j\mbox{ are compared}]. \end{align} }[/math]

We are going to bound this probability.

Observation 2: [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are compared if and only if one of them is chosen as pivot when they are still in the same subset.

This is easy to verify: just check the algorithm. The next one is a bit complicated.

Observation 3: If [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are still in the same subset then all [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math] are in the same subset.

We can verify this by induction. Initially, [math]\displaystyle{ S }[/math] itself has the property described above; and partitioning any [math]\displaystyle{ S }[/math] with the property into [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math] will preserve the property for both [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math]. Therefore Claim 3 holds.

Combining Observation 2 and 3, we have:

Observation 4: [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are compared only if one of [math]\displaystyle{ \{a_i, a_j\} }[/math] is chosen from [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math].

And,

Observation 5: Every one of [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math] is chosen equal-probably.

This is because the Random Quicksort chooses the pivot uniformly at random.

Observation 4 and 5 together imply:

[math]\displaystyle{ \begin{align} \Pr[a_i\mbox{ and }a_j\mbox{ are compared}] &\le \frac{2}{j-i+1}. \end{align} }[/math]
Remark: Perhaps you feel confused about the above argument. You may ask: "The algorithm chooses pivots for many times during the execution. Why in the above argument, it looks like the pivot is chosen only once?" Good question! Let's see what really happens by looking closely.

For any pair [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math], initially [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math] are all in the same set [math]\displaystyle{ S }[/math] (obviously!). During the execution of the algorithm, the set which containing [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math] are shrinking (due to the pivoting), until one of [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math] is chosen, and the set is partitioned into different subsets. We ask for the probability that the chosen one is among [math]\displaystyle{ \{a_i, a_j\} }[/math]. So we really care about "the last" pivoting before [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math] is split.

Formally, let [math]\displaystyle{ Y }[/math] be the random variable denoting the pivot element. We know that for each [math]\displaystyle{ a_k\in\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math], [math]\displaystyle{ Y=a_k }[/math] with the same probability, and [math]\displaystyle{ Y\not\in\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math] with an unknown probability (remember that there might be other elements in the same subset with [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math]). The probability we are looking for is actually [math]\displaystyle{ \Pr[Y\in \{a_i, a_j\}\mid Y\in\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}] }[/math], which is always [math]\displaystyle{ \frac{2}{j-i+1} }[/math], provided that [math]\displaystyle{ Y }[/math] is uniform over [math]\displaystyle{ \{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\} }[/math].

The conditional probability rules out the irrelevant events in a probabilistic argument.

Summing all up:

[math]\displaystyle{ \begin{align} \mathbf{E}\left[\sum_{i=1}^n\sum_{j\gt i}X_{ij}\right] &= \sum_{i=1}^n\sum_{j\gt i}\mathbf{E}\left[X_{ij}\right]\\ &\le \sum_{i=1}^n\sum_{j\gt i}\frac{2}{j-i+1}\\ &= \sum_{i=1}^n\sum_{k=2}^{n-i+1}\frac{2}{k} & & (\mbox{Let }k=j-i+1)\\ &\le \sum_{i=1}^n\sum_{k=1}^{n}\frac{2}{k}\\ &= 2n\sum_{k=1}^{n}\frac{1}{k}\\ &= 2n H(n). \end{align} }[/math]

[math]\displaystyle{ H(n) }[/math] is the [math]\displaystyle{ n }[/math]th Harmonic number. It holds that

[math]\displaystyle{ \begin{align}H(n) = \ln n+O(1)\end{align} }[/math].

Therefore, for an arbitrary input [math]\displaystyle{ S }[/math] of [math]\displaystyle{ n }[/math] numbers, the expected number of comparisons taken by RandQSort to sort [math]\displaystyle{ S }[/math] is [math]\displaystyle{ \mathrm{O}(n\log n) }[/math].

Random Recurrence

We consider the selection problem: given as input a set [math]\displaystyle{ S }[/math] of [math]\displaystyle{ n }[/math] distinct numbers and an integer [math]\displaystyle{ 1\le k\le n }[/math], find the [math]\displaystyle{ k }[/math]-th smallest number in [math]\displaystyle{ S }[/math].

We know that this problem can be solved deterministically by the median-of-median algorithm or randomly by the LazySelect algorithm, both in linear time. In the same spirit of random QuickSort, we propose the following random QuickSelection algorithm, called the RandomQS.

RandomQS([math]\displaystyle{ S }[/math],[math]\displaystyle{ k }[/math])
if [math]\displaystyle{ |S|=1 }[/math] return the only element in [math]\displaystyle{ S }[/math];
pick a uniform random pivot [math]\displaystyle{ x\in S }[/math];
construct [math]\displaystyle{ S_1=\{y\in S\mid y\le x\} }[/math] and [math]\displaystyle{ S_2=\{y\in S\mid y\gt x\} }[/math];
if [math]\displaystyle{ |S_1|\ge k }[/math]: return RandomQS[math]\displaystyle{ (S_1,k) }[/math];
if [math]\displaystyle{ |S_1|\lt k }[/math]: return RandomQS[math]\displaystyle{ (S_2,k-|S_1|) }[/math];

It can be verified as an exercise problem that the expected number of comparisons of RandomQS is [math]\displaystyle{ O(n) }[/math]. Alternatively, we ask the following question:

  • How many recursive calls to RandomQS are made in expectation in a running of the algorithm?

Let [math]\displaystyle{ T(n) }[/math] denote the number of recursive calls in a running of RandomQS. Obviously [math]\displaystyle{ T(n) }[/math] is a random variable and it satisfies the following recurrence:

[math]\displaystyle{ T(n) = \begin{cases} 1+T(n-X(n)) & n\gt 1\\ 0 & n=1 \end{cases} }[/math]

where [math]\displaystyle{ X(n) }[/math] is a random variable representing the number of elements in the input [math]\displaystyle{ S }[/math] which are thrown away in the current call. Specifically, [math]\displaystyle{ X(n)=|S_2|=n-|S_1| }[/math] when [math]\displaystyle{ |S_1|\ge k }[/math], and [math]\displaystyle{ X(n)=|S_1| }[/math] when [math]\displaystyle{ |S_1|\lt k }[/math]. We are looking for an upper bound on the expectation [math]\displaystyle{ \mathbf{E}[T(n)] }[/math].

The token game

The above recursion can be seen as the following token game:

  • Initially we have [math]\displaystyle{ n }[/math] tokens.
  • In each round, we independently draw a random number [math]\displaystyle{ 0\le X(m)\le m }[/math] which depends on the current number of tokens [math]\displaystyle{ m }[/math] and throw away [math]\displaystyle{ X(m) }[/math] tokens.
  • The game terminates when there is no token left.

The number of rounds of the game is given by the random variable [math]\displaystyle{ T(n) }[/math].

Needless to say, this procedure has some generality. It is actually pretty common in randomized algorithms. The followings are some examples:

Survival game for coins (with an application in Skip List)
Suppose we start with [math]\displaystyle{ n }[/math] biased coins, each with probability [math]\displaystyle{ p }[/math] being a HEAD. In each round, we independently flip every coins and remove those coins whose outcome is TAIL. We play the game until no coin left. The number of rounds is given by [math]\displaystyle{ T(n) }[/math], satisfying the recursion
[math]\displaystyle{ T(n)= \begin{cases} 1+T(n-X(n)) & n\gt 0\\ 0 & n=0 \end{cases} }[/math]
where [math]\displaystyle{ X(n) }[/math] gives the number of TAILs, which follows the binomial distribution [math]\displaystyle{ B(n,p) }[/math].
This quantity has an application in the analysis of the Skip List, a randomized data structure. A Skip List is a multi-level linked list, in which we start with a standard linked list of [math]\displaystyle{ n }[/math] nodes, and each node in the list is lifted to the upper level independently with a fixed probability [math]\displaystyle{ p }[/math]. This lifting process is recursively applied until there is no node left. The number of levels of a Skip List is given by [math]\displaystyle{ T(n) }[/math].
Coupon collector
Suppose a coupon collector collecting totally [math]\displaystyle{ n }[/math] coupons by uniform independent trials. For each [math]\displaystyle{ 0\le k\le n }[/math], let [math]\displaystyle{ T(k) }[/math] be the number of trials to take when there are [math]\displaystyle{ k }[/math] coupons remaining to collect. [math]\displaystyle{ T(k) }[/math] satisfies the recursion
[math]\displaystyle{ T(k) = \begin{cases} 1+T(k-X(k)) & k\gt 0\\ 0 & k=0 \end{cases} }[/math]

where [math]\displaystyle{ X(k) }[/math] is the number of trials taken when there are exactly [math]\displaystyle{ k }[/math] coupons remaining to collect, which we know satisfy that

[math]\displaystyle{ X(k) = \begin{cases} 1 & \text{with probability }\frac{k}{n},\\ 0 & \text{with probability }1-\frac{k}{n}. \end{cases} }[/math]
Geometric distribution
Any geometric random variable can be described as a [math]\displaystyle{ T(1) }[/math] defined as follows
[math]\displaystyle{ T(1) =1 +T(1-X(1)) }[/math] and [math]\displaystyle{ T(0)=0 }[/math]
where [math]\displaystyle{ X(1)\in\{0,1\} }[/math] is a Bernoulli random variable.

Karp-Upfal-Wigderson bound

The following theorem was proved by Karp, Upfa, and Wigderson, and later was improved by Karp. Today it is known as the Karp-Upfal-Wigderson bound.

Theorem (Karp-Upfal-Wigderson 1988)
Consider the following recurrence
[math]\displaystyle{ \begin{align} T(n) &= \begin{cases} 1+T(n-X(n)) & n\gt c\\ 0 & n=c \end{cases} \end{align} }[/math]
where [math]\displaystyle{ c }[/math] is a constant and each [math]\displaystyle{ X(n) }[/math] is an independent drawn of a nonnegative integral random variable [math]\displaystyle{ X_n }[/math] such that [math]\displaystyle{ 0\le X_n\le n-c }[/math].
If there exists a positive-valued non-degreasing function [math]\displaystyle{ \,\mu(x) }[/math] satisfying that [math]\displaystyle{ \mathbf{E}[X_n]\ge\mu(n) }[/math] for all [math]\displaystyle{ n }[/math], then
[math]\displaystyle{ \mathbf{E}[T(n)] \le \int_{c}^n\frac{1}{\mu(x)}\,\mathrm{d} x. }[/math]

The theorem gives an upper bound on the expected number of rounds of the token game if we can nicely lower bound the expectation of each [math]\displaystyle{ X_n }[/math]. The theorem itself is quite intuitive: for each [math]\displaystyle{ k }[/math], [math]\displaystyle{ X_k }[/math] gives the rate at which the number of tokens drops from [math]\displaystyle{ k }[/math]. At this rate it takes [math]\displaystyle{ 1/X_k }[/math] rounds to cross from [math]\displaystyle{ k }[/math] to [math]\displaystyle{ k-1 }[/math] tokens. From the point of view of an interval [math]\displaystyle{ (x-\epsilon,x] }[/math], we do not know which [math]\displaystyle{ X_k }[/math] we are using, but for any [math]\displaystyle{ k\ge x }[/math] the average rate is lower bounded by [math]\displaystyle{ \mu(k)\ge \mu(x) }[/math]. Taking an integration, it is not surprising that it takes at most [math]\displaystyle{ \int_{c}^n\frac{1}{\mu(x)}\,\mathrm{d}x }[/math] rounds on average to consume all but [math]\displaystyle{ c }[/math] tokens. The only issue in this informal argument is that we do not have [math]\displaystyle{ \mathbf{E}[1/X]=1/\mathbf{E}[X] }[/math], so this is not a rigorous proof. But with bit care we can still manage to prove the theorem rigorously.

Proof.

We prove the theorem by applying an induction on [math]\displaystyle{ n }[/math]. The bound is trivially true when [math]\displaystyle{ n=c }[/math]. For [math]\displaystyle{ n\gt c }[/math], assume the induction hypothesis for all smaller [math]\displaystyle{ n }[/math]. Denote that [math]\displaystyle{ X=X(n) }[/math]. Due to the recursion [math]\displaystyle{ T(n)=1+T(n-X(n)) }[/math], we have

[math]\displaystyle{ \begin{align} \mathbf{E}[T(n)] &= 1+\mathbf{E}[T(n-X)]\\ &= 1+\mathbf{E}[T(n-X)\mid X=0]\Pr[X_n=0]+\mathbf{E}[T(n-X)\mid X\gt 0]\Pr[X\gt 0]. \end{align} }[/math]

Denote that [math]\displaystyle{ p=\Pr[X=0] }[/math]. We have [math]\displaystyle{ \Pr[X\gt 0]=1-p }[/math] because [math]\displaystyle{ X=X(n) }[/math] is nonnegative. Thus the above equation becomes

[math]\displaystyle{ \begin{align} \mathbf{E}[T(n)] &= 1+p\mathbf{E}[T(n)]+(1-p)\mathbf{E}[T(n-X)\mid X\gt 0], \end{align} }[/math]

which gives us that

[math]\displaystyle{ \mathbf{E}[T(n)]=\frac{1}{1-p}+\mathbf{E}[T(n-X)\mid X\gt 0] }[/math].

For the conditional expectation, we have

[math]\displaystyle{ \begin{align} &\mathbf{E}[T(n-X)\mid X\gt 0]\\ = &\mathbf{E}[\,\mathbf{E}[T(n-X)\mid X]\mid X\gt 0]\\ \le &\mathbf{E}\left[\int_{c}^{n-X}\frac{1}{\mu(x)}\,\mathrm{d}x \,\Big|\, X\gt 0\right] & \text{(induction hypothesis)}\\ = &\mathbf{E}\left[\int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x - \int_{n-X}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x\,\Big|\, X\gt 0\right] \\ = &\int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x-\mathbf{E}\left[\int_{n-X}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x\,\Big|\, X\gt 0\right] \\ \le &\int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x-\mathbf{E}\left[\int_{n-X}^{n}\frac{1}{\mu(n)}\,\mathrm{d}x\,\Big|\, X\gt 0\right] &\text{(}\mu(x)\text{ is non-degreasing)}\\ = &\int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x-\frac{1}{\mu(n)}\mathbf{E}[X\mid X\gt 0]. \end{align} }[/math]

On the other hand, due to the total expectation, we have

[math]\displaystyle{ \mathbf{E}[X_n] =\mathbf{E}[X_n\mid X_n=0]\Pr[X_n=0]+\mathbf{E}[X_n\mid X_n\gt 0]\Pr[X_n\gt 0]=(1-p)\mathbf{E}[X_n\mid X_n\gt 0], }[/math]

which gives us that [math]\displaystyle{ \mathbf{E}[X\mid X\gt 0]=\frac{\mathbf{E}[X_n]}{1-p}\ge\frac{\mu(n)}{1-p} }[/math]. Substituting this in the above inequality gives us that

[math]\displaystyle{ \begin{align} \mathbf{E}[T(n)] &= \frac{1}{1-p}+\mathbf{E}[T(n-X)\mid X\gt 0]\\ &\le \frac{1}{1-p}+\int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x-\frac{1}{\mu(n)}\cdot\frac{\mu(n)}{1-p}\\ &= \int_{c}^{n}\frac{1}{\mu(x)}\,\mathrm{d}x. \end{align} }[/math]
[math]\displaystyle{ \square }[/math]

Applications of Karp-Upfal-Wigderson bound

Random QuickSelect

As shown above, the number [math]\displaystyle{ T(n) }[/math] of recursive calls in a running of RandomQS satisfies the recurrence:

[math]\displaystyle{ T(n) = \begin{cases} 1+T(n-X(n)) & n\gt 1\\ 0 & n=1 \end{cases} }[/math]

where [math]\displaystyle{ X(n) }[/math] is a random variable representing the number of elements in the input [math]\displaystyle{ S }[/math] which are thrown away in the current call.

[math]\displaystyle{ X(n) =\begin{cases} n-|S_1| & \text{if }|S_1|\ge k-1\\ |S_1| & \text{if }|S_1|\lt k-1\\ \end{cases} }[/math]

Recall that [math]\displaystyle{ S_1=\{y\in S\mid y\le x\} }[/math] is the number of elements in the current input [math]\displaystyle{ S }[/math] no larger than the pivot [math]\displaystyle{ x }[/math] and [math]\displaystyle{ x }[/math] is uniformly chosen from [math]\displaystyle{ S }[/math] at random. It is easy to see that no matter for what [math]\displaystyle{ k }[/math] it holds that [math]\displaystyle{ X(n)\ge \min(m,n-m) }[/math] where [math]\displaystyle{ m }[/math] is uniformly random over [math]\displaystyle{ \{0,1,\ldots n-1\} }[/math]. An easy calculation gives us that [math]\displaystyle{ \textbf{E}[X(n)]\ge \frac{n}{4} }[/math]. By the Karp-Upfal-Wigderson bound, we have

[math]\displaystyle{ \mathbf{E}[T(n)]\le\int_{1}^n\frac{4}{x}\,\mathrm{d}x=4\ln n. }[/math]

Skip List

As discussed above, the number of levels of a Skip List for [math]\displaystyle{ n }[/math] elements is given by [math]\displaystyle{ T(n) }[/math] satisfying that

[math]\displaystyle{ T(n)= \begin{cases} 1+T(n-X(n)) & n\gt 0\\ 0 & n=0 \end{cases} }[/math]

where [math]\displaystyle{ X(n) }[/math] is the number of TAILs when independently flip [math]\displaystyle{ n }[/math] biased coins, where each coin turns HEAD with probability [math]\displaystyle{ p }[/math]. Clearly [math]\displaystyle{ X(n) }[/math] follows the binomial distribution [math]\displaystyle{ B(n,1-p) }[/math], and its expectation is [math]\displaystyle{ \mathbf{E}[X(n)]=(1-p)n }[/math].

By the Karp-Upfal-Wigderson bound, we have

[math]\displaystyle{ \mathbf{E}[T(n)]\le\int_{0}^n\frac{1}{(1-p)x}\,\mathrm{d}x=\frac{1}{1-p}(\ln n-\ln 0). }[/math]

Here the [math]\displaystyle{ \ln 0 }[/math] will cause a problem. We can fix the problem by observing that for integral [math]\displaystyle{ n }[/math], the expectation [math]\displaystyle{ \mathbf{E}[X(n)] }[/math] is actually

[math]\displaystyle{ \mathbf{E}[X(n)]=(1-p)n=(1-p)\lceil n\rceil. }[/math]

Thus the real Karp-Upfal-Wigderson bound for this process is

[math]\displaystyle{ \mathbf{E}[T(n)]\le\int_{0}^n\frac{1}{(1-p)\lceil x\rceil}\,\mathrm{d}x=\frac{1}{1-p}\sum_{k=1}^n\frac{1}{k}=\frac{H_n}{1-p}, }[/math]

where [math]\displaystyle{ H_n }[/math] is the harmonic number.

Coupon collector

We have represented the number of trials in a coupon collector game with [math]\displaystyle{ n }[/math] coupons as [math]\displaystyle{ T(n) }[/math] defined by the following recursion

[math]\displaystyle{ T(k) = \begin{cases} 1+T(k-X(k)) & k\gt 0\\ 0 & k=0 \end{cases} }[/math]

where [math]\displaystyle{ X(k) }[/math] is the number of trials taken when there are exactly [math]\displaystyle{ k }[/math] coupons remaining to collect, which we know satisfy that

[math]\displaystyle{ X(k) = \begin{cases} 1 & \text{with probability }\frac{k}{n},\\ 0 & \text{with probability }1-\frac{k}{n}. \end{cases} }[/math]

Thus [math]\displaystyle{ \mathbf{E}[X(k)]=\frac{k}{n}=\frac{\lceil k\rceil}{n} }[/math]. By the Karp-Upfal-Wigderson bound, we have

[math]\displaystyle{ \mathbf{E}[T(n)]\le\int_{0}^n\frac{n}{\lceil x\rceil}\,\mathrm{d}x=n\sum_{k=1}^n\frac{1}{k}=n H_n. }[/math]

Geometric distribution

We know that a geometric random variable can be described as a [math]\displaystyle{ T(1) }[/math] defined as follows

[math]\displaystyle{ T(1) =1 +T(1-X(1)) }[/math] and [math]\displaystyle{ T(0)=0 }[/math]

where [math]\displaystyle{ X(1)\in\{0,1\} }[/math] is a Bernoulli random variable being 1 with probability [math]\displaystyle{ p }[/math]. Then [math]\displaystyle{ \mathbf{E}[X]=p }[/math]. By the Karp-Upfal-Wigderson bound, we have

[math]\displaystyle{ \mathbf{E}[T(1)]\le\int_{0}^1\frac{1}{p}\,\mathrm{d}x=\frac{1}{p}. }[/math]