Randomized Algorithms (Spring 2010)/Introduction: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>WikiSysop
No edit summary
imported>WikiSysop
No edit summary
Line 36: Line 36:
Elements <math>a_i</math> and <math>a_j</math> are compared only if one of them is chosen as pivot. After comparison they are separated (thus are never compared again). So we have the following observation:
Elements <math>a_i</math> and <math>a_j</math> are compared only if one of them is chosen as pivot. After comparison they are separated (thus are never compared again). So we have the following observation:


'''Observation 1:  Every pair of <math>a_i</math> and <math>a_j</math> are compared at most once.'''
'''Claim 1:  Every pair of <math>a_i</math> and <math>a_j</math> are compared at most once.'''


Therefore the sum of <math>X_{ij}</math> for all pair <math>\{i,j\}</math> gives the total number of comparisons. The expected number of comparisons is <math>\mathbb{E}\left[\sum_{i=1}^n\sum_{j>i}X_{ij}\right]</math>. Due to [[Linearity of Expectation]], <math>\mathbb{E}\left[\sum_{i=1}^n\sum_{j>i}X_{ij}\right] = \sum_{i=1}^n\sum_{j>i}\mathbb{E}\left[X_{ij}\right]</math>. Our next step is to analyze <math>\mathbb{E}\left[X_{ij}\right]</math> for each <math>\{i,j\}</math>.
Therefore the sum of <math>X_{ij}</math> for all pair <math>\{i,j\}</math> gives the total number of comparisons. The expected number of comparisons is <math>\mathbb{E}\left[\sum_{i=1}^n\sum_{j>i}X_{ij}\right]</math>. Due to [[Linearity of Expectation]], <math>\mathbb{E}\left[\sum_{i=1}^n\sum_{j>i}X_{ij}\right] = \sum_{i=1}^n\sum_{j>i}\mathbb{E}\left[X_{ij}\right]</math>. Our next step is to analyze <math>\mathbb{E}\left[X_{ij}\right]</math> for each <math>\{i,j\}</math>.
Line 50: Line 50:
We are going to bound this probability.
We are going to bound this probability.


'''Observation 2: <math>a_i</math> and <math>a_j</math> are compared if and only if one of them is chosen as pivot when they are still in the same subset.'''
'''Claim 2: <math>a_i</math> and <math>a_j</math> are compared if and only if one of them is chosen as pivot when they are still in the same subset.'''


This observation is easy: just look at the algorithm. The next one is a bit difficult.
This is easy to verify: just check the algorithm. The next one is a bit complicated.


'''Observation 3: If <math>a_i</math> and <math>a_j</math> are still in the same subset then all <math>\{a_i,a_{i+1},\ldots,a_{j-1},a_{j}\}</math> are in the same subset.'''
'''Claim 3: If <math>a_i</math> and <math>a_j</math> are still in the same subset then all <math>\{a_i,a_{i+1},\ldots,a_{j-1},a_{j}\}</math> are in the same subset.'''


We can verify this by induction. Initially, <math>S</math> itself has the property described above; and partitioning any <math>S</math> with the property into <math>S_1</math> and <math>S_2</math> will preserve the property for both <math>S_1</math> and <math>S_2</math>. Therefore the statement in Observation 3 holds.
We can verify this by induction. Initially, <math>S</math> itself has the property described above; and partitioning any <math>S</math> with the property into <math>S_1</math> and <math>S_2</math> will preserve the property for both <math>S_1</math> and <math>S_2</math>. Therefore Claim 3 holds.


Combining Observation 2 and 3, we have:
Combining Claim 2 and 3, we have:


'''Observation 4: <math>a_i</math> and <math>a_j</math> are compared only if one of <math>\{a_i, a_j\}</math> is chosen from <math>\{a_i,a_{i+1},\ldots,a_{j-1},a_{j}\}</math>.'''
'''Claim 4: <math>a_i</math> and <math>a_j</math> are compared only if one of <math>\{a_i, a_j\}</math> is chosen from <math>\{a_i,a_{i+1},\ldots,a_{j-1},a_{j}\}</math>.'''
 
And apparently,
 
'''Claim 5: Every one of <math>\{a_i,a_{i+1},\ldots,a_{j-1},a_{j}\}</math> is chosen with the same probability.'''
 
This is because our RandQSort chooses the pivot ''uniformly at random''.
 
Claim 4 and Claim 5 together give:
 
<math>\begin{align}
\Pr[a_i\mbox{ and }a_j\mbox{ are compared}]
&= \frac{2}{j-i+1}.
\end{align}</math>

Revision as of 09:19, 2 January 2010

Randomized Quicksort

For an input set S with an arbitrary order, the Quicksort algorithm sorts [math]\displaystyle{ S }[/math] as described in following psuedocode:

  • if [math]\displaystyle{ |S|\gt 1 }[/math] do:
    • pick an element [math]\displaystyle{ x }[/math] from [math]\displaystyle{ S }[/math] as the pivot;
    • partition [math]\displaystyle{ S }[/math] into [math]\displaystyle{ S_1 }[/math], [math]\displaystyle{ \{x\} }[/math], and [math]\displaystyle{ S_2 }[/math], where all elements in [math]\displaystyle{ S_1 }[/math] are smaller than [math]\displaystyle{ x }[/math] and all elements in [math]\displaystyle{ S_2 }[/math] are larger than [math]\displaystyle{ x }[/math];
    • recursively sort [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math];

Let us measure the time complexity of this sorting algorithm by the number of comparisons.

For the deterministic quicksort algorithm, the pivot element is chosen deterministically (usually the first one in the sequence [math]\displaystyle{ S }[/math]). This will make the worst-case time complexity [math]\displaystyle{ \Omega(n^2) }[/math], which means there exists a bad case [math]\displaystyle{ S }[/math], sorting which will cost us [math]\displaystyle{ \Omega(n^2) }[/math] comparisons, every time!

It is just so unfair to have an unbeatable case for this brilliant algorithm. So we tweak the algorithm a little bit:

Algorithm: RandQSort

  • if [math]\displaystyle{ |S|\gt 1 }[/math] do:
    • uniformly pick a random element [math]\displaystyle{ x }[/math] from [math]\displaystyle{ S }[/math] as the pivot;
    • partition [math]\displaystyle{ S }[/math] into [math]\displaystyle{ S_1 }[/math], [math]\displaystyle{ \{x\} }[/math], and [math]\displaystyle{ S_2 }[/math], where all elements in [math]\displaystyle{ S_1 }[/math] are smaller than [math]\displaystyle{ x }[/math] and all elements in [math]\displaystyle{ S_2 }[/math] are larger than [math]\displaystyle{ x }[/math];
    • recursively sort [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math];

Analysis of RandQSort

Our goal is to analyze the expected number of comparisons during an execution of RandQSort with an arbitrary input [math]\displaystyle{ S }[/math]. We achieve this by measuring the chance that each pair of elements are compared, and summing all of them up due to Linearity of Expectation.

Let [math]\displaystyle{ a_i }[/math] denote the [math]\displaystyle{ i }[/math]th smallest element in [math]\displaystyle{ S }[/math]. Let [math]\displaystyle{ X_{ij}\in\{0,1\} }[/math] be the random variable which indicates whether [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are compared during the execution of RandQSort. That is:

[math]\displaystyle{ \begin{align} X_{ij} &= \begin{cases} 1 & a_i\mbox{ and }a_j\mbox{ are compared}\\ 0 & \mbox{otherwise} \end{cases}. \end{align} }[/math]

Elements [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are compared only if one of them is chosen as pivot. After comparison they are separated (thus are never compared again). So we have the following observation:

Claim 1: Every pair of [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are compared at most once.

Therefore the sum of [math]\displaystyle{ X_{ij} }[/math] for all pair [math]\displaystyle{ \{i,j\} }[/math] gives the total number of comparisons. The expected number of comparisons is [math]\displaystyle{ \mathbb{E}\left[\sum_{i=1}^n\sum_{j\gt i}X_{ij}\right] }[/math]. Due to Linearity of Expectation, [math]\displaystyle{ \mathbb{E}\left[\sum_{i=1}^n\sum_{j\gt i}X_{ij}\right] = \sum_{i=1}^n\sum_{j\gt i}\mathbb{E}\left[X_{ij}\right] }[/math]. Our next step is to analyze [math]\displaystyle{ \mathbb{E}\left[X_{ij}\right] }[/math] for each [math]\displaystyle{ \{i,j\} }[/math].

By the definition of expectation and [math]\displaystyle{ X_{ij} }[/math], it holds that

[math]\displaystyle{ \begin{align} \mathbb{E}\left[X_{ij}\right] &= 1\cdot \Pr[a_i\mbox{ and }a_j\mbox{ are compared}] + 0\cdot \Pr[a_i\mbox{ and }a_j\mbox{ are not compared}]\\ &= \Pr[a_i\mbox{ and }a_j\mbox{ are compared}]. \end{align} }[/math]

We are going to bound this probability.

Claim 2: [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are compared if and only if one of them is chosen as pivot when they are still in the same subset.

This is easy to verify: just check the algorithm. The next one is a bit complicated.

Claim 3: If [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are still in the same subset then all [math]\displaystyle{ \{a_i,a_{i+1},\ldots,a_{j-1},a_{j}\} }[/math] are in the same subset.

We can verify this by induction. Initially, [math]\displaystyle{ S }[/math] itself has the property described above; and partitioning any [math]\displaystyle{ S }[/math] with the property into [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math] will preserve the property for both [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math]. Therefore Claim 3 holds.

Combining Claim 2 and 3, we have:

Claim 4: [math]\displaystyle{ a_i }[/math] and [math]\displaystyle{ a_j }[/math] are compared only if one of [math]\displaystyle{ \{a_i, a_j\} }[/math] is chosen from [math]\displaystyle{ \{a_i,a_{i+1},\ldots,a_{j-1},a_{j}\} }[/math].

And apparently,

Claim 5: Every one of [math]\displaystyle{ \{a_i,a_{i+1},\ldots,a_{j-1},a_{j}\} }[/math] is chosen with the same probability.

This is because our RandQSort chooses the pivot uniformly at random.

Claim 4 and Claim 5 together give:

[math]\displaystyle{ \begin{align} \Pr[a_i\mbox{ and }a_j\mbox{ are compared}] &= \frac{2}{j-i+1}. \end{align} }[/math]