Randomized Algorithms (Spring 2010)/Tail inequalities

From TCS Wiki
Revision as of 09:55, 7 February 2010 by imported>WikiSysop (→‎The Chernoff bound)
Jump to navigation Jump to search

Select the Median

The selection problem is the problem of finding the [math]\displaystyle{ k }[/math]th smallest element in a set [math]\displaystyle{ S }[/math]. A typical case of selection problem is finding the median, the [math]\displaystyle{ (\lceil n/2\rceil) }[/math]th element in the sorted order of [math]\displaystyle{ S }[/math].

The median can be found in [math]\displaystyle{ O(n\log n) }[/math] time by sorting. There is a linear-time deterministic algorithm, "median of medians" algorithm, which is very sophisticated. Here we introduce a much simpler randomized algorithm which also runs in linear time. The idea of this algorithm is random sampling.

Randomized median algorithm

Analysis

Chernoff Bound

Suppose that we have a fair coin. If we toss it once, then the outcome is completely unpredictable. But if we toss it, say for 1000 times, then the outcome is very much predictable. The number of HEADs is very likely to be around 500. This striking phenomenon is called the concentration. The Chernoff bound captures the concentration of independent trials.

The Chernoff bound is also a tail bound for the sum of independent random variables which may give us exponentially sharp bounds.

Before proving the Chernoff bound, we should talk about the moment generating functions.

Moment generating functions

We know that the more we know about the different moments of a random variable [math]\displaystyle{ X }[/math], the more information we would have about [math]\displaystyle{ X }[/math]. There is a so-called moment generating function, which "packs" all the information about the moments of [math]\displaystyle{ X }[/math] into one function.

Definition:
The moment generating function of a random variable [math]\displaystyle{ X }[/math] is defined as [math]\displaystyle{ \mathbf{E}\left[\mathrm{e}^{\lambda X}\right] }[/math] where [math]\displaystyle{ \lambda\gt 0 }[/math] is the parameter of the function.

By Taylor's expansion and the linearity of expectations,

[math]\displaystyle{ \begin{align} \mathbf{E}\left[\mathrm{e}^{\lambda X}\right] &= \mathbf{E}\left[\sum_{k=0}^\infty\frac{\lambda^k}{k!}X^k\right]\\ &=\sum_{k=0}^\infty\frac{\lambda^k}{k!}\mathbf{E}\left[X^k\right] \end{align} }[/math]

The moment generating function [math]\displaystyle{ \mathbf{E}\left[\mathrm{e}^{\lambda X}\right] }[/math] is a function of [math]\displaystyle{ \lambda }[/math].

The Chernoff bound

The Chernoff bound is obtained by applying the Markov's inequality to the moment generating function of the sum of independent trials, with an appropriate choice of the parameter [math]\displaystyle{ \lambda }[/math].

Chernoff bound (the upper tail):
Let [math]\displaystyle{ X=\sum_{i=1}^n X_i }[/math], where [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are independent Poisson trials. Let [math]\displaystyle{ \mu=\mathbf{E}[X] }[/math].
Then for any [math]\displaystyle{ \delta\gt 0 }[/math],
[math]\displaystyle{ \Pr[X\ge (1+\delta)\mu]\lt \left(\frac{e^{\delta}}{(1+\delta)^{(1+\delta)}}\right)^{\mu}. }[/math]

Proof: For any [math]\displaystyle{ \lambda\gt 0 }[/math], [math]\displaystyle{ X\ge t }[/math] is equivalent with that [math]\displaystyle{ e^{\lambda X}\ge e^{\lambda t} }[/math], thus

[math]\displaystyle{ \begin{align} \Pr[X\ge t] &= \Pr\left[e^{\lambda X}\ge e^{\lambda t}\right]\\ &\le \frac{\mathbf{E}\left[e^{\lambda X}\right]}{e^{\lambda t}}, \end{align} }[/math]

where the last step follows by Markov's inequality.

Computing the moment generating function [math]\displaystyle{ \mathbf{E}[e^{\lambda X}] }[/math]:

[math]\displaystyle{ \begin{align} \mathbf{E}\left[e^{\lambda X}\right] &= \mathbf{E}\left[e^{\lambda \sum_{i=1}^n X_i}\right]\\ &= \mathbf{E}\left[\prod_{i=1}^n e^{\lambda X_i}\right]. \end{align} }[/math]

For independent random variables, the expectation of the product equals the product of the expectations, therefore, for the last term above, we have

[math]\displaystyle{ \begin{align} \mathbf{E}\left[\prod_{i=1}^n e^{\lambda X_i}\right] &= \prod_{i=1}^n \mathbf{E}\left[e^{\lambda X_i}\right]. \end{align} }[/math]


Chernoff bound (the lower tail):
Let [math]\displaystyle{ X=\sum_{i=1}^n X_i }[/math], where [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are independent Poisson trials. Let [math]\displaystyle{ \mu=\mathbf{E}[X] }[/math].
Then for any [math]\displaystyle{ 0\lt \delta\lt 1 }[/math],
[math]\displaystyle{ \Pr[X\le (1-\delta)\mu]\lt \left(\frac{e^{-\delta}}{(1-\delta)^{(1-\delta)}}\right)^{\mu}. }[/math]


Chernoff-Hoeffding bound (for continuous random variables):
Let [math]\displaystyle{ X=\sum_{i=1}^n X_i }[/math], where for each [math]\displaystyle{ 1\le i\le n }[/math], [math]\displaystyle{ X_i }[/math] is independently distributed over the range [math]\displaystyle{ [0,1] }[/math]. Let [math]\displaystyle{ \mu=\mathbf{E}[X] }[/math].
Then for any [math]\displaystyle{ \delta\gt 0 }[/math],
[math]\displaystyle{ \Pr[X\ge (1+\delta)\mu]\lt \left(\frac{e^{\delta}}{(1+\delta)^{(1+\delta)}}\right)^{\mu}; }[/math]
and for any [math]\displaystyle{ 0\lt \delta\lt 1 }[/math],
[math]\displaystyle{ \Pr[X\ge (1-\delta)\mu]\lt \left(\frac{e^{-\delta}}{(1-\delta)^{(1-\delta)}}\right)^{\mu}. }[/math]


Useful forms of the Chernoff bound
Let [math]\displaystyle{ X=\sum_{i=1}^n X_i }[/math], where for each [math]\displaystyle{ 1\le i\le n }[/math], [math]\displaystyle{ X_i }[/math] is independently distributed over the range [math]\displaystyle{ [0,1] }[/math]. Let [math]\displaystyle{ \mu=\mathbf{E}[X] }[/math]. Then
1. for [math]\displaystyle{ 0\lt \delta\le 1 }[/math],
[math]\displaystyle{ \Pr[X\ge (1+\delta)\mu]\lt \exp\left(-\frac{\mu\delta^2}{3}\right); }[/math]
[math]\displaystyle{ \Pr[X\le (1-\delta)\mu]\lt \exp\left(-\frac{\mu\delta^2}{2}\right); }[/math]
2. for [math]\displaystyle{ t\gt 0 }[/math],
[math]\displaystyle{ \Pr[X\ge\mu+t]\le \exp\left(-\frac{2t^2}{n}\right); }[/math]
[math]\displaystyle{ \Pr[X\le\mu-t]\le \exp\left(-\frac{2t^2}{n}\right); }[/math]
3. for [math]\displaystyle{ t\ge 2e\mu }[/math],
[math]\displaystyle{ \Pr[X\ge t]\le 2^{-t}. }[/math]

Permutation Routing