随机算法 (Spring 2014)/Moment and Deviation

From TCS Wiki
Revision as of 12:38, 10 March 2014 by imported>Etone (Created page with "= Tail Inequalities = When applying probabilistic analysis, we often want a bound in form of <math>\Pr[X\ge t]<\epsilon</math> for some random variable <math>X</math> (think that…")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Tail Inequalities

When applying probabilistic analysis, we often want a bound in form of [math]\displaystyle{ \Pr[X\ge t]\lt \epsilon }[/math] for some random variable [math]\displaystyle{ X }[/math] (think that [math]\displaystyle{ X }[/math] is a cost such as running time of a randomized algorithm). We call this a tail bound, or a tail inequality.

Besides directly computing the probability [math]\displaystyle{ \Pr[X\ge t] }[/math], we want to have some general way of estimating tail probabilities from some measurable information regarding the random variables.

Markov's Inequality

One of the most natural information about a random variable is its expectation, which is the first moment of the random variable. Markov's inequality draws a tail bound for a random variable from its expectation.

Theorem (Markov's Inequality)
Let [math]\displaystyle{ X }[/math] be a random variable assuming only nonnegative values. Then, for all [math]\displaystyle{ t\gt 0 }[/math],
[math]\displaystyle{ \begin{align} \Pr[X\ge t]\le \frac{\mathbf{E}[X]}{t}. \end{align} }[/math]
Proof.
Let [math]\displaystyle{ Y }[/math] be the indicator such that
[math]\displaystyle{ \begin{align} Y &= \begin{cases} 1 & \mbox{if }X\ge t,\\ 0 & \mbox{otherwise.} \end{cases} \end{align} }[/math]

It holds that [math]\displaystyle{ Y\le\frac{X}{t} }[/math]. Since [math]\displaystyle{ Y }[/math] is 0-1 valued, [math]\displaystyle{ \mathbf{E}[Y]=\Pr[Y=1]=\Pr[X\ge t] }[/math]. Therefore,

[math]\displaystyle{ \Pr[X\ge t] = \mathbf{E}[Y] \le \mathbf{E}\left[\frac{X}{t}\right] =\frac{\mathbf{E}[X]}{t}. }[/math]
[math]\displaystyle{ \square }[/math]

Example (from Las Vegas to Monte Carlo)

Let [math]\displaystyle{ A }[/math] be a Las Vegas randomized algorithm for a decision problem [math]\displaystyle{ f }[/math], whose expected running time is within [math]\displaystyle{ T(n) }[/math] on any input of size [math]\displaystyle{ n }[/math]. We transform [math]\displaystyle{ A }[/math] to a Monte Carlo randomized algorithm [math]\displaystyle{ B }[/math] with bounded one-sided error as follows:

[math]\displaystyle{ B(x) }[/math]:
  • Run [math]\displaystyle{ A(x) }[/math] for [math]\displaystyle{ 2T(n) }[/math] long where [math]\displaystyle{ n }[/math] is the size of [math]\displaystyle{ x }[/math].
  • If [math]\displaystyle{ A(x) }[/math] returned within [math]\displaystyle{ 2T(n) }[/math] time, then return what [math]\displaystyle{ A(x) }[/math] just returned, else return 1.

Since [math]\displaystyle{ A }[/math] is Las Vegas, its output is always correct, thus [math]\displaystyle{ B(x) }[/math] only errs when it returns 1, thus the error is one-sided. The error probability is bounded by the probability that [math]\displaystyle{ A(x) }[/math] runs longer than [math]\displaystyle{ 2T(n) }[/math]. Since the expected running time of [math]\displaystyle{ A(x) }[/math] is at most [math]\displaystyle{ T(n) }[/math], due to Markov's inequality,

[math]\displaystyle{ \Pr[\mbox{the running time of }A(x)\ge2T(n)]\le\frac{\mathbf{E}[\mbox{running time of }A(x)]}{2T(n)}\le\frac{1}{2}, }[/math]

thus the error probability is bounded.

Generalization

For any random variable [math]\displaystyle{ X }[/math], for an arbitrary non-negative real function [math]\displaystyle{ h }[/math], the [math]\displaystyle{ h(X) }[/math] is a non-negative random variable. Applying Markov's inequality, we directly have that

[math]\displaystyle{ \Pr[h(X)\ge t]\le\frac{\mathbf{E}[h(X)]}{t}. }[/math]

This trivial application of Markov's inequality gives us a powerful tool for proving tail inequalities. With the function [math]\displaystyle{ h }[/math] which extracts more information about the random variable, we can prove sharper tail inequalities.

Variance

Definition (variance)
The variance of a random variable [math]\displaystyle{ X }[/math] is defined as
[math]\displaystyle{ \begin{align} \mathbf{Var}[X]=\mathbf{E}\left[(X-\mathbf{E}[X])^2\right]=\mathbf{E}\left[X^2\right]-(\mathbf{E}[X])^2. \end{align} }[/math]
The standard deviation of random variable [math]\displaystyle{ X }[/math] is
[math]\displaystyle{ \delta[X]=\sqrt{\mathbf{Var}[X]}. }[/math]

We have seen that due to the linearity of expectations, the expectation of the sum of variable is the sum of the expectations of the variables. It is natural to ask whether this is true for variances. We find that the variance of sum has an extra term called covariance.

Definition (covariance)
The covariance of two random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] is
[math]\displaystyle{ \begin{align} \mathbf{Cov}(X,Y)=\mathbf{E}\left[(X-\mathbf{E}[X])(Y-\mathbf{E}[Y])\right]. \end{align} }[/math]

We have the following theorem for the variance of sum.

Theorem
For any two random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math],
[math]\displaystyle{ \begin{align} \mathbf{Var}[X+Y]=\mathbf{Var}[X]+\mathbf{Var}[Y]+2\mathbf{Cov}(X,Y). \end{align} }[/math]
Generally, for any random variables [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math],
[math]\displaystyle{ \begin{align} \mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]+\sum_{i\neq j}\mathbf{Cov}(X_i,X_j). \end{align} }[/math]
Proof.
The equation for two variables is directly due to the definition of variance and covariance. The equation for [math]\displaystyle{ n }[/math] variables can be deduced from the equation for two variables.
[math]\displaystyle{ \square }[/math]

We will see that when random variables are independent, the variance of sum is equal to the sum of variances. To prove this, we first establish a very useful result regarding the expectation of multiplicity.

Theorem
For any two independent random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math],
[math]\displaystyle{ \begin{align} \mathbf{E}[X\cdot Y]=\mathbf{E}[X]\cdot\mathbf{E}[Y]. \end{align} }[/math]
Proof.
[math]\displaystyle{ \begin{align} \mathbf{E}[X\cdot Y] &= \sum_{x,y}xy\Pr[X=x\wedge Y=y]\\ &= \sum_{x,y}xy\Pr[X=x]\Pr[Y=y]\\ &= \sum_{x}x\Pr[X=x]\sum_{y}y\Pr[Y=y]\\ &= \mathbf{E}[X]\cdot\mathbf{E}[Y]. \end{align} }[/math]
[math]\displaystyle{ \square }[/math]

With the above theorem, we can show that the covariance of two independent variables is always zero.

Theorem
For any two independent random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math],
[math]\displaystyle{ \begin{align} \mathbf{Cov}(X,Y)=0. \end{align} }[/math]
Proof.
[math]\displaystyle{ \begin{align} \mathbf{Cov}(X,Y) &=\mathbf{E}\left[(X-\mathbf{E}[X])(Y-\mathbf{E}[Y])\right]\\ &= \mathbf{E}\left[X-\mathbf{E}[X]\right]\mathbf{E}\left[Y-\mathbf{E}[Y]\right] &\qquad(\mbox{Independence})\\ &=0. \end{align} }[/math]
[math]\displaystyle{ \square }[/math]

We then have the following theorem for the variance of the sum of pairwise independent random variables.

Theorem
For pairwise independent random variables [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math],
[math]\displaystyle{ \begin{align} \mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]. \end{align} }[/math]
Remark
The theorem holds for pairwise independent random variables, a much weaker independence requirement than the mutual independence. This makes the variance-based probability tools work even for weakly random cases. We will see what it exactly means in the future lectures.

Variance of binomial distribution

For a Bernoulli trial with parameter [math]\displaystyle{ p }[/math].

[math]\displaystyle{ X=\begin{cases} 1& \mbox{with probability }p\\ 0& \mbox{with probability }1-p \end{cases} }[/math]

The variance is

[math]\displaystyle{ \mathbf{Var}[X]=\mathbf{E}[X^2]-(\mathbf{E}[X])^2=\mathbf{E}[X]-(\mathbf{E}[X])^2=p-p^2=p(1-p). }[/math]

Let [math]\displaystyle{ Y }[/math] be a binomial random variable with parameter [math]\displaystyle{ n }[/math] and [math]\displaystyle{ p }[/math], i.e. [math]\displaystyle{ Y=\sum_{i=1}^nY_i }[/math], where [math]\displaystyle{ Y_i }[/math]'s are i.i.d. Bernoulli trials with parameter [math]\displaystyle{ p }[/math]. The variance is

[math]\displaystyle{ \begin{align} \mathbf{Var}[Y] &= \mathbf{Var}\left[\sum_{i=1}^nY_i\right]\\ &= \sum_{i=1}^n\mathbf{Var}\left[Y_i\right] &\qquad (\mbox{Independence})\\ &= \sum_{i=1}^np(1-p) &\qquad (\mbox{Bernoulli})\\ &= p(1-p)n. \end{align} }[/math]

Chebyshev's inequality

With the information of the expectation and variance of a random variable, one can derive a stronger tail bound known as Chebyshev's Inequality.

Theorem (Chebyshev's Inequality)
For any [math]\displaystyle{ t\gt 0 }[/math],
[math]\displaystyle{ \begin{align} \Pr\left[|X-\mathbf{E}[X]| \ge t\right] \le \frac{\mathbf{Var}[X]}{t^2}. \end{align} }[/math]
Proof.
Observe that
[math]\displaystyle{ \Pr[|X-\mathbf{E}[X]| \ge t] = \Pr[(X-\mathbf{E}[X])^2 \ge t^2]. }[/math]

Since [math]\displaystyle{ (X-\mathbf{E}[X])^2 }[/math] is a nonnegative random variable, we can apply Markov's inequality, such that

[math]\displaystyle{ \Pr[(X-\mathbf{E}[X])^2 \ge t^2] \le \frac{\mathbf{E}[(X-\mathbf{E}[X])^2]}{t^2} =\frac{\mathbf{Var}[X]}{t^2}. }[/math]
[math]\displaystyle{ \square }[/math]