高级算法 (Fall 2019)/Basic tail inequalities

From EtoneWiki
Jump to: navigation, search

Markov's Inequality

One of the most natural information about a random variable is its expectation, which is the first moment of the random variable. Markov's inequality draws a tail bound for a random variable from its expectation.

Theorem (Markov's Inequality)
Let [math]X[/math] be a random variable assuming only nonnegative values. Then, for all [math]t\gt 0[/math],
[math]\begin{align} \Pr[X\ge t]\le \frac{\mathbf{E}[X]}{t}. \end{align}[/math]
Proof.
Let [math]Y[/math] be the indicator such that
[math]\begin{align} Y &= \begin{cases} 1 & \mbox{if }X\ge t,\\ 0 & \mbox{otherwise.} \end{cases} \end{align}[/math]

It holds that [math]Y\le\frac{X}{t}[/math]. Since [math]Y[/math] is 0-1 valued, [math]\mathbf{E}[Y]=\Pr[Y=1]=\Pr[X\ge t][/math]. Therefore,

[math] \Pr[X\ge t] = \mathbf{E}[Y] \le \mathbf{E}\left[\frac{X}{t}\right] =\frac{\mathbf{E}[X]}{t}. [/math]
[math]\square[/math]

Generalization

For any random variable [math]X[/math], for an arbitrary non-negative real function [math]h[/math], the [math]h(X)[/math] is a non-negative random variable. Applying Markov's inequality, we directly have that

[math] \Pr[h(X)\ge t]\le\frac{\mathbf{E}[h(X)]}{t}. [/math]

This trivial application of Markov's inequality gives us a powerful tool for proving tail inequalities. With the function [math]h[/math] which extracts more information about the random variable, we can prove sharper tail inequalities.

Chebyshev's inequality

Variance

Definition (variance)
The variance of a random variable [math]X[/math] is defined as
[math]\begin{align} \mathbf{Var}[X]=\mathbf{E}\left[(X-\mathbf{E}[X])^2\right]=\mathbf{E}\left[X^2\right]-(\mathbf{E}[X])^2. \end{align}[/math]
The standard deviation of random variable [math]X[/math] is
[math] \delta[X]=\sqrt{\mathbf{Var}[X]}. [/math]

The variance is the diagonal case for covariance.

Definition (covariance)
The covariance of two random variables [math]X[/math] and [math]Y[/math] is
[math]\begin{align} \mathbf{Cov}(X,Y)=\mathbf{E}\left[(X-\mathbf{E}[X])(Y-\mathbf{E}[Y])\right]. \end{align}[/math]

We have the following theorem for the variance of sum.

Theorem
For any two random variables [math]X[/math] and [math]Y[/math],
[math]\begin{align} \mathbf{Var}[X+Y]=\mathbf{Var}[X]+\mathbf{Var}[Y]+2\mathbf{Cov}(X,Y). \end{align}[/math]
Generally, for any random variables [math]X_1,X_2,\ldots,X_n[/math],
[math]\begin{align} \mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]+\sum_{i\neq j}\mathbf{Cov}(X_i,X_j). \end{align}[/math]
Proof.
The equation for two variables is directly due to the definition of variance and covariance. The equation for [math]n[/math] variables can be deduced from the equation for two variables.
[math]\square[/math]

For independent random variables, the expectation of a product equals the product of expectations.

Theorem
For any two independent random variables [math]X[/math] and [math]Y[/math],
[math]\begin{align} \mathbf{E}[X\cdot Y]=\mathbf{E}[X]\cdot\mathbf{E}[Y]. \end{align}[/math]
Proof.
[math] \begin{align} \mathbf{E}[X\cdot Y] &= \sum_{x,y}xy\Pr[X=x\wedge Y=y]\\ &= \sum_{x,y}xy\Pr[X=x]\Pr[Y=y]\\ &= \sum_{x}x\Pr[X=x]\sum_{y}y\Pr[Y=y]\\ &= \mathbf{E}[X]\cdot\mathbf{E}[Y]. \end{align} [/math]
[math]\square[/math]

Consequently, covariance of independent random variables is always zero.

Theorem
For any two independent random variables [math]X[/math] and [math]Y[/math],
[math]\begin{align} \mathbf{Cov}(X,Y)=0. \end{align}[/math]
Proof.
[math]\begin{align} \mathbf{Cov}(X,Y) &=\mathbf{E}\left[(X-\mathbf{E}[X])(Y-\mathbf{E}[Y])\right]\\ &= \mathbf{E}\left[X-\mathbf{E}[X]\right]\mathbf{E}\left[Y-\mathbf{E}[Y]\right] &\qquad(\mbox{Independence})\\ &=0. \end{align}[/math]
[math]\square[/math]

The variance of the sum of pairwise independent random variables is equal to the sum of variances.

Theorem
For pairwise independent random variables [math]X_1,X_2,\ldots,X_n[/math],
[math]\begin{align} \mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]. \end{align}[/math]
Remark
The theorem holds for pairwise independent random variables, a much weaker independence requirement than the mutual independence. This makes the second-moment methods very useful for pairwise independent random variables.

Variance of binomial distribution

For a Bernoulli trial with parameter [math]p[/math].

[math] X=\begin{cases} 1& \mbox{with probability }p\\ 0& \mbox{with probability }1-p \end{cases} [/math]

The variance is

[math] \mathbf{Var}[X]=\mathbf{E}[X^2]-(\mathbf{E}[X])^2=\mathbf{E}[X]-(\mathbf{E}[X])^2=p-p^2=p(1-p). [/math]

Let [math]Y[/math] be a binomial random variable with parameter [math]n[/math] and [math]p[/math], i.e. [math]Y=\sum_{i=1}^nY_i[/math], where [math]Y_i[/math]'s are i.i.d. Bernoulli trials with parameter [math]p[/math]. The variance is

[math] \begin{align} \mathbf{Var}[Y] &= \mathbf{Var}\left[\sum_{i=1}^nY_i\right]\\ &= \sum_{i=1}^n\mathbf{Var}\left[Y_i\right] &\qquad (\mbox{Independence})\\ &= \sum_{i=1}^np(1-p) &\qquad (\mbox{Bernoulli})\\ &= p(1-p)n. \end{align} [/math]

Chebyshev's inequality

Theorem (Chebyshev's Inequality)
For any [math]t\gt 0[/math],
[math]\begin{align} \Pr\left[|X-\mathbf{E}[X]| \ge t\right] \le \frac{\mathbf{Var}[X]}{t^2}. \end{align}[/math]
Proof.
Observe that
[math]\Pr[|X-\mathbf{E}[X]| \ge t] = \Pr[(X-\mathbf{E}[X])^2 \ge t^2].[/math]

Since [math](X-\mathbf{E}[X])^2[/math] is a nonnegative random variable, we can apply Markov's inequality, such that

[math] \Pr[(X-\mathbf{E}[X])^2 \ge t^2] \le \frac{\mathbf{E}[(X-\mathbf{E}[X])^2]}{t^2} =\frac{\mathbf{Var}[X]}{t^2}. [/math]
[math]\square[/math]