imported>Etone |
|
Line 1: |
Line 1: |
| =Markov's Inequality= | | == Summary == |
| | | Importing file |
| One of the most natural information about a random variable is its expectation, which is the first moment of the random variable. Markov's inequality draws a tail bound for a random variable from its expectation.
| |
| {{Theorem
| |
| |Theorem (Markov's Inequality)|
| |
| :Let <math>X</math> be a random variable assuming only nonnegative values. Then, for all <math>t>0</math>,
| |
| ::<math>\begin{align}
| |
| \Pr[X\ge t]\le \frac{\mathbf{E}[X]}{t}.
| |
| \end{align}</math>
| |
| }}
| |
| {{Proof| Let <math>Y</math> be the indicator such that
| |
| :<math>\begin{align}
| |
| Y &=
| |
| \begin{cases}
| |
| 1 & \mbox{if }X\ge t,\\
| |
| 0 & \mbox{otherwise.}
| |
| \end{cases}
| |
| \end{align}</math>
| |
| | |
| It holds that <math>Y\le\frac{X}{t}</math>. Since <math>Y</math> is 0-1 valued, <math>\mathbf{E}[Y]=\Pr[Y=1]=\Pr[X\ge t]</math>. Therefore,
| |
| :<math>
| |
| \Pr[X\ge t]
| |
| =
| |
| \mathbf{E}[Y]
| |
| \le
| |
| \mathbf{E}\left[\frac{X}{t}\right]
| |
| =\frac{\mathbf{E}[X]}{t}.
| |
| </math>
| |
| }}
| |
| | |
| == Generalization ==
| |
| For any random variable <math>X</math>, for an arbitrary non-negative real function <math>h</math>, the <math>h(X)</math> is a non-negative random variable. Applying Markov's inequality, we directly have that
| |
| :<math>
| |
| \Pr[h(X)\ge t]\le\frac{\mathbf{E}[h(X)]}{t}.
| |
| </math>
| |
| | |
| This trivial application of Markov's inequality gives us a powerful tool for proving tail inequalities. With the function <math>h</math> which extracts more information about the random variable, we can prove sharper tail inequalities.
| |
| | |
| =Chebyshev's inequality=
| |
| | |
| == Variance ==
| |
| {{Theorem
| |
| |Definition (variance)|
| |
| :The '''variance''' of a random variable <math>X</math> is defined as
| |
| ::<math>\begin{align}
| |
| \mathbf{Var}[X]=\mathbf{E}\left[(X-\mathbf{E}[X])^2\right]=\mathbf{E}\left[X^2\right]-(\mathbf{E}[X])^2.
| |
| \end{align}</math>
| |
| :The '''standard deviation''' of random variable <math>X</math> is
| |
| ::<math>
| |
| \delta[X]=\sqrt{\mathbf{Var}[X]}.
| |
| </math>
| |
| }}
| |
| | |
| The variance is the diagonal case for '''covariance'''.
| |
| {{Theorem
| |
| |Definition (covariance)|
| |
| :The '''covariance''' of two random variables <math>X</math> and <math>Y</math> is
| |
| ::<math>\begin{align}
| |
| \mathbf{Cov}(X,Y)=\mathbf{E}\left[(X-\mathbf{E}[X])(Y-\mathbf{E}[Y])\right].
| |
| \end{align}</math>
| |
| }}
| |
| | |
| We have the following theorem for the variance of sum.
| |
| | |
| {{Theorem
| |
| |Theorem|
| |
| :For any two random variables <math>X</math> and <math>Y</math>,
| |
| ::<math>\begin{align}
| |
| \mathbf{Var}[X+Y]=\mathbf{Var}[X]+\mathbf{Var}[Y]+2\mathbf{Cov}(X,Y).
| |
| \end{align}</math>
| |
| :Generally, for any random variables <math>X_1,X_2,\ldots,X_n</math>,
| |
| ::<math>\begin{align}
| |
| \mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]+\sum_{i\neq j}\mathbf{Cov}(X_i,X_j).
| |
| \end{align}</math>
| |
| }}
| |
| {{Proof| The equation for two variables is directly due to the definition of variance and covariance. The equation for <math>n</math> variables can be deduced from the equation for two variables.
| |
| }}
| |
| | |
| For independent random variables, the expectation of a product equals the product of expectations.
| |
| {{Theorem
| |
| |Theorem|
| |
| :For any two independent random variables <math>X</math> and <math>Y</math>,
| |
| ::<math>\begin{align}
| |
| \mathbf{E}[X\cdot Y]=\mathbf{E}[X]\cdot\mathbf{E}[Y].
| |
| \end{align}</math>
| |
| }}
| |
| {{Proof|
| |
| :<math>
| |
| \begin{align}
| |
| \mathbf{E}[X\cdot Y]
| |
| &=
| |
| \sum_{x,y}xy\Pr[X=x\wedge Y=y]\\
| |
| &=
| |
| \sum_{x,y}xy\Pr[X=x]\Pr[Y=y]\\
| |
| &=
| |
| \sum_{x}x\Pr[X=x]\sum_{y}y\Pr[Y=y]\\
| |
| &=
| |
| \mathbf{E}[X]\cdot\mathbf{E}[Y].
| |
| \end{align}
| |
| </math>
| |
| }}
| |
| | |
| Consequently, covariance of independent random variables is always zero.
| |
| {{Theorem
| |
| |Theorem|
| |
| :For any two independent random variables <math>X</math> and <math>Y</math>,
| |
| ::<math>\begin{align}
| |
| \mathbf{Cov}(X,Y)=0.
| |
| \end{align}</math>
| |
| }}
| |
| {{Proof|
| |
| :<math>\begin{align}
| |
| \mathbf{Cov}(X,Y)
| |
| &=\mathbf{E}\left[(X-\mathbf{E}[X])(Y-\mathbf{E}[Y])\right]\\
| |
| &= \mathbf{E}\left[X-\mathbf{E}[X]\right]\mathbf{E}\left[Y-\mathbf{E}[Y]\right] &\qquad(\mbox{Independence})\\
| |
| &=0.
| |
| \end{align}</math>
| |
| }}
| |
| | |
| The variance of the sum of pairwise independent random variables is equal to the sum of variances.
| |
| {{Theorem
| |
| |Theorem|
| |
| :For '''pairwise''' independent random variables <math>X_1,X_2,\ldots,X_n</math>,
| |
| ::<math>\begin{align}
| |
| \mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i].
| |
| \end{align}</math>
| |
| }}
| |
| | |
| ;Remark
| |
| :The theorem holds for '''pairwise''' independent random variables, a much weaker independence requirement than the '''mutual''' independence. This makes the second-moment methods very useful for pairwise independent random variables.
| |
| | |
| === Variance of binomial distribution ===
| |
| For a Bernoulli trial with parameter <math>p</math>.
| |
| :<math>
| |
| X=\begin{cases}
| |
| 1& \mbox{with probability }p\\
| |
| 0& \mbox{with probability }1-p
| |
| \end{cases}
| |
| </math>
| |
| The variance is
| |
| :<math>
| |
| \mathbf{Var}[X]=\mathbf{E}[X^2]-(\mathbf{E}[X])^2=\mathbf{E}[X]-(\mathbf{E}[X])^2=p-p^2=p(1-p).
| |
| </math>
| |
| | |
| Let <math>Y</math> be a binomial random variable with parameter <math>n</math> and <math>p</math>, i.e. <math>Y=\sum_{i=1}^nY_i</math>, where <math>Y_i</math>'s are i.i.d. Bernoulli trials with parameter <math>p</math>. The variance is
| |
| :<math>
| |
| \begin{align}
| |
| \mathbf{Var}[Y]
| |
| &=
| |
| \mathbf{Var}\left[\sum_{i=1}^nY_i\right]\\
| |
| &=
| |
| \sum_{i=1}^n\mathbf{Var}\left[Y_i\right] &\qquad (\mbox{Independence})\\
| |
| &=
| |
| \sum_{i=1}^np(1-p) &\qquad (\mbox{Bernoulli})\\
| |
| &=
| |
| p(1-p)n.
| |
| \end{align}
| |
| </math>
| |
| | |
| == Chebyshev's inequality ==
| |
| {{Theorem
| |
| |Theorem (Chebyshev's Inequality)|
| |
| :For any <math>t>0</math>,
| |
| ::<math>\begin{align}
| |
| \Pr\left[|X-\mathbf{E}[X]| \ge t\right] \le \frac{\mathbf{Var}[X]}{t^2}.
| |
| \end{align}</math>
| |
| }}
| |
| {{Proof| Observe that
| |
| :<math>\Pr[|X-\mathbf{E}[X]| \ge t] = \Pr[(X-\mathbf{E}[X])^2 \ge t^2].</math>
| |
| Since <math>(X-\mathbf{E}[X])^2</math> is a nonnegative random variable, we can apply Markov's inequality, such that
| |
| :<math>
| |
| \Pr[(X-\mathbf{E}[X])^2 \ge t^2] \le
| |
| \frac{\mathbf{E}[(X-\mathbf{E}[X])^2]}{t^2}
| |
| =\frac{\mathbf{Var}[X]}{t^2}. | |
| </math>
| |
| }}
| |