高级算法 (Fall 2021)/Finite Field Basics and 高级算法 (Fall 2021)/Basic tail inequalities: Difference between pages
imported>Etone (Created page with "=Field= Let <math>S</math> be a set, '''closed''' under binary operations <math>+</math> (addition) and <math>\cdot</math> (multiplication). It gives us the following algebrai...") |
imported>Etone (Created page with "=Markov's Inequality= One of the most natural information about a random variable is its expectation, which is the first moment of the random variable. Markov's inequality dr...") |
||
Line 1: | Line 1: | ||
= | =Markov's Inequality= | ||
One of the most natural information about a random variable is its expectation, which is the first moment of the random variable. Markov's inequality draws a tail bound for a random variable from its expectation. | |||
{{Theorem | |||
|Theorem (Markov's Inequality)| | |||
:Let <math>X</math> be a random variable assuming only nonnegative values. Then, for all <math>t>0</math>, | |||
::<math>\begin{align} | |||
\Pr[X\ge t]\le \frac{\mathbf{E}[X]}{t}. | |||
\end{align}</math> | |||
}} | |||
{{Proof| Let <math>Y</math> be the indicator such that | |||
:<math>\begin{align} | |||
Y &= | |||
\begin{cases} | |||
1 & \mbox{if }X\ge t,\\ | |||
0 & \mbox{otherwise.} | |||
\end{cases} | |||
\end{align}</math> | |||
It holds that <math>Y\le\frac{X}{t}</math>. Since <math>Y</math> is 0-1 valued, <math>\mathbf{E}[Y]=\Pr[Y=1]=\Pr[X\ge t]</math>. Therefore, | |||
:<math> | |||
\Pr[X\ge t] | |||
= | |||
\mathbf{E}[Y] | |||
\le | |||
\mathbf{E}\left[\frac{X}{t}\right] | |||
=\frac{\mathbf{E}[X]}{t}. | |||
</math> | |||
}} | |||
== Generalization == | |||
For any random variable <math>X</math>, for an arbitrary non-negative real function <math>h</math>, the <math>h(X)</math> is a non-negative random variable. Applying Markov's inequality, we directly have that | |||
:<math> | |||
\Pr[h(X)\ge t]\le\frac{\mathbf{E}[h(X)]}{t}. | |||
</math> | |||
This trivial application of Markov's inequality gives us a powerful tool for proving tail inequalities. With the function <math>h</math> which extracts more information about the random variable, we can prove sharper tail inequalities. | |||
=Chebyshev's inequality= | |||
== Variance == | |||
{{Theorem | |||
|Definition (variance)| | |||
:The '''variance''' of a random variable <math>X</math> is defined as | |||
::<math>\begin{align} | |||
\mathbf{Var}[X]=\mathbf{E}\left[(X-\mathbf{E}[X])^2\right]=\mathbf{E}\left[X^2\right]-(\mathbf{E}[X])^2. | |||
\end{align}</math> | |||
:The '''standard deviation''' of random variable <math>X</math> is | |||
::<math> | |||
\delta[X]=\sqrt{\mathbf{Var}[X]}. | |||
</math> | |||
}} | |||
The variance is the diagonal case for '''covariance'''. | |||
{{Theorem | |||
|Definition (covariance)| | |||
:The '''covariance''' of two random variables <math>X</math> and <math>Y</math> is | |||
::<math>\begin{align} | |||
\mathbf{Cov}(X,Y)=\mathbf{E}\left[(X-\mathbf{E}[X])(Y-\mathbf{E}[Y])\right]. | |||
\end{align}</math> | |||
}} | |||
{{Theorem|Theorem| | We have the following theorem for the variance of sum. | ||
: | |||
{{Theorem | |||
|Theorem| | |||
:For any two random variables <math>X</math> and <math>Y</math>, | |||
::<math>\begin{align} | |||
\mathbf{Var}[X+Y]=\mathbf{Var}[X]+\mathbf{Var}[Y]+2\mathbf{Cov}(X,Y). | |||
\end{align}</math> | |||
:Generally, for any random variables <math>X_1,X_2,\ldots,X_n</math>, | |||
::<math>\begin{align} | |||
\mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]+\sum_{i\neq j}\mathbf{Cov}(X_i,X_j). | |||
\end{align}</math> | |||
}} | |||
{{Proof| The equation for two variables is directly due to the definition of variance and covariance. The equation for <math>n</math> variables can be deduced from the equation for two variables. | |||
}} | }} | ||
For independent random variables, the expectation of a product equals the product of expectations. | |||
{{Theorem | |||
|Theorem| | |||
:For any two independent random variables <math>X</math> and <math>Y</math>, | |||
::<math>\begin{align} | |||
\mathbf{E}[X\cdot Y]=\mathbf{E}[X]\cdot\mathbf{E}[Y]. | |||
\end{align}</math> | |||
}} | |||
{{Proof| | |||
:<math> | |||
\begin{align} | |||
\mathbf{E}[X\cdot Y] | |||
&= | |||
\sum_{x,y}xy\Pr[X=x\wedge Y=y]\\ | |||
&= | |||
\sum_{x,y}xy\Pr[X=x]\Pr[Y=y]\\ | |||
&= | |||
\sum_{x}x\Pr[X=x]\sum_{y}y\Pr[Y=y]\\ | |||
&= | |||
\mathbf{E}[X]\cdot\mathbf{E}[Y]. | |||
\end{align} | |||
</math> | |||
}} | |||
{{Theorem| | Consequently, covariance of independent random variables is always zero. | ||
:<math>\ | {{Theorem | ||
|Theorem| | |||
:For any two independent random variables <math>X</math> and <math>Y</math>, | |||
::<math>\begin{align} | |||
\mathbf{Cov}(X,Y)=0. | |||
\end{align}</math> | |||
}} | |||
{{Proof| | |||
:<math>\begin{align} | |||
\mathbf{Cov}(X,Y) | |||
&=\mathbf{E}\left[(X-\mathbf{E}[X])(Y-\mathbf{E}[Y])\right]\\ | |||
&= \mathbf{E}\left[X-\mathbf{E}[X]\right]\mathbf{E}\left[Y-\mathbf{E}[Y]\right] &\qquad(\mbox{Independence})\\ | |||
&=0. | |||
\end{align}</math> | |||
}} | }} | ||
The variance of the sum of pairwise independent random variables is equal to the sum of variances. | |||
{{Theorem | |||
: | |Theorem| | ||
:For '''pairwise''' independent random variables <math>X_1,X_2,\ldots,X_n</math>, | |||
::<math>\begin{align} | |||
\mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]. | |||
\end{align}</math> | |||
}} | }} | ||
;Remark | |||
:The theorem holds for '''pairwise''' independent random variables, a much weaker independence requirement than the '''mutual''' independence. This makes the second-moment methods very useful for pairwise independent random variables. | |||
=== Variance of binomial distribution === | |||
For a Bernoulli trial with parameter <math>p</math>. | |||
:<math> | |||
X=\begin{cases} | |||
1& \mbox{with probability }p\\ | |||
0& \mbox{with probability }1-p | |||
\end{cases} | |||
</math> | |||
The variance is | |||
:<math> | |||
\mathbf{Var}[X]=\mathbf{E}[X^2]-(\mathbf{E}[X])^2=\mathbf{E}[X]-(\mathbf{E}[X])^2=p-p^2=p(1-p). | |||
</math> | |||
Let <math>Y</math> be a binomial random variable with parameter <math>n</math> and <math>p</math>, i.e. <math>Y=\sum_{i=1}^nY_i</math>, where <math>Y_i</math>'s are i.i.d. Bernoulli trials with parameter <math>p</math>. The variance is | |||
:<math> | |||
\begin{align} | |||
\mathbf{Var}[Y] | |||
&= | |||
\mathbf{Var}\left[\sum_{i=1}^nY_i\right]\\ | |||
&= | |||
\sum_{i=1}^n\mathbf{Var}\left[Y_i\right] &\qquad (\mbox{Independence})\\ | |||
&= | |||
\sum_{i=1}^np(1-p) &\qquad (\mbox{Bernoulli})\\ | |||
&= | |||
p(1-p)n. | |||
\end{align} | |||
</math> | |||
{{Theorem| | == Chebyshev's inequality == | ||
: | {{Theorem | ||
|Theorem (Chebyshev's Inequality)| | |||
:For any <math>t>0</math>, | |||
::<math>\begin{align} | |||
\Pr\left[|X-\mathbf{E}[X]| \ge t\right] \le \frac{\mathbf{Var}[X]}{t^2}. | |||
\end{align}</math> | |||
}} | |||
{{Proof| Observe that | |||
:<math>\Pr[|X-\mathbf{E}[X]| \ge t] = \Pr[(X-\mathbf{E}[X])^2 \ge t^2].</math> | |||
Since <math>(X-\mathbf{E}[X])^2</math> is a nonnegative random variable, we can apply Markov's inequality, such that | |||
:<math> | |||
\Pr[(X-\mathbf{E}[X])^2 \ge t^2] \le | |||
\frac{\mathbf{E}[(X-\mathbf{E}[X])^2]}{t^2} | |||
=\frac{\mathbf{Var}[X]}{t^2}. | |||
</math> | |||
}} | }} |
Latest revision as of 06:13, 9 September 2021
Markov's Inequality
One of the most natural information about a random variable is its expectation, which is the first moment of the random variable. Markov's inequality draws a tail bound for a random variable from its expectation.
Theorem (Markov's Inequality) - Let [math]\displaystyle{ X }[/math] be a random variable assuming only nonnegative values. Then, for all [math]\displaystyle{ t\gt 0 }[/math],
- [math]\displaystyle{ \begin{align} \Pr[X\ge t]\le \frac{\mathbf{E}[X]}{t}. \end{align} }[/math]
- Let [math]\displaystyle{ X }[/math] be a random variable assuming only nonnegative values. Then, for all [math]\displaystyle{ t\gt 0 }[/math],
Proof. Let [math]\displaystyle{ Y }[/math] be the indicator such that - [math]\displaystyle{ \begin{align} Y &= \begin{cases} 1 & \mbox{if }X\ge t,\\ 0 & \mbox{otherwise.} \end{cases} \end{align} }[/math]
It holds that [math]\displaystyle{ Y\le\frac{X}{t} }[/math]. Since [math]\displaystyle{ Y }[/math] is 0-1 valued, [math]\displaystyle{ \mathbf{E}[Y]=\Pr[Y=1]=\Pr[X\ge t] }[/math]. Therefore,
- [math]\displaystyle{ \Pr[X\ge t] = \mathbf{E}[Y] \le \mathbf{E}\left[\frac{X}{t}\right] =\frac{\mathbf{E}[X]}{t}. }[/math]
- [math]\displaystyle{ \square }[/math]
Generalization
For any random variable [math]\displaystyle{ X }[/math], for an arbitrary non-negative real function [math]\displaystyle{ h }[/math], the [math]\displaystyle{ h(X) }[/math] is a non-negative random variable. Applying Markov's inequality, we directly have that
- [math]\displaystyle{ \Pr[h(X)\ge t]\le\frac{\mathbf{E}[h(X)]}{t}. }[/math]
This trivial application of Markov's inequality gives us a powerful tool for proving tail inequalities. With the function [math]\displaystyle{ h }[/math] which extracts more information about the random variable, we can prove sharper tail inequalities.
Chebyshev's inequality
Variance
Definition (variance) - The variance of a random variable [math]\displaystyle{ X }[/math] is defined as
- [math]\displaystyle{ \begin{align} \mathbf{Var}[X]=\mathbf{E}\left[(X-\mathbf{E}[X])^2\right]=\mathbf{E}\left[X^2\right]-(\mathbf{E}[X])^2. \end{align} }[/math]
- The standard deviation of random variable [math]\displaystyle{ X }[/math] is
- [math]\displaystyle{ \delta[X]=\sqrt{\mathbf{Var}[X]}. }[/math]
- The variance of a random variable [math]\displaystyle{ X }[/math] is defined as
The variance is the diagonal case for covariance.
Definition (covariance) - The covariance of two random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] is
- [math]\displaystyle{ \begin{align} \mathbf{Cov}(X,Y)=\mathbf{E}\left[(X-\mathbf{E}[X])(Y-\mathbf{E}[Y])\right]. \end{align} }[/math]
- The covariance of two random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] is
We have the following theorem for the variance of sum.
Theorem - For any two random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math],
- [math]\displaystyle{ \begin{align} \mathbf{Var}[X+Y]=\mathbf{Var}[X]+\mathbf{Var}[Y]+2\mathbf{Cov}(X,Y). \end{align} }[/math]
- Generally, for any random variables [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math],
- [math]\displaystyle{ \begin{align} \mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]+\sum_{i\neq j}\mathbf{Cov}(X_i,X_j). \end{align} }[/math]
- For any two random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math],
Proof. The equation for two variables is directly due to the definition of variance and covariance. The equation for [math]\displaystyle{ n }[/math] variables can be deduced from the equation for two variables.
- [math]\displaystyle{ \square }[/math]
For independent random variables, the expectation of a product equals the product of expectations.
Theorem - For any two independent random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math],
- [math]\displaystyle{ \begin{align} \mathbf{E}[X\cdot Y]=\mathbf{E}[X]\cdot\mathbf{E}[Y]. \end{align} }[/math]
- For any two independent random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math],
Proof. - [math]\displaystyle{ \begin{align} \mathbf{E}[X\cdot Y] &= \sum_{x,y}xy\Pr[X=x\wedge Y=y]\\ &= \sum_{x,y}xy\Pr[X=x]\Pr[Y=y]\\ &= \sum_{x}x\Pr[X=x]\sum_{y}y\Pr[Y=y]\\ &= \mathbf{E}[X]\cdot\mathbf{E}[Y]. \end{align} }[/math]
- [math]\displaystyle{ \square }[/math]
Consequently, covariance of independent random variables is always zero.
Theorem - For any two independent random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math],
- [math]\displaystyle{ \begin{align} \mathbf{Cov}(X,Y)=0. \end{align} }[/math]
- For any two independent random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math],
Proof. - [math]\displaystyle{ \begin{align} \mathbf{Cov}(X,Y) &=\mathbf{E}\left[(X-\mathbf{E}[X])(Y-\mathbf{E}[Y])\right]\\ &= \mathbf{E}\left[X-\mathbf{E}[X]\right]\mathbf{E}\left[Y-\mathbf{E}[Y]\right] &\qquad(\mbox{Independence})\\ &=0. \end{align} }[/math]
- [math]\displaystyle{ \square }[/math]
The variance of the sum of pairwise independent random variables is equal to the sum of variances.
Theorem - For pairwise independent random variables [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math],
- [math]\displaystyle{ \begin{align} \mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]. \end{align} }[/math]
- For pairwise independent random variables [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math],
- Remark
- The theorem holds for pairwise independent random variables, a much weaker independence requirement than the mutual independence. This makes the second-moment methods very useful for pairwise independent random variables.
Variance of binomial distribution
For a Bernoulli trial with parameter [math]\displaystyle{ p }[/math].
- [math]\displaystyle{ X=\begin{cases} 1& \mbox{with probability }p\\ 0& \mbox{with probability }1-p \end{cases} }[/math]
The variance is
- [math]\displaystyle{ \mathbf{Var}[X]=\mathbf{E}[X^2]-(\mathbf{E}[X])^2=\mathbf{E}[X]-(\mathbf{E}[X])^2=p-p^2=p(1-p). }[/math]
Let [math]\displaystyle{ Y }[/math] be a binomial random variable with parameter [math]\displaystyle{ n }[/math] and [math]\displaystyle{ p }[/math], i.e. [math]\displaystyle{ Y=\sum_{i=1}^nY_i }[/math], where [math]\displaystyle{ Y_i }[/math]'s are i.i.d. Bernoulli trials with parameter [math]\displaystyle{ p }[/math]. The variance is
- [math]\displaystyle{ \begin{align} \mathbf{Var}[Y] &= \mathbf{Var}\left[\sum_{i=1}^nY_i\right]\\ &= \sum_{i=1}^n\mathbf{Var}\left[Y_i\right] &\qquad (\mbox{Independence})\\ &= \sum_{i=1}^np(1-p) &\qquad (\mbox{Bernoulli})\\ &= p(1-p)n. \end{align} }[/math]
Chebyshev's inequality
Theorem (Chebyshev's Inequality) - For any [math]\displaystyle{ t\gt 0 }[/math],
- [math]\displaystyle{ \begin{align} \Pr\left[|X-\mathbf{E}[X]| \ge t\right] \le \frac{\mathbf{Var}[X]}{t^2}. \end{align} }[/math]
- For any [math]\displaystyle{ t\gt 0 }[/math],
Proof. Observe that - [math]\displaystyle{ \Pr[|X-\mathbf{E}[X]| \ge t] = \Pr[(X-\mathbf{E}[X])^2 \ge t^2]. }[/math]
Since [math]\displaystyle{ (X-\mathbf{E}[X])^2 }[/math] is a nonnegative random variable, we can apply Markov's inequality, such that
- [math]\displaystyle{ \Pr[(X-\mathbf{E}[X])^2 \ge t^2] \le \frac{\mathbf{E}[(X-\mathbf{E}[X])^2]}{t^2} =\frac{\mathbf{Var}[X]}{t^2}. }[/math]
- [math]\displaystyle{ \square }[/math]