# Markov's Inequality

One of the most natural information about a random variable is its expectation, which is the first moment of the random variable. Markov's inequality draws a tail bound for a random variable from its expectation.

 Theorem (Markov's Inequality) Let ${\displaystyle X}$ be a random variable assuming only nonnegative values. Then, for all ${\displaystyle t>0}$, {\displaystyle {\begin{aligned}\Pr[X\geq t]\leq {\frac {\mathbf {E} [X]}{t}}.\end{aligned}}}
Proof.
 Let ${\displaystyle Y}$ be the indicator such that {\displaystyle {\begin{aligned}Y&={\begin{cases}1&{\mbox{if }}X\geq t,\\0&{\mbox{otherwise.}}\end{cases}}\end{aligned}}} It holds that ${\displaystyle Y\leq {\frac {X}{t}}}$. Since ${\displaystyle Y}$ is 0-1 valued, ${\displaystyle \mathbf {E} [Y]=\Pr[Y=1]=\Pr[X\geq t]}$. Therefore, ${\displaystyle \Pr[X\geq t]=\mathbf {E} [Y]\leq \mathbf {E} \left[{\frac {X}{t}}\right]={\frac {\mathbf {E} [X]}{t}}.}$
${\displaystyle \square }$

## Generalization

For any random variable ${\displaystyle X}$, for an arbitrary non-negative real function ${\displaystyle h}$, the ${\displaystyle h(X)}$ is a non-negative random variable. Applying Markov's inequality, we directly have that

${\displaystyle \Pr[h(X)\geq t]\leq {\frac {\mathbf {E} [h(X)]}{t}}.}$

This trivial application of Markov's inequality gives us a powerful tool for proving tail inequalities. With the function ${\displaystyle h}$ which extracts more information about the random variable, we can prove sharper tail inequalities.

# Chebyshev's inequality

## Variance

 Definition (variance) The variance of a random variable ${\displaystyle X}$ is defined as {\displaystyle {\begin{aligned}\mathbf {Var} [X]=\mathbf {E} \left[(X-\mathbf {E} [X])^{2}\right]=\mathbf {E} \left[X^{2}\right]-(\mathbf {E} [X])^{2}.\end{aligned}}} The standard deviation of random variable ${\displaystyle X}$ is ${\displaystyle \delta [X]={\sqrt {\mathbf {Var} [X]}}.}$

The variance is the diagonal case for covariance.

 Definition (covariance) The covariance of two random variables ${\displaystyle X}$ and ${\displaystyle Y}$ is {\displaystyle {\begin{aligned}\mathbf {Cov} (X,Y)=\mathbf {E} \left[(X-\mathbf {E} [X])(Y-\mathbf {E} [Y])\right].\end{aligned}}}

We have the following theorem for the variance of sum.

 Theorem For any two random variables ${\displaystyle X}$ and ${\displaystyle Y}$, {\displaystyle {\begin{aligned}\mathbf {Var} [X+Y]=\mathbf {Var} [X]+\mathbf {Var} [Y]+2\mathbf {Cov} (X,Y).\end{aligned}}} Generally, for any random variables ${\displaystyle X_{1},X_{2},\ldots ,X_{n}}$, {\displaystyle {\begin{aligned}\mathbf {Var} \left[\sum _{i=1}^{n}X_{i}\right]=\sum _{i=1}^{n}\mathbf {Var} [X_{i}]+\sum _{i\neq j}\mathbf {Cov} (X_{i},X_{j}).\end{aligned}}}
Proof.
 The equation for two variables is directly due to the definition of variance and covariance. The equation for ${\displaystyle n}$ variables can be deduced from the equation for two variables.
${\displaystyle \square }$

For independent random variables, the expectation of a product equals the product of expectations.

 Theorem For any two independent random variables ${\displaystyle X}$ and ${\displaystyle Y}$, {\displaystyle {\begin{aligned}\mathbf {E} [X\cdot Y]=\mathbf {E} [X]\cdot \mathbf {E} [Y].\end{aligned}}}
Proof.
 {\displaystyle {\begin{aligned}\mathbf {E} [X\cdot Y]&=\sum _{x,y}xy\Pr[X=x\wedge Y=y]\\&=\sum _{x,y}xy\Pr[X=x]\Pr[Y=y]\\&=\sum _{x}x\Pr[X=x]\sum _{y}y\Pr[Y=y]\\&=\mathbf {E} [X]\cdot \mathbf {E} [Y].\end{aligned}}}
${\displaystyle \square }$

Consequently, covariance of independent random variables is always zero.

 Theorem For any two independent random variables ${\displaystyle X}$ and ${\displaystyle Y}$, {\displaystyle {\begin{aligned}\mathbf {Cov} (X,Y)=0.\end{aligned}}}
Proof.
 {\displaystyle {\begin{aligned}\mathbf {Cov} (X,Y)&=\mathbf {E} \left[(X-\mathbf {E} [X])(Y-\mathbf {E} [Y])\right]\\&=\mathbf {E} \left[X-\mathbf {E} [X]\right]\mathbf {E} \left[Y-\mathbf {E} [Y]\right]&\qquad ({\mbox{Independence}})\\&=0.\end{aligned}}}
${\displaystyle \square }$

The variance of the sum of pairwise independent random variables is equal to the sum of variances.

 Theorem For pairwise independent random variables ${\displaystyle X_{1},X_{2},\ldots ,X_{n}}$, {\displaystyle {\begin{aligned}\mathbf {Var} \left[\sum _{i=1}^{n}X_{i}\right]=\sum _{i=1}^{n}\mathbf {Var} [X_{i}].\end{aligned}}}
Remark
The theorem holds for pairwise independent random variables, a much weaker independence requirement than the mutual independence. This makes the second-moment methods very useful for pairwise independent random variables.

### Variance of binomial distribution

For a Bernoulli trial with parameter ${\displaystyle p}$.

${\displaystyle X={\begin{cases}1&{\mbox{with probability }}p\\0&{\mbox{with probability }}1-p\end{cases}}}$

The variance is

${\displaystyle \mathbf {Var} [X]=\mathbf {E} [X^{2}]-(\mathbf {E} [X])^{2}=\mathbf {E} [X]-(\mathbf {E} [X])^{2}=p-p^{2}=p(1-p).}$

Let ${\displaystyle Y}$ be a binomial random variable with parameter ${\displaystyle n}$ and ${\displaystyle p}$, i.e. ${\displaystyle Y=\sum _{i=1}^{n}Y_{i}}$, where ${\displaystyle Y_{i}}$'s are i.i.d. Bernoulli trials with parameter ${\displaystyle p}$. The variance is

{\displaystyle {\begin{aligned}\mathbf {Var} [Y]&=\mathbf {Var} \left[\sum _{i=1}^{n}Y_{i}\right]\\&=\sum _{i=1}^{n}\mathbf {Var} \left[Y_{i}\right]&\qquad ({\mbox{Independence}})\\&=\sum _{i=1}^{n}p(1-p)&\qquad ({\mbox{Bernoulli}})\\&=p(1-p)n.\end{aligned}}}

## Chebyshev's inequality

 Theorem (Chebyshev's Inequality) For any ${\displaystyle t>0}$, {\displaystyle {\begin{aligned}\Pr \left[|X-\mathbf {E} [X]|\geq t\right]\leq {\frac {\mathbf {Var} [X]}{t^{2}}}.\end{aligned}}}
Proof.
 Observe that ${\displaystyle \Pr[|X-\mathbf {E} [X]|\geq t]=\Pr[(X-\mathbf {E} [X])^{2}\geq t^{2}].}$ Since ${\displaystyle (X-\mathbf {E} [X])^{2}}$ is a nonnegative random variable, we can apply Markov's inequality, such that ${\displaystyle \Pr[(X-\mathbf {E} [X])^{2}\geq t^{2}]\leq {\frac {\mathbf {E} [(X-\mathbf {E} [X])^{2}]}{t^{2}}}={\frac {\mathbf {Var} [X]}{t^{2}}}.}$
${\displaystyle \square }$