Latest revision as of 06:13, 9 September 2021

Markov's Inequality

One of the most natural information about a random variable is its expectation, which is the first moment of the random variable. Markov's inequality draws a tail bound for a random variable from its expectation.

Theorem (Markov's Inequality)

Let [math]\displaystyle{ X }[/math] be a random variable assuming only nonnegative values. Then, for all [math]\displaystyle{ t\gt 0 }[/math],

[math]\displaystyle{ \begin{align} \Pr[X\ge t]\le \frac{\mathbf{E}[X]}{t}. \end{align} }[/math]

Proof.

Let [math]\displaystyle{ Y }[/math] be the indicator such that

[math]\displaystyle{ \begin{align} Y &= \begin{cases} 1 & \mbox{if }X\ge t,\\ 0 & \mbox{otherwise.} \end{cases} \end{align} }[/math]

It holds that [math]\displaystyle{ Y\le\frac{X}{t} }[/math]. Since [math]\displaystyle{ Y }[/math] is 0-1 valued, [math]\displaystyle{ \mathbf{E}[Y]=\Pr[Y=1]=\Pr[X\ge t] }[/math]. Therefore,

[math]\displaystyle{ \Pr[X\ge t] = \mathbf{E}[Y] \le \mathbf{E}\left[\frac{X}{t}\right] =\frac{\mathbf{E}[X]}{t}. }[/math]

[math]\displaystyle{ \square }[/math]

Generalization

For any random variable [math]\displaystyle{ X }[/math], for an arbitrary non-negative real function [math]\displaystyle{ h }[/math], the [math]\displaystyle{ h(X) }[/math] is a non-negative random variable. Applying Markov's inequality, we directly have that

[math]\displaystyle{ \Pr[h(X)\ge t]\le\frac{\mathbf{E}[h(X)]}{t}. }[/math]

This trivial application of Markov's inequality gives us a powerful tool for proving tail inequalities. With the function [math]\displaystyle{ h }[/math] which extracts more information about the random variable, we can prove sharper tail inequalities.

Chebyshev's inequality

Variance

Definition (variance)

The variance of a random variable [math]\displaystyle{ X }[/math] is defined as

[math]\displaystyle{ \begin{align} \mathbf{Var}[X]=\mathbf{E}\left[(X-\mathbf{E}[X])^2\right]=\mathbf{E}\left[X^2\right]-(\mathbf{E}[X])^2. \end{align} }[/math]

The standard deviation of random variable [math]\displaystyle{ X }[/math] is

[math]\displaystyle{ \delta[X]=\sqrt{\mathbf{Var}[X]}. }[/math]

The variance is the diagonal case for covariance.

Definition (covariance)

The covariance of two random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] is

[math]\displaystyle{ \begin{align} \mathbf{Cov}(X,Y)=\mathbf{E}\left[(X-\mathbf{E}[X])(Y-\mathbf{E}[Y])\right]. \end{align} }[/math]

We have the following theorem for the variance of sum.

Theorem

For any two random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math],

[math]\displaystyle{ \begin{align} \mathbf{Var}[X+Y]=\mathbf{Var}[X]+\mathbf{Var}[Y]+2\mathbf{Cov}(X,Y). \end{align} }[/math]

Generally, for any random variables [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math],

[math]\displaystyle{ \begin{align} \mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]+\sum_{i\neq j}\mathbf{Cov}(X_i,X_j). \end{align} }[/math]

Proof.

The equation for two variables is directly due to the definition of variance and covariance. The equation for [math]\displaystyle{ n }[/math] variables can be deduced from the equation for two variables.

[math]\displaystyle{ \square }[/math]

For independent random variables, the expectation of a product equals the product of expectations.

Theorem

For any two independent random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math],

[math]\displaystyle{ \begin{align} \mathbf{E}[X\cdot Y]=\mathbf{E}[X]\cdot\mathbf{E}[Y]. \end{align} }[/math]

Proof.

[math]\displaystyle{ \begin{align} \mathbf{E}[X\cdot Y] &= \sum_{x,y}xy\Pr[X=x\wedge Y=y]\\ &= \sum_{x,y}xy\Pr[X=x]\Pr[Y=y]\\ &= \sum_{x}x\Pr[X=x]\sum_{y}y\Pr[Y=y]\\ &= \mathbf{E}[X]\cdot\mathbf{E}[Y]. \end{align} }[/math]

[math]\displaystyle{ \square }[/math]

Consequently, covariance of independent random variables is always zero.

Theorem

For any two independent random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math],

[math]\displaystyle{ \begin{align} \mathbf{Cov}(X,Y)=0. \end{align} }[/math]

Proof.

[math]\displaystyle{ \begin{align} \mathbf{Cov}(X,Y) &=\mathbf{E}\left[(X-\mathbf{E}[X])(Y-\mathbf{E}[Y])\right]\\ &= \mathbf{E}\left[X-\mathbf{E}[X]\right]\mathbf{E}\left[Y-\mathbf{E}[Y]\right] &\qquad(\mbox{Independence})\\ &=0. \end{align} }[/math]

[math]\displaystyle{ \square }[/math]

The variance of the sum of pairwise independent random variables is equal to the sum of variances.

Theorem

For pairwise independent random variables [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math],

[math]\displaystyle{ \begin{align} \mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]. \end{align} }[/math]

Remark: The theorem holds for pairwise independent random variables, a much weaker independence requirement than the mutual independence. This makes the second-moment methods very useful for pairwise independent random variables.

Variance of binomial distribution

For a Bernoulli trial with parameter [math]\displaystyle{ p }[/math].

[math]\displaystyle{ X=\begin{cases} 1& \mbox{with probability }p\\ 0& \mbox{with probability }1-p \end{cases} }[/math]

The variance is

[math]\displaystyle{ \mathbf{Var}[X]=\mathbf{E}[X^2]-(\mathbf{E}[X])^2=\mathbf{E}[X]-(\mathbf{E}[X])^2=p-p^2=p(1-p). }[/math]

Let [math]\displaystyle{ Y }[/math] be a binomial random variable with parameter [math]\displaystyle{ n }[/math] and [math]\displaystyle{ p }[/math], i.e. [math]\displaystyle{ Y=\sum_{i=1}^nY_i }[/math], where [math]\displaystyle{ Y_i }[/math]'s are i.i.d. Bernoulli trials with parameter [math]\displaystyle{ p }[/math]. The variance is

[math]\displaystyle{ \begin{align} \mathbf{Var}[Y] &= \mathbf{Var}\left[\sum_{i=1}^nY_i\right]\\ &= \sum_{i=1}^n\mathbf{Var}\left[Y_i\right] &\qquad (\mbox{Independence})\\ &= \sum_{i=1}^np(1-p) &\qquad (\mbox{Bernoulli})\\ &= p(1-p)n. \end{align} }[/math]

Chebyshev's inequality

Theorem (Chebyshev's Inequality)

For any [math]\displaystyle{ t\gt 0 }[/math],

[math]\displaystyle{ \begin{align} \Pr\left[|X-\mathbf{E}[X]| \ge t\right] \le \frac{\mathbf{Var}[X]}{t^2}. \end{align} }[/math]

Proof.

Observe that

[math]\displaystyle{ \Pr[|X-\mathbf{E}[X]| \ge t] = \Pr[(X-\mathbf{E}[X])^2 \ge t^2]. }[/math]

Since [math]\displaystyle{ (X-\mathbf{E}[X])^2 }[/math] is a nonnegative random variable, we can apply Markov's inequality, such that

[math]\displaystyle{ \Pr[(X-\mathbf{E}[X])^2 \ge t^2] \le \frac{\mathbf{E}[(X-\mathbf{E}[X])^2]}{t^2} =\frac{\mathbf{Var}[X]}{t^2}. }[/math]

[math]\displaystyle{ \square }[/math]

高级算法 (Fall 2021)/Finite Field Basics and 高级算法 (Fall 2021)/Basic tail inequalities: Difference between pages

Latest revision as of 06:13, 9 September 2021

Contents

Markov's Inequality

Generalization

Chebyshev's inequality

Variance

Variance of binomial distribution

Chebyshev's inequality

Navigation menu

@@ Line 1: / Line 1: @@
-=Field=
+=Markov's Inequality=
-Let <math>S</math> be a set, '''closed''' under binary operations <math>+</math> (addition) and <math>\cdot</math> (multiplication). It gives us the following algebraic structures if the corresponding set of axioms are satisfied.
-{|class="wikitable"
-!colspan="7"|Structures
-!Axioms
-!Operations
-|-
-|rowspan="9" style="background-color:#ffffcc;text-align:center;"|'''''field'''''
-|rowspan="8" style="background-color:#ffffcc;text-align:center;"|'''''commutative<br>ring'''''
-|rowspan="7" style="background-color:#ffffcc;text-align:center;"|'''''ring'''''
-|rowspan="4" style="background-color:#ffffcc;text-align:center;"|'''''abelian<br>group'''''
-|rowspan="3" style="background-color:#ffffcc;text-align:center;"|'''''group'''''
-| rowspan="2" style="background-color:#ffffcc;text-align:center;"|'''''monoid'''''
-|style="background-color:#ffffcc;text-align:center;"|'''''semigroup'''''
-|1. '''Addition''' is '''associative''': <math>\forall x,y,z\in S, (x+y)+z= x+(y+z).</math>
-|rowspan="4" style="text-align:center;"|<math>+</math>
-|-
-|
-|2. Existence of '''additive identity 0''': <math>\forall x\in S, x+0= 0+x=x.</math>
-|-
-|colspan="2"|
-|3. Everyone has an '''additive inverse''': <math>\forall x\in S, \exists -x\in S, \text{ s.t. } x+(-x)= (-x)+x=0.</math>
-|-
-|colspan="3"|
-|4. '''Addition''' is '''commutative''': <math>\forall x,y\in S, x+y= y+x.</math>
-|-
-|colspan="4" rowspan="3"|
-|5. Multiplication '''distributes''' over addition: <math>\forall x,y,z\in S, x\cdot(y+z)= x\cdot y+x\cdot z</math> and <math>(y+z)\cdot x= y\cdot x+z\cdot x.</math>
-|style="text-align:center;"|<math>+,\cdot</math>
-|-
-|6. '''Multiplication''' is '''associative''': <math>\forall x,y,z\in S, (x\cdot y)\cdot z= x\cdot (y\cdot z).</math>
-|rowspan="4" style="text-align:center;"|<math>\cdot</math>
-|-
-|7. Existence of '''multiplicative identity 1''':  <math>\forall x\in S, x\cdot 1= 1\cdot x=x.</math>
-|-
-|colspan="5"|
-|8. '''Multiplication''' is '''commutative''': <math>\forall x,y\in S, x\cdot y= y\cdot x.</math>
-|-
-|colspan="6"|
-|9. Every non-zero element has a '''multiplicative inverse''': <math>\forall x\in S\setminus\{0\}, \exists x^{-1}\in S, \text{ s.t. } x\cdot x^{-1}= x^{-1}\cdot x=1.</math>
-|}
-The semigroup, monoid, group and abelian group are given by <math>(S,+)</math>, and the ring, commutative ring, and field are given by <math>(S,+,\cdot)</math>.
-Examples:
+One of the most natural information about a random variable is its expectation, which is the first moment of the random variable. Markov's inequality draws a tail bound for a random variable from its expectation.
-* '''Infinite fields''': <math>\mathbb{Q}</math>, <math>\mathbb{R}</math>, <math>\mathbb{C}</math> are fields. The integer set <math>\mathbb{Z}</math> is a commutative ring but is not a field.
+{{Theorem
-* '''Finite fields''': Finite fields are called '''Galois fields'''. The number of elements of a finite field is called its '''order'''. A finite field of order <math>q</math>, is usually denoted as <math>\mathsf{GF}(q)</math> or <math>\mathbb{F}_q</math>.
+|Theorem (Markov's Inequality)|
-** '''Prime field''' <math>{\mathbb{Z}_p}</math>: For any integer <math>n>1</math>,  <math>\mathbb{Z}_n=\{0,1,\ldots,n-1\}</math> under modulo-<math>p</math> addition <math>+</math> and multiplication <math>\cdot</math> forms a commutative ring. It is called '''quotient ring''', and is sometimes denoted as <math>\mathbb{Z}/n\mathbb{Z}</math>.  In particular, for '''prime''' <math>p</math>, <math>\mathbb{Z}_p</math> is a field. This can be verified by [http://en.wikipedia.org/wiki/Fermat%27s_little_theorem Fermat's little theorem].
+:Let <math>X</math> be a random variable assuming only nonnegative values. Then, for all <math>t>0</math>,
-** '''Boolean arithmetics''' <math>\mathsf{GF}(2)</math>: The finite field of order 2 <math>\mathsf{GF}(2)</math> contains only two elements 0 and 1, with bit-wise XOR as addition and bit-wise AND as multiplication. <math>\mathsf{GF}(2^n)</math>
+::<math>\begin{align}
-** Other examples: There are other examples of finite fields, for instance <math>\{a+bi\mid a,b\in \mathbb{Z}_3\}</math> where <math>i=\sqrt{-1}</math>. This field is isomorphic to <math>\mathsf{GF}(9)</math>. In fact, the following theorem holds for finite fields of given order.
+\Pr[X\ge t]\le \frac{\mathbf{E}[X]}{t}.
+\end{align}</math>
+}}
+{{Proof| Let <math>Y</math> be the indicator such that
+:<math>\begin{align}
+Y &=
+\begin{cases}
+& \mbox{if }X\ge t,\\
+& \mbox{otherwise.}
+\end{cases}
+\end{align}</math>
+It holds that <math>Y\le\frac{X}{t}</math>. Since <math>Y</math> is 0-1 valued, <math>\mathbf{E}[Y]=\Pr[Y=1]=\Pr[X\ge t]</math>. Therefore,
+:<math>
+\Pr[X\ge t]
+=
+\mathbf{E}[Y]
+\le
+\mathbf{E}\left[\frac{X}{t}\right]
+=\frac{\mathbf{E}[X]}{t}.
+</math>
+}}
+== Generalization ==
+For any random variable <math>X</math>, for an arbitrary non-negative real function <math>h</math>, the <math>h(X)</math> is a non-negative random variable. Applying Markov's inequality, we directly have that
+:<math>
+\Pr[h(X)\ge t]\le\frac{\mathbf{E}[h(X)]}{t}.
+</math>
+This trivial application of Markov's inequality gives us a powerful tool for proving tail inequalities. With the function <math>h</math> which extracts more information about the random variable, we can prove sharper tail inequalities.
+=Chebyshev's inequality=
+== Variance ==
+{{Theorem
+|Definition (variance)|
+:The '''variance''' of a random variable <math>X</math> is defined as
+::<math>\begin{align}
+\mathbf{Var}[X]=\mathbf{E}\left[(X-\mathbf{E}[X])^2\right]=\mathbf{E}\left[X^2\right]-(\mathbf{E}[X])^2.
+\end{align}</math>
+:The '''standard deviation''' of random variable <math>X</math> is
+::<math>
+\delta[X]=\sqrt{\mathbf{Var}[X]}.
+</math>
+}}
+The variance is the diagonal case for '''covariance'''.
+{{Theorem
+|Definition (covariance)|
+:The '''covariance''' of two random variables <math>X</math> and <math>Y</math> is
+::<math>\begin{align}
+\mathbf{Cov}(X,Y)=\mathbf{E}\left[(X-\mathbf{E}[X])(Y-\mathbf{E}[Y])\right].
+\end{align}</math>
+}}
-{{Theorem|Theorem|
+We have the following theorem for the variance of sum.
-:A finite field of order <math>q</math> exists if and only if <math>q=p^k</math> for some prime number <math>p</math> and positive integer <math>k</math>. Moreover, all fields of a given order are isomorphic.
+{{Theorem
+|Theorem|
+:For any two random variables <math>X</math> and <math>Y</math>,
+::<math>\begin{align}
+\mathbf{Var}[X+Y]=\mathbf{Var}[X]+\mathbf{Var}[Y]+2\mathbf{Cov}(X,Y).
+\end{align}</math>
+:Generally, for any random variables <math>X_1,X_2,\ldots,X_n</math>,
+::<math>\begin{align}
+\mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]+\sum_{i\neq j}\mathbf{Cov}(X_i,X_j).
+\end{align}</math>
+}}
+{{Proof| The equation for two variables is directly due to the definition of variance and covariance. The equation for <math>n</math> variables can be deduced from the equation for two variables.
 }}
-=Polynomial over a field=
+For independent random variables, the expectation of a product equals the product of expectations.
-Given a field <math>\mathbb{F}</math>, the '''polynomial ring''' <math>\mathbb{F}[x]</math> consists of all polynomials in the variable <math>x</math> with coefficients in <math>\mathbb{F}</math>. Addition and multiplication of polynomials are naturally defined by applying the distributive law and combining like terms.
+{{Theorem
+|Theorem|
+:For any two independent random variables <math>X</math> and <math>Y</math>,
+::<math>\begin{align}
+\mathbf{E}[X\cdot Y]=\mathbf{E}[X]\cdot\mathbf{E}[Y].
+\end{align}</math>
+}}
+{{Proof|
+:<math>
+\begin{align}
+\mathbf{E}[X\cdot Y]
+&=
+\sum_{x,y}xy\Pr[X=x\wedge Y=y]\\
+&=
+\sum_{x,y}xy\Pr[X=x]\Pr[Y=y]\\
+&=
+\sum_{x}x\Pr[X=x]\sum_{y}y\Pr[Y=y]\\
+&=
+\mathbf{E}[X]\cdot\mathbf{E}[Y].
+\end{align}
+</math>
+}}
-{{Theorem|Proposition (polynomial ring)|
+Consequently, covariance of independent random variables is always zero.
-:<math>\mathbb{F}[x]</math> is a ring.
+{{Theorem
+|Theorem|
+:For any two independent random variables <math>X</math> and <math>Y</math>,
+::<math>\begin{align}
+\mathbf{Cov}(X,Y)=0.
+\end{align}</math>
+}}
+{{Proof|
+:<math>\begin{align}
+\mathbf{Cov}(X,Y)
+&=\mathbf{E}\left[(X-\mathbf{E}[X])(Y-\mathbf{E}[Y])\right]\\
+&= \mathbf{E}\left[X-\mathbf{E}[X]\right]\mathbf{E}\left[Y-\mathbf{E}[Y]\right] &\qquad(\mbox{Independence})\\
+&=0.
+\end{align}</math>
 }}
-The '''degree''' <math>\mathrm{deg}(f)</math> of a polynomial <math>f\in \mathbb{F}[x]</math> is the exponent on the '''leading term''', the term with a nonzero coefficient that has the largest exponent.
-Because <math>\mathbb{F}[x]</math> is a ring, we cannot do division the way we do it in a field like <math>\mathbb{R}</math>, but we can do division the way we do it in a ring like <math>\mathbb{Z}</math>, leaving a '''remainder'''. The equivalent of the '''integer division''' for <math>\mathbb{Z}</math> is as follows.
+The variance of the sum of pairwise independent random variables is equal to the sum of variances.
-{{Theorem|Proposition (division for polynomials)|
+{{Theorem
-:Given a polynomial <math>f</math> and a nonzero polynomial <math>g</math> in <math>\mathbb{F}[x]</math>, there are unique polynomials <math>q</math> and <math>r</math> such that <math>f =q\cdot g+r</math> and <math>\mathrm{deg}(r)<\mathrm{deg}(g)</math>.
+|Theorem|
+:For '''pairwise''' independent random variables <math>X_1,X_2,\ldots,X_n</math>,
+::<math>\begin{align}
+\mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i].
+\end{align}</math>
 }}
-The proof of this is by induction on <math>\mathrm{deg}(f)</math>, with the basis <math>\mathrm{deg}(f)<\mathrm{deg}(g)</math>, in which case the theorem holds trivially by letting <math>q=0</math> and <math>r=f</math>.
-As we turn <math>\mathbb{Z}</math> (a ring) into a finite field <math>\mathbb{Z}_p</math> by taking quotients <math>\bmod p</math>, we can turn a polynomial ring <math>\mathbb{F}[x]</math> into a finite field by taking <math>\mathbb{F}[x]</math> modulo a "prime-like" polynomial, using the division of polynomials above.
+;Remark
+:The theorem holds for '''pairwise''' independent random variables, a much weaker independence requirement than the '''mutual''' independence. This makes the second-moment methods very useful for pairwise independent random variables.
+=== Variance of binomial distribution ===
+For a Bernoulli trial with parameter <math>p</math>.
+:<math>
+X=\begin{cases}
+& \mbox{with probability }p\\
+& \mbox{with probability }1-p
+\end{cases}
+</math>
+The variance is
+:<math>
+\mathbf{Var}[X]=\mathbf{E}[X^2]-(\mathbf{E}[X])^2=\mathbf{E}[X]-(\mathbf{E}[X])^2=p-p^2=p(1-p).
+</math>
+Let <math>Y</math> be a binomial random variable with parameter <math>n</math> and <math>p</math>, i.e. <math>Y=\sum_{i=1}^nY_i</math>, where <math>Y_i</math>'s are i.i.d. Bernoulli trials with parameter <math>p</math>. The variance is
+:<math>
+\begin{align}
+\mathbf{Var}[Y]
+&=
+\mathbf{Var}\left[\sum_{i=1}^nY_i\right]\\
+&=
+\sum_{i=1}^n\mathbf{Var}\left[Y_i\right] &\qquad (\mbox{Independence})\\
+&=
+\sum_{i=1}^np(1-p) &\qquad (\mbox{Bernoulli})\\
+&=
+p(1-p)n.
+\end{align}
+</math>
-{{Theorem|Definition (irreducible polynomial)|
+== Chebyshev's inequality ==
-:An '''irreducible polynomial''', or a '''prime polynomial''', is a non-constant polynomial <math>f</math> that ''cannot'' be factored as <math>f=g\cdot h</math> for any non-constant polynomials <math>g</math> and <math>h</math>.
+{{Theorem
+|Theorem (Chebyshev's Inequality)|
+:For any <math>t>0</math>,
+::<math>\begin{align}
+\Pr\left[|X-\mathbf{E}[X]| \ge t\right] \le \frac{\mathbf{Var}[X]}{t^2}.
+\end{align}</math>
+}}
+{{Proof| Observe that
+:<math>\Pr[|X-\mathbf{E}[X]| \ge t] = \Pr[(X-\mathbf{E}[X])^2 \ge t^2].</math>
+Since <math>(X-\mathbf{E}[X])^2</math> is a nonnegative random variable, we can apply Markov's inequality, such that
+:<math>
+\Pr[(X-\mathbf{E}[X])^2 \ge t^2] \le
+\frac{\mathbf{E}[(X-\mathbf{E}[X])^2]}{t^2}
+=\frac{\mathbf{Var}[X]}{t^2}.
+</math>
 }}

高级算法 (Fall 2021)/Finite Field Basics and 高级算法 (Fall 2021)/Basic tail inequalities: Difference between pages

Latest revision as of 06:13, 9 September 2021

Markov's Inequality

Generalization

Chebyshev's inequality

Variance

Variance of binomial distribution

Chebyshev's inequality

Navigation menu

Search