Randomized Algorithms (Spring 2010)/Balls and bins: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>WikiSysop
imported>WikiSysop
Line 1: Line 1:
== Random Variables and Expectations ==
== Probability Basics ==
We are familiar with the [http://en.wikipedia.org/wiki/Inclusion–exclusion_principle principle of inclusion-exclusion] for finite sets.
{|border="1"
{|border="1"
|'''Principle of Inclusion-Exclusion:'''
|'''Principle of Inclusion-Exclusion:'''
:Let <math>S_1, S_2, \ldots, S_n</math> be <math>n</math> finite sets. Then
::<math>\begin{align}
\left|\bigcup_{1\le i\le n}S_i\right|
&=
\sum_{i=1}^n|S_i|
-\sum_{i<j}|S_i\cap S_j|
+\sum_{i<j<k}|S_i\cap S_j\cap S_k|\\
& \quad -\cdots
+(-1)^{\ell-1}\sum_{i_1<i_2<\cdots<i_\ell}\left|\bigcap_{r=1}^\ell S_{i_r}\right|
+\cdots
+(-1)^{n-1} \left|\bigcap_{i=1}^n S_i\right|.
\end{align}</math>
|}
|}
The principle can be generalized to '''probability events'''.
{|border="1"
|'''Principle of Inclusion-Exclusion:'''
:Let <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> be <math>n</math> events. Then
::<math>\begin{align}
\Pr\left[\bigvee_{1\le i\le n}\mathcal{E}_i\right]
&=
\sum_{i=1}^n\Pr[\mathcal{E}_i]
-\sum_{i<j}\Pr[\mathcal{E}_i\wedge \mathcal{E}_j]
+\sum_{i<j<k}\Pr[\mathcal{E}_i\wedge \mathcal{E}_j\wedge \mathcal{E}_k]\\
& \quad -\cdots
+(-1)^{\ell-1}\sum_{i_1<i_2<\cdots<i_\ell}\Pr\left[\bigwedge_{r=1}^\ell \mathcal{E}_{i_r}\right]
+\cdots
+(-1)^{n-1}\Pr\left[\bigwedge_{i=1}^n \mathcal{E}_{i}\right].
\end{align}</math>
|}
The proof of the principle is due to measure theory, and is omitted here. The following inequality follows (nontrivially) the principle of inclusion-exclusion:
{|border="1"
|'''Theorem (the union bound):'''
:Let <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> be <math>n</math> events. Then
::<math>\begin{align}
\Pr\left[\bigvee_{1\le i\le n}\mathcal{E}_i\right]
&\le
\sum_{i=1}^n\Pr[\mathcal{E}_i].
\end{align}</math>
|}
The name of this inequality is [http://en.wikipedia.org/wiki/Boole's_inequality Boole's inequality]. It is usually referred by its nickname "the '''union bound'''".


Let <math>X</math> be a discrete '''random variable'''.  
Let <math>X</math> be a discrete '''random variable'''.  

Revision as of 12:22, 15 January 2010

Probability Basics

We are familiar with the principle of inclusion-exclusion for finite sets.

Principle of Inclusion-Exclusion:
Let [math]\displaystyle{ S_1, S_2, \ldots, S_n }[/math] be [math]\displaystyle{ n }[/math] finite sets. Then
[math]\displaystyle{ \begin{align} \left|\bigcup_{1\le i\le n}S_i\right| &= \sum_{i=1}^n|S_i| -\sum_{i\lt j}|S_i\cap S_j| +\sum_{i\lt j\lt k}|S_i\cap S_j\cap S_k|\\ & \quad -\cdots +(-1)^{\ell-1}\sum_{i_1\lt i_2\lt \cdots\lt i_\ell}\left|\bigcap_{r=1}^\ell S_{i_r}\right| +\cdots +(-1)^{n-1} \left|\bigcap_{i=1}^n S_i\right|. \end{align} }[/math]

The principle can be generalized to probability events.

Principle of Inclusion-Exclusion:
Let [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] be [math]\displaystyle{ n }[/math] events. Then
[math]\displaystyle{ \begin{align} \Pr\left[\bigvee_{1\le i\le n}\mathcal{E}_i\right] &= \sum_{i=1}^n\Pr[\mathcal{E}_i] -\sum_{i\lt j}\Pr[\mathcal{E}_i\wedge \mathcal{E}_j] +\sum_{i\lt j\lt k}\Pr[\mathcal{E}_i\wedge \mathcal{E}_j\wedge \mathcal{E}_k]\\ & \quad -\cdots +(-1)^{\ell-1}\sum_{i_1\lt i_2\lt \cdots\lt i_\ell}\Pr\left[\bigwedge_{r=1}^\ell \mathcal{E}_{i_r}\right] +\cdots +(-1)^{n-1}\Pr\left[\bigwedge_{i=1}^n \mathcal{E}_{i}\right]. \end{align} }[/math]

The proof of the principle is due to measure theory, and is omitted here. The following inequality follows (nontrivially) the principle of inclusion-exclusion:

Theorem (the union bound):
Let [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] be [math]\displaystyle{ n }[/math] events. Then
[math]\displaystyle{ \begin{align} \Pr\left[\bigvee_{1\le i\le n}\mathcal{E}_i\right] &\le \sum_{i=1}^n\Pr[\mathcal{E}_i]. \end{align} }[/math]

The name of this inequality is Boole's inequality. It is usually referred by its nickname "the union bound".

Let [math]\displaystyle{ X }[/math] be a discrete random variable.

Definition (Independence):
Two random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] are independent if and only if
[math]\displaystyle{ \begin{align} \Pr[X=x\wedge Y=y] &= \Pr[X=x]\cdot\Pr[Y=y] \end{align} }[/math]
for all values [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math].

The expectation of [math]\displaystyle{ X }[/math] is defined as follows.

Definition (Expectation):
The expectation of a discrete random variable [math]\displaystyle{ X }[/math], denoted by [math]\displaystyle{ \mathbb{E}[X] }[/math], is given by
[math]\displaystyle{ \begin{align} \mathbb{E}[X] &= \sum_{x}x\Pr[X=x], \end{align} }[/math]
where the summation is over all values [math]\displaystyle{ x }[/math] in the range of [math]\displaystyle{ X }[/math].

Linearity of Expectation

Balls-into-bins model

Imagine that [math]\displaystyle{ m }[/math] balls are thrown into [math]\displaystyle{ n }[/math] bins, in such a way that each ball is thrown into a bin which is uniformly and independently chosen from all [math]\displaystyle{ n }[/math] bins. We may ask several questions regarding the distribution of balls in the bins, including:

  • the probability that there is no bin with more than one balls (the birthday problem)
  • the expected number of balls in each bin (occupancy problem)
  • the maximum load of all bins with high probability (occupancy problem)
  • the probability that there is no empty bin (coupon collector problem)

The coupon collector problem

Deviation bounds

Markov's inequality

Chebyshev's inequality

The coupon collector revisited

The [math]\displaystyle{ k }[/math]-Median Problem