随机算法 (Fall 2011)/Probability Space: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>WikiSysop
No edit summary
imported>WikiSysop
No edit summary
Line 1: Line 1:
In probability theory class we have learned the basic concepts of '''events''' and '''random variables'''. The following are some examples:
=Axioms of Probability=
::{|border="1"
| '''event''' <math>\mathcal{E}</math>:
| '''random variable''' <math>X</math>:
|-
|<math>\mathcal{E}_1</math>: 75 out of 100 coin flips are HEADs
| <math>X_1</math> is the number of HEADs in the 100 coin flips
|-
|<math>\mathcal{E}_2</math>: the result  of rolling a 6-sided die is an even number (2, 4, 6)
| <math>X_2</math> is the result of rolling a die
|-
| <math>\mathcal{E}_3</math>: tomorrow is rainy
| <math>X_3=1</math> if tomorrow is rainy and <math>X_3=0</math> if otherwise
|}
 
Events and random variables can be related:
* An event can be defined from random variables <math>X, Y</math> by a predicate <math>P(X, Y)</math>. (<math>\mathcal{E}</math>: X>Y)
* A boolean  random variable <math>X</math> can be defined from an event <math>\mathcal{E}</math> as follows:
:<math>\begin{align}X=\begin{cases}1& \mathcal{E} \text{ occurs}\\
0& \text{ otherwise}\end{cases}\end{align}</math>
We say that <math>X</math> ''indicates'' the event <math>\mathcal{E}</math>. We will later see that this indicator trick is very useful for probabilistic analysis.
 
 
The formal definitions of events and random variables are related to the foundation of probability theory, which is not the topic of this class. Here we only give an informal description (you may ignore them if you feel confused):
* A '''probability space''' is a set <math>\Omega</math> of elementary events, also called '''samples'''. (6 sides of a die)
* An '''event''' is a subset of <math>\Omega</math>, i.e. a set of elementary events. (<math>\mathcal{E}_2</math>: side 2, 4, 6 of the die)
* A '''random variable''' is not really a variable, but a function! The function maps events to values. (<math>X_2</math> maps the first side of the die to 1, the second side to 2, the third side to 3, ...)
 
The axiom foundation of probability theory is laid by [http://en.wikipedia.org/wiki/Andrey_Kolmogorov Kolmogorov], one of the greatest mathematician of the 20th century, who advanced various very different fields of mathematics.
The axiom foundation of probability theory is laid by [http://en.wikipedia.org/wiki/Andrey_Kolmogorov Kolmogorov], one of the greatest mathematician of the 20th century, who advanced various very different fields of mathematics.


=Axioms of Probability=
{{Theorem|Definition (Probability Space)|
 
{{Theorem|Definition (Probability space)|
A '''probability space''' is a triple <math>(\Omega,\Sigma,\Pr)</math>.  
A '''probability space''' is a triple <math>(\Omega,\Sigma,\Pr)</math>.  
*<math>\Omega</math> is a set, called the '''sample space'''.  
*<math>\Omega</math> is a set, called the '''sample space'''.  
*<math>\Sigma\subseteq 2^{\Omega}</math> is the set of all '''events''', satisfying:
*<math>\Sigma\subseteq 2^{\Omega}</math> is the set of all '''events''', satisfying:
*:(A1) <math>\Omega\in\Sigma</math> and <math>\empty\in\Sigma</math>. (The ''certain'' event and the ''impossible'' event.)
*:(A1). <math>\Omega\in\Sigma</math> and <math>\empty\in\Sigma</math>. (The ''certain'' event and the ''impossible'' event.)
*:(A2) If <math>A,B\in\Sigma</math>, then <math>A\cap B, A\cup B, A-B\in\Sigma</math>. (Intersection, union, and diference of two events are events).
*:(A2). If <math>A,B\in\Sigma</math>, then <math>A\cap B, A\cup B, A-B\in\Sigma</math>. (Intersection, union, and diference of two events are events).
* A '''probability measure''' <math>\Pr:\Sigma\rightarrow\mathbb{R}</math> is a function that maps each event to a nonnegative real number, satisfying
* A '''probability measure''' <math>\Pr:\Sigma\rightarrow\mathbb{R}</math> is a function that maps each event to a nonnegative real number, satisfying
*:(A3) <math>\Pr(\Omega)=1</math>.
*:(A3). <math>\Pr(\Omega)=1</math>.
*:(A4) If <math>A\cap B=\emptyset</math> (such events are call ''disjoint'' events), then  
*:(A4). If <math>A\cap B=\emptyset</math> (such events are call ''disjoint'' events), then <math>\Pr(A\cup B)=\Pr(A)+\Pr(B)</math>.
*::<math>\Pr(A\cup B)=\Pr(A)+\Pr(B)</math>.  
*:(A5*). For a decreasing sequence of events <math>A_1\supset A_2\supset \cdots\supset A_n\supset\cdots</math> of events with <math>\bigcap_n A_n=\emptyset</math>, it holds that <math>\lim_{n\rightarrow \infty}\Pr(A_n)=0</math>.
}}
}}
The sample space <math>\Omega</math> is the set of all possible outcomes of the random process modeled by the probability space. An event is a subset of <math>\Omega</math>. The statements (A1)--(A5) are axioms of probability. A probability space is well defined as long as these axioms are satisfied.
;Example
:Consider the probability space defined by rolling a dice with six faces. The sample space is <math>\Omega=\{1,2,3,4,5,6\}</math>, and <math>\Sigma=2^{\Omega}</math>. For any event <math>A\in\Sigma</math>, its probability is given by <math>\Pr(A)=\frac{|A|}{6}</math>.
;Remark
* In general, the set <math>\Omega</math> may be continuous, but we only consider '''discrete''' probability in this lecture, thus we assume that <math>\Omega</math> is either finite or countably infinite.
* In many cases (such as the above example), <math>\Sigma=2^{\Omega}</math>, i.e. the events enumerates all subsets of <math>\Omega</math>. But in general, a probability space is well-defined by any <math>\Sigma</math> satisfying (A1) and (A2). Such <math>\Sigma</math> is called a <math>\sigma</math>-algebra defined on <math>\Omega</math>.
* The last axiom (A5*) is redundant if <math>\Sigma</math> is finite, thus it is only essential when there are infinitely many events. The role of axiom (A5*) in probability theory is like Zorn's Lemma (or equivalently the Axiom of Choice) in axiomatic set theory.


= Independent events =
Laws for probability can be deduced from the above axiom system. Denote that <math>\bar{A}=\Omega-A</math>.
{{Theorem
{{Theorem|Proposition|
|Definition (Independent events)|
:<math>\Pr(\bar{A})=1-\Pr(A)</math>.
:Two events <math>\mathcal{E}_1</math> and <math>\mathcal{E}_2</math> are '''independent''' if and only if
::<math>\begin{align}
\Pr\left[\mathcal{E}_1 \wedge \mathcal{E}_2\right]
&=
\Pr[\mathcal{E}_1]\cdot\Pr[\mathcal{E}_2].
\end{align}</math>
}}
}}
This definition can be generalized to any number of events:
{{Proof|
{{Theorem
Due to Axiom (A4), <math>\Pr(\bar{A})+\Pr(A)=\Pr(\Omega)</math> which is equal to 1 according to Axiom (A3), thus <math>\Pr(\bar{A})+\Pr(A)=1</math>. The proposition follows.
|Definition (Independent events)|
:Events <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> are '''mutually independent''' if and only if, for any subset <math>I\subseteq\{1,2,\ldots,n\}</math>,
::<math>\begin{align}
\Pr\left[\bigwedge_{i\in I}\mathcal{E}_i\right]
&=
\prod_{i\in I}\Pr[\mathcal{E}_i].
\end{align}</math>
}}
}}


Note that in probability theory, the "mutual independence" is <font color="red">not</font> equivalent with "pair-wise independence", which we will learn in the future.
Exercise: Deduce other useful laws for probability from the axioms. For example, <math>A\subseteq B\Longrightarrow\Pr(A)\le\Pr(B)</math>.
 
= Notation =
An event <math>A\subseteq\Omega</math> can be represented as <math>A=\{a\in\Omega\mid \mathcal{E}(a)\}</math> with a predicate <math>\mathcal{E}</math>.
 
The predicate notation of probability is
:<math>\Pr[\mathcal{E}]=\Pr(\{a\in\Omega\mid \mathcal{E}(a)\})</math>.
;Example
: We still consider the probability space by rolling a six-face dice. The sample space is <math>\Omega=\{1,2,3,4,5,6\}</math>. Consider the event that the outcome is odd.
:: <math>\Pr[\text{ the outcome is odd }]=\Pr(\{1,3,5\})</math>.
 
During the lecture, we mostly use the predicate notation instead of subset notation.


= The union bound =
= The Union Bound =
We are familiar with the [http://en.wikipedia.org/wiki/Inclusion–exclusion_principle principle of inclusion-exclusion] for finite sets.
We are familiar with the [http://en.wikipedia.org/wiki/Inclusion–exclusion_principle principle of inclusion-exclusion] for finite sets.
{{Theorem
{{Theorem
Line 84: Line 61:
}}
}}


The principle can be generalized to '''probability events'''.
The principle can be generalized to probability events.
{{Theorem
{{Theorem
|Principle of Inclusion-Exclusion|
|Principle of Inclusion-Exclusion for Probability|
:Let <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> be <math>n</math> events. Then
:Let <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> be <math>n</math> events. Then
::<math>\begin{align}
::<math>\begin{align}
Line 101: Line 78:
}}
}}


The principle is implied by the axiomatic formalization of probability. The following inequality is implied (nontrivially) by the principle of inclusion-exclusion:
We only prove the basic case for two events.
{{Theorem|Lemma|
:For any two events <math>\mathcal{E}_1</math> and <math>\mathcal{E}_2</math>,
::<math>\Pr[\mathcal{E}_1\vee\mathcal{E}_2]=\Pr[\mathcal{E}_1]+\Pr[\mathcal{E}_2]-\Pr[\mathcal{E}_1\wedge\mathcal{E}_2]</math>.
}}
{{Proof| The followings are due to Axiom (A4).
:<math>\begin{align}
\Pr[\mathcal{E}_1]
&=\Pr[\mathcal{E}_1\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_1\wedge\mathcal{E}_2];\\
\Pr[\mathcal{E}_2]
&=\Pr[\mathcal{E}_2\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_1\wedge\mathcal{E}_2];\\
\Pr[\mathcal{E}_1\vee\mathcal{E}_2]
&=\Pr[\mathcal{E}_1\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_2\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_1\wedge\mathcal{E}_2].
\end{align}</math>
The lemma follows directly.
}}
 
A direct consequence of the lemma is the following theorem, the '''union bound'''.
{{Theorem
{{Theorem
|Theorem (the union bound)|
|Theorem (Union Bound)|
:Let <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> be <math>n</math> events. Then
:Let <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> be <math>n</math> events. Then
::<math>\begin{align}
::<math>\begin{align}
Line 111: Line 105:
\end{align}</math>
\end{align}</math>
}}
}}
The name of this inequality is [http://en.wikipedia.org/wiki/Boole's_inequality Boole's inequality]. It is usually referred by its nickname "the '''union bound'''". The bound holds for arbitrary events, even if they are dependent. Due to this generality, the union bound is extremely useful for analyzing the union of events.
The name of this inequality is [http://en.wikipedia.org/wiki/Boole's_inequality Boole's inequality]. It is usually referred by its nickname the "union bound". The bound holds for arbitrary events, even if they are dependent. Due to this generality, the union bound is extremely useful in probabilistic analysis.
 
= Independence =
{{Theorem
|Definition (Independent events)|
:Two events <math>\mathcal{E}_1</math> and <math>\mathcal{E}_2</math> are '''independent''' if and only if
::<math>\begin{align}
\Pr\left[\mathcal{E}_1 \wedge \mathcal{E}_2\right]
&=
\Pr[\mathcal{E}_1]\cdot\Pr[\mathcal{E}_2].
\end{align}</math>
}}
This definition can be generalized to any number of events:
{{Theorem
|Definition (Independent events)|
:Events <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> are '''mutually independent''' if and only if, for any subset <math>I\subseteq\{1,2,\ldots,n\}</math>,
::<math>\begin{align}
\Pr\left[\bigwedge_{i\in I}\mathcal{E}_i\right]
&=
\prod_{i\in I}\Pr[\mathcal{E}_i].
\end{align}</math>
}}
 
Note that in probability theory, the "mutual independence" is <font color="red">not</font> equivalent with "pair-wise independence", which we will learn in the future.

Revision as of 15:12, 22 July 2011

Axioms of Probability

The axiom foundation of probability theory is laid by Kolmogorov, one of the greatest mathematician of the 20th century, who advanced various very different fields of mathematics.

Definition (Probability Space)

A probability space is a triple [math]\displaystyle{ (\Omega,\Sigma,\Pr) }[/math].

  • [math]\displaystyle{ \Omega }[/math] is a set, called the sample space.
  • [math]\displaystyle{ \Sigma\subseteq 2^{\Omega} }[/math] is the set of all events, satisfying:
    (A1). [math]\displaystyle{ \Omega\in\Sigma }[/math] and [math]\displaystyle{ \empty\in\Sigma }[/math]. (The certain event and the impossible event.)
    (A2). If [math]\displaystyle{ A,B\in\Sigma }[/math], then [math]\displaystyle{ A\cap B, A\cup B, A-B\in\Sigma }[/math]. (Intersection, union, and diference of two events are events).
  • A probability measure [math]\displaystyle{ \Pr:\Sigma\rightarrow\mathbb{R} }[/math] is a function that maps each event to a nonnegative real number, satisfying
    (A3). [math]\displaystyle{ \Pr(\Omega)=1 }[/math].
    (A4). If [math]\displaystyle{ A\cap B=\emptyset }[/math] (such events are call disjoint events), then [math]\displaystyle{ \Pr(A\cup B)=\Pr(A)+\Pr(B) }[/math].
    (A5*). For a decreasing sequence of events [math]\displaystyle{ A_1\supset A_2\supset \cdots\supset A_n\supset\cdots }[/math] of events with [math]\displaystyle{ \bigcap_n A_n=\emptyset }[/math], it holds that [math]\displaystyle{ \lim_{n\rightarrow \infty}\Pr(A_n)=0 }[/math].

The sample space [math]\displaystyle{ \Omega }[/math] is the set of all possible outcomes of the random process modeled by the probability space. An event is a subset of [math]\displaystyle{ \Omega }[/math]. The statements (A1)--(A5) are axioms of probability. A probability space is well defined as long as these axioms are satisfied.

Example
Consider the probability space defined by rolling a dice with six faces. The sample space is [math]\displaystyle{ \Omega=\{1,2,3,4,5,6\} }[/math], and [math]\displaystyle{ \Sigma=2^{\Omega} }[/math]. For any event [math]\displaystyle{ A\in\Sigma }[/math], its probability is given by [math]\displaystyle{ \Pr(A)=\frac{|A|}{6} }[/math].
Remark
  • In general, the set [math]\displaystyle{ \Omega }[/math] may be continuous, but we only consider discrete probability in this lecture, thus we assume that [math]\displaystyle{ \Omega }[/math] is either finite or countably infinite.
  • In many cases (such as the above example), [math]\displaystyle{ \Sigma=2^{\Omega} }[/math], i.e. the events enumerates all subsets of [math]\displaystyle{ \Omega }[/math]. But in general, a probability space is well-defined by any [math]\displaystyle{ \Sigma }[/math] satisfying (A1) and (A2). Such [math]\displaystyle{ \Sigma }[/math] is called a [math]\displaystyle{ \sigma }[/math]-algebra defined on [math]\displaystyle{ \Omega }[/math].
  • The last axiom (A5*) is redundant if [math]\displaystyle{ \Sigma }[/math] is finite, thus it is only essential when there are infinitely many events. The role of axiom (A5*) in probability theory is like Zorn's Lemma (or equivalently the Axiom of Choice) in axiomatic set theory.

Laws for probability can be deduced from the above axiom system. Denote that [math]\displaystyle{ \bar{A}=\Omega-A }[/math].

Proposition
[math]\displaystyle{ \Pr(\bar{A})=1-\Pr(A) }[/math].
Proof.

Due to Axiom (A4), [math]\displaystyle{ \Pr(\bar{A})+\Pr(A)=\Pr(\Omega) }[/math] which is equal to 1 according to Axiom (A3), thus [math]\displaystyle{ \Pr(\bar{A})+\Pr(A)=1 }[/math]. The proposition follows.

[math]\displaystyle{ \square }[/math]

Exercise: Deduce other useful laws for probability from the axioms. For example, [math]\displaystyle{ A\subseteq B\Longrightarrow\Pr(A)\le\Pr(B) }[/math].

Notation

An event [math]\displaystyle{ A\subseteq\Omega }[/math] can be represented as [math]\displaystyle{ A=\{a\in\Omega\mid \mathcal{E}(a)\} }[/math] with a predicate [math]\displaystyle{ \mathcal{E} }[/math].

The predicate notation of probability is

[math]\displaystyle{ \Pr[\mathcal{E}]=\Pr(\{a\in\Omega\mid \mathcal{E}(a)\}) }[/math].
Example
We still consider the probability space by rolling a six-face dice. The sample space is [math]\displaystyle{ \Omega=\{1,2,3,4,5,6\} }[/math]. Consider the event that the outcome is odd.
[math]\displaystyle{ \Pr[\text{ the outcome is odd }]=\Pr(\{1,3,5\}) }[/math].

During the lecture, we mostly use the predicate notation instead of subset notation.

The Union Bound

We are familiar with the principle of inclusion-exclusion for finite sets.

Principle of Inclusion-Exclusion
Let [math]\displaystyle{ S_1, S_2, \ldots, S_n }[/math] be [math]\displaystyle{ n }[/math] finite sets. Then
[math]\displaystyle{ \begin{align} \left|\bigcup_{1\le i\le n}S_i\right| &= \sum_{i=1}^n|S_i| -\sum_{i\lt j}|S_i\cap S_j| +\sum_{i\lt j\lt k}|S_i\cap S_j\cap S_k|\\ & \quad -\cdots +(-1)^{\ell-1}\sum_{i_1\lt i_2\lt \cdots\lt i_\ell}\left|\bigcap_{r=1}^\ell S_{i_r}\right| +\cdots +(-1)^{n-1} \left|\bigcap_{i=1}^n S_i\right|. \end{align} }[/math]

The principle can be generalized to probability events.

Principle of Inclusion-Exclusion for Probability
Let [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] be [math]\displaystyle{ n }[/math] events. Then
[math]\displaystyle{ \begin{align} \Pr\left[\bigvee_{1\le i\le n}\mathcal{E}_i\right] &= \sum_{i=1}^n\Pr[\mathcal{E}_i] -\sum_{i\lt j}\Pr[\mathcal{E}_i\wedge \mathcal{E}_j] +\sum_{i\lt j\lt k}\Pr[\mathcal{E}_i\wedge \mathcal{E}_j\wedge \mathcal{E}_k]\\ & \quad -\cdots +(-1)^{\ell-1}\sum_{i_1\lt i_2\lt \cdots\lt i_\ell}\Pr\left[\bigwedge_{r=1}^\ell \mathcal{E}_{i_r}\right] +\cdots +(-1)^{n-1}\Pr\left[\bigwedge_{i=1}^n \mathcal{E}_{i}\right]. \end{align} }[/math]

We only prove the basic case for two events.

Lemma
For any two events [math]\displaystyle{ \mathcal{E}_1 }[/math] and [math]\displaystyle{ \mathcal{E}_2 }[/math],
[math]\displaystyle{ \Pr[\mathcal{E}_1\vee\mathcal{E}_2]=\Pr[\mathcal{E}_1]+\Pr[\mathcal{E}_2]-\Pr[\mathcal{E}_1\wedge\mathcal{E}_2] }[/math].
Proof.
The followings are due to Axiom (A4).
[math]\displaystyle{ \begin{align} \Pr[\mathcal{E}_1] &=\Pr[\mathcal{E}_1\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_1\wedge\mathcal{E}_2];\\ \Pr[\mathcal{E}_2] &=\Pr[\mathcal{E}_2\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_1\wedge\mathcal{E}_2];\\ \Pr[\mathcal{E}_1\vee\mathcal{E}_2] &=\Pr[\mathcal{E}_1\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_2\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_1\wedge\mathcal{E}_2]. \end{align} }[/math]

The lemma follows directly.

[math]\displaystyle{ \square }[/math]

A direct consequence of the lemma is the following theorem, the union bound.

Theorem (Union Bound)
Let [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] be [math]\displaystyle{ n }[/math] events. Then
[math]\displaystyle{ \begin{align} \Pr\left[\bigvee_{1\le i\le n}\mathcal{E}_i\right] &\le \sum_{i=1}^n\Pr[\mathcal{E}_i]. \end{align} }[/math]

The name of this inequality is Boole's inequality. It is usually referred by its nickname the "union bound". The bound holds for arbitrary events, even if they are dependent. Due to this generality, the union bound is extremely useful in probabilistic analysis.

Independence

Definition (Independent events)
Two events [math]\displaystyle{ \mathcal{E}_1 }[/math] and [math]\displaystyle{ \mathcal{E}_2 }[/math] are independent if and only if
[math]\displaystyle{ \begin{align} \Pr\left[\mathcal{E}_1 \wedge \mathcal{E}_2\right] &= \Pr[\mathcal{E}_1]\cdot\Pr[\mathcal{E}_2]. \end{align} }[/math]

This definition can be generalized to any number of events:

Definition (Independent events)
Events [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] are mutually independent if and only if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math],
[math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i\in I}\mathcal{E}_i\right] &= \prod_{i\in I}\Pr[\mathcal{E}_i]. \end{align} }[/math]

Note that in probability theory, the "mutual independence" is not equivalent with "pair-wise independence", which we will learn in the future.