随机算法 (Fall 2011) and 随机算法 (Fall 2011)/Probability Space: Difference between pages

From TCS Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Etone
 
 
Line 1: Line 1:
{{Infobox
=Axioms of Probability=
|name        = Infobox
The axiom foundation of probability theory is laid by [http://en.wikipedia.org/wiki/Andrey_Kolmogorov Kolmogorov], one of the greatest mathematician of the 20th century, who advanced various very different fields of mathematics.
|bodystyle    =  
|title        = <font size=3>随机算法
<br>Randomized Algorithms</font>
|titlestyle  =


|image        =  
{{Theorem|Definition (Probability Space)|
|imagestyle  =  
A '''probability space''' is a triple <math>(\Omega,\Sigma,\Pr)</math>.
|caption      =  
*<math>\Omega</math> is a set, called the '''sample space'''.
|captionstyle =
*<math>\Sigma\subseteq 2^{\Omega}</math> is the set of all '''events''', satisfying:
|headerstyle  = background:#ccf;
*:(A1). <math>\Omega\in\Sigma</math> and <math>\empty\in\Sigma</math>. (The ''certain'' event and the ''impossible'' event.)
|labelstyle  = background:#ddf;
*:(A2). If <math>A,B\in\Sigma</math>, then <math>A\cap B, A\cup B, A-B\in\Sigma</math>. (Intersection, union, and diference of two events are events).
|datastyle    =  
* A '''probability measure''' <math>\Pr:\Sigma\rightarrow\mathbb{R}</math> is a function that maps each event to a nonnegative real number, satisfying
*:(A3). <math>\Pr(\Omega)=1</math>.
*:(A4). If <math>A\cap B=\emptyset</math> (such events are call ''disjoint'' events), then <math>\Pr(A\cup B)=\Pr(A)+\Pr(B)</math>.
*:(A5*). For a decreasing sequence of events <math>A_1\supset A_2\supset \cdots\supset A_n\supset\cdots</math> of events with <math>\bigcap_n A_n=\emptyset</math>, it holds that <math>\lim_{n\rightarrow \infty}\Pr(A_n)=0</math>.
}}
The sample space <math>\Omega</math> is the set of all possible outcomes of the random process modeled by the probability space. An event is a subset of <math>\Omega</math>. The statements (A1)--(A5) are axioms of probability. A probability space is well defined as long as these axioms are satisfied.
;Example
:Consider the probability space defined by rolling a dice with six faces. The sample space is <math>\Omega=\{1,2,3,4,5,6\}</math>, and <math>\Sigma</math> is the power set <math>2^{\Omega}</math>. For any event <math>A\in\Sigma</math>, its probability is given by <math>\Pr(A)=\frac{|A|}{6}</math>.
 
;Remark
* In general, the set <math>\Omega</math> may be continuous, but we only consider '''discrete''' probability in this lecture, thus we assume that <math>\Omega</math> is either finite or countably infinite.
* In many cases (such as the above example), <math>\Sigma=2^{\Omega}</math>, i.e. the events enumerates all subsets of <math>\Omega</math>. But in general, a probability space is well-defined by any <math>\Sigma</math> satisfying (A1) and (A2). Such <math>\Sigma</math> is called a <math>\sigma</math>-algebra defined on <math>\Omega</math>.
* The last axiom (A5*) is redundant if <math>\Sigma</math> is finite, thus it is only essential when there are infinitely many events. The role of axiom (A5*) in probability theory is like [http://en.wikipedia.org/wiki/Zorn's_lemma Zorn's Lemma] (or equivalently the [http://en.wikipedia.org/wiki/Axiom_of_choice Axiom of Choice]) in axiomatic set theory.


|header1 =Instructor
Laws for probability can be deduced from the above axiom system. Denote that <math>\bar{A}=\Omega-A</math>.
|label1  =
{{Theorem|Proposition|
|data1  =
:<math>\Pr(\bar{A})=1-\Pr(A)</math>.
|header2 =
|label2  =
|data2  = 尹一通
|header3 =
|label3  = Email
|data3  = yitong.yin@gmail.com  yinyt@nju.edu.cn 
|header4 =
|label4= office
|data4= 计算机系 804
|header5 = Class
|label5  =
|data5  =
|header6 =
|label6  = Class meetings
|data6  = Monday, 10am-12pm <br> 仙逸C-420
|header7 =
|label7  = Place
|data7  =  
|header8 =
|label8  = Office hours
|data8  = Wednesday, 2-5pm <br>计算机系 804
|header9 = Textbooks
|label9  =
|data9  =
|header10 =
|label10  =
|data10  =
{{Infobox
|name        =
|bodystyle  =
|title        =
|titlestyle  =
|image        = [[File:MR-randomized-algorithms.png|border|100px]]
|imagestyle  =
|caption      = ''Randomized Algorithms'',<br>Motwani and Raghavan, Cambridge Univ Press, 1995.
|captionstyle =
}}
|header11 =
|label11  =
|data11  =
{{Infobox
|name        =
|bodystyle  =
|title        =
|titlestyle  =
|image        = [[File:Probability_and_Computing.png|border|100px]]
|imagestyle  =
|caption      = ''Probability and Computing: Randomized Algorithms and Probabilistic Analysis'', Mitzenmacher and Upfal, Cambridge Univ Press, 2005.
|captionstyle =
}}
}}
|belowstyle = background:#ddf;
{{Proof|
|below =  
Due to Axiom (A4), <math>\Pr(\bar{A})+\Pr(A)=\Pr(\Omega)</math> which is equal to 1 according to Axiom (A3), thus <math>\Pr(\bar{A})+\Pr(A)=1</math>. The proposition follows.
}}
}}


This is the page for the class ''Randomized Algorithms'' for the Fall 2011 semester. Students who take this class should check this page periodically for content updates and new announcements.  
Exercise: Deduce other useful laws for probability from the axioms. For example, <math>A\subseteq B\Longrightarrow\Pr(A)\le\Pr(B)</math>.


= Announcement =  
= Notation =
* (09/19/2011) <font size=3 color=red>第一次作业发布,在Assignments部分。下周一9月26日上课时交。</font>
An event <math>A\subseteq\Omega</math> can be represented as <math>A=\{a\in\Omega\mid \mathcal{E}(a)\}</math> with a predicate <math>\mathcal{E}</math>.
* 由于有事需要外出,9月14日星期三下午的office hour改在9月13日下午。
* 第一次课的slides已发布,见lecture notes部分。以后每节课slides的链接都会在那一课的题目后面。


= Course info =
The predicate notation of probability is
* '''Instructor ''': 尹一通,
:<math>\Pr[\mathcal{E}]=\Pr(\{a\in\Omega\mid \mathcal{E}(a)\})</math>.
:*email: yitong.yin@gmail.com, yinyt@nju.edu.cn
;Example
:*office: 计算机系 804.
: We still consider the probability space by rolling a six-face dice. The sample space is <math>\Omega=\{1,2,3,4,5,6\}</math>. Consider the event that the outcome is odd.
* '''Class meeting''': Monday 10am-12pm, 仙逸C-420.
:: <math>\Pr[\text{ the outcome is odd }]=\Pr(\{1,3,5\})</math>.
* '''Office hour''': Wednesday 2-5pm, 计算机系 804.


= Syllabus =
During the lecture, we mostly use the predicate notation instead of subset notation.
随机化(randomization)是现代计算机科学最重要的方法之一,近二十年来被广泛的应用于计算机科学的各个领域。在这些应用的背后,是一些共通的随机化原理。在随机算法这门课程中,我们将用数学的语言描述这些原理,将会介绍以下内容:
* 一些重要的随机算法的设计思想和理论分析;
* 概率论工具及其在算法分析中的应用,包括常用的概率不等式,以及数学证明的概率方法 (the probabilistic method);
* 随机算法的概率模型,包括典型的随机算法模型,以及概率复杂度模型。
作为一门理论课程,这门课的内容偏重数学上的分析和证明。这么做的目的不单纯是为了追求严格性,而是因为用更聪明的方法去解决问题往往需要具备有一定深度的数学思维和数学洞察力。


=== 先修课程 Prerequisites ===
= The Union Bound =
* 必须:离散数学,概率论,线性代数。
We are familiar with the [http://en.wikipedia.org/wiki/Inclusion–exclusion_principle principle of inclusion-exclusion] for finite sets.
* 推荐:算法设计与分析。
{{Theorem
|Principle of Inclusion-Exclusion|
:Let <math>S_1, S_2, \ldots, S_n</math> be <math>n</math> finite sets. Then
::<math>\begin{align}
\left|\bigcup_{1\le i\le n}S_i\right|
&=
\sum_{i=1}^n|S_i|
-\sum_{i<j}|S_i\cap S_j|
+\sum_{i<j<k}|S_i\cap S_j\cap S_k|\\
& \quad -\cdots
+(-1)^{\ell-1}\sum_{i_1<i_2<\cdots<i_\ell}\left|\bigcap_{r=1}^\ell S_{i_r}\right|
+\cdots
+(-1)^{n-1} \left|\bigcap_{i=1}^n S_i\right|.
\end{align}</math>
}}


=== Course materials ===
The principle can be generalized to probability events.
* [[随机算法 (Fall 2011)/Course materials|教材和参考书]]
{{Theorem
|Principle of Inclusion-Exclusion for Probability|
:Let <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> be <math>n</math> events. Then
::<math>\begin{align}
\Pr\left[\bigvee_{1\le i\le n}\mathcal{E}_i\right]
&=
\sum_{i=1}^n\Pr[\mathcal{E}_i]
-\sum_{i<j}\Pr[\mathcal{E}_i\wedge \mathcal{E}_j]
+\sum_{i<j<k}\Pr[\mathcal{E}_i\wedge \mathcal{E}_j\wedge \mathcal{E}_k]\\
& \quad -\cdots
+(-1)^{\ell-1}\sum_{i_1<i_2<\cdots<i_\ell}\Pr\left[\bigwedge_{r=1}^\ell \mathcal{E}_{i_r}\right]
+\cdots
+(-1)^{n-1}\Pr\left[\bigwedge_{i=1}^n \mathcal{E}_{i}\right].
\end{align}</math>
}}


=== 成绩 Grades ===
We only prove the basic case for two events.
* 课程成绩:本课程将会有六次作业和一次期末考试。最终成绩将由平时作业成绩和期末考试成绩综合得出。
{{Theorem|Lemma|
* 迟交:如果有特殊的理由,无法按时完成作业,请提前联系授课老师,给出正当理由。否则迟交的作业将不被接受。
:For any two events <math>\mathcal{E}_1</math> and <math>\mathcal{E}_2</math>,
::<math>\Pr[\mathcal{E}_1\vee\mathcal{E}_2]=\Pr[\mathcal{E}_1]+\Pr[\mathcal{E}_2]-\Pr[\mathcal{E}_1\wedge\mathcal{E}_2]</math>.
}}
{{Proof| The followings are due to Axiom (A4).
:<math>\begin{align}
\Pr[\mathcal{E}_1]
&=\Pr[\mathcal{E}_1\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_1\wedge\mathcal{E}_2];\\
\Pr[\mathcal{E}_2]
&=\Pr[\mathcal{E}_2\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_1\wedge\mathcal{E}_2];\\
\Pr[\mathcal{E}_1\vee\mathcal{E}_2]
&=\Pr[\mathcal{E}_1\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_2\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_1\wedge\mathcal{E}_2].
\end{align}</math>
The lemma follows directly.
}}


=== <font color=red> 学术诚信 Academic Integrity </font>===
A direct consequence of the lemma is the following theorem, the '''union bound'''.
学术诚信是所有从事学术活动的学生和学者最基本的职业道德底线,本课程将不遗余力的维护学术诚信规范,违反这一底线的行为将不会被容忍。
{{Theorem
|Theorem (Union Bound)|
:Let <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> be <math>n</math> events. Then
::<math>\begin{align}
\Pr\left[\bigvee_{1\le i\le n}\mathcal{E}_i\right]
&\le
\sum_{i=1}^n\Pr[\mathcal{E}_i].
\end{align}</math>
}}
The name of this inequality is [http://en.wikipedia.org/wiki/Boole's_inequality Boole's inequality]. It is usually referred by its nickname the "union bound". The bound holds for arbitrary events, even if they are dependent. Due to this generality, the union bound is extremely useful in probabilistic analysis.


作业完成的原则:署你名字的工作必须由你完成。允许讨论,但作业必须独立完成,并在作业中列出所有参与讨论的人。不允许其他任何形式的合作——尤其是与已经完成作业的同学“讨论”。
= Independence =
 
{{Theorem
本课程将对剽窃行为采取零容忍的态度。在完成作业过程中,对他人工作(出版物、互联网资料、其他人的作业等)直接的文本抄袭和对关键思想、关键元素的抄袭,按照 [http://www.acm.org/publications/policies/plagiarism_policy ACM Policy on Plagiarism]的解释,都将视为剽窃。剽窃者成绩将被取消。如果发现互相抄袭行为,<font color=red> 抄袭和被抄袭双方的成绩都将被取消</font>。因此请主动防止自己的作业被他人抄袭。
|Definition (Independent events)|
 
:Two events <math>\mathcal{E}_1</math> and <math>\mathcal{E}_2</math> are '''independent''' if and only if
学术诚信影响学生个人的品行,也关乎整个教育系统的正常运转。为了一点分数而做出学术不端的行为,不仅使自己沦为一个欺骗者,也使他人的诚实努力失去意义。让我们一起努力维护一个诚信的环境。
::<math>\begin{align}
 
\Pr\left[\mathcal{E}_1 \wedge \mathcal{E}_2\right]
= Assignments =
&=
* (2011/09/19) [[随机算法 (Fall 2011)/Problem set 1|Problem set 1]] due on Sept 26, in class.
\Pr[\mathcal{E}_1]\cdot\Pr[\mathcal{E}_2].
* (2011/10/10) [[随机算法 (Fall 2011)/Problem set 2|Problem set 2]] due on Oct 24, in class.
\end{align}</math>
* (2011/11/07) [[随机算法 (Fall 2011)/Problem set 3|Problem set 3]] due on Nov 21, in class.
}}
 
This definition can be generalized to any number of events:
= Lecture Notes =
{{Theorem
# Introduction  | [ftp://tcs.nju.edu.cn/slides/random2011/random1.pdf slides]
|Definition (Independent events)|
#* [[随机算法 (Fall 2011)/Randomized Algorithms: an Introduction|Randomized Algorithms: an Introduction]]
:Events <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> are '''mutually independent''' if and only if, for any subset <math>I\subseteq\{1,2,\ldots,n\}</math>,
#*[[随机算法 (Fall 2011)/Complexity Classes|Complexity Classes]]
::<math>\begin{align}
# Probability Basics |  [ftp://tcs.nju.edu.cn/slides/random2011/random2.pdf slides]
\Pr\left[\bigwedge_{i\in I}\mathcal{E}_i\right]
#* [[随机算法 (Fall 2011)/Probability Space|Probability Space]]
&=
#* [[随机算法 (Fall 2011)/Verifying Matrix Multiplication|Verifying Matrix Multiplication]]
\prod_{i\in I}\Pr[\mathcal{E}_i].
#* [[随机算法 (Fall 2011)/Conditional Probability|Conditional Probability]]
\end{align}</math>
#* [[随机算法 (Fall 2011)/Randomized Min-Cut|Randomized Min-Cut]]
}}
#* [[随机算法 (Fall 2011)/Random Variables and Expectations|Random Variables and Expectations]]
#* [[随机算法 (Fall 2011)/Randomized Quicksort|Randomized Quicksort]]
# Balls and Bins  [ftp://tcs.nju.edu.cn/slides/random2011/random3.pdf slides]
#* [[随机算法 (Fall 2011)/Distributions of Coin Flipping|Distributions of Coin Flipping]]
#* [[随机算法 (Fall 2011)/Birthday Problem|Birthday Problem]]
#* [[随机算法 (Fall 2011)/Coupon Collector|Coupon Collector]]
#* [[随机算法 (Fall 2011)/Balls-into-balls Occupancy Problem|Balls-into-balls Occupancy Problem]]
#* [[随机算法 (Fall 2011)/Bloom Filter|Bloom Filter]]
#* [[随机算法 (Fall 2011)/Stable Marriage|Stable Marriage]]
# Moment and Deviation [ftp://tcs.nju.edu.cn/slides/random2011/random4.pdf slides]
#* [[随机算法 (Fall 2011)/Markov's Inequality|Markov's Inequality]]
#* [[随机算法 (Fall 2011)/Chebyshev's Inequality|Chebyshev's Inequality]]
#* [[随机算法 (Fall 2011)/Median Selection|Median Selection]]
#* [[随机算法 (Fall 2011)/Random Graphs|Random Graphs]]
# Chernoff Bound [ftp://tcs.nju.edu.cn/slides/random2011/random5.pdf slides]
#* [[随机算法 (Fall 2011)/Chernoff Bound|Chernoff Bound]]
#* [[随机算法 (Fall 2011)/Set Balancing|Set Balancing]]
#* [[随机算法 (Fall 2011)/Routing in a Parallel Network|Routing in a Parallel Network]]
# Concentration of Measure [ftp://tcs.nju.edu.cn/slides/random2011/random6.pdf slides1]| [ftp://tcs.nju.edu.cn/slides/random2011/random7.pdf slides2]
#* [[随机算法 (Fall 2011)/Martingales|Martingales]]
#* [[随机算法 (Fall 2011)/Azuma's Inequality|Azuma's Inequality]]
#* [[随机算法 (Fall 2011)/The Method of Bounded Differences|The Method of Bounded Differences]]
# The Probabilistic Method  [ftp://tcs.nju.edu.cn/slides/random2011/random8.pdf slides]
#* [[随机算法 (Fall 2011)/Johnson-Lindenstrauss Theorem|Johnson-Lindenstrauss Theorem]]
#* [[随机算法 (Fall 2011)/Max-SAT|Max-SAT]]
#* [[随机算法 (Fall 2011)/Linear Programming|Linear Programming]]
#* [[随机算法 (Fall 2011)/Lovász Local Lemma|Lovász Local Lemma]]
# Fingerprinting and sketching
#* [[随机算法 (Fall 2011)/Identity checking|Identity checking]]
#* [[随机算法 (Fall 2011)/Checking distinctness|Checking distinctness]]
#* [[随机算法 (Fall 2011)/Data streams|Data streams]]
# Fancy hash tables
#* [[随机算法 (Fall 2011)/Universal hashing|Universal hashing]]
#* [[随机算法 (Fall 2011)/Cuckoo hashing|Cuckoo hashing]]
#* [[随机算法 (Fall 2011)/Locality sensitive hashing|Locality sensitive hashing]]
# Random Walk Algorithms
#* [[随机算法 (Fall 2011)/Randomized 2SAT|Randomized 2SAT]]
#* [[随机算法 (Fall 2011)/Randomized 3SAT|Randomized 3SAT]]
#* [[随机算法 (Fall 2011)/Perfect Matching in Regular Bipartite Graph|Perfect Matching in Regular Bipartite Graph]]
#* [[随机算法 (Fall 2011)/The Metropolis Algorithm|The Metropolis Algorithm]]
#* [[随机算法 (Fall 2011)/Dynamics on Spins|Dynamics on Spins]]
# Markov Chain and Random Walk
#* [[随机算法 (Fall 2011)/Markov Chains|Markov Chains]]
#* [[随机算法 (Fall 2011)/Random Walks on Undirected Graphs|Random Walks on Undirected Graphs]]
#* [[随机算法 (Fall 2011)/Electrical Network|Electrical Network]]
#* [[随机算法 (Fall 2011)/Cover Time|Cover Time]]
#* [[随机算法 (Fall 2011)/Graph Connectivity|Graph Connectivity]]
Coupling and Mixing Time
#* [[随机算法 (Fall 2011)/Mixing Time|Mixing Time]]
#* [[随机算法 (Fall 2011)/Coupling|Coupling]]
#* [[随机算法 (Fall 2011)/Card Shuffling|Card Shuffling]]
#* [[随机算法 (Fall 2011)/Path Coupling|Path Coupling]]
#* [[随机算法 (Fall 2011)/Graph Coloring|Graph Coloring]]
# Expander Graphs I
#* [[随机算法 (Fall 2011)/Expander Graphs|Expander Graphs]]
#* [[随机算法 (Fall 2011)/Graph Spectrum|Graph Spectrum]]
#* [[随机算法 (Fall 2011)/The Spectral Gap|The Spectral Gap]]
#* [[随机算法 (Fall 2011)/Random Walk on Expander Graph|Random Walk on Expander Graph]]
# Expander Graphs II
#* [[随机算法 (Fall 2011)/Expander Mixing Lemma|Expander Mixing Lemma]]
#* [[随机算法 (Fall 2011)/Chernoff Bound for Expander Walks|Chernoff Bound for Expander Walks]]
#* [[随机算法 (Fall 2011)/The Zig-Zag Product|The Zig-Zag Product]]
#* [[随机算法 (Fall 2011)/USTCON in LOGSPACE|USTCON in LOGSPACE]]
# Sampling and Counting
#* [[随机算法 (Fall 2011)/The #P Class and Approximation|The #P Class and Approximation]]
#* [[随机算法 (Fall 2011)/DNF Counting|DNF Counting]]
#* [[随机算法 (Fall 2011)/Canonical Paths|Canonical Paths]]
#* [[随机算法 (Fall 2011)/Count Matchings|Count Matchings]]
#* [[随机算法 (Fall 2011)/Sampling and Counting|Sampling and Counting]]
# Markov Chain Monte Carlo (MCMC)
#* [[随机算法 (Fall 2011)/Spin Systems|Spin Systems]]
#* [[随机算法 (Fall 2011)/Simulated Annealing|Simulated Annealing]]
#* [[随机算法 (Fall 2011)/Volume Estimation|Volume Estimation]]
# Complexity


= The Probability Theory Toolkit =
Note that in probability theory, the "mutual independence" is <font color="red">not</font> equivalent with "pair-wise independence", which we will learn in the future.
* [http://en.wikipedia.org/wiki/Expected_value#Linearity Linearity of expectation]
* [http://en.wikipedia.org/wiki/Independence_(probability_theory)#Independent_events Independent events] and [http://en.wikipedia.org/wiki/Conditional_independence conditional independence]
* [http://en.wikipedia.org/wiki/Conditional_probability Conditional probability] and [http://en.wikipedia.org/wiki/Conditional_expectation conditional expectation]
* The [http://en.wikipedia.org/wiki/Law_of_total_probability law of total probability] and the [http://en.wikipedia.org/wiki/Law_of_total_expectation law of total expectation]
* The [http://en.wikipedia.org/wiki/Boole's_inequality union bound]
* [http://en.wikipedia.org/wiki/Bernoulli_trial Bernoulli trials]
* [http://en.wikipedia.org/wiki/Geometric_distribution Geometric distribution]
* [http://en.wikipedia.org/wiki/Binomial_distribution Binomial distribution]
* [http://en.wikipedia.org/wiki/Markov's_inequality Markov's inequality]
* [http://en.wikipedia.org/wiki/Chebyshev's_inequality Chebyshev's inequality]
* [http://en.wikipedia.org/wiki/Chernoff_bound Chernoff bound]
* [http://en.wikipedia.org/wiki/Pairwise_independence k-wise independence]
* [http://en.wikipedia.org/wiki/Martingale_(probability_theory) Martingale]
* [http://en.wikipedia.org/wiki/Azuma's_inequality Azuma's inequality] and [http://en.wikipedia.org/wiki/Hoeffding's_inequality Hoeffding's inequality]
* [http://en.wikipedia.org/wiki/Doob_martingale Doob martingale]
* The [http://en.wikipedia.org/wiki/Probabilistic_method  probabilistic method]
* The [http://en.wikipedia.org/wiki/Lov%C3%A1sz_local_lemma  Lovász local lemma]  and the [http://en.wikipedia.org/wiki/Algorithmic_Lov%C3%A1sz_local_lemma algorithmic Lovász local lemma]
* [http://en.wikipedia.org/wiki/Markov_chain Markov chain]:
::[http://en.wikipedia.org/wiki/Markov_chain#Reducibility reducibility], [http://en.wikipedia.org/wiki/Markov_chain#Periodicity Periodicity], [http://en.wikipedia.org/wiki/Markov_chain#Steady-state_analysis_and_limiting_distributions stationary distribution], [http://en.wikipedia.org/wiki/Hitting_time hitting time], cover time;
::[http://en.wikipedia.org/wiki/Markov_chain_mixing_time mixing time], [http://en.wikipedia.org/wiki/Conductance_(probability) conductance]

Revision as of 15:23, 22 July 2011

Axioms of Probability

The axiom foundation of probability theory is laid by Kolmogorov, one of the greatest mathematician of the 20th century, who advanced various very different fields of mathematics.

Definition (Probability Space)

A probability space is a triple [math]\displaystyle{ (\Omega,\Sigma,\Pr) }[/math].

  • [math]\displaystyle{ \Omega }[/math] is a set, called the sample space.
  • [math]\displaystyle{ \Sigma\subseteq 2^{\Omega} }[/math] is the set of all events, satisfying:
    (A1). [math]\displaystyle{ \Omega\in\Sigma }[/math] and [math]\displaystyle{ \empty\in\Sigma }[/math]. (The certain event and the impossible event.)
    (A2). If [math]\displaystyle{ A,B\in\Sigma }[/math], then [math]\displaystyle{ A\cap B, A\cup B, A-B\in\Sigma }[/math]. (Intersection, union, and diference of two events are events).
  • A probability measure [math]\displaystyle{ \Pr:\Sigma\rightarrow\mathbb{R} }[/math] is a function that maps each event to a nonnegative real number, satisfying
    (A3). [math]\displaystyle{ \Pr(\Omega)=1 }[/math].
    (A4). If [math]\displaystyle{ A\cap B=\emptyset }[/math] (such events are call disjoint events), then [math]\displaystyle{ \Pr(A\cup B)=\Pr(A)+\Pr(B) }[/math].
    (A5*). For a decreasing sequence of events [math]\displaystyle{ A_1\supset A_2\supset \cdots\supset A_n\supset\cdots }[/math] of events with [math]\displaystyle{ \bigcap_n A_n=\emptyset }[/math], it holds that [math]\displaystyle{ \lim_{n\rightarrow \infty}\Pr(A_n)=0 }[/math].

The sample space [math]\displaystyle{ \Omega }[/math] is the set of all possible outcomes of the random process modeled by the probability space. An event is a subset of [math]\displaystyle{ \Omega }[/math]. The statements (A1)--(A5) are axioms of probability. A probability space is well defined as long as these axioms are satisfied.

Example
Consider the probability space defined by rolling a dice with six faces. The sample space is [math]\displaystyle{ \Omega=\{1,2,3,4,5,6\} }[/math], and [math]\displaystyle{ \Sigma }[/math] is the power set [math]\displaystyle{ 2^{\Omega} }[/math]. For any event [math]\displaystyle{ A\in\Sigma }[/math], its probability is given by [math]\displaystyle{ \Pr(A)=\frac{|A|}{6} }[/math].
Remark
  • In general, the set [math]\displaystyle{ \Omega }[/math] may be continuous, but we only consider discrete probability in this lecture, thus we assume that [math]\displaystyle{ \Omega }[/math] is either finite or countably infinite.
  • In many cases (such as the above example), [math]\displaystyle{ \Sigma=2^{\Omega} }[/math], i.e. the events enumerates all subsets of [math]\displaystyle{ \Omega }[/math]. But in general, a probability space is well-defined by any [math]\displaystyle{ \Sigma }[/math] satisfying (A1) and (A2). Such [math]\displaystyle{ \Sigma }[/math] is called a [math]\displaystyle{ \sigma }[/math]-algebra defined on [math]\displaystyle{ \Omega }[/math].
  • The last axiom (A5*) is redundant if [math]\displaystyle{ \Sigma }[/math] is finite, thus it is only essential when there are infinitely many events. The role of axiom (A5*) in probability theory is like Zorn's Lemma (or equivalently the Axiom of Choice) in axiomatic set theory.

Laws for probability can be deduced from the above axiom system. Denote that [math]\displaystyle{ \bar{A}=\Omega-A }[/math].

Proposition
[math]\displaystyle{ \Pr(\bar{A})=1-\Pr(A) }[/math].
Proof.

Due to Axiom (A4), [math]\displaystyle{ \Pr(\bar{A})+\Pr(A)=\Pr(\Omega) }[/math] which is equal to 1 according to Axiom (A3), thus [math]\displaystyle{ \Pr(\bar{A})+\Pr(A)=1 }[/math]. The proposition follows.

[math]\displaystyle{ \square }[/math]

Exercise: Deduce other useful laws for probability from the axioms. For example, [math]\displaystyle{ A\subseteq B\Longrightarrow\Pr(A)\le\Pr(B) }[/math].

Notation

An event [math]\displaystyle{ A\subseteq\Omega }[/math] can be represented as [math]\displaystyle{ A=\{a\in\Omega\mid \mathcal{E}(a)\} }[/math] with a predicate [math]\displaystyle{ \mathcal{E} }[/math].

The predicate notation of probability is

[math]\displaystyle{ \Pr[\mathcal{E}]=\Pr(\{a\in\Omega\mid \mathcal{E}(a)\}) }[/math].
Example
We still consider the probability space by rolling a six-face dice. The sample space is [math]\displaystyle{ \Omega=\{1,2,3,4,5,6\} }[/math]. Consider the event that the outcome is odd.
[math]\displaystyle{ \Pr[\text{ the outcome is odd }]=\Pr(\{1,3,5\}) }[/math].

During the lecture, we mostly use the predicate notation instead of subset notation.

The Union Bound

We are familiar with the principle of inclusion-exclusion for finite sets.

Principle of Inclusion-Exclusion
Let [math]\displaystyle{ S_1, S_2, \ldots, S_n }[/math] be [math]\displaystyle{ n }[/math] finite sets. Then
[math]\displaystyle{ \begin{align} \left|\bigcup_{1\le i\le n}S_i\right| &= \sum_{i=1}^n|S_i| -\sum_{i\lt j}|S_i\cap S_j| +\sum_{i\lt j\lt k}|S_i\cap S_j\cap S_k|\\ & \quad -\cdots +(-1)^{\ell-1}\sum_{i_1\lt i_2\lt \cdots\lt i_\ell}\left|\bigcap_{r=1}^\ell S_{i_r}\right| +\cdots +(-1)^{n-1} \left|\bigcap_{i=1}^n S_i\right|. \end{align} }[/math]

The principle can be generalized to probability events.

Principle of Inclusion-Exclusion for Probability
Let [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] be [math]\displaystyle{ n }[/math] events. Then
[math]\displaystyle{ \begin{align} \Pr\left[\bigvee_{1\le i\le n}\mathcal{E}_i\right] &= \sum_{i=1}^n\Pr[\mathcal{E}_i] -\sum_{i\lt j}\Pr[\mathcal{E}_i\wedge \mathcal{E}_j] +\sum_{i\lt j\lt k}\Pr[\mathcal{E}_i\wedge \mathcal{E}_j\wedge \mathcal{E}_k]\\ & \quad -\cdots +(-1)^{\ell-1}\sum_{i_1\lt i_2\lt \cdots\lt i_\ell}\Pr\left[\bigwedge_{r=1}^\ell \mathcal{E}_{i_r}\right] +\cdots +(-1)^{n-1}\Pr\left[\bigwedge_{i=1}^n \mathcal{E}_{i}\right]. \end{align} }[/math]

We only prove the basic case for two events.

Lemma
For any two events [math]\displaystyle{ \mathcal{E}_1 }[/math] and [math]\displaystyle{ \mathcal{E}_2 }[/math],
[math]\displaystyle{ \Pr[\mathcal{E}_1\vee\mathcal{E}_2]=\Pr[\mathcal{E}_1]+\Pr[\mathcal{E}_2]-\Pr[\mathcal{E}_1\wedge\mathcal{E}_2] }[/math].
Proof.
The followings are due to Axiom (A4).
[math]\displaystyle{ \begin{align} \Pr[\mathcal{E}_1] &=\Pr[\mathcal{E}_1\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_1\wedge\mathcal{E}_2];\\ \Pr[\mathcal{E}_2] &=\Pr[\mathcal{E}_2\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_1\wedge\mathcal{E}_2];\\ \Pr[\mathcal{E}_1\vee\mathcal{E}_2] &=\Pr[\mathcal{E}_1\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_2\wedge\neg(\mathcal{E}_1\wedge\mathcal{E}_2)]+\Pr[\mathcal{E}_1\wedge\mathcal{E}_2]. \end{align} }[/math]

The lemma follows directly.

[math]\displaystyle{ \square }[/math]

A direct consequence of the lemma is the following theorem, the union bound.

Theorem (Union Bound)
Let [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] be [math]\displaystyle{ n }[/math] events. Then
[math]\displaystyle{ \begin{align} \Pr\left[\bigvee_{1\le i\le n}\mathcal{E}_i\right] &\le \sum_{i=1}^n\Pr[\mathcal{E}_i]. \end{align} }[/math]

The name of this inequality is Boole's inequality. It is usually referred by its nickname the "union bound". The bound holds for arbitrary events, even if they are dependent. Due to this generality, the union bound is extremely useful in probabilistic analysis.

Independence

Definition (Independent events)
Two events [math]\displaystyle{ \mathcal{E}_1 }[/math] and [math]\displaystyle{ \mathcal{E}_2 }[/math] are independent if and only if
[math]\displaystyle{ \begin{align} \Pr\left[\mathcal{E}_1 \wedge \mathcal{E}_2\right] &= \Pr[\mathcal{E}_1]\cdot\Pr[\mathcal{E}_2]. \end{align} }[/math]

This definition can be generalized to any number of events:

Definition (Independent events)
Events [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] are mutually independent if and only if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math],
[math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i\in I}\mathcal{E}_i\right] &= \prod_{i\in I}\Pr[\mathcal{E}_i]. \end{align} }[/math]

Note that in probability theory, the "mutual independence" is not equivalent with "pair-wise independence", which we will learn in the future.