# 高级算法 (Fall 2017)/Probability Basics

## Contents

# Probability Space

The axiom foundation of probability theory is laid by Kolmogorov, one of the greatest mathematician of the 20th century, who advanced various very different fields of mathematics.

**Definition (Probability Space)**A

**probability space**is a triple .- is a set, called the
**sample space**. - is the set of all
**events**, satisfying:- (K1). and . (Existence of the
*certain*event and the*impossible*event) - (K2). If , then . (Intersection, union, and difference of two events are events).

- (K1). and . (Existence of the
- A
**probability measure**is a function that maps each event to a nonnegative real number, satisfying- (K3). .
- (K4). For any
*disjoint*events and (which means ), it holds that . - (K5*). For a decreasing sequence of events of events with , it holds that .

- is a set, called the

- Remark

- In general, the set may be continuous, but we only consider
**discrete**probability in this lecture, thus we assume that is either finite or countably infinite. - Sometimes it is convenient to assume , i.e. the events enumerates all subsets of . But in general, a probability space is well-defined by any satisfying (K1) and (K2). Such is called a -algebra defined on .
- The last axiom (K5*) is redundant if is finite, thus it is only essential when there are infinitely many events. The role of axiom (K5*) in probability theory is like Zorn's Lemma (or equivalently the Axiom of Choice) in axiomatic set theory.

Useful laws for probability can be deduced from the *axioms* (K1)-(K5).

**Proposition**- Let . It holds that .
- If then .

**Proof.**- The events and are disjoint and . Due to Axiom (K4) and (K3), .
- The events and are disjoint and since . Due to Axiom (K4), , thus .

- Notation

An event can be represented as with a predicate .

The predicate notation of probability is

- .

We use the two notations interchangeably.

## Union bound

A very useful inequality in probability is the **Boole's inequality**, mostly known by its nickname **union bound**.

**Theorem (union bound)**- Let be events. Then

- Let be events. Then

**Proof.**Let and for , let . We have .

On the other hand, are disjoint, which implies by the axiom of probability space that

- .

Also note that for all , thus for all . The theorem follows.

The union bound is a special case of the **Boole-Bonferroni inequality**.

**Theorem (Boole-Bonferroni inequality)**- Let be events. For , define .

- Then for
in :**odd**- ;

- and for
in :**even**- .

The inequality follows from the well-known **inclusion-exclusion principle**, stated as follows, as well as the fact that the quantity is *unimodal* in .

**Principle of Inclusion-Exclusion**- Let be events. Then
- where .

- Let be events. Then

# Conditional Probability

In probability theory, the word "condition" is a verb. "Conditioning on the event ..." means that it is assumed that the event occurs.

**Definition (conditional probability)**- The
**conditional probability**that event occurs given that event occurs is

- The

The conditional probability is well-defined only if .

## Law of total probability

The following fact is known as the law of total probability. It computes the probability by averaging over all possible cases.

**Theorem (law of total probability)**- Let be
*mutually disjoint*events, and is the sample space. - Then for any event ,

- Let be

**Proof.**Since are mutually disjoint and , events are also mutually disjoint, and . Then the additivity of disjoint events, we have

The law of total probability provides us a standard tool for breaking a probability into sub-cases. Sometimes this will help the analysis.

## "The Chain Rule"

By the definition of conditional probability, . Thus, . This hints us that we can compute the probability of the AND of events by conditional probabilities. Formally, we have the following theorem:

**Theorem**- Let be any events. Then

- Let be any events. Then

**Proof.**It holds that . Thus, let and , then Recursively applying this equation to until there is only left, the theorem is proved.

# Random Variable

**Definition (random variable)**- A random variable on a sample space is a real-valued function . A random variable X is called a
**discrete**random variable if its range is finite or countably infinite.

- A random variable on a sample space is a real-valued function . A random variable X is called a

For a random variable and a real value , we write "" for the event , and denote the probability of the event by

- .

The independence can also be defined for variables:

**Definition (Independent variables)**- Two random variables and are
**independent**if and only if - for all values and . Random variables are
**mutually independent**if and only if, for any subset and any values , where ,

- Two random variables and are

Note that in probability theory, the "mutual independence" is not equivalent with "pair-wise independence", which we will learn in the future.

# Linearity of Expectation

Let be a discrete **random variable**. The expectation of is defined as follows.

**Definition (Expectation)**- The
**expectation**of a discrete random variable , denoted by , is given by - where the summation is over all values in the range of .

- The

Perhaps the most useful property of expectation is its **linearity**.

**Theorem (Linearity of Expectations)**- For any discrete random variables , and any real constants ,

- For any discrete random variables , and any real constants ,

**Proof.**By the definition of the expectations, it is easy to verify that (try to prove by yourself): for any discrete random variables and , and any real constant ,

- ;
- .

The theorem follows by induction.

The linearity of expectation gives an easy way to compute the expectation of a random variable if the variable can be written as a sum.

- Example
- Supposed that we have a biased coin that the probability of HEADs is . Flipping the coin for n times, what is the expectation of number of HEADs?
- It looks straightforward that it must be np, but how can we prove it? Surely we can apply the definition of expectation to compute the expectation with brute force. A more convenient way is by the linearity of expectations: Let indicate whether the -th flip is HEADs. Then , and the total number of HEADs after n flips is . Applying the linearity of expectation, the expected number of HEADs is:
- .

The real power of the linearity of expectations is that it does not require the random variables to be independent, thus can be applied to any set of random variables. For example:

However, do not exaggerate this power!

- For an arbitrary function (not necessarily linear), the equation does not hold generally.
- For variances, the equation does not hold without further assumption of the independence of and .

## Conditional Expectation

Conditional expectation can be accordingly defined:

**Definition (conditional expectation)**- For random variables and ,
- where the summation is taken over the range of .

- For random variables and ,

There is also a **law of total expectation**.

**Theorem (law of total expectation)**- Let and be two random variables. Then

- Let and be two random variables. Then

# -wise independence

Recall the definition of independence between events:

**Definition (Independent events)**- Events are
**mutually independent**if, for any subset ,

- Events are

Similarly, we can define independence between random variables:

**Definition (Independent variables)**- Random variables are
**mutually independent**if, for any subset and any values , where ,

- Random variables are

Mutual independence is an ideal condition of independence. The limited notion of independence is usually defined by the **k-wise independence**.

**Definition (k-wise Independenc)**- 1. Events are
**k-wise independent**if, for any subset with - 2. Random variables are
**k-wise independent**if, for any subset with and any values , where ,

- 1. Events are

A very common case is pairwise independence, i.e. the 2-wise independence.

**Definition (pairwise Independent random variables)**- Random variables are
**pairwise independent**if, for any where and any values

- Random variables are

Note that the definition of k-wise independence is hereditary:

- If are k-wise independent, then they are also -wise independent for any .
- If are NOT k-wise independent, then they cannot be -wise independent for any .

## Pairwise Independent Bits

Suppose we have mutually independent and uniform random bits . We are going to extract pairwise independent bits from these mutually independent bits.

Enumerate all the nonempty subsets of in some order. Let be the th subset. Let

where is the exclusive-or, whose truth table is as follows.

0 0 0 0 1 1 1 0 1 1 1 0

There are such , because there are nonempty subsets of . An equivalent definition of is

- .

Sometimes, is called the **parity** of the bits in .

We claim that are pairwise independent and uniform.

**Theorem**- For any and any ,
- For any that and any ,

- For any and any ,

The proof is left for your exercise.

Therefore, we extract exponentially many pairwise independent uniform random bits from a sequence of mutually independent uniform random bits.

Note that are not 3-wise independent. For example, consider the subsets and the corresponding random bits . Any two of would decide the value of the third one.