高级算法 (Fall 2016)/Probability Basics: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>Etone
No edit summary
imported>Etone
No edit summary
Line 178: Line 178:
\mathbf{E}[X]=\sum_{y}\mathbf{E}[X\mid Y=y]\cdot\Pr[Y=y].
\mathbf{E}[X]=\sum_{y}\mathbf{E}[X\mid Y=y]\cdot\Pr[Y=y].
</math>
</math>
}}
= <math>k</math>-wise  independence =
Recall the definition of independence between events:
{{Theorem
|Definition (Independent events)|
:Events <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> are '''mutually independent''' if, for any subset <math>I\subseteq\{1,2,\ldots,n\}</math>,
::<math>\begin{align}
\Pr\left[\bigwedge_{i\in I}\mathcal{E}_i\right]
&=
\prod_{i\in I}\Pr[\mathcal{E}_i].
\end{align}</math>
}}
Similarly, we can define independence between random variables:
{{Theorem
|Definition (Independent variables)|
:Random variables <math>X_1, X_2, \ldots, X_n</math> are '''mutually independent''' if, for any subset <math>I\subseteq\{1,2,\ldots,n\}</math> and any values <math>x_i</math>, where <math>i\in I</math>,
::<math>\begin{align}
\Pr\left[\bigwedge_{i\in I}(X_i=x_i)\right]
&=
\prod_{i\in I}\Pr[X_i=x_i].
\end{align}</math>
}}
Mutual independence is an ideal condition of independence. The limited notion of independence is usually defined by the '''k-wise independence'''.
{{Theorem
|Definition (k-wise Independenc)|
:1. Events <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math> are '''k-wise independent''' if, for any subset <math>I\subseteq\{1,2,\ldots,n\}</math>  with <math>|I|\le k</math>
:::<math>\begin{align}
\Pr\left[\bigwedge_{i\in I}\mathcal{E}_i\right]
&=
\prod_{i\in I}\Pr[\mathcal{E}_i].
\end{align}</math>
:2. Random variables <math>X_1, X_2, \ldots, X_n</math> are '''k-wise independent''' if, for any subset <math>I\subseteq\{1,2,\ldots,n\}</math> with <math>|I|\le k</math> and any values <math>x_i</math>, where <math>i\in I</math>,
:::<math>\begin{align}
\Pr\left[\bigwedge_{i\in I}(X_i=x_i)\right]
&=
\prod_{i\in I}\Pr[X_i=x_i].
\end{align}</math>
}}
A very common case is pairwise independence, i.e. the 2-wise independence.
{{Theorem
|Definition (pairwise Independent random variables)|
:Random variables <math>X_1, X_2, \ldots, X_n</math> are '''pairwise independent''' if, for any <math>X_i,X_j</math> where <math>i\neq j</math> and any values <math>a,b</math>
:::<math>\begin{align}
\Pr\left[X_i=a\wedge X_j=b\right]
&=
\Pr[X_i=a]\cdot\Pr[X_j=b].
\end{align}</math>
}}
Note that the definition of k-wise independence is hereditary:
* If <math>X_1, X_2, \ldots, X_n</math> are k-wise independent, then they are also <math>\ell</math>-wise independent for any <math>\ell<k</math>.
* If <math>X_1, X_2, \ldots, X_n</math> are NOT k-wise independent, then they cannot be <math>\ell</math>-wise independent for any <math>\ell>k</math>.
== Pairwise Independent Bits ==
Suppose we have <math>m</math> mutually independent and uniform random bits <math>X_1,\ldots, X_m</math>. We are going to extract <math>n=2^m-1</math> pairwise independent bits from these <math>m</math> mutually independent bits.
Enumerate all the nonempty subsets of <math>\{1,2,\ldots,m\}</math> in some order. Let <math>S_j</math>  be the <math>j</math>th subset. Let
:<math>
Y_j=\bigoplus_{i\in S_j} X_i,
</math>
where <math>\oplus</math> is the exclusive-or, whose truth table is as follows.
:{|cellpadding="4" border="1"
|-
|<math>a</math>
|<math>b</math>
|<math>a</math><math>\oplus</math><math>b</math>
|-
| 0 || 0 ||align="center"| 0
|-
| 0 || 1 ||align="center"| 1
|-
| 1 || 0 ||align="center"| 1
|-
| 1 || 1 ||align="center"| 0
|}
There are <math>n=2^m-1</math> such <math>Y_j</math>, because there are <math>2^m-1</math> nonempty subsets of <math>\{1,2,\ldots,m\}</math>. An equivalent definition of <math>Y_j</math> is
:<math>Y_j=\left(\sum_{i\in S_j}X_i\right)\bmod 2</math>.
Sometimes, <math>Y_j</math> is called the '''parity''' of the bits in <math>S_j</math>.
We claim that <math>Y_j</math> are pairwise independent and uniform.
{{Theorem
|Theorem|
:For any <math>Y_j</math> and any <math>b\in\{0,1\}</math>,
::<math>\begin{align}
\Pr\left[Y_j=b\right]
&=
\frac{1}{2}.
\end{align}</math>
:For any <math>Y_j,Y_\ell</math> that <math>j\neq\ell</math> and any <math>a,b\in\{0,1\}</math>,
::<math>\begin{align}
\Pr\left[Y_j=a\wedge Y_\ell=b\right]
&=
\frac{1}{4}.
\end{align}</math>
}}
The proof is left for your exercise.
Therefore, we extract exponentially many pairwise independent uniform random bits from a sequence of mutually independent uniform random bits.
Note that <math>Y_j</math> are not 3-wise independent. For example, consider the subsets <math>S_1=\{1\},S_2=\{2\},S_3=\{1,2\}</math> and the corresponding random bits <math>Y_1,Y_2,Y_3</math>. Any two of <math>Y_1,Y_2,Y_3</math> would decide the value of the third one.
==  Pairwise Independent Variables ==
We now consider constructing pairwise independent random variables ranging over <math>[p]=\{0,1,2,\ldots,p-1\}</math> for some prime <math>p</math>. Unlike the above construction, now we only need two independent random sources <math>X_0,X_1</math>, which are uniformly and independently distributed over <math>[p]</math>.
Let <math>Y_0,Y_1,\ldots, Y_{p-1}</math> be defined as:
:<math>
\begin{align}
Y_i=(X_0+i\cdot X_1)\bmod p &\quad \mbox{for }i\in[p].
\end{align}
</math>
{{Theorem
|Theorem|
: The random variables <math>Y_0,Y_1,\ldots, Y_{p-1}</math> are pairwise independent uniform random variables over <math>[p]</math>.
}}
{{Proof| We first show that <math>Y_i</math> are uniform. That is, we will show that for any <math>i,a\in[p]</math>,
:<math>\begin{align}
\Pr\left[(X_0+i\cdot X_1)\bmod p=a\right]
&=
\frac{1}{p}.
\end{align}</math>
Due to the law of total probability,
:<math>\begin{align}
\Pr\left[(X_0+i\cdot X_1)\bmod p=a\right]
&=
\sum_{j\in[p]}\Pr[X_1=j]\cdot\Pr\left[(X_0+ij)\bmod p=a\right]\\
&=\frac{1}{p}\sum_{j\in[p]}\Pr\left[X_0\equiv(a-ij)\pmod{p}\right].
\end{align}</math>
For prime <math>p</math>, for any <math>i,j,a\in[p]</math>, there is exact one value in <math>[p]</math> of <math>X_0</math> satisfying <math>X_0\equiv(a-ij)\pmod{p}</math>. Thus, <math>\Pr\left[X_0\equiv(a-ij)\pmod{p}\right]=1/p</math> and the above probability is <math>\frac{1}{p}</math>.
We then show that <math>Y_i</math> are pairwise independent, i.e. we will show that for any <math>Y_i,Y_j</math> that <math>i\neq j</math> and any <math>a,b\in[p]</math>,
:<math>\begin{align}
\Pr\left[Y_i=a\wedge Y_j=b\right]
&=
\frac{1}{p^2}.
\end{align}</math>
The event <math>Y_i=a\wedge Y_j=b</math> is equivalent to that
:<math>
\begin{cases}
(X_0+iX_1)\equiv a\pmod{p}\\
(X_0+jX_1)\equiv b\pmod{p}
\end{cases}
</math>
Due to the [http://en.wikipedia.org/wiki/Chinese_remainder_theorem Chinese remainder theorem], there exists a unique solution of <math>X_0</math> and <math>X_1</math> in <math>[p]</math> to the above linear congruential system. Thus the probability of the event is <math>\frac{1}{p^2}</math>.
}}
}}

Revision as of 09:08, 22 September 2016

Probability Space

The axiom foundation of probability theory is laid by Kolmogorov, one of the greatest mathematician of the 20th century, who advanced various very different fields of mathematics.

Definition (Probability Space)

A probability space is a triple [math]\displaystyle{ (\Omega,\Sigma,\Pr) }[/math].

  • [math]\displaystyle{ \Omega }[/math] is a set, called the sample space.
  • [math]\displaystyle{ \Sigma\subseteq 2^{\Omega} }[/math] is the set of all events, satisfying:
    (K1). [math]\displaystyle{ \Omega\in\Sigma }[/math] and [math]\displaystyle{ \empty\in\Sigma }[/math]. (The certain event and the impossible event.)
    (K2). If [math]\displaystyle{ A,B\in\Sigma }[/math], then [math]\displaystyle{ A\cap B, A\cup B, A-B\in\Sigma }[/math]. (Intersection, union, and diference of two events are events).
  • A probability measure [math]\displaystyle{ \Pr:\Sigma\rightarrow\mathbb{R} }[/math] is a function that maps each event to a nonnegative real number, satisfying
    (K3). [math]\displaystyle{ \Pr(\Omega)=1 }[/math].
    (K4). If [math]\displaystyle{ A\cap B=\emptyset }[/math] (such events are call disjoint events), then [math]\displaystyle{ \Pr(A\cup B)=\Pr(A)+\Pr(B) }[/math].
    (K5*). For a decreasing sequence of events [math]\displaystyle{ A_1\supset A_2\supset \cdots\supset A_n\supset\cdots }[/math] of events with [math]\displaystyle{ \bigcap_n A_n=\emptyset }[/math], it holds that [math]\displaystyle{ \lim_{n\rightarrow \infty}\Pr(A_n)=0 }[/math].
Remark
  • In general, the set [math]\displaystyle{ \Omega }[/math] may be continuous, but we only consider discrete probability in this lecture, thus we assume that [math]\displaystyle{ \Omega }[/math] is either finite or countably infinite.
  • Sometimes it is convenient to assume [math]\displaystyle{ \Sigma=2^{\Omega} }[/math], i.e. the events enumerates all subsets of [math]\displaystyle{ \Omega }[/math]. But in general, a probability space is well-defined by any [math]\displaystyle{ \Sigma }[/math] satisfying (K1) and (K2). Such [math]\displaystyle{ \Sigma }[/math] is called a [math]\displaystyle{ \sigma }[/math]-algebra defined on [math]\displaystyle{ \Omega }[/math].
  • The last axiom (K5*) is redundant if [math]\displaystyle{ \Sigma }[/math] is finite, thus it is only essential when there are infinitely many events. The role of axiom (K5*) in probability theory is like Zorn's Lemma (or equivalently the Axiom of Choice) in axiomatic set theory.

Useful laws for probability can be deduced from the axioms (K1)-(K5).

Proposition
  1. Let [math]\displaystyle{ \bar{A}=\Omega\setminus A }[/math]. It holds that [math]\displaystyle{ \Pr(\bar{A})=1-\Pr(A) }[/math].
  2. If [math]\displaystyle{ A\subseteq B }[/math] then [math]\displaystyle{ \Pr(A)\le\Pr(B) }[/math].
Proof.
  1. The events [math]\displaystyle{ \bar{A} }[/math] and [math]\displaystyle{ A }[/math] are disjoint and [math]\displaystyle{ \bar{A}\cup A=\Omega }[/math]. Due to Axiom (K4) and (K3), [math]\displaystyle{ \Pr(\bar{A})+\Pr(A)=\Pr(\Omega)=1 }[/math].
  2. The events [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B\setminus A }[/math] are disjoint and [math]\displaystyle{ A\cup(B\setminus A)=B }[/math] since [math]\displaystyle{ A\subseteq B }[/math]. Due to Axiom (K4), [math]\displaystyle{ \Pr(A)+\Pr(B\setminus A)=\Pr(B) }[/math], thus [math]\displaystyle{ \Pr(A)\le\Pr(B) }[/math].
[math]\displaystyle{ \square }[/math]
Notation

An event [math]\displaystyle{ A\subseteq\Omega }[/math] can be represented as [math]\displaystyle{ A=\{a\in\Omega\mid \mathcal{E}(a)\} }[/math] with a predicate [math]\displaystyle{ \mathcal{E} }[/math].

The predicate notation of probability is

[math]\displaystyle{ \Pr[\mathcal{E}]=\Pr(\{a\in\Omega\mid \mathcal{E}(a)\}) }[/math].

We will mostly use the predicate notation instead of subset notation.

Conditional Probability

In probability theory, the word "condition" is a verb. "Conditioning on the event ..." means that it is assumed that the event occurs.

Definition (conditional probability)
The conditional probability that event [math]\displaystyle{ A }[/math] occurs given that event [math]\displaystyle{ B }[/math] occurs is
[math]\displaystyle{ \Pr[A\mid B]=\frac{\Pr[A\wedge B]}{\Pr[B]}. }[/math]

The conditional probability is well-defined only if [math]\displaystyle{ \Pr[B]\neq0 }[/math].

For independent events [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math], it holds that

[math]\displaystyle{ \Pr[A\mid B] =\Pr[A]. }[/math]

It supports our intuition that for two independent events, whether one of them occurs will not affect the chance of the other.

Law of total probability

The following fact is known as the law of total probability. It computes the probability by averaging over all possible cases.

Theorem (law of total probability)
Let [math]\displaystyle{ B_1,B_2,\ldots,B_n }[/math] be mutually disjoint events, and [math]\displaystyle{ \bigcup_{i=1}^n B_i=\Omega }[/math] is the sample space.
Then for any event [math]\displaystyle{ A }[/math],
[math]\displaystyle{ \Pr[A]=\sum_{i=1}^n\Pr[A\wedge B_i]=\sum_{i=1}^n\Pr[A\mid B_i]\cdot\Pr[B_i]. }[/math]
Proof.
Since [math]\displaystyle{ B_1,B_2,\ldots, B_n }[/math] are mutually disjoint and [math]\displaystyle{ \bigvee_{i=1}^n B_i=\Omega }[/math], events [math]\displaystyle{ A\wedge B_1, A\wedge B_2,\ldots, A\wedge B_n }[/math] are also mutually disjoint, and [math]\displaystyle{ A=\bigcup_{i=1}^n\left(A\cap B_i\right) }[/math]. Then the additivity of disjoint events, we have
[math]\displaystyle{ \Pr[A]=\sum_{i=1}^n\Pr[A\wedge B_i]=\sum_{i=1}^n\Pr[A\mid B_i]\cdot\Pr[B_i]. }[/math]
[math]\displaystyle{ \square }[/math]

The law of total probability provides us a standard tool for breaking a probability into sub-cases. Sometimes this will help the analysis.

The chain rule

By the definition of conditional probability, [math]\displaystyle{ \Pr[A\mid B]=\frac{\Pr[A\wedge B]}{\Pr[B]} }[/math]. Thus, [math]\displaystyle{ \Pr[A\wedge B] =\Pr[B]\cdot\Pr[A\mid B] }[/math]. This hints us that we can compute the probability of the AND of events by conditional probabilities. Formally, we have the following theorem:

Theorem
Let [math]\displaystyle{ A_1, A_2, \ldots, A_n }[/math] be any [math]\displaystyle{ n }[/math] events. Then
[math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i=1}^n A_i\right] &= \prod_{k=1}^n\Pr\left[A_k \mid \bigwedge_{i\lt k} A_i\right]. \end{align} }[/math]
Proof.
It holds that [math]\displaystyle{ \Pr[A\wedge B] =\Pr[B]\cdot\Pr[A\mid B] }[/math]. Thus, let [math]\displaystyle{ A=A_n }[/math] and [math]\displaystyle{ B=A_1\wedge A_2\wedge\cdots\wedge A_{n-1} }[/math], then
[math]\displaystyle{ \begin{align} \Pr[A_1\wedge A_2\wedge\cdots\wedge A_n] &= \Pr[A_1\wedge A_2\wedge\cdots\wedge A_{n-1}]\cdot\Pr\left[A_n\mid \bigwedge_{i\lt n}A_i\right]. \end{align} }[/math]

Recursively applying this equation to [math]\displaystyle{ \Pr[A_1\wedge A_2\wedge\cdots\wedge A_{n-1}] }[/math] until there is only [math]\displaystyle{ A_1 }[/math] left, the theorem is proved.

[math]\displaystyle{ \square }[/math]

Random Variable

Definition (random variable)
A random variable [math]\displaystyle{ X }[/math] on a sample space [math]\displaystyle{ \Omega }[/math] is a real-valued function [math]\displaystyle{ X:\Omega\rightarrow\mathbb{R} }[/math]. A random variable X is called a discrete random variable if its range is finite or countably infinite.

For a random variable [math]\displaystyle{ X }[/math] and a real value [math]\displaystyle{ x\in\mathbb{R} }[/math], we write "[math]\displaystyle{ X=x }[/math]" for the event [math]\displaystyle{ \{a\in\Omega\mid X(a)=x\} }[/math], and denote the probability of the event by

[math]\displaystyle{ \Pr[X=x]=\Pr(\{a\in\Omega\mid X(a)=x\}) }[/math].

The independence can also be defined for variables:

Definition (Independent variables)
Two random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] are independent if and only if
[math]\displaystyle{ \Pr[(X=x)\wedge(Y=y)]=\Pr[X=x]\cdot\Pr[Y=y] }[/math]
for all values [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math]. Random variables [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are mutually independent if and only if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math] and any values [math]\displaystyle{ x_i }[/math], where [math]\displaystyle{ i\in I }[/math],
[math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i\in I}(X_i=x_i)\right] &= \prod_{i\in I}\Pr[X_i=x_i]. \end{align} }[/math]

Note that in probability theory, the "mutual independence" is not equivalent with "pair-wise independence", which we will learn in the future.

Linearity of Expectation

Let [math]\displaystyle{ X }[/math] be a discrete random variable. The expectation of [math]\displaystyle{ X }[/math] is defined as follows.

Definition (Expectation)
The expectation of a discrete random variable [math]\displaystyle{ X }[/math], denoted by [math]\displaystyle{ \mathbf{E}[X] }[/math], is given by
[math]\displaystyle{ \begin{align} \mathbf{E}[X] &= \sum_{x}x\Pr[X=x], \end{align} }[/math]
where the summation is over all values [math]\displaystyle{ x }[/math] in the range of [math]\displaystyle{ X }[/math].

Perhaps the most useful property of expectation is its linearity.

Theorem (Linearity of Expectations)
For any discrete random variables [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math], and any real constants [math]\displaystyle{ a_1, a_2, \ldots, a_n }[/math],
[math]\displaystyle{ \begin{align} \mathbf{E}\left[\sum_{i=1}^n a_iX_i\right] &= \sum_{i=1}^n a_i\cdot\mathbf{E}[X_i]. \end{align} }[/math]
Proof.
By the definition of the expectations, it is easy to verify that (try to prove by yourself):

for any discrete random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math], and any real constant [math]\displaystyle{ c }[/math],

  • [math]\displaystyle{ \mathbf{E}[X+Y]=\mathbf{E}[X]+\mathbf{E}[Y] }[/math];
  • [math]\displaystyle{ \mathbf{E}[cX]=c\mathbf{E}[X] }[/math].

The theorem follows by induction.

[math]\displaystyle{ \square }[/math]

The linearity of expectation gives an easy way to compute the expectation of a random variable if the variable can be written as a sum.

Example
Supposed that we have a biased coin that the probability of HEADs is [math]\displaystyle{ p }[/math]. Flipping the coin for n times, what is the expectation of number of HEADs?
It looks straightforward that it must be np, but how can we prove it? Surely we can apply the definition of expectation to compute the expectation with brute force. A more convenient way is by the linearity of expectations: Let [math]\displaystyle{ X_i }[/math] indicate whether the [math]\displaystyle{ i }[/math]-th flip is HEADs. Then [math]\displaystyle{ \mathbf{E}[X_i]=1\cdot p+0\cdot(1-p)=p }[/math], and the total number of HEADs after n flips is [math]\displaystyle{ X=\sum_{i=1}^{n}X_i }[/math]. Applying the linearity of expectation, the expected number of HEADs is:
[math]\displaystyle{ \mathbf{E}[X]=\mathbf{E}\left[\sum_{i=1}^{n}X_i\right]=\sum_{i=1}^{n}\mathbf{E}[X_i]=np }[/math].

The real power of the linearity of expectations is that it does not require the random variables to be independent, thus can be applied to any set of random variables. For example:

[math]\displaystyle{ \mathbf{E}\left[\alpha X+\beta X^2+\gamma X^3\right] = \alpha\cdot\mathbf{E}[X]+\beta\cdot\mathbf{E}\left[X^2\right]+\gamma\cdot\mathbf{E}\left[X^3\right]. }[/math]

However, do not exaggerate this power!

  • For an arbitrary function [math]\displaystyle{ f }[/math] (not necessarily linear), the equation [math]\displaystyle{ \mathbf{E}[f(X)]=f(\mathbf{E}[X]) }[/math] does not hold generally.
  • For variances, the equation [math]\displaystyle{ var(X+Y)=var(X)+var(Y) }[/math] does not hold without further assumption of the independence of [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math].

Conditional Expectation

Conditional expectation can be accordingly defined:

Definition (conditional expectation)
For random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math],
[math]\displaystyle{ \mathbf{E}[X\mid Y=y]=\sum_{x}x\Pr[X=x\mid Y=y], }[/math]
where the summation is taken over the range of [math]\displaystyle{ X }[/math].

There is also a law of total expectation.

Theorem (law of total expectation)
Let [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] be two random variables. Then
[math]\displaystyle{ \mathbf{E}[X]=\sum_{y}\mathbf{E}[X\mid Y=y]\cdot\Pr[Y=y]. }[/math]

[math]\displaystyle{ k }[/math]-wise independence

Recall the definition of independence between events:

Definition (Independent events)
Events [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] are mutually independent if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math],
[math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i\in I}\mathcal{E}_i\right] &= \prod_{i\in I}\Pr[\mathcal{E}_i]. \end{align} }[/math]

Similarly, we can define independence between random variables:

Definition (Independent variables)
Random variables [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are mutually independent if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math] and any values [math]\displaystyle{ x_i }[/math], where [math]\displaystyle{ i\in I }[/math],
[math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i\in I}(X_i=x_i)\right] &= \prod_{i\in I}\Pr[X_i=x_i]. \end{align} }[/math]

Mutual independence is an ideal condition of independence. The limited notion of independence is usually defined by the k-wise independence.

Definition (k-wise Independenc)
1. Events [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] are k-wise independent if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math] with [math]\displaystyle{ |I|\le k }[/math]
[math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i\in I}\mathcal{E}_i\right] &= \prod_{i\in I}\Pr[\mathcal{E}_i]. \end{align} }[/math]
2. Random variables [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are k-wise independent if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math] with [math]\displaystyle{ |I|\le k }[/math] and any values [math]\displaystyle{ x_i }[/math], where [math]\displaystyle{ i\in I }[/math],
[math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i\in I}(X_i=x_i)\right] &= \prod_{i\in I}\Pr[X_i=x_i]. \end{align} }[/math]

A very common case is pairwise independence, i.e. the 2-wise independence.

Definition (pairwise Independent random variables)
Random variables [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are pairwise independent if, for any [math]\displaystyle{ X_i,X_j }[/math] where [math]\displaystyle{ i\neq j }[/math] and any values [math]\displaystyle{ a,b }[/math]
[math]\displaystyle{ \begin{align} \Pr\left[X_i=a\wedge X_j=b\right] &= \Pr[X_i=a]\cdot\Pr[X_j=b]. \end{align} }[/math]

Note that the definition of k-wise independence is hereditary:

  • If [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are k-wise independent, then they are also [math]\displaystyle{ \ell }[/math]-wise independent for any [math]\displaystyle{ \ell\lt k }[/math].
  • If [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are NOT k-wise independent, then they cannot be [math]\displaystyle{ \ell }[/math]-wise independent for any [math]\displaystyle{ \ell\gt k }[/math].

Pairwise Independent Bits

Suppose we have [math]\displaystyle{ m }[/math] mutually independent and uniform random bits [math]\displaystyle{ X_1,\ldots, X_m }[/math]. We are going to extract [math]\displaystyle{ n=2^m-1 }[/math] pairwise independent bits from these [math]\displaystyle{ m }[/math] mutually independent bits.

Enumerate all the nonempty subsets of [math]\displaystyle{ \{1,2,\ldots,m\} }[/math] in some order. Let [math]\displaystyle{ S_j }[/math] be the [math]\displaystyle{ j }[/math]th subset. Let

[math]\displaystyle{ Y_j=\bigoplus_{i\in S_j} X_i, }[/math]

where [math]\displaystyle{ \oplus }[/math] is the exclusive-or, whose truth table is as follows.

[math]\displaystyle{ a }[/math] [math]\displaystyle{ b }[/math] [math]\displaystyle{ a }[/math][math]\displaystyle{ \oplus }[/math][math]\displaystyle{ b }[/math]
0 0 0
0 1 1
1 0 1
1 1 0

There are [math]\displaystyle{ n=2^m-1 }[/math] such [math]\displaystyle{ Y_j }[/math], because there are [math]\displaystyle{ 2^m-1 }[/math] nonempty subsets of [math]\displaystyle{ \{1,2,\ldots,m\} }[/math]. An equivalent definition of [math]\displaystyle{ Y_j }[/math] is

[math]\displaystyle{ Y_j=\left(\sum_{i\in S_j}X_i\right)\bmod 2 }[/math].

Sometimes, [math]\displaystyle{ Y_j }[/math] is called the parity of the bits in [math]\displaystyle{ S_j }[/math].

We claim that [math]\displaystyle{ Y_j }[/math] are pairwise independent and uniform.

Theorem
For any [math]\displaystyle{ Y_j }[/math] and any [math]\displaystyle{ b\in\{0,1\} }[/math],
[math]\displaystyle{ \begin{align} \Pr\left[Y_j=b\right] &= \frac{1}{2}. \end{align} }[/math]
For any [math]\displaystyle{ Y_j,Y_\ell }[/math] that [math]\displaystyle{ j\neq\ell }[/math] and any [math]\displaystyle{ a,b\in\{0,1\} }[/math],
[math]\displaystyle{ \begin{align} \Pr\left[Y_j=a\wedge Y_\ell=b\right] &= \frac{1}{4}. \end{align} }[/math]

The proof is left for your exercise.

Therefore, we extract exponentially many pairwise independent uniform random bits from a sequence of mutually independent uniform random bits.

Note that [math]\displaystyle{ Y_j }[/math] are not 3-wise independent. For example, consider the subsets [math]\displaystyle{ S_1=\{1\},S_2=\{2\},S_3=\{1,2\} }[/math] and the corresponding random bits [math]\displaystyle{ Y_1,Y_2,Y_3 }[/math]. Any two of [math]\displaystyle{ Y_1,Y_2,Y_3 }[/math] would decide the value of the third one.

Pairwise Independent Variables

We now consider constructing pairwise independent random variables ranging over [math]\displaystyle{ [p]=\{0,1,2,\ldots,p-1\} }[/math] for some prime [math]\displaystyle{ p }[/math]. Unlike the above construction, now we only need two independent random sources [math]\displaystyle{ X_0,X_1 }[/math], which are uniformly and independently distributed over [math]\displaystyle{ [p] }[/math].

Let [math]\displaystyle{ Y_0,Y_1,\ldots, Y_{p-1} }[/math] be defined as:

[math]\displaystyle{ \begin{align} Y_i=(X_0+i\cdot X_1)\bmod p &\quad \mbox{for }i\in[p]. \end{align} }[/math]
Theorem
The random variables [math]\displaystyle{ Y_0,Y_1,\ldots, Y_{p-1} }[/math] are pairwise independent uniform random variables over [math]\displaystyle{ [p] }[/math].
Proof.
We first show that [math]\displaystyle{ Y_i }[/math] are uniform. That is, we will show that for any [math]\displaystyle{ i,a\in[p] }[/math],
[math]\displaystyle{ \begin{align} \Pr\left[(X_0+i\cdot X_1)\bmod p=a\right] &= \frac{1}{p}. \end{align} }[/math]

Due to the law of total probability,

[math]\displaystyle{ \begin{align} \Pr\left[(X_0+i\cdot X_1)\bmod p=a\right] &= \sum_{j\in[p]}\Pr[X_1=j]\cdot\Pr\left[(X_0+ij)\bmod p=a\right]\\ &=\frac{1}{p}\sum_{j\in[p]}\Pr\left[X_0\equiv(a-ij)\pmod{p}\right]. \end{align} }[/math]

For prime [math]\displaystyle{ p }[/math], for any [math]\displaystyle{ i,j,a\in[p] }[/math], there is exact one value in [math]\displaystyle{ [p] }[/math] of [math]\displaystyle{ X_0 }[/math] satisfying [math]\displaystyle{ X_0\equiv(a-ij)\pmod{p} }[/math]. Thus, [math]\displaystyle{ \Pr\left[X_0\equiv(a-ij)\pmod{p}\right]=1/p }[/math] and the above probability is [math]\displaystyle{ \frac{1}{p} }[/math].

We then show that [math]\displaystyle{ Y_i }[/math] are pairwise independent, i.e. we will show that for any [math]\displaystyle{ Y_i,Y_j }[/math] that [math]\displaystyle{ i\neq j }[/math] and any [math]\displaystyle{ a,b\in[p] }[/math],

[math]\displaystyle{ \begin{align} \Pr\left[Y_i=a\wedge Y_j=b\right] &= \frac{1}{p^2}. \end{align} }[/math]

The event [math]\displaystyle{ Y_i=a\wedge Y_j=b }[/math] is equivalent to that

[math]\displaystyle{ \begin{cases} (X_0+iX_1)\equiv a\pmod{p}\\ (X_0+jX_1)\equiv b\pmod{p} \end{cases} }[/math]

Due to the Chinese remainder theorem, there exists a unique solution of [math]\displaystyle{ X_0 }[/math] and [math]\displaystyle{ X_1 }[/math] in [math]\displaystyle{ [p] }[/math] to the above linear congruential system. Thus the probability of the event is [math]\displaystyle{ \frac{1}{p^2} }[/math].

[math]\displaystyle{ \square }[/math]