Revision as of 06:40, 21 November 2011

k-wise independence

Recall the definition of independence between events:

Definition (Independent events)

Events [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] are mutually independent if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math],

[math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i\in I}\mathcal{E}_i\right] &= \prod_{i\in I}\Pr[\mathcal{E}_i]. \end{align} }[/math]

Similarly, we can define independence between random variables:

Definition (Independent variables)

Random variables [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are mutually independent if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math] and any values [math]\displaystyle{ x_i }[/math], where [math]\displaystyle{ i\in I }[/math],

[math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i\in I}(X_i=x_i)\right] &= \prod_{i\in I}\Pr[X_i=x_i]. \end{align} }[/math]

Mutual independence is an ideal condition of independence. The limited notion of independence is usually defined by the k-wise independence.

Definition (k-wise Independenc)

1. Events [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] are k-wise independent if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math] with [math]\displaystyle{ |I|\le k }[/math]

[math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i\in I}\mathcal{E}_i\right] &= \prod_{i\in I}\Pr[\mathcal{E}_i]. \end{align} }[/math]

2. Random variables [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are k-wise independent if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math] with [math]\displaystyle{ |I|\le k }[/math] and any values [math]\displaystyle{ x_i }[/math], where [math]\displaystyle{ i\in I }[/math],

[math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i\in I}(X_i=x_i)\right] &= \prod_{i\in I}\Pr[X_i=x_i]. \end{align} }[/math]

A very common case is pairwise independence, i.e. the 2-wise independence.

Definition (pairwise Independent random variables)

Random variables [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are pairwise independent if, for any [math]\displaystyle{ X_i,X_j }[/math] where [math]\displaystyle{ i\neq j }[/math] and any values [math]\displaystyle{ a,b }[/math]

[math]\displaystyle{ \begin{align} \Pr\left[X_i=a\wedge X_j=b\right] &= \Pr[X_i=a]\cdot\Pr[X_j=b]. \end{align} }[/math]

Note that the definition of k-wise independence is hereditary:

If [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are k-wise independent, then they are also [math]\displaystyle{ \ell }[/math]-wise independent for any [math]\displaystyle{ \ell\lt k }[/math].
If [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are NOT k-wise independent, then they cannot be [math]\displaystyle{ \ell }[/math]-wise independent for any [math]\displaystyle{ \ell\gt k }[/math].

Construction via XOR

Suppose we have [math]\displaystyle{ m }[/math] mutually independent and uniform random bits [math]\displaystyle{ X_1,\ldots, X_m }[/math]. We are going to extract [math]\displaystyle{ n=2^m-1 }[/math] pairwise independent bits from these [math]\displaystyle{ m }[/math] mutually independent bits.

Enumerate all the nonempty subsets of [math]\displaystyle{ \{1,2,\ldots,m\} }[/math] in some order. Let [math]\displaystyle{ S_j }[/math] be the [math]\displaystyle{ j }[/math]th subset. Let

[math]\displaystyle{ Y_j=\bigoplus_{i\in S_j} X_i, }[/math]

where [math]\displaystyle{ \oplus }[/math] is the exclusive-or, whose truth table is as follows.

[math]\displaystyle{ a }[/math]	[math]\displaystyle{ b }[/math]	[math]\displaystyle{ a }[/math][math]\displaystyle{ \oplus }[/math][math]\displaystyle{ b }[/math]
0	0	0
0	1	1
1	0	1
1	1	0

There are [math]\displaystyle{ n=2^m-1 }[/math] such [math]\displaystyle{ Y_j }[/math], because there are [math]\displaystyle{ 2^m-1 }[/math] nonempty subsets of [math]\displaystyle{ \{1,2,\ldots,m\} }[/math]. An equivalent definition of [math]\displaystyle{ Y_j }[/math] is

[math]\displaystyle{ Y_j=\left(\sum_{i\in S_j}X_i\right)\bmod 2 }[/math].

Sometimes, [math]\displaystyle{ Y_j }[/math] is called the parity of the bits in [math]\displaystyle{ S_j }[/math].

We claim that [math]\displaystyle{ Y_j }[/math] are pairwise independent and uniform.

Theorem

For any [math]\displaystyle{ Y_j }[/math] and any [math]\displaystyle{ b\in\{0,1\} }[/math],

[math]\displaystyle{ \begin{align} \Pr\left[Y_j=b\right] &= \frac{1}{2}. \end{align} }[/math]

For any [math]\displaystyle{ Y_j,Y_\ell }[/math] that [math]\displaystyle{ j\neq\ell }[/math] and any [math]\displaystyle{ a,b\in\{0,1\} }[/math],

[math]\displaystyle{ \begin{align} \Pr\left[Y_j=a\wedge Y_\ell=b\right] &= \frac{1}{4}. \end{align} }[/math]

The proof is left for your exercise.

Therefore, we extract exponentially many pairwise independent uniform random bits from a sequence of mutually independent uniform random bits.

Note that [math]\displaystyle{ Y_j }[/math] are not 3-wise independent. For example, consider the subsets [math]\displaystyle{ S_1=\{1\},S_2=\{2\},S_3=\{1,2\} }[/math] and the corresponding random bits [math]\displaystyle{ Y_1,Y_2,Y_3 }[/math]. Any two of [math]\displaystyle{ Y_1,Y_2,Y_3 }[/math] would decide the value of the third one.

Construction via modulo a prime

We now consider constructing pairwise independent random variables ranging over [math]\displaystyle{ [p]=\{0,1,2,\ldots,p-1\} }[/math] for some prime [math]\displaystyle{ p }[/math]. Unlike the above construction, now we only need two independent random sources [math]\displaystyle{ X_0,X_1 }[/math], which are uniformly and independently distributed over [math]\displaystyle{ [p] }[/math].

Let [math]\displaystyle{ Y_0,Y_1,\ldots, Y_{p-1} }[/math] be defined as:

[math]\displaystyle{ \begin{align} Y_i=(X_0+i\cdot X_1)\bmod p &\quad \mbox{for }i\in[p]. \end{align} }[/math]

Theorem

The random variables [math]\displaystyle{ Y_0,Y_1,\ldots, Y_{p-1} }[/math] are pairwise independent uniform random variables over [math]\displaystyle{ [p] }[/math].

Proof.

We first show that [math]\displaystyle{ Y_i }[/math] are uniform. That is, we will show that for any [math]\displaystyle{ i,a\in[p] }[/math],

[math]\displaystyle{ \begin{align} \Pr\left[(X_0+i\cdot X_1)\bmod p=a\right] &= \frac{1}{p}. \end{align} }[/math]

Due to the law of total probability,

[math]\displaystyle{ \begin{align} \Pr\left[(X_0+i\cdot X_1)\bmod p=a\right] &= \sum_{j\in[p]}\Pr[X_1=j]\cdot\Pr\left[(X_0+ij)\bmod p=a\right]\\ &=\frac{1}{p}\sum_{j\in[p]}\Pr\left[X_0\equiv(a-ij)\pmod{p}\right]. \end{align} }[/math]

For prime [math]\displaystyle{ p }[/math], for any [math]\displaystyle{ i,j,a\in[p] }[/math], there is exact one value in [math]\displaystyle{ [p] }[/math] of [math]\displaystyle{ X_0 }[/math] satisfying [math]\displaystyle{ X_0\equiv(a-ij)\pmod{p} }[/math]. Thus, [math]\displaystyle{ \Pr\left[X_0\equiv(a-ij)\pmod{p}\right]=1/p }[/math] and the above probability is [math]\displaystyle{ \frac{1}{p} }[/math].

We then show that [math]\displaystyle{ Y_i }[/math] are pairwise independent, i.e. we will show that for any [math]\displaystyle{ Y_i,Y_j }[/math] that [math]\displaystyle{ i\neq j }[/math] and any [math]\displaystyle{ a,b\in[p] }[/math],

[math]\displaystyle{ \begin{align} \Pr\left[Y_i=a\wedge Y_j=b\right] &= \frac{1}{p^2}. \end{align} }[/math]

The event [math]\displaystyle{ Y_i=a\wedge Y_j=b }[/math] is equivalent to that

[math]\displaystyle{ \begin{cases} (X_0+iX_1)\equiv a\pmod{p}\\ (X_0+jX_1)\equiv b\pmod{p} \end{cases} }[/math]

Due to the Chinese remainder theorem, there exists a unique solution of [math]\displaystyle{ X_0 }[/math] and [math]\displaystyle{ X_1 }[/math] in [math]\displaystyle{ [p] }[/math] to the above linear congruential system. Thus the probability of the event is [math]\displaystyle{ \frac{1}{p^2} }[/math].

[math]\displaystyle{ \square }[/math]

Tools for limited independence

Let [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math] be random variables. The variance of their sum is

[math]\displaystyle{ \begin{align} \mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]+\sum_{i\neq j}\mathbf{cov}(X_i,X_j). \end{align} }[/math]

If [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math] are pairwise independent, then [math]\displaystyle{ \mathbf{cov}(X_i,X_j)=0 }[/math] for any [math]\displaystyle{ i\neq j }[/math]. This gives us the following theorem of linearity of variance for pairwise independent random variables.

Theorem

For pairwise independent random variables [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math],

[math]\displaystyle{ \begin{align} \mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]. \end{align} }[/math]

The theorem relies on that the covariances of pairwise independent random variables are 0, which in turn is actually a consequence of a more general theorem.

Theorem ([math]\displaystyle{ k }[/math]-wise independence fools [math]\displaystyle{ k }[/math]-degree polynomials)

Let [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math] be mutually independent random variables and [math]\displaystyle{ Y_1,Y_2,\ldots,Y_n }[/math] be [math]\displaystyle{ k }[/math]-wise independent random variables, with that the marginal distribution of [math]\displaystyle{ Y_i }[/math] is identical to the marginal distribution of [math]\displaystyle{ X_i }[/math], [math]\displaystyle{ 1\le i\le n }[/math], that is, [math]\displaystyle{ \Pr[X_i=z]=\Pr[Y_i=z] }[/math] for any [math]\displaystyle{ z }[/math], [math]\displaystyle{ 1\le i\le n }[/math].

Let [math]\displaystyle{ f:\mathbb{R}^n\rightarrow\mathbb{R} }[/math] be a multivariate polynomial of degree at most [math]\displaystyle{ k }[/math]. Then

[math]\displaystyle{ \begin{align} \mathbf{E}\left[f(X_1,X_2,\ldots,X_n)\right]=\mathbf{E}[f(Y_1,Y_2,\ldots,Y_n)]. \end{align} }[/math]

This phenomenon is sometimes called that the [math]\displaystyle{ k }[/math]-degree polynomials are fooled by [math]\displaystyle{ k }[/math]-wise independence. In other words, a [math]\displaystyle{ k }[/math]-degree polynomial behaves the same on the [math]\displaystyle{ k }[/math]-wise independent random variables as on the mutual independent random variables.

This theorem is implied by the following lemma.

Lemma

Let [math]\displaystyle{ X_1,X_2,\ldots,X_k }[/math] be [math]\displaystyle{ k }[/math] mutually independent random variables. Then

[math]\displaystyle{ \begin{align} \mathbf{E}\left[\prod_{i=1}^k X_i\right]=\prod_{i=1}^k\mathbf{E}[X_i]. \end{align} }[/math]

The lemma can be proved by directly compute the expectation. We omit the detailed proof.

By the linearity of expectation, the expectation of a polynomial is reduced to the sum of the expectations of terms. For a k-degree polynomial, each term has at most [math]\displaystyle{ k }[/math] variables. Due to the above lemma, with k-wise independence, the expectation of each term behaves exactly the same as mutual independence.

Since the [math]\displaystyle{ k }[/math]th moment is the expectation of a k-degree polynomial of random variables, the tools based on the [math]\displaystyle{ k }[/math]th moment can be safely used for the k-wise independence. In particular, Chebyshev's inequality for pairwise independent random variables:

Chebyshev's inequality

Let [math]\displaystyle{ X=\sum_{i=1}^n X_i }[/math], where [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are pairwise independent Poisson trials. Let [math]\displaystyle{ \mu=\mathbf{E}[X] }[/math].

Then

[math]\displaystyle{ \Pr[|X-\mu|\ge t]\le\frac{\mathbf{Var}[X]}{t^2}=\frac{\sum_{i=1}^n\mathbf{Var}[X_i]}{t^2}. }[/math]

Application: Derandomizing MAX-CUT

Application: Two-point sampling

Consider a Monte Carlo randomized algorithm with one-sided error for a decision problem [math]\displaystyle{ f }[/math]. We formulate the algorithm as a deterministic algorithm [math]\displaystyle{ A }[/math] that takes as input [math]\displaystyle{ x }[/math] and a uniform random number [math]\displaystyle{ r\in[p] }[/math] where [math]\displaystyle{ p }[/math] is a prime, such that for any input [math]\displaystyle{ x }[/math]:

If [math]\displaystyle{ f(x)=1 }[/math], then [math]\displaystyle{ \Pr[A(x,r)=1]\ge\frac{1}{2} }[/math], where the probability is taken over the random choice of [math]\displaystyle{ r }[/math].
If [math]\displaystyle{ f(x)=0 }[/math], then [math]\displaystyle{ A(x,r)=0 }[/math] for any [math]\displaystyle{ r }[/math].

We call [math]\displaystyle{ r }[/math] the random source for the algorithm.

For the [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=1 }[/math], we call the [math]\displaystyle{ r }[/math] that makes [math]\displaystyle{ A(x,r)=1 }[/math] a witness for [math]\displaystyle{ x }[/math]. For a positive [math]\displaystyle{ x }[/math], at least half of [math]\displaystyle{ [p] }[/math] are witnesses. The random source [math]\displaystyle{ r }[/math] has polynomial number of bits, which means that [math]\displaystyle{ p }[/math] is exponentially large, thus it is infeasible to find the witness for an input [math]\displaystyle{ x }[/math] by exhaustive search. Deterministic overcomes this by having sophisticated deterministic rules for efficiently searching for a witness. Randomization, on the other hard, reduce this to a bit of luck, by randomly choosing an [math]\displaystyle{ r }[/math] and winning with a probability of 1/2.

We can boost the accuracy (equivalently, reduce the error) of any Monte Carlo randomized algorithm with one-sided error by running the algorithm for a number of times.

Suppose that we sample [math]\displaystyle{ t }[/math] values [math]\displaystyle{ r_1,r_2,\ldots,r_t }[/math] uniformly and independently from [math]\displaystyle{ [p] }[/math], and run the following scheme:

[math]\displaystyle{ B(x,r_1,r_2,\ldots,r_t): }[/math]

return [math]\displaystyle{ \bigvee_{i=1}^t A(x,r_i) }[/math];

That is, return 1 if any instance of [math]\displaystyle{ A(x,r_i)=1 }[/math]. For any [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=1 }[/math], due to the independence of [math]\displaystyle{ r_1,r_2,\ldots,r_t }[/math], the probability that [math]\displaystyle{ B(x,r_1,r_2,\ldots,r_t) }[/math] returns an incorrect result is at most [math]\displaystyle{ 2^{-t} }[/math]. On the other hand, [math]\displaystyle{ B }[/math] never makes mistakes for the [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=0 }[/math] since [math]\displaystyle{ A }[/math] has no false positives. Thus, the error of the Monte Carlo algorithm is reduced to [math]\displaystyle{ 2^{-t} }[/math].

Sampling [math]\displaystyle{ t }[/math] mutually independent random numbers from [math]\displaystyle{ [p] }[/math] can be quite expensive since it requires [math]\displaystyle{ \Omega(t\log p) }[/math] random bits. Suppose that we can only afford [math]\displaystyle{ O(\log p) }[/math] random bits. In particular, we sample two independent uniform random number [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] from [math]\displaystyle{ [p] }[/math]. If we use [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] directly bu running two independent instances [math]\displaystyle{ A(x,a) }[/math] and [math]\displaystyle{ A(x,b) }[/math], we only get an error upper bound of 1/4.

The following scheme reduces the error significantly with the same number of random bits:

Algorithm

Choose two independent uniform random number [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] from [math]\displaystyle{ [p] }[/math]. Construct [math]\displaystyle{ t }[/math] random number [math]\displaystyle{ r_1,r_2,\ldots,r_t }[/math] by:

[math]\displaystyle{ \begin{align} \forall 1\le i\le t, &\quad \mbox{let }r_i = (a\cdot i+b)\bmod p. \end{align} }[/math]

Run [math]\displaystyle{ B(x,r_1,r_2,\ldots,r_t): }[/math].

Due to the discussion in the last section, we know that for [math]\displaystyle{ t\le p }[/math], [math]\displaystyle{ r_1,r_2,\ldots,r_t }[/math] are pairwise independent and uniform over [math]\displaystyle{ [p] }[/math]. Let [math]\displaystyle{ X_i=A(x,r_i) }[/math] and [math]\displaystyle{ X=\sum_{i=1}^tX_i }[/math]. Due to the uniformity of [math]\displaystyle{ r_i }[/math] and our definition of [math]\displaystyle{ A }[/math], for any [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=1 }[/math], it holds that

[math]\displaystyle{ \Pr[X_i=1]=\Pr[A(x,r_i)=1]\ge\frac{1}{2}. }[/math]

By the linearity of expectations,

[math]\displaystyle{ \mathbf{E}[X]=\sum_{i=1}^t\mathbf{E}[X_i]=\sum_{i=1}^t\Pr[X_i=1]\ge\frac{t}{2}. }[/math]

Since [math]\displaystyle{ X_i }[/math] is Bernoulli trial with a probability of success at least [math]\displaystyle{ p=1/2 }[/math]. We can estimate the variance of each [math]\displaystyle{ X_i }[/math] as follows.

[math]\displaystyle{ \mathbf{Var}[X_i]=p(1-p)\le\frac{1}{4}. }[/math]

Applying Chebyshev's inequality, we have that for any [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=1 }[/math],

[math]\displaystyle{ \begin{align} \Pr\left[\bigvee_{i=1}^t A(x,r_i)=0\right] &= \Pr[X=0]\\ &\le \Pr[|X-\mathbf{E}[X]|\ge \mathbf{E}[X]]\\ &\le \Pr\left[|X-\mathbf{E}[X]|\ge \frac{t}{2}\right]\\ &\le \frac{4}{t^2}\sum_{i=1}^t\mathbf{Var}[X_i]\\ &\le \frac{1}{t}. \end{align} }[/math]

The error is reduced to [math]\displaystyle{ 1/t }[/math] with only two random numbers. This scheme works as long as [math]\displaystyle{ t\le p }[/math].

@@ Line 151: / Line 151: @@
 = Tools for limited independence =
-We have the following theorem of linearity of variance for pairwise independent random variables.
+Let <math>X_1,X_2,\ldots,X_n</math> be random variables. The variance of their sum is
+:<math>\begin{align}
+\mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]+\sum_{i\neq j}\mathbf{cov}(X_i,X_j).
+\end{align}</math>
+If <math>X_1,X_2,\ldots,X_n</math> are pairwise independent, then <math>\mathbf{cov}(X_i,X_j)=0</math> for any <math>i\neq j</math>. This gives us the following theorem of linearity of variance for pairwise independent random variables.
 {{Theorem
 |Theorem|
@@ Line 159: / Line 163: @@
 \end{align}</math>
 }}
-We proved the theorem by showing that the covariances of pairwise independent random variables are 0. The theorem is actually a consequence of a more general statement.
+The theorem relies on that the covariances of pairwise independent random variables are 0, which in turn is actually a consequence of a more general theorem.
 {{Theorem
 |Theorem (<math>k</math>-wise independence fools <math>k</math>-degree polynomials)|

随机算法 (Fall 2011)/Limited independence: Difference between revisions

Revision as of 06:40, 21 November 2011

Contents

k-wise independence

Construction via XOR

Construction via modulo a prime

Tools for limited independence

Application: Derandomizing MAX-CUT

Application: Two-point sampling

Navigation menu

随机算法 (Fall 2011)/Limited independence: Difference between revisions

Revision as of 06:40, 21 November 2011

k-wise independence

Construction via XOR

Construction via modulo a prime

Tools for limited independence

Application: Derandomizing MAX-CUT

Application: Two-point sampling

Navigation menu

Search