# k-wise independence

Recall the definition of independence between events:

 Definition (Independent events) Events ${\displaystyle {\mathcal {E}}_{1},{\mathcal {E}}_{2},\ldots ,{\mathcal {E}}_{n}}$ are mutually independent if, for any subset ${\displaystyle I\subseteq \{1,2,\ldots ,n\}}$, {\displaystyle {\begin{aligned}\Pr \left[\bigwedge _{i\in I}{\mathcal {E}}_{i}\right]&=\prod _{i\in I}\Pr[{\mathcal {E}}_{i}].\end{aligned}}}

Similarly, we can define independence between random variables:

 Definition (Independent variables) Random variables ${\displaystyle X_{1},X_{2},\ldots ,X_{n}}$ are mutually independent if, for any subset ${\displaystyle I\subseteq \{1,2,\ldots ,n\}}$ and any values ${\displaystyle x_{i}}$, where ${\displaystyle i\in I}$, {\displaystyle {\begin{aligned}\Pr \left[\bigwedge _{i\in I}(X_{i}=x_{i})\right]&=\prod _{i\in I}\Pr[X_{i}=x_{i}].\end{aligned}}}

Mutual independence is an ideal condition of independence. The limited notion of independence is usually defined by the k-wise independence.

 Definition (k-wise Independenc) 1. Events ${\displaystyle {\mathcal {E}}_{1},{\mathcal {E}}_{2},\ldots ,{\mathcal {E}}_{n}}$ are k-wise independent if, for any subset ${\displaystyle I\subseteq \{1,2,\ldots ,n\}}$ with ${\displaystyle |I|\leq k}$ {\displaystyle {\begin{aligned}\Pr \left[\bigwedge _{i\in I}{\mathcal {E}}_{i}\right]&=\prod _{i\in I}\Pr[{\mathcal {E}}_{i}].\end{aligned}}} 2. Random variables ${\displaystyle X_{1},X_{2},\ldots ,X_{n}}$ are k-wise independent if, for any subset ${\displaystyle I\subseteq \{1,2,\ldots ,n\}}$ with ${\displaystyle |I|\leq k}$ and any values ${\displaystyle x_{i}}$, where ${\displaystyle i\in I}$, {\displaystyle {\begin{aligned}\Pr \left[\bigwedge _{i\in I}(X_{i}=x_{i})\right]&=\prod _{i\in I}\Pr[X_{i}=x_{i}].\end{aligned}}}

A very common case is pairwise independence, i.e. the 2-wise independence.

 Definition (pairwise Independent random variables) Random variables ${\displaystyle X_{1},X_{2},\ldots ,X_{n}}$ are pairwise independent if, for any ${\displaystyle X_{i},X_{j}}$ where ${\displaystyle i\neq j}$ and any values ${\displaystyle a,b}$ {\displaystyle {\begin{aligned}\Pr \left[X_{i}=a\wedge X_{j}=b\right]&=\Pr[X_{i}=a]\cdot \Pr[X_{j}=b].\end{aligned}}}

Note that the definition of k-wise independence is hereditary:

• If ${\displaystyle X_{1},X_{2},\ldots ,X_{n}}$ are k-wise independent, then they are also ${\displaystyle \ell }$-wise independent for any ${\displaystyle \ell .
• If ${\displaystyle X_{1},X_{2},\ldots ,X_{n}}$ are NOT k-wise independent, then they cannot be ${\displaystyle \ell }$-wise independent for any ${\displaystyle \ell >k}$.

## Construction via XOR

Suppose we have ${\displaystyle m}$ mutually independent and uniform random bits ${\displaystyle X_{1},\ldots ,X_{m}}$. We are going to extract ${\displaystyle n=2^{m}-1}$ pairwise independent bits from these ${\displaystyle m}$ mutually independent bits.

Enumerate all the nonempty subsets of ${\displaystyle \{1,2,\ldots ,m\}}$ in some order. Let ${\displaystyle S_{j}}$ be the ${\displaystyle j}$th subset. Let

${\displaystyle Y_{j}=\bigoplus _{i\in S_{j}}X_{i},}$

where ${\displaystyle \oplus }$ is the exclusive-or, whose truth table is as follows.

 ${\displaystyle a}$ ${\displaystyle b}$ ${\displaystyle a}$${\displaystyle \oplus }$${\displaystyle b}$ 0 0 0 0 1 1 1 0 1 1 1 0

There are ${\displaystyle n=2^{m}-1}$ such ${\displaystyle Y_{j}}$, because there are ${\displaystyle 2^{m}-1}$ nonempty subsets of ${\displaystyle \{1,2,\ldots ,m\}}$. An equivalent definition of ${\displaystyle Y_{j}}$ is

${\displaystyle Y_{j}=\left(\sum _{i\in S_{j}}X_{i}\right){\bmod {2}}}$.

Sometimes, ${\displaystyle Y_{j}}$ is called the parity of the bits in ${\displaystyle S_{j}}$.

We claim that ${\displaystyle Y_{j}}$ are pairwise independent and uniform.

 Theorem For any ${\displaystyle Y_{j}}$ and any ${\displaystyle b\in \{0,1\}}$, {\displaystyle {\begin{aligned}\Pr \left[Y_{j}=b\right]&={\frac {1}{2}}.\end{aligned}}} For any ${\displaystyle Y_{j},Y_{\ell }}$ that ${\displaystyle j\neq \ell }$ and any ${\displaystyle a,b\in \{0,1\}}$, {\displaystyle {\begin{aligned}\Pr \left[Y_{j}=a\wedge Y_{\ell }=b\right]&={\frac {1}{4}}.\end{aligned}}}

The proof is left for your exercise.

Therefore, we extract exponentially many pairwise independent uniform random bits from a sequence of mutually independent uniform random bits.

Note that ${\displaystyle Y_{j}}$ are not 3-wise independent. For example, consider the subsets ${\displaystyle S_{1}=\{1\},S_{2}=\{2\},S_{3}=\{1,2\}}$ and the corresponding random bits ${\displaystyle Y_{1},Y_{2},Y_{3}}$. Any two of ${\displaystyle Y_{1},Y_{2},Y_{3}}$ would decide the value of the third one.

## Construction via modulo a prime

We now consider constructing pairwise independent random variables ranging over ${\displaystyle [p]=\{0,1,2,\ldots ,p-1\}}$ for some prime ${\displaystyle p}$. Unlike the above construction, now we only need two independent random sources ${\displaystyle X_{0},X_{1}}$, which are uniformly and independently distributed over ${\displaystyle [p]}$.

Let ${\displaystyle Y_{0},Y_{1},\ldots ,Y_{p-1}}$ be defined as:

{\displaystyle {\begin{aligned}Y_{i}=(X_{0}+i\cdot X_{1}){\bmod {p}}&\quad {\mbox{for }}i\in [p].\end{aligned}}}
 Theorem The random variables ${\displaystyle Y_{0},Y_{1},\ldots ,Y_{p-1}}$ are pairwise independent uniform random variables over ${\displaystyle [p]}$.
Proof.
 We first show that ${\displaystyle Y_{i}}$ are uniform. That is, we will show that for any ${\displaystyle i,a\in [p]}$, {\displaystyle {\begin{aligned}\Pr \left[(X_{0}+i\cdot X_{1}){\bmod {p}}=a\right]&={\frac {1}{p}}.\end{aligned}}} Due to the law of total probability, {\displaystyle {\begin{aligned}\Pr \left[(X_{0}+i\cdot X_{1}){\bmod {p}}=a\right]&=\sum _{j\in [p]}\Pr[X_{1}=j]\cdot \Pr \left[(X_{0}+ij){\bmod {p}}=a\right]\\&={\frac {1}{p}}\sum _{j\in [p]}\Pr \left[X_{0}\equiv (a-ij){\pmod {p}}\right].\end{aligned}}} For prime ${\displaystyle p}$, for any ${\displaystyle i,j,a\in [p]}$, there is exact one value in ${\displaystyle [p]}$ of ${\displaystyle X_{0}}$ satisfying ${\displaystyle X_{0}\equiv (a-ij){\pmod {p}}}$. Thus, ${\displaystyle \Pr \left[X_{0}\equiv (a-ij){\pmod {p}}\right]=1/p}$ and the above probability is ${\displaystyle {\frac {1}{p}}}$. We then show that ${\displaystyle Y_{i}}$ are pairwise independent, i.e. we will show that for any ${\displaystyle Y_{i},Y_{j}}$ that ${\displaystyle i\neq j}$ and any ${\displaystyle a,b\in [p]}$, {\displaystyle {\begin{aligned}\Pr \left[Y_{i}=a\wedge Y_{j}=b\right]&={\frac {1}{p^{2}}}.\end{aligned}}} The event ${\displaystyle Y_{i}=a\wedge Y_{j}=b}$ is equivalent to that ${\displaystyle {\begin{cases}(X_{0}+iX_{1})\equiv a{\pmod {p}}\\(X_{0}+jX_{1})\equiv b{\pmod {p}}\end{cases}}}$ Due to the Chinese remainder theorem, there exists a unique solution of ${\displaystyle X_{0}}$ and ${\displaystyle X_{1}}$ in ${\displaystyle [p]}$ to the above linear congruential system. Thus the probability of the event is ${\displaystyle {\frac {1}{p^{2}}}}$.
${\displaystyle \square }$

# Tools for limited independence

Let ${\displaystyle X_{1},X_{2},\ldots ,X_{n}}$ be random variables. The variance of their sum is

{\displaystyle {\begin{aligned}\mathbf {Var} \left[\sum _{i=1}^{n}X_{i}\right]=\sum _{i=1}^{n}\mathbf {Var} [X_{i}]+\sum _{i\neq j}\mathbf {cov} (X_{i},X_{j}).\end{aligned}}}

If ${\displaystyle X_{1},X_{2},\ldots ,X_{n}}$ are pairwise independent, then ${\displaystyle \mathbf {cov} (X_{i},X_{j})=0}$ for any ${\displaystyle i\neq j}$ since the covariance of a pair of independent random variables is 0. This gives us the following theorem of linearity of variance for pairwise independent random variables.

 Theorem For pairwise independent random variables ${\displaystyle X_{1},X_{2},\ldots ,X_{n}}$, {\displaystyle {\begin{aligned}\mathbf {Var} \left[\sum _{i=1}^{n}X_{i}\right]=\sum _{i=1}^{n}\mathbf {Var} [X_{i}].\end{aligned}}}

The theorem relies on that the covariances of pairwise independent random variables are 0, which in turn is actually a consequence of a more general theorem.

 Theorem (${\displaystyle k}$-wise independence fools ${\displaystyle k}$-degree polynomials) Let ${\displaystyle X_{1},X_{2},\ldots ,X_{n}}$ be mutually independent random variables and ${\displaystyle Y_{1},Y_{2},\ldots ,Y_{n}}$ be ${\displaystyle k}$-wise independent random variables, with that the marginal distribution of ${\displaystyle Y_{i}}$ is identical to the marginal distribution of ${\displaystyle X_{i}}$, ${\displaystyle 1\leq i\leq n}$, that is, ${\displaystyle \Pr[X_{i}=z]=\Pr[Y_{i}=z]}$ for any ${\displaystyle z}$, ${\displaystyle 1\leq i\leq n}$. Let ${\displaystyle f:\mathbb {R} ^{n}\rightarrow \mathbb {R} }$ be a multivariate polynomial of degree at most ${\displaystyle k}$. Then {\displaystyle {\begin{aligned}\mathbf {E} \left[f(X_{1},X_{2},\ldots ,X_{n})\right]=\mathbf {E} [f(Y_{1},Y_{2},\ldots ,Y_{n})].\end{aligned}}}

This phenomenon is sometimes called that the ${\displaystyle k}$-degree polynomials are fooled by ${\displaystyle k}$-wise independence. In other words, a ${\displaystyle k}$-degree polynomial behaves the same on the ${\displaystyle k}$-wise independent random variables as on the mutual independent random variables.

This theorem is implied by the following lemma.

 Lemma Let ${\displaystyle X_{1},X_{2},\ldots ,X_{k}}$ be ${\displaystyle k}$ mutually independent random variables. Then {\displaystyle {\begin{aligned}\mathbf {E} \left[\prod _{i=1}^{k}X_{i}\right]=\prod _{i=1}^{k}\mathbf {E} [X_{i}].\end{aligned}}}

The lemma can be proved by directly compute the expectation. We omit the detailed proof.

By the linearity of expectation, the expectation of a polynomial is reduced to the sum of the expectations of terms. For a k-degree polynomial, each term has at most ${\displaystyle k}$ variables. Due to the above lemma, with k-wise independence, the expectation of each term behaves exactly the same as mutual independence.

Since the ${\displaystyle k}$th moment is the expectation of a k-degree polynomial of random variables, the tools based on the ${\displaystyle k}$th moment can be safely used for the k-wise independence. In particular, Chebyshev's inequality for pairwise independent random variables:

 Chebyshev's inequality Let ${\displaystyle X=\sum _{i=1}^{n}X_{i}}$, where ${\displaystyle X_{1},X_{2},\ldots ,X_{n}}$ are pairwise independent Poisson trials. Let ${\displaystyle \mu =\mathbf {E} [X]}$. Then ${\displaystyle \Pr[|X-\mu |\geq t]\leq {\frac {\mathbf {Var} [X]}{t^{2}}}={\frac {\sum _{i=1}^{n}\mathbf {Var} [X_{i}]}{t^{2}}}.}$

# Application: Derandomizing MAX-CUT

Let ${\displaystyle G(V,E)}$ be an undirected graph, and ${\displaystyle S\subset V}$ be a vertex set. The cut defined by ${\displaystyle S}$ is ${\displaystyle C(S,{\bar {S}})=|\{uv\in E\mid u\in S,v\not \in S\}|}$.

Given as input an undirected graph ${\displaystyle G(V,E)}$, find the ${\displaystyle S\subset V}$ whose cut value ${\displaystyle C(S,{\bar {S}})}$ is maximized. This problem is called the maximum cut (MAX-CUT) problem, which is NP-hard. The decision version of one of the weighted version of the problem is one of the Karp's 21 NP-complete problems. The problem has a ${\displaystyle 0.878}$-approximation algorithm by rounding a semidefinite programming. Assuming that the unique game conjecture (UGC), there does not exist a poly-time algorithm with better approximation ratio unless ${\displaystyle P=NP}$.

Here we give a very simple ${\displaystyle 0.5}$-approximation algorithm. The "algorithm" has a one-line description:

• Put each ${\displaystyle v\in V}$ into ${\displaystyle S}$ independently with probability 1/2.

We then analyze the approximation ratio of this algorithm.

For each ${\displaystyle v\in V}$, let ${\displaystyle Y_{v}}$ indicate whether ${\displaystyle v\in S}$, that is

${\displaystyle Y_{v}={\begin{cases}1&v\in S,\\0&v\not \in S.\end{cases}}}$

For each edge ${\displaystyle uv\in E}$, let ${\displaystyle Y_{uv}}$ indicate whether ${\displaystyle uv}$ contribute to the cut ${\displaystyle C(S,{\bar {S}})}$, i.e. whether ${\displaystyle u\in S,v\not \in S}$ or ${\displaystyle u\not \in S,v\in S}$, that is

${\displaystyle Y_{uv}={\begin{cases}1&Y_{u}\neq Y_{v},\\0&{\text{otherwise}}.\end{cases}}}$

Then ${\displaystyle C(S,{\bar {S}})=\sum _{uv\in E}Y_{uv}}$. Due to the linearity of expectation,

${\displaystyle \mathbf {E} \left[C(S,{\bar {S}})\right]=\sum _{uv\in E}\mathbf {E} [Y_{uv}]=\sum _{uv\in E}\Pr[Y_{u}\neq Y_{v}]={\frac {|E|}{2}}}$.

The maximum cut of a graph is at most ${\displaystyle |E|}$. Thus, the algorithm returns in expectation a cut with size at least half of the maximum cut.

We then show how to dereandomize this algorithm by pairwise independent bits.

Suppose that ${\displaystyle |V|=n}$ and enumerate the ${\displaystyle n}$ vertices by ${\displaystyle v_{1},v_{2},\ldots ,v_{n}}$ in an arbitrary order. Let ${\displaystyle m=\lceil \log _{2}(n+1)\rceil }$. Sample ${\displaystyle m}$ bits ${\displaystyle X_{1},\ldots ,X_{m}\in \{0,1\}}$ uniformly and independently at random. Enumerate all nonempty subsets of ${\displaystyle \{1,2,\ldots ,m\}}$ by ${\displaystyle S_{1},S_{2},\ldots ,S_{2^{m}-1}}$. For each vertex ${\displaystyle v_{j}}$, let ${\displaystyle Y_{v_{j}}=\bigoplus _{i\in S_{j}}X_{i}}$. The MAX-CUT algorithm uses these bits to construct the solution ${\displaystyle S}$:

• For ${\displaystyle j=1,2,\ldots ,n}$, put ${\displaystyle v_{j}}$ into ${\displaystyle S}$ if ${\displaystyle Y_{v_{j}}=1}$.

We have shown that ${\displaystyle Y_{v_{j}}}$, ${\displaystyle j=1,2,\ldots ,n}$, are uniform and pairwise independent. Thus we still have that ${\displaystyle \Pr[Y_{u}\neq Y_{v}]={\frac {1}{2}}}$. The above analysis still holds, so that the algorithm returns in expectation a cut with size at least ${\displaystyle {\frac {|E|}{2}}}$.

Finally, we notice that there are only ${\displaystyle m=\lceil \log _{2}(n+1)\rceil }$ total random bits in the new algorithm. We can enumerate all ${\displaystyle 2^{m}\leq 2(n+1)}$ possible strings of ${\displaystyle m}$ bits, run the above algorithm with the bit strings as the "random sources", and output the maximum cut returned. There must exist a bit string ${\displaystyle X_{1},\ldots ,X_{m}\in \{0,1\}}$ on which the algorithm returns a cut of size ${\displaystyle \geq {\frac {|E|}{2}}}$ (why?). This gives us a deterministic polynomial time (actually ${\displaystyle O(n^{2})}$ time) ${\displaystyle 1/2}$-approximation algorithm.

# Application: Two-point sampling

Consider a Monte Carlo randomized algorithm with one-sided error for a decision problem ${\displaystyle f}$. We formulate the algorithm as a deterministic algorithm ${\displaystyle A}$ that takes as input ${\displaystyle x}$ and a uniform random number ${\displaystyle r\in [p]}$ where ${\displaystyle p}$ is a prime, such that for any input ${\displaystyle x}$:

• If ${\displaystyle f(x)=1}$, then ${\displaystyle \Pr[A(x,r)=1]\geq {\frac {1}{2}}}$, where the probability is taken over the random choice of ${\displaystyle r}$.
• If ${\displaystyle f(x)=0}$, then ${\displaystyle A(x,r)=0}$ for any ${\displaystyle r}$.

We call ${\displaystyle r}$ the random source for the algorithm.

For the ${\displaystyle x}$ that ${\displaystyle f(x)=1}$, we call the ${\displaystyle r}$ that makes ${\displaystyle A(x,r)=1}$ a witness for ${\displaystyle x}$. For a positive ${\displaystyle x}$, at least half of ${\displaystyle [p]}$ are witnesses. The random source ${\displaystyle r}$ has polynomial number of bits, which means that ${\displaystyle p}$ is exponentially large, thus it is infeasible to find the witness for an input ${\displaystyle x}$ by exhaustive search. Deterministic overcomes this by having sophisticated deterministic rules for efficiently searching for a witness. Randomization, on the other hard, reduce this to a bit of luck, by randomly choosing an ${\displaystyle r}$ and winning with a probability of 1/2.

We can boost the accuracy (equivalently, reduce the error) of any Monte Carlo randomized algorithm with one-sided error by running the algorithm for a number of times.

Suppose that we sample ${\displaystyle t}$ values ${\displaystyle r_{1},r_{2},\ldots ,r_{t}}$ uniformly and independently from ${\displaystyle [p]}$, and run the following scheme:

 ${\displaystyle B(x,r_{1},r_{2},\ldots ,r_{t}):}$ return ${\displaystyle \bigvee _{i=1}^{t}A(x,r_{i})}$;

That is, return 1 if any instance of ${\displaystyle A(x,r_{i})=1}$. For any ${\displaystyle x}$ that ${\displaystyle f(x)=1}$, due to the independence of ${\displaystyle r_{1},r_{2},\ldots ,r_{t}}$, the probability that ${\displaystyle B(x,r_{1},r_{2},\ldots ,r_{t})}$ returns an incorrect result is at most ${\displaystyle 2^{-t}}$. On the other hand, ${\displaystyle B}$ never makes mistakes for the ${\displaystyle x}$ that ${\displaystyle f(x)=0}$ since ${\displaystyle A}$ has no false positives. Thus, the error of the Monte Carlo algorithm is reduced to ${\displaystyle 2^{-t}}$.

Sampling ${\displaystyle t}$ mutually independent random numbers from ${\displaystyle [p]}$ can be quite expensive since it requires ${\displaystyle \Omega (t\log p)}$ random bits. Suppose that we can only afford ${\displaystyle O(\log p)}$ random bits. In particular, we sample two independent uniform random number ${\displaystyle a}$ and ${\displaystyle b}$ from ${\displaystyle [p]}$. If we use ${\displaystyle a}$ and ${\displaystyle b}$ directly bu running two independent instances ${\displaystyle A(x,a)}$ and ${\displaystyle A(x,b)}$, we only get an error upper bound of 1/4.

The following scheme reduces the error significantly with the same number of random bits:

 Algorithm Choose two independent uniform random number ${\displaystyle a}$ and ${\displaystyle b}$ from ${\displaystyle [p]}$. Construct ${\displaystyle t}$ random number ${\displaystyle r_{1},r_{2},\ldots ,r_{t}}$ by: {\displaystyle {\begin{aligned}\forall 1\leq i\leq t,&\quad {\mbox{let }}r_{i}=(a\cdot i+b){\bmod {p}}.\end{aligned}}} Run ${\displaystyle B(x,r_{1},r_{2},\ldots ,r_{t}):}$.

Due to the discussion in the last section, we know that for ${\displaystyle t\leq p}$, ${\displaystyle r_{1},r_{2},\ldots ,r_{t}}$ are pairwise independent and uniform over ${\displaystyle [p]}$. Let ${\displaystyle X_{i}=A(x,r_{i})}$ and ${\displaystyle X=\sum _{i=1}^{t}X_{i}}$. Due to the uniformity of ${\displaystyle r_{i}}$ and our definition of ${\displaystyle A}$, for any ${\displaystyle x}$ that ${\displaystyle f(x)=1}$, it holds that

${\displaystyle \Pr[X_{i}=1]=\Pr[A(x,r_{i})=1]\geq {\frac {1}{2}}.}$

By the linearity of expectations,

${\displaystyle \mathbf {E} [X]=\sum _{i=1}^{t}\mathbf {E} [X_{i}]=\sum _{i=1}^{t}\Pr[X_{i}=1]\geq {\frac {t}{2}}.}$

Since ${\displaystyle X_{i}}$ is Bernoulli trial with a probability of success at least ${\displaystyle p=1/2}$. We can estimate the variance of each ${\displaystyle X_{i}}$ as follows.

${\displaystyle \mathbf {Var} [X_{i}]=p(1-p)\leq {\frac {1}{4}}.}$

Applying Chebyshev's inequality, we have that for any ${\displaystyle x}$ that ${\displaystyle f(x)=1}$,

{\displaystyle {\begin{aligned}\Pr \left[\bigvee _{i=1}^{t}A(x,r_{i})=0\right]&=\Pr[X=0]\\&\leq \Pr[|X-\mathbf {E} [X]|\geq \mathbf {E} [X]]\\&\leq \Pr \left[|X-\mathbf {E} [X]|\geq {\frac {t}{2}}\right]\\&\leq {\frac {4}{t^{2}}}\sum _{i=1}^{t}\mathbf {Var} [X_{i}]\\&\leq {\frac {1}{t}}.\end{aligned}}}

The error is reduced to ${\displaystyle 1/t}$ with only two random numbers. This scheme works as long as ${\displaystyle t\leq p}$.