随机算法 (Fall 2011)/Limited independence: Difference between revisions
imported>Etone |
imported>Etone |
||
(17 intermediate revisions by the same user not shown) | |||
Line 151: | Line 151: | ||
= Tools for limited independence = | = Tools for limited independence = | ||
Let <math>X_1,X_2,\ldots,X_n</math> be random variables. The variance of their sum is | |||
:<math>\begin{align} | |||
\mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]+\sum_{i\neq j}\mathbf{cov}(X_i,X_j). | |||
\end{align}</math> | |||
If <math>X_1,X_2,\ldots,X_n</math> are pairwise independent, then <math>\mathbf{cov}(X_i,X_j)=0</math> for any <math>i\neq j</math> since the covariance of a pair of independent random variables is 0. This gives us the following theorem of linearity of variance for pairwise independent random variables. | |||
{{Theorem | {{Theorem | ||
|Theorem| | |Theorem| | ||
Line 161: | Line 163: | ||
\end{align}</math> | \end{align}</math> | ||
}} | }} | ||
The theorem relies on that the covariances of pairwise independent random variables are 0, which in turn is actually a consequence of a more general theorem. | |||
{{Theorem | {{Theorem | ||
|Theorem | |Theorem (<math>k</math>-wise independence fools <math>k</math>-degree polynomials)| | ||
:Let <math>X_1,X_2,\ldots,X_n</math> be mutually independent random variables | :Let <math>X_1,X_2,\ldots,X_n</math> be mutually independent random variables and <math>Y_1,Y_2,\ldots,Y_n</math> be <math>k</math>-wise independent random variables, with that the marginal distribution of <math>Y_i</math> is identical to the marginal distribution of <math>X_i</math>, <math>1\le i\le n</math>, that is, <math>\Pr[X_i=z]=\Pr[Y_i=z]</math> for any <math>z</math>, <math>1\le i\le n</math>. | ||
:Let <math>f:\mathbb{R}^n\rightarrow\mathbb{R}</math> be a multivariate polynomial of degree at most <math>k</math>. Then | |||
::<math>\begin{align} | ::<math>\begin{align} | ||
Line 171: | Line 175: | ||
}} | }} | ||
This phenomenon is sometimes called that the k-degree polynomials are ''fooled'' by k-wise independence. In other words, a k-degree polynomial behaves the same on the k-wise independent random variables as on the mutual independent random variables. | This phenomenon is sometimes called that the <math>k</math>-degree polynomials are ''fooled'' by <math>k</math>-wise independence. In other words, a <math>k</math>-degree polynomial behaves the same on the <math>k</math>-wise independent random variables as on the mutual independent random variables. | ||
This theorem is implied by the following lemma. | This theorem is implied by the following lemma. | ||
Line 183: | Line 187: | ||
The lemma can be proved by directly compute the expectation. We omit the detailed proof. | The lemma can be proved by directly compute the expectation. We omit the detailed proof. | ||
By the linearity of expectation, the expectation of a polynomial is reduced to the sum of the expectations of terms. For a k-degree polynomial, each term has at most <math>k</math> variables. Due to the above lemma, with k-wise independence, the expectation of each term behaves exactly the same as mutual independence | By the linearity of expectation, the expectation of a polynomial is reduced to the sum of the expectations of terms. For a k-degree polynomial, each term has at most <math>k</math> variables. Due to the above lemma, with k-wise independence, the expectation of each term behaves exactly the same as mutual independence. | ||
Since the <math>k</math>th moment is the expectation of a k-degree polynomial of random variables, the tools based on the <math>k</math>th moment can be safely used for the k-wise independence. In particular, Chebyshev's inequality for pairwise independent random variables: | Since the <math>k</math>th moment is the expectation of a k-degree polynomial of random variables, the tools based on the <math>k</math>th moment can be safely used for the k-wise independence. In particular, Chebyshev's inequality for pairwise independent random variables: | ||
Line 193: | Line 197: | ||
::<math>\Pr[|X-\mu|\ge t]\le\frac{\mathbf{Var}[X]}{t^2}=\frac{\sum_{i=1}^n\mathbf{Var}[X_i]}{t^2}.</math> | ::<math>\Pr[|X-\mu|\ge t]\le\frac{\mathbf{Var}[X]}{t^2}=\frac{\sum_{i=1}^n\mathbf{Var}[X_i]}{t^2}.</math> | ||
}} | }} | ||
=Application: Derandomizing MAX-CUT= | |||
Let <math>G(V,E)</math> be an undirected graph, and <math>S\subset V</math> be a vertex set. The '''cut''' defined by <math>S</math> is <math>C(S,\bar{S})=|\{uv\in E\mid u\in S, v\not\in S\}|</math>. | |||
Given as input an undirected graph <math>G(V,E)</math>, find the <math>S\subset V</math> whose cut value <math>C(S,\bar{S})</math> is maximized. This problem is called the [http://en.wikipedia.org/wiki/Maximum_cut maximum cut (MAX-CUT) problem], which is NP-hard. The decision version of one of the weighted version of the problem is one of the [http://en.wikipedia.org/wiki/Karp%27s_21_NP-complete_problems Karp's 21 NP-complete problems]. The problem has a <math>0.878</math>-approximation algorithm by rounding a [http://en.wikipedia.org/wiki/Semidefinite_programming semidefinite programming]. Assuming that the [http://en.wikipedia.org/wiki/Unique_games_conjecture unique game conjecture (UGC)], there does not exist a poly-time algorithm with better approximation ratio unless <math>P=NP</math>. | |||
Here we give a very simple <math>0.5</math>-approximation algorithm. The "algorithm" has a one-line description: | |||
*Put each <math>v\in V</math> into <math>S</math> independently with probability 1/2. | |||
We then analyze the approximation ratio of this algorithm. | |||
For each <math>v\in V</math>, let <math>Y_v</math> indicate whether <math>v\in S</math>, that is | |||
:<math>Y_v=\begin{cases}1& v\in S,\\ | |||
0& v\not\in S.\end{cases}</math> | |||
For each edge <math>uv\in E</math>, let <math>Y_{uv}</math> indicate whether <math>uv</math> contribute to the cut <math>C(S,\bar{S})</math>, i.e. whether <math>u\in S, v\not\in S</math> or <math>u\not\in S, v\in S</math>, that is | |||
:<math>Y_{uv}=\begin{cases}1&Y_u\neq Y_v,\\0&\text{otherwise}.\end{cases}</math> | |||
Then <math>C(S,\bar{S})=\sum_{uv\in E}Y_{uv}</math>. Due to the linearity of expectation, | |||
:<math>\mathbf{E}\left[C(S,\bar{S})\right]=\sum_{uv\in E}\mathbf{E}[Y_{uv}]=\sum_{uv\in E}\Pr[Y_u\neq Y_v]=\frac{|E|}{2}</math>. | |||
The maximum cut of a graph is at most <math>|E|</math>. Thus, the algorithm returns in expectation a cut with size at least half of the maximum cut. | |||
We then show how to dereandomize this algorithm by pairwise independent bits. | |||
Suppose that <math>|V|=n</math> and enumerate the <math>n</math> vertices by <math>v_1,v_2,\ldots, v_n</math> in an arbitrary order. Let <math>m=\lceil\log_2 (n+1)\rceil</math>. Sample <math>m</math> bits <math>X_1,\ldots, X_m\in\{0,1\}</math> uniformly and independently at random. Enumerate all nonempty subsets of <math>\{1,2,\ldots,m\}</math> by <math>S_1,S_2,\ldots,S_{2^m-1}</math>. For each vertex <math>v_j</math>, let <math>Y_{v_j}=\bigoplus_{i\in S_j}X_i</math>. The MAX-CUT algorithm uses these bits to construct the solution <math>S</math>: | |||
* For <math>j=1,2,\ldots,n</math>, put <math>v_j</math> into <math>S</math> if <math>Y_{v_j}=1</math>. | |||
We have shown that <math>Y_{v_j}</math>, <math>j=1,2,\ldots,n</math>, are uniform and pairwise independent. Thus we still have that <math>\Pr[Y_{u}\neq Y_{v}]=\frac{1}{2}</math>. The above analysis still holds, so that the algorithm returns in expectation a cut with size at least <math>\frac{|E|}{2}</math>. | |||
Finally, we notice that there are only <math>m=\lceil\log_2 (n+1)\rceil</math> total random bits in the new algorithm. We can enumerate all <math>2^m\le 2(n+1)</math> possible strings of <math>m</math> bits, run the above algorithm with the bit strings as the "random sources", and output the maximum cut returned. There must exist a bit string <math>X_1,\ldots, X_m\in\{0,1\}</math> on which the algorithm returns a cut of size <math>\ge \frac{|E|}{2}</math> (why?). This gives us a deterministic polynomial time (actually <math>O(n^2)</math> time) <math>1/2</math>-approximation algorithm. | |||
= Application: Two-point sampling = | = Application: Two-point sampling = |
Latest revision as of 07:32, 21 November 2011
k-wise independence
Recall the definition of independence between events:
Definition (Independent events) - Events [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] are mutually independent if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math],
- [math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i\in I}\mathcal{E}_i\right] &= \prod_{i\in I}\Pr[\mathcal{E}_i]. \end{align} }[/math]
- Events [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] are mutually independent if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math],
Similarly, we can define independence between random variables:
Definition (Independent variables) - Random variables [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are mutually independent if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math] and any values [math]\displaystyle{ x_i }[/math], where [math]\displaystyle{ i\in I }[/math],
- [math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i\in I}(X_i=x_i)\right] &= \prod_{i\in I}\Pr[X_i=x_i]. \end{align} }[/math]
- Random variables [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are mutually independent if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math] and any values [math]\displaystyle{ x_i }[/math], where [math]\displaystyle{ i\in I }[/math],
Mutual independence is an ideal condition of independence. The limited notion of independence is usually defined by the k-wise independence.
Definition (k-wise Independenc) - 1. Events [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] are k-wise independent if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math] with [math]\displaystyle{ |I|\le k }[/math]
- [math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i\in I}\mathcal{E}_i\right] &= \prod_{i\in I}\Pr[\mathcal{E}_i]. \end{align} }[/math]
- 2. Random variables [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are k-wise independent if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math] with [math]\displaystyle{ |I|\le k }[/math] and any values [math]\displaystyle{ x_i }[/math], where [math]\displaystyle{ i\in I }[/math],
- [math]\displaystyle{ \begin{align} \Pr\left[\bigwedge_{i\in I}(X_i=x_i)\right] &= \prod_{i\in I}\Pr[X_i=x_i]. \end{align} }[/math]
- 1. Events [math]\displaystyle{ \mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n }[/math] are k-wise independent if, for any subset [math]\displaystyle{ I\subseteq\{1,2,\ldots,n\} }[/math] with [math]\displaystyle{ |I|\le k }[/math]
A very common case is pairwise independence, i.e. the 2-wise independence.
Definition (pairwise Independent random variables) - Random variables [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are pairwise independent if, for any [math]\displaystyle{ X_i,X_j }[/math] where [math]\displaystyle{ i\neq j }[/math] and any values [math]\displaystyle{ a,b }[/math]
- [math]\displaystyle{ \begin{align} \Pr\left[X_i=a\wedge X_j=b\right] &= \Pr[X_i=a]\cdot\Pr[X_j=b]. \end{align} }[/math]
- Random variables [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are pairwise independent if, for any [math]\displaystyle{ X_i,X_j }[/math] where [math]\displaystyle{ i\neq j }[/math] and any values [math]\displaystyle{ a,b }[/math]
Note that the definition of k-wise independence is hereditary:
- If [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are k-wise independent, then they are also [math]\displaystyle{ \ell }[/math]-wise independent for any [math]\displaystyle{ \ell\lt k }[/math].
- If [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are NOT k-wise independent, then they cannot be [math]\displaystyle{ \ell }[/math]-wise independent for any [math]\displaystyle{ \ell\gt k }[/math].
Construction via XOR
Suppose we have [math]\displaystyle{ m }[/math] mutually independent and uniform random bits [math]\displaystyle{ X_1,\ldots, X_m }[/math]. We are going to extract [math]\displaystyle{ n=2^m-1 }[/math] pairwise independent bits from these [math]\displaystyle{ m }[/math] mutually independent bits.
Enumerate all the nonempty subsets of [math]\displaystyle{ \{1,2,\ldots,m\} }[/math] in some order. Let [math]\displaystyle{ S_j }[/math] be the [math]\displaystyle{ j }[/math]th subset. Let
- [math]\displaystyle{ Y_j=\bigoplus_{i\in S_j} X_i, }[/math]
where [math]\displaystyle{ \oplus }[/math] is the exclusive-or, whose truth table is as follows.
[math]\displaystyle{ a }[/math] [math]\displaystyle{ b }[/math] [math]\displaystyle{ a }[/math][math]\displaystyle{ \oplus }[/math][math]\displaystyle{ b }[/math] 0 0 0 0 1 1 1 0 1 1 1 0
There are [math]\displaystyle{ n=2^m-1 }[/math] such [math]\displaystyle{ Y_j }[/math], because there are [math]\displaystyle{ 2^m-1 }[/math] nonempty subsets of [math]\displaystyle{ \{1,2,\ldots,m\} }[/math]. An equivalent definition of [math]\displaystyle{ Y_j }[/math] is
- [math]\displaystyle{ Y_j=\left(\sum_{i\in S_j}X_i\right)\bmod 2 }[/math].
Sometimes, [math]\displaystyle{ Y_j }[/math] is called the parity of the bits in [math]\displaystyle{ S_j }[/math].
We claim that [math]\displaystyle{ Y_j }[/math] are pairwise independent and uniform.
Theorem - For any [math]\displaystyle{ Y_j }[/math] and any [math]\displaystyle{ b\in\{0,1\} }[/math],
- [math]\displaystyle{ \begin{align} \Pr\left[Y_j=b\right] &= \frac{1}{2}. \end{align} }[/math]
- For any [math]\displaystyle{ Y_j,Y_\ell }[/math] that [math]\displaystyle{ j\neq\ell }[/math] and any [math]\displaystyle{ a,b\in\{0,1\} }[/math],
- [math]\displaystyle{ \begin{align} \Pr\left[Y_j=a\wedge Y_\ell=b\right] &= \frac{1}{4}. \end{align} }[/math]
- For any [math]\displaystyle{ Y_j }[/math] and any [math]\displaystyle{ b\in\{0,1\} }[/math],
The proof is left for your exercise.
Therefore, we extract exponentially many pairwise independent uniform random bits from a sequence of mutually independent uniform random bits.
Note that [math]\displaystyle{ Y_j }[/math] are not 3-wise independent. For example, consider the subsets [math]\displaystyle{ S_1=\{1\},S_2=\{2\},S_3=\{1,2\} }[/math] and the corresponding random bits [math]\displaystyle{ Y_1,Y_2,Y_3 }[/math]. Any two of [math]\displaystyle{ Y_1,Y_2,Y_3 }[/math] would decide the value of the third one.
Construction via modulo a prime
We now consider constructing pairwise independent random variables ranging over [math]\displaystyle{ [p]=\{0,1,2,\ldots,p-1\} }[/math] for some prime [math]\displaystyle{ p }[/math]. Unlike the above construction, now we only need two independent random sources [math]\displaystyle{ X_0,X_1 }[/math], which are uniformly and independently distributed over [math]\displaystyle{ [p] }[/math].
Let [math]\displaystyle{ Y_0,Y_1,\ldots, Y_{p-1} }[/math] be defined as:
- [math]\displaystyle{ \begin{align} Y_i=(X_0+i\cdot X_1)\bmod p &\quad \mbox{for }i\in[p]. \end{align} }[/math]
Theorem - The random variables [math]\displaystyle{ Y_0,Y_1,\ldots, Y_{p-1} }[/math] are pairwise independent uniform random variables over [math]\displaystyle{ [p] }[/math].
Proof. We first show that [math]\displaystyle{ Y_i }[/math] are uniform. That is, we will show that for any [math]\displaystyle{ i,a\in[p] }[/math], - [math]\displaystyle{ \begin{align} \Pr\left[(X_0+i\cdot X_1)\bmod p=a\right] &= \frac{1}{p}. \end{align} }[/math]
Due to the law of total probability,
- [math]\displaystyle{ \begin{align} \Pr\left[(X_0+i\cdot X_1)\bmod p=a\right] &= \sum_{j\in[p]}\Pr[X_1=j]\cdot\Pr\left[(X_0+ij)\bmod p=a\right]\\ &=\frac{1}{p}\sum_{j\in[p]}\Pr\left[X_0\equiv(a-ij)\pmod{p}\right]. \end{align} }[/math]
For prime [math]\displaystyle{ p }[/math], for any [math]\displaystyle{ i,j,a\in[p] }[/math], there is exact one value in [math]\displaystyle{ [p] }[/math] of [math]\displaystyle{ X_0 }[/math] satisfying [math]\displaystyle{ X_0\equiv(a-ij)\pmod{p} }[/math]. Thus, [math]\displaystyle{ \Pr\left[X_0\equiv(a-ij)\pmod{p}\right]=1/p }[/math] and the above probability is [math]\displaystyle{ \frac{1}{p} }[/math].
We then show that [math]\displaystyle{ Y_i }[/math] are pairwise independent, i.e. we will show that for any [math]\displaystyle{ Y_i,Y_j }[/math] that [math]\displaystyle{ i\neq j }[/math] and any [math]\displaystyle{ a,b\in[p] }[/math],
- [math]\displaystyle{ \begin{align} \Pr\left[Y_i=a\wedge Y_j=b\right] &= \frac{1}{p^2}. \end{align} }[/math]
The event [math]\displaystyle{ Y_i=a\wedge Y_j=b }[/math] is equivalent to that
- [math]\displaystyle{ \begin{cases} (X_0+iX_1)\equiv a\pmod{p}\\ (X_0+jX_1)\equiv b\pmod{p} \end{cases} }[/math]
Due to the Chinese remainder theorem, there exists a unique solution of [math]\displaystyle{ X_0 }[/math] and [math]\displaystyle{ X_1 }[/math] in [math]\displaystyle{ [p] }[/math] to the above linear congruential system. Thus the probability of the event is [math]\displaystyle{ \frac{1}{p^2} }[/math].
- [math]\displaystyle{ \square }[/math]
Tools for limited independence
Let [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math] be random variables. The variance of their sum is
- [math]\displaystyle{ \begin{align} \mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]+\sum_{i\neq j}\mathbf{cov}(X_i,X_j). \end{align} }[/math]
If [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math] are pairwise independent, then [math]\displaystyle{ \mathbf{cov}(X_i,X_j)=0 }[/math] for any [math]\displaystyle{ i\neq j }[/math] since the covariance of a pair of independent random variables is 0. This gives us the following theorem of linearity of variance for pairwise independent random variables.
Theorem - For pairwise independent random variables [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math],
- [math]\displaystyle{ \begin{align} \mathbf{Var}\left[\sum_{i=1}^n X_i\right]=\sum_{i=1}^n\mathbf{Var}[X_i]. \end{align} }[/math]
- For pairwise independent random variables [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math],
The theorem relies on that the covariances of pairwise independent random variables are 0, which in turn is actually a consequence of a more general theorem.
Theorem ([math]\displaystyle{ k }[/math]-wise independence fools [math]\displaystyle{ k }[/math]-degree polynomials) - Let [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math] be mutually independent random variables and [math]\displaystyle{ Y_1,Y_2,\ldots,Y_n }[/math] be [math]\displaystyle{ k }[/math]-wise independent random variables, with that the marginal distribution of [math]\displaystyle{ Y_i }[/math] is identical to the marginal distribution of [math]\displaystyle{ X_i }[/math], [math]\displaystyle{ 1\le i\le n }[/math], that is, [math]\displaystyle{ \Pr[X_i=z]=\Pr[Y_i=z] }[/math] for any [math]\displaystyle{ z }[/math], [math]\displaystyle{ 1\le i\le n }[/math].
- Let [math]\displaystyle{ f:\mathbb{R}^n\rightarrow\mathbb{R} }[/math] be a multivariate polynomial of degree at most [math]\displaystyle{ k }[/math]. Then
- [math]\displaystyle{ \begin{align} \mathbf{E}\left[f(X_1,X_2,\ldots,X_n)\right]=\mathbf{E}[f(Y_1,Y_2,\ldots,Y_n)]. \end{align} }[/math]
This phenomenon is sometimes called that the [math]\displaystyle{ k }[/math]-degree polynomials are fooled by [math]\displaystyle{ k }[/math]-wise independence. In other words, a [math]\displaystyle{ k }[/math]-degree polynomial behaves the same on the [math]\displaystyle{ k }[/math]-wise independent random variables as on the mutual independent random variables.
This theorem is implied by the following lemma.
Lemma - Let [math]\displaystyle{ X_1,X_2,\ldots,X_k }[/math] be [math]\displaystyle{ k }[/math] mutually independent random variables. Then
- [math]\displaystyle{ \begin{align} \mathbf{E}\left[\prod_{i=1}^k X_i\right]=\prod_{i=1}^k\mathbf{E}[X_i]. \end{align} }[/math]
- Let [math]\displaystyle{ X_1,X_2,\ldots,X_k }[/math] be [math]\displaystyle{ k }[/math] mutually independent random variables. Then
The lemma can be proved by directly compute the expectation. We omit the detailed proof.
By the linearity of expectation, the expectation of a polynomial is reduced to the sum of the expectations of terms. For a k-degree polynomial, each term has at most [math]\displaystyle{ k }[/math] variables. Due to the above lemma, with k-wise independence, the expectation of each term behaves exactly the same as mutual independence.
Since the [math]\displaystyle{ k }[/math]th moment is the expectation of a k-degree polynomial of random variables, the tools based on the [math]\displaystyle{ k }[/math]th moment can be safely used for the k-wise independence. In particular, Chebyshev's inequality for pairwise independent random variables:
Chebyshev's inequality - Let [math]\displaystyle{ X=\sum_{i=1}^n X_i }[/math], where [math]\displaystyle{ X_1, X_2, \ldots, X_n }[/math] are pairwise independent Poisson trials. Let [math]\displaystyle{ \mu=\mathbf{E}[X] }[/math].
- Then
- [math]\displaystyle{ \Pr[|X-\mu|\ge t]\le\frac{\mathbf{Var}[X]}{t^2}=\frac{\sum_{i=1}^n\mathbf{Var}[X_i]}{t^2}. }[/math]
Application: Derandomizing MAX-CUT
Let [math]\displaystyle{ G(V,E) }[/math] be an undirected graph, and [math]\displaystyle{ S\subset V }[/math] be a vertex set. The cut defined by [math]\displaystyle{ S }[/math] is [math]\displaystyle{ C(S,\bar{S})=|\{uv\in E\mid u\in S, v\not\in S\}| }[/math].
Given as input an undirected graph [math]\displaystyle{ G(V,E) }[/math], find the [math]\displaystyle{ S\subset V }[/math] whose cut value [math]\displaystyle{ C(S,\bar{S}) }[/math] is maximized. This problem is called the maximum cut (MAX-CUT) problem, which is NP-hard. The decision version of one of the weighted version of the problem is one of the Karp's 21 NP-complete problems. The problem has a [math]\displaystyle{ 0.878 }[/math]-approximation algorithm by rounding a semidefinite programming. Assuming that the unique game conjecture (UGC), there does not exist a poly-time algorithm with better approximation ratio unless [math]\displaystyle{ P=NP }[/math].
Here we give a very simple [math]\displaystyle{ 0.5 }[/math]-approximation algorithm. The "algorithm" has a one-line description:
- Put each [math]\displaystyle{ v\in V }[/math] into [math]\displaystyle{ S }[/math] independently with probability 1/2.
We then analyze the approximation ratio of this algorithm.
For each [math]\displaystyle{ v\in V }[/math], let [math]\displaystyle{ Y_v }[/math] indicate whether [math]\displaystyle{ v\in S }[/math], that is
- [math]\displaystyle{ Y_v=\begin{cases}1& v\in S,\\ 0& v\not\in S.\end{cases} }[/math]
For each edge [math]\displaystyle{ uv\in E }[/math], let [math]\displaystyle{ Y_{uv} }[/math] indicate whether [math]\displaystyle{ uv }[/math] contribute to the cut [math]\displaystyle{ C(S,\bar{S}) }[/math], i.e. whether [math]\displaystyle{ u\in S, v\not\in S }[/math] or [math]\displaystyle{ u\not\in S, v\in S }[/math], that is
- [math]\displaystyle{ Y_{uv}=\begin{cases}1&Y_u\neq Y_v,\\0&\text{otherwise}.\end{cases} }[/math]
Then [math]\displaystyle{ C(S,\bar{S})=\sum_{uv\in E}Y_{uv} }[/math]. Due to the linearity of expectation,
- [math]\displaystyle{ \mathbf{E}\left[C(S,\bar{S})\right]=\sum_{uv\in E}\mathbf{E}[Y_{uv}]=\sum_{uv\in E}\Pr[Y_u\neq Y_v]=\frac{|E|}{2} }[/math].
The maximum cut of a graph is at most [math]\displaystyle{ |E| }[/math]. Thus, the algorithm returns in expectation a cut with size at least half of the maximum cut.
We then show how to dereandomize this algorithm by pairwise independent bits.
Suppose that [math]\displaystyle{ |V|=n }[/math] and enumerate the [math]\displaystyle{ n }[/math] vertices by [math]\displaystyle{ v_1,v_2,\ldots, v_n }[/math] in an arbitrary order. Let [math]\displaystyle{ m=\lceil\log_2 (n+1)\rceil }[/math]. Sample [math]\displaystyle{ m }[/math] bits [math]\displaystyle{ X_1,\ldots, X_m\in\{0,1\} }[/math] uniformly and independently at random. Enumerate all nonempty subsets of [math]\displaystyle{ \{1,2,\ldots,m\} }[/math] by [math]\displaystyle{ S_1,S_2,\ldots,S_{2^m-1} }[/math]. For each vertex [math]\displaystyle{ v_j }[/math], let [math]\displaystyle{ Y_{v_j}=\bigoplus_{i\in S_j}X_i }[/math]. The MAX-CUT algorithm uses these bits to construct the solution [math]\displaystyle{ S }[/math]:
- For [math]\displaystyle{ j=1,2,\ldots,n }[/math], put [math]\displaystyle{ v_j }[/math] into [math]\displaystyle{ S }[/math] if [math]\displaystyle{ Y_{v_j}=1 }[/math].
We have shown that [math]\displaystyle{ Y_{v_j} }[/math], [math]\displaystyle{ j=1,2,\ldots,n }[/math], are uniform and pairwise independent. Thus we still have that [math]\displaystyle{ \Pr[Y_{u}\neq Y_{v}]=\frac{1}{2} }[/math]. The above analysis still holds, so that the algorithm returns in expectation a cut with size at least [math]\displaystyle{ \frac{|E|}{2} }[/math].
Finally, we notice that there are only [math]\displaystyle{ m=\lceil\log_2 (n+1)\rceil }[/math] total random bits in the new algorithm. We can enumerate all [math]\displaystyle{ 2^m\le 2(n+1) }[/math] possible strings of [math]\displaystyle{ m }[/math] bits, run the above algorithm with the bit strings as the "random sources", and output the maximum cut returned. There must exist a bit string [math]\displaystyle{ X_1,\ldots, X_m\in\{0,1\} }[/math] on which the algorithm returns a cut of size [math]\displaystyle{ \ge \frac{|E|}{2} }[/math] (why?). This gives us a deterministic polynomial time (actually [math]\displaystyle{ O(n^2) }[/math] time) [math]\displaystyle{ 1/2 }[/math]-approximation algorithm.
Application: Two-point sampling
Consider a Monte Carlo randomized algorithm with one-sided error for a decision problem [math]\displaystyle{ f }[/math]. We formulate the algorithm as a deterministic algorithm [math]\displaystyle{ A }[/math] that takes as input [math]\displaystyle{ x }[/math] and a uniform random number [math]\displaystyle{ r\in[p] }[/math] where [math]\displaystyle{ p }[/math] is a prime, such that for any input [math]\displaystyle{ x }[/math]:
- If [math]\displaystyle{ f(x)=1 }[/math], then [math]\displaystyle{ \Pr[A(x,r)=1]\ge\frac{1}{2} }[/math], where the probability is taken over the random choice of [math]\displaystyle{ r }[/math].
- If [math]\displaystyle{ f(x)=0 }[/math], then [math]\displaystyle{ A(x,r)=0 }[/math] for any [math]\displaystyle{ r }[/math].
We call [math]\displaystyle{ r }[/math] the random source for the algorithm.
For the [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=1 }[/math], we call the [math]\displaystyle{ r }[/math] that makes [math]\displaystyle{ A(x,r)=1 }[/math] a witness for [math]\displaystyle{ x }[/math]. For a positive [math]\displaystyle{ x }[/math], at least half of [math]\displaystyle{ [p] }[/math] are witnesses. The random source [math]\displaystyle{ r }[/math] has polynomial number of bits, which means that [math]\displaystyle{ p }[/math] is exponentially large, thus it is infeasible to find the witness for an input [math]\displaystyle{ x }[/math] by exhaustive search. Deterministic overcomes this by having sophisticated deterministic rules for efficiently searching for a witness. Randomization, on the other hard, reduce this to a bit of luck, by randomly choosing an [math]\displaystyle{ r }[/math] and winning with a probability of 1/2.
We can boost the accuracy (equivalently, reduce the error) of any Monte Carlo randomized algorithm with one-sided error by running the algorithm for a number of times.
Suppose that we sample [math]\displaystyle{ t }[/math] values [math]\displaystyle{ r_1,r_2,\ldots,r_t }[/math] uniformly and independently from [math]\displaystyle{ [p] }[/math], and run the following scheme:
[math]\displaystyle{ B(x,r_1,r_2,\ldots,r_t): }[/math] - return [math]\displaystyle{ \bigvee_{i=1}^t A(x,r_i) }[/math];
That is, return 1 if any instance of [math]\displaystyle{ A(x,r_i)=1 }[/math]. For any [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=1 }[/math], due to the independence of [math]\displaystyle{ r_1,r_2,\ldots,r_t }[/math], the probability that [math]\displaystyle{ B(x,r_1,r_2,\ldots,r_t) }[/math] returns an incorrect result is at most [math]\displaystyle{ 2^{-t} }[/math]. On the other hand, [math]\displaystyle{ B }[/math] never makes mistakes for the [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=0 }[/math] since [math]\displaystyle{ A }[/math] has no false positives. Thus, the error of the Monte Carlo algorithm is reduced to [math]\displaystyle{ 2^{-t} }[/math].
Sampling [math]\displaystyle{ t }[/math] mutually independent random numbers from [math]\displaystyle{ [p] }[/math] can be quite expensive since it requires [math]\displaystyle{ \Omega(t\log p) }[/math] random bits. Suppose that we can only afford [math]\displaystyle{ O(\log p) }[/math] random bits. In particular, we sample two independent uniform random number [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] from [math]\displaystyle{ [p] }[/math]. If we use [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] directly bu running two independent instances [math]\displaystyle{ A(x,a) }[/math] and [math]\displaystyle{ A(x,b) }[/math], we only get an error upper bound of 1/4.
The following scheme reduces the error significantly with the same number of random bits:
Algorithm Choose two independent uniform random number [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] from [math]\displaystyle{ [p] }[/math]. Construct [math]\displaystyle{ t }[/math] random number [math]\displaystyle{ r_1,r_2,\ldots,r_t }[/math] by:
- [math]\displaystyle{ \begin{align} \forall 1\le i\le t, &\quad \mbox{let }r_i = (a\cdot i+b)\bmod p. \end{align} }[/math]
Run [math]\displaystyle{ B(x,r_1,r_2,\ldots,r_t): }[/math].
Due to the discussion in the last section, we know that for [math]\displaystyle{ t\le p }[/math], [math]\displaystyle{ r_1,r_2,\ldots,r_t }[/math] are pairwise independent and uniform over [math]\displaystyle{ [p] }[/math]. Let [math]\displaystyle{ X_i=A(x,r_i) }[/math] and [math]\displaystyle{ X=\sum_{i=1}^tX_i }[/math]. Due to the uniformity of [math]\displaystyle{ r_i }[/math] and our definition of [math]\displaystyle{ A }[/math], for any [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=1 }[/math], it holds that
- [math]\displaystyle{ \Pr[X_i=1]=\Pr[A(x,r_i)=1]\ge\frac{1}{2}. }[/math]
By the linearity of expectations,
- [math]\displaystyle{ \mathbf{E}[X]=\sum_{i=1}^t\mathbf{E}[X_i]=\sum_{i=1}^t\Pr[X_i=1]\ge\frac{t}{2}. }[/math]
Since [math]\displaystyle{ X_i }[/math] is Bernoulli trial with a probability of success at least [math]\displaystyle{ p=1/2 }[/math]. We can estimate the variance of each [math]\displaystyle{ X_i }[/math] as follows.
- [math]\displaystyle{ \mathbf{Var}[X_i]=p(1-p)\le\frac{1}{4}. }[/math]
Applying Chebyshev's inequality, we have that for any [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=1 }[/math],
- [math]\displaystyle{ \begin{align} \Pr\left[\bigvee_{i=1}^t A(x,r_i)=0\right] &= \Pr[X=0]\\ &\le \Pr[|X-\mathbf{E}[X]|\ge \mathbf{E}[X]]\\ &\le \Pr\left[|X-\mathbf{E}[X]|\ge \frac{t}{2}\right]\\ &\le \frac{4}{t^2}\sum_{i=1}^t\mathbf{Var}[X_i]\\ &\le \frac{1}{t}. \end{align} }[/math]
The error is reduced to [math]\displaystyle{ 1/t }[/math] with only two random numbers. This scheme works as long as [math]\displaystyle{ t\le p }[/math].