随机算法 (Spring 2014)/Second Moment: Difference between revisions
imported>Etone |
imported>Etone |
||
(4 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= Erdős–Rényi Random Graphs = | |||
Consider a graph <math>G(V,E)</math> which is randomly generated as: | |||
* <math>|V|=n</math>; | |||
* <math>\forall \{u,v\}\in{V\choose 2}</math>, <math>uv\in E</math> independently with probability <math>p</math>. | |||
Such graph is denoted as '''<math>G(n,p)</math>'''. This is called the '''Erdős–Rényi model''' or '''<math>G(n,p)</math> model''' for random graphs. | |||
Informally, the presence of every edge of <math>G(n,p)</math> is determined by an independent coin flipping (with probability of HEADs <math>p</math>). | |||
= Threshold phenomenon = | |||
One of the most fascinating phenomenon of random graphs is that for so many natural graph properties, the random graph <math>G(n,p)</math> suddenly changes from almost always not having the property to almost always having the property as <math>p</math> grows in a very small range. | |||
A monotone graph property <math>P</math> is said to have the '''threshold''' <math>p(n)</math> if | |||
* when <math>p\ll p(n)</math>, <math>\Pr[P(G(n,p))]=0</math> as <math>n\rightarrow\infty</math> (also called <math>G(n,p)</math> almost always does not have <math>P</math>); and | |||
* when <math>p\gg p(n)</math>, <math>\Pr[P(G(n,p))]=1</math> as <math>n\rightarrow\infty</math> (also called <math>G(n,p)</math> almost always has <math>P</math>). | |||
The classic method for proving the threshold is the so-called second moment method (Chebyshev's inequality). | |||
== Threshold for 4-clique == | |||
{{Theorem|Theorem| | |||
:The threshold for a random graph <math>G(n,p)</math> to contain a 4-clique is <math>p=n^{-2/3}</math>. | |||
}} | |||
We formulate the problem as such. | |||
For any <math>4</math>-subset of vertices <math>S\in{V\choose 4}</math>, let <math>X_S</math> be the indicator random variable such that | |||
:<math> | |||
X_S= | |||
\begin{cases} | |||
1 & S\mbox{ is a clique},\\ | |||
0 & \mbox{otherwise}. | |||
\end{cases} | |||
</math> | |||
Let <math>X=\sum_{S\in{V\choose 4}}X_S</math> be the total number of 4-cliques in <math>G</math>. | |||
It is sufficient to prove the following lemma. | |||
{{Theorem|Lemma| | |||
*If <math>p=o(n^{-2/3})</math>, then <math>\Pr[X\ge 1]\rightarrow 0</math> as <math>n\rightarrow\infty</math>. | |||
*If <math>p=\omega(n^{-2/3})</math>, then <math>\Pr[X\ge 1]\rightarrow 1</math> as <math>n\rightarrow\infty</math>. | |||
}} | |||
{{Proof| | |||
The first claim is proved by the first moment (expectation and Markov's inequality) and the second claim is proved by the second moment method (Chebyshev's inequality). | |||
Every 4-clique has 6 edges, thus for any <math>S\in{V\choose 4}</math>, | |||
:<math>\mathbf{E}[X_S]=\Pr[X_S=1]=p^6</math>. | |||
By the linearity of expectation, | |||
:<math>\mathbf{E}[X]=\sum_{S\in{V\choose 4}}\mathbf{E}[X_S]={n\choose 4}p^6</math>. | |||
Applying Markov's inequality | |||
:<math>\Pr[X\ge 1]\le \mathbf{E}[X]=O(n^4p^6)=o(1)</math>, if <math>p=o(n^{-2/3})</math>. | |||
The first claim is proved. | |||
To prove the second claim, it is equivalent to show that <math>\Pr[X=0]=o(1)</math> if <math>p=\omega(n^{-2/3})</math>. By the Chebyshev's inequality, | |||
:<math>\Pr[X=0]\le\Pr[|X-\mathbf{E}[X]|\ge\mathbf{E}[X]]\le\frac{\mathbf{Var}[X]}{(\mathbf{E}[X])^2}</math>, | |||
where the variance is computed as | |||
:<math>\mathbf{Var}[X]=\mathbf{Var}\left[\sum_{S\in{V\choose 4}}X_S\right]=\sum_{S\in{V\choose 4}}\mathbf{Var}[X_S]+\sum_{S,T\in{V\choose 4}, S\neq T}\mathbf{Cov}(X_S,X_T)</math>. | |||
For any <math>S\in{V\choose 4}</math>, | |||
:<math>\mathbf{Var}[X_S]=\mathbf{E}[X_S^2]-\mathbf{E}[X_S]^2\le \mathbf{E}[X_S^2]=\mathbf{E}[X_S]=p^6</math>. Thus the first term of above formula is <math>\sum_{S\in{V\choose 4}}\mathbf{Var}[X_S]=O(n^4p^6)</math>. | |||
We now compute the covariances. For any <math>S,T\in{V\choose 4}</math> that <math>S\neq T</math>: | |||
* Case.1: <math>|S\cap T|\le 1</math>, so <math>S</math> and <math>T</math> do not share any edges. <math>X_S</math> and <math>X_T</math> are independent, thus <math>\mathbf{Cov}(X_S,X_T)=0</math>. | |||
* Case.2: <math>|S\cap T|= 2</math>, so <math>S</math> and <math>T</math> share an edge. Since <math>|S\cup T|=6</math>, there are <math>{n\choose 6}=O(n^6)</math> pairs of such <math>S</math> and <math>T</math>. | |||
::<math>\mathbf{Cov}(X_S,X_T)=\mathbf{E}[X_SX_T]-\mathbf{E}[X_S]\mathbf{E}[X_T]\le\mathbf{E}[X_SX_T]=\Pr[X_S=1\wedge X_T=1]=p^{11}</math> | |||
:since there are 11 edges in the union of two 4-cliques that share a common edge. The contribution of these pairs is <math>O(n^6p^{11})</math>. | |||
* Case.2: <math>|S\cap T|= 3</math>, so <math>S</math> and <math>T</math> share a triangle. Since <math>|S\cup T|=5</math>, there are <math>{n\choose 5}=O(n^5)</math> pairs of such <math>S</math> and <math>T</math>. By the same argument, | |||
::<math>\mathbf{Cov}(X_S,X_T)\le\Pr[X_S=1\wedge X_T=1]=p^{9}</math> | |||
:since there are 9 edges in the union of two 4-cliques that share a triangle. The contribution of these pairs is <math>O(n^5p^{9})</math>. | |||
Putting all these together, | |||
:<math>\mathbf{Var}[X]=O(n^4p^6+n^6p^{11}+n^5p^{9}).</math> | |||
And | |||
:<math>\Pr[X=0]\le\frac{\mathbf{Var}[X]}{(\mathbf{E}[X])^2}=O(n^{-4}p^{-6}+n^{-2}p^{-1}+n^{-3}p^{-3})</math>, | |||
which is <math>o(1)</math> if <math>p=\omega(n^{-2/3})</math>. The second claim is also proved. | |||
}} | |||
== Threshold for balanced subgraphs == | |||
The above theorem can be generalized to any "balanced" subgraphs. | |||
{{Theorem|Definition| | |||
* The '''density''' of a graph <math>G(V,E)</math>, denoted <math>\rho(G)\,</math>, is defined as <math>\rho(G)=\frac{|E|}{|V|}</math>. | |||
* A graph <math>G(V,E)</math> is '''balanced''' if <math>\rho(H)\le \rho(G)</math> for all subgraphs <math>H</math> of <math>G</math>. | |||
}} | |||
Cliques are balanced, because <math>\frac{{k\choose 2}}{k}\le \frac{{n\choose 2}}{n}</math> for any <math>k\le n</math>. The threshold for 4-clique is a direct corollary of the following general theorem. | |||
{{Theorem|Theorem (Erdős–Rényi 1960)| | |||
:Let <math>H</math> be a balanced graph with <math>k</math> vertices and <math>\ell</math> edges. The threshold for the property that a random graph <math>G(n,p)</math> contains a (not necessarily induced) subgraph isomorphic to <math>H</math> is <math>p=n^{-k/\ell}\,</math>. | |||
}} | |||
{{Prooftitle|Sketch of proof.| | |||
For any <math>S\in{V\choose k}</math>, let <math>X_S</math> indicate whether <math>G_S</math> (the subgraph of <math>G</math> induced by <math>S</math>) contain a subgraph <math>H</math>. Then | |||
:<math>p^{\ell}\le\mathbf{E}[X_S]\le k!p^{\ell}</math>, since there are at most <math>k!</math> ways to match the substructure. | |||
Note that <math>k</math> does not depend on <math>n</math>. Thus, <math>\mathbf{E}[X_S]=\Theta(p^{\ell})</math>. Let <math>X=\sum_{S\in{V\choose k}}X_S</math> be the number of <math>H</math>-subgraphs. | |||
:<math>\mathbf{E}[X]=\Theta(n^kp^{\ell})</math>. | |||
By Markov's inequality, <math>\Pr[X\ge 1]\le \mathbf{E}[X]=\Theta(n^kp^{\ell})</math> which is <math>o(1)</math> when <math>p\ll n^{-\ell/k}</math>. | |||
By Chebyshev's inequality, <math>\Pr[X=0]\le \frac{\mathbf{Var}[X]}{\mathbf{E}[X]^2}</math> where | |||
:<math>\mathbf{Var}[X]=\sum_{S\in{V\choose k}}\mathbf{Var}[X_S]+\sum_{S\neq T}\mathbf{Cov}(X_S,X_T)</math>. | |||
The first term <math>\sum_{S\in{V\choose k}}\mathbf{Var}[X_S]\le \sum_{S\in{V\choose k}}\mathbf{E}[X_S^2]= \sum_{S\in{V\choose k}}\mathbf{E}[X_S]=\mathbf{E}[X]=\Theta(n^kp^{\ell})</math>. | |||
For the covariances, <math>\mathbf{Cov}(X_S,X_T)\neq 0</math> only if <math>|S\cap T|=i</math> for <math>2\le i\le k-1</math>. Note that <math>|S\cap T|=i</math> implies that <math>|S\cup T|=2k-i</math>. And for balanced <math>H</math>, the number of edges of interest in <math>S</math> and <math>T</math> is <math>2\ell-i\rho(H_{S\cap T})\ge 2\ell-i\rho(H)=2\ell-i\ell/k</math>. Thus, <math>\mathbf{Cov}(X_S,X_T)\le\mathbf{E}[X_SX_T]\le p^{2\ell-i\ell/k}</math>. And, | |||
:<math>\sum_{S\neq T}\mathbf{Cov}(X_S,X_T)=\sum_{i=2}^{k-1}O(n^{2k-i}p^{2\ell-i\ell/k})</math> | |||
Therefore, when <math>p\gg n^{-\ell/k}</math>, | |||
:<math> | |||
\Pr[X=0]\le \frac{\mathbf{Var}[X]}{\mathbf{E}[X]^2}\le \frac{\Theta(n^kp^{\ell})+\sum_{i=2}^{k-1}O(n^{2k-i}p^{2\ell-i\ell/k})}{\Theta(n^{2k}p^{2\ell})}=\Theta(n^{-k}p^{-\ell})+\sum_{i=2}^{k-1}O(n^{-i}p^{-i\ell/k})=o(1)</math>. | |||
}} | |||
= Two-point sampling = | |||
== Pairwise Independent Variables == | |||
We now consider constructing pairwise independent random variables ranging over <math>[p]=\{0,1,2,\ldots,p-1\}</math> for some prime <math>p</math>. We assume <math>X_0,X_1</math> to be two independent random variables which are uniformly and independently distributed over <math>[p]</math>. | |||
Let <math>Y_0,Y_1,\ldots, Y_{p-1}</math> be defined as: | |||
:<math> | |||
\begin{align} | |||
Y_i=(X_0+i\cdot X_1)\bmod p &\quad \mbox{for }i\in[p]. | |||
\end{align} | |||
</math> | |||
{{Theorem | |||
|Theorem| | |||
: The random variables <math>Y_0,Y_1,\ldots, Y_{p-1}</math> are pairwise independent uniform random variables over <math>[p]</math>. | |||
}} | |||
{{Proof| We first show that <math>Y_i</math> are uniform. That is, we will show that for any <math>i,a\in[p]</math>, | |||
:<math>\begin{align} | |||
\Pr\left[(X_0+i\cdot X_1)\bmod p=a\right] | |||
&= | |||
\frac{1}{p}. | |||
\end{align}</math> | |||
Due to the law of total probability, | |||
:<math>\begin{align} | |||
\Pr\left[(X_0+i\cdot X_1)\bmod p=a\right] | |||
&= | |||
\sum_{j\in[p]}\Pr[X_1=j]\cdot\Pr\left[(X_0+ij)\bmod p=a\right]\\ | |||
&=\frac{1}{p}\sum_{j\in[p]}\Pr\left[X_0\equiv(a-ij)\pmod{p}\right]. | |||
\end{align}</math> | |||
For prime <math>p</math>, for any <math>i,j,a\in[p]</math>, there is exact one value in <math>[p]</math> of <math>X_0</math> satisfying <math>X_0\equiv(a-ij)\pmod{p}</math>. Thus, <math>\Pr\left[X_0\equiv(a-ij)\pmod{p}\right]=1/p</math> and the above probability is <math>\frac{1}{p}</math>. | |||
We then show that <math>Y_i</math> are pairwise independent, i.e. we will show that for any <math>Y_i,Y_j</math> that <math>i\neq j</math> and any <math>a,b\in[p]</math>, | |||
:<math>\begin{align} | |||
\Pr\left[Y_i=a\wedge Y_j=b\right] | |||
&= | |||
\frac{1}{p^2}. | |||
\end{align}</math> | |||
The event <math>Y_i=a\wedge Y_j=b</math> is equivalent to that | |||
:<math> | |||
\begin{cases} | |||
(X_0+iX_1)\equiv a\pmod{p}\\ | |||
(X_0+jX_1)\equiv b\pmod{p} | |||
\end{cases} | |||
</math> | |||
Due to the [http://en.wikipedia.org/wiki/Chinese_remainder_theorem Chinese remainder theorem], there exists a unique solution of <math>X_0</math> and <math>X_1</math> in <math>[p]</math> to the above linear congruential system. Thus the probability of the event is <math>\frac{1}{p^2}</math>. | |||
}} | |||
== Two-spoint sampling == | |||
Consider a Monte Carlo randomized algorithm with one-sided error for a decision problem <math>f</math>. We formulate the algorithm as a deterministic algorithm <math>A</math> that takes as input <math>x</math> and a uniform random number <math>r\in[p]</math> where <math>p</math> is a prime, such that for any input <math>x</math>: | |||
* If <math>f(x)=1</math>, then <math>\Pr[A(x,r)=1]\ge\frac{1}{2}</math>, where the probability is taken over the random choice of <math>r</math>. | |||
* If <math>f(x)=0</math>, then <math>A(x,r)=0</math> for any <math>r</math>. | |||
We call <math>r</math> the '''random source''' for the algorithm. | |||
For the <math>x</math> that <math>f(x)=1</math>, we call the <math>r</math> that makes <math>A(x,r)=1</math> a '''witness''' for <math>x</math>. For a positive <math>x</math>, at least half of <math>[p]</math> are witnesses. The random source <math>r</math> has polynomial number of bits, which means that <math>p</math> is exponentially large, thus it is infeasible to find the witness for an input <math>x</math> by exhaustive search. Deterministic overcomes this by having sophisticated deterministic rules for efficiently searching for a witness. Randomization, on the other hard, reduce this to a bit of luck, by randomly choosing an <math>r</math> and winning with a probability of 1/2. | |||
We can boost the accuracy (equivalently, reduce the error) of any Monte Carlo randomized algorithm with one-sided error by running the algorithm for a number of times. | |||
Suppose that we sample <math>t</math> values <math>r_1,r_2,\ldots,r_t</math> uniformly and independently from <math>[p]</math>, and run the following scheme: | |||
:{|border="1" | |||
|<math>B(x,r_1,r_2,\ldots,r_t):</math> | |||
:return <math>\bigvee_{i=1}^t A(x,r_i)</math>; | |||
|} | |||
That is, return 1 if any instance of <math>A(x,r_i)=1</math>. For any <math>x</math> that <math>f(x)=1</math>, due to the independence of <math>r_1,r_2,\ldots,r_t</math>, the probability that <math>B(x,r_1,r_2,\ldots,r_t)</math> returns an incorrect result is at most <math>2^{-t}</math>. On the other hand, <math>B</math> never makes mistakes for the <math>x</math> that <math>f(x)=0</math> since <math>A</math> has no false positives. Thus, the error of the Monte Carlo algorithm is reduced to <math>2^{-t}</math>. | |||
Sampling <math>t</math> mutually independent random numbers from <math>[p]</math> can be quite expensive since it requires <math>\Omega(t\log p)</math> random bits. Suppose that we can only afford <math>O(\log p)</math> random bits. In particular, we sample two independent uniform random number <math>a</math> and <math>b</math> from <math>[p]</math>. If we use <math>a</math> and <math>b</math> directly bu running two independent instances <math>A(x,a)</math> and <math>A(x,b)</math>, we only get an error upper bound of 1/4. | |||
The following scheme reduces the error significantly with the same number of random bits: | |||
:{{Theorem | |||
|Algorithm| | |||
Choose two independent uniform random number <math>a</math> and <math>b</math> from <math>[p]</math>. | |||
Construct <math>t</math> random number <math>r_1,r_2,\ldots,r_t</math> by: | |||
:<math> | |||
\begin{align} | |||
\forall 1\le i\le t, &\quad \mbox{let }r_i = (a\cdot i+b)\bmod p. | |||
\end{align} | |||
</math> | |||
Run <math>B(x,r_1,r_2,\ldots,r_t):</math>. | |||
}} | |||
Due to the discussion in the last section, we know that for <math>t\le p</math>, <math>r_1,r_2,\ldots,r_t</math> are pairwise independent and uniform over <math>[p]</math>. Let <math>X_i=A(x,r_i)</math> and <math>X=\sum_{i=1}^tX_i</math>. Due to the uniformity of <math>r_i</math> and our definition of <math>A</math>, for any <math>x</math> that <math>f(x)=1</math>, it holds that | |||
:<math> | |||
\Pr[X_i=1]=\Pr[A(x,r_i)=1]\ge\frac{1}{2}. | |||
</math> | |||
By the linearity of expectations, | |||
:<math> | |||
\mathbf{E}[X]=\sum_{i=1}^t\mathbf{E}[X_i]=\sum_{i=1}^t\Pr[X_i=1]\ge\frac{t}{2}. | |||
</math> | |||
Since <math>X_i</math> is Bernoulli trial with a probability of success at least <math>p=1/2</math>. | |||
We can estimate the variance of each <math>X_i</math> as follows. | |||
:<math> | |||
\mathbf{Var}[X_i]=p(1-p)\le\frac{1}{4}. | |||
</math> | |||
Applying Chebyshev's inequality, we have that for any <math>x</math> that <math>f(x)=1</math>, | |||
:<math>\begin{align} | |||
\Pr\left[\bigvee_{i=1}^t A(x,r_i)=0\right] | |||
&= | |||
\Pr[X=0]\\ | |||
&\le | |||
\Pr[|X-\mathbf{E}[X]|\ge \mathbf{E}[X]]\\ | |||
&\le | |||
\Pr\left[|X-\mathbf{E}[X]|\ge \frac{t}{2}\right]\\ | |||
&\le | |||
\frac{4}{t^2}\sum_{i=1}^t\mathbf{Var}[X_i]\\ | |||
&\le \frac{1}{t}. | |||
\end{align} | |||
</math> | |||
The error is reduced to <math>1/t</math> with only two random numbers. This scheme works as long as <math>t\le p</math>. |
Latest revision as of 05:32, 19 March 2014
Erdős–Rényi Random Graphs
Consider a graph [math]\displaystyle{ G(V,E) }[/math] which is randomly generated as:
- [math]\displaystyle{ |V|=n }[/math];
- [math]\displaystyle{ \forall \{u,v\}\in{V\choose 2} }[/math], [math]\displaystyle{ uv\in E }[/math] independently with probability [math]\displaystyle{ p }[/math].
Such graph is denoted as [math]\displaystyle{ G(n,p) }[/math]. This is called the Erdős–Rényi model or [math]\displaystyle{ G(n,p) }[/math] model for random graphs.
Informally, the presence of every edge of [math]\displaystyle{ G(n,p) }[/math] is determined by an independent coin flipping (with probability of HEADs [math]\displaystyle{ p }[/math]).
Threshold phenomenon
One of the most fascinating phenomenon of random graphs is that for so many natural graph properties, the random graph [math]\displaystyle{ G(n,p) }[/math] suddenly changes from almost always not having the property to almost always having the property as [math]\displaystyle{ p }[/math] grows in a very small range.
A monotone graph property [math]\displaystyle{ P }[/math] is said to have the threshold [math]\displaystyle{ p(n) }[/math] if
- when [math]\displaystyle{ p\ll p(n) }[/math], [math]\displaystyle{ \Pr[P(G(n,p))]=0 }[/math] as [math]\displaystyle{ n\rightarrow\infty }[/math] (also called [math]\displaystyle{ G(n,p) }[/math] almost always does not have [math]\displaystyle{ P }[/math]); and
- when [math]\displaystyle{ p\gg p(n) }[/math], [math]\displaystyle{ \Pr[P(G(n,p))]=1 }[/math] as [math]\displaystyle{ n\rightarrow\infty }[/math] (also called [math]\displaystyle{ G(n,p) }[/math] almost always has [math]\displaystyle{ P }[/math]).
The classic method for proving the threshold is the so-called second moment method (Chebyshev's inequality).
Threshold for 4-clique
Theorem - The threshold for a random graph [math]\displaystyle{ G(n,p) }[/math] to contain a 4-clique is [math]\displaystyle{ p=n^{-2/3} }[/math].
We formulate the problem as such. For any [math]\displaystyle{ 4 }[/math]-subset of vertices [math]\displaystyle{ S\in{V\choose 4} }[/math], let [math]\displaystyle{ X_S }[/math] be the indicator random variable such that
- [math]\displaystyle{ X_S= \begin{cases} 1 & S\mbox{ is a clique},\\ 0 & \mbox{otherwise}. \end{cases} }[/math]
Let [math]\displaystyle{ X=\sum_{S\in{V\choose 4}}X_S }[/math] be the total number of 4-cliques in [math]\displaystyle{ G }[/math].
It is sufficient to prove the following lemma.
Lemma - If [math]\displaystyle{ p=o(n^{-2/3}) }[/math], then [math]\displaystyle{ \Pr[X\ge 1]\rightarrow 0 }[/math] as [math]\displaystyle{ n\rightarrow\infty }[/math].
- If [math]\displaystyle{ p=\omega(n^{-2/3}) }[/math], then [math]\displaystyle{ \Pr[X\ge 1]\rightarrow 1 }[/math] as [math]\displaystyle{ n\rightarrow\infty }[/math].
Proof. The first claim is proved by the first moment (expectation and Markov's inequality) and the second claim is proved by the second moment method (Chebyshev's inequality).
Every 4-clique has 6 edges, thus for any [math]\displaystyle{ S\in{V\choose 4} }[/math],
- [math]\displaystyle{ \mathbf{E}[X_S]=\Pr[X_S=1]=p^6 }[/math].
By the linearity of expectation,
- [math]\displaystyle{ \mathbf{E}[X]=\sum_{S\in{V\choose 4}}\mathbf{E}[X_S]={n\choose 4}p^6 }[/math].
Applying Markov's inequality
- [math]\displaystyle{ \Pr[X\ge 1]\le \mathbf{E}[X]=O(n^4p^6)=o(1) }[/math], if [math]\displaystyle{ p=o(n^{-2/3}) }[/math].
The first claim is proved.
To prove the second claim, it is equivalent to show that [math]\displaystyle{ \Pr[X=0]=o(1) }[/math] if [math]\displaystyle{ p=\omega(n^{-2/3}) }[/math]. By the Chebyshev's inequality,
- [math]\displaystyle{ \Pr[X=0]\le\Pr[|X-\mathbf{E}[X]|\ge\mathbf{E}[X]]\le\frac{\mathbf{Var}[X]}{(\mathbf{E}[X])^2} }[/math],
where the variance is computed as
- [math]\displaystyle{ \mathbf{Var}[X]=\mathbf{Var}\left[\sum_{S\in{V\choose 4}}X_S\right]=\sum_{S\in{V\choose 4}}\mathbf{Var}[X_S]+\sum_{S,T\in{V\choose 4}, S\neq T}\mathbf{Cov}(X_S,X_T) }[/math].
For any [math]\displaystyle{ S\in{V\choose 4} }[/math],
- [math]\displaystyle{ \mathbf{Var}[X_S]=\mathbf{E}[X_S^2]-\mathbf{E}[X_S]^2\le \mathbf{E}[X_S^2]=\mathbf{E}[X_S]=p^6 }[/math]. Thus the first term of above formula is [math]\displaystyle{ \sum_{S\in{V\choose 4}}\mathbf{Var}[X_S]=O(n^4p^6) }[/math].
We now compute the covariances. For any [math]\displaystyle{ S,T\in{V\choose 4} }[/math] that [math]\displaystyle{ S\neq T }[/math]:
- Case.1: [math]\displaystyle{ |S\cap T|\le 1 }[/math], so [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math] do not share any edges. [math]\displaystyle{ X_S }[/math] and [math]\displaystyle{ X_T }[/math] are independent, thus [math]\displaystyle{ \mathbf{Cov}(X_S,X_T)=0 }[/math].
- Case.2: [math]\displaystyle{ |S\cap T|= 2 }[/math], so [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math] share an edge. Since [math]\displaystyle{ |S\cup T|=6 }[/math], there are [math]\displaystyle{ {n\choose 6}=O(n^6) }[/math] pairs of such [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math].
- [math]\displaystyle{ \mathbf{Cov}(X_S,X_T)=\mathbf{E}[X_SX_T]-\mathbf{E}[X_S]\mathbf{E}[X_T]\le\mathbf{E}[X_SX_T]=\Pr[X_S=1\wedge X_T=1]=p^{11} }[/math]
- since there are 11 edges in the union of two 4-cliques that share a common edge. The contribution of these pairs is [math]\displaystyle{ O(n^6p^{11}) }[/math].
- Case.2: [math]\displaystyle{ |S\cap T|= 3 }[/math], so [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math] share a triangle. Since [math]\displaystyle{ |S\cup T|=5 }[/math], there are [math]\displaystyle{ {n\choose 5}=O(n^5) }[/math] pairs of such [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math]. By the same argument,
- [math]\displaystyle{ \mathbf{Cov}(X_S,X_T)\le\Pr[X_S=1\wedge X_T=1]=p^{9} }[/math]
- since there are 9 edges in the union of two 4-cliques that share a triangle. The contribution of these pairs is [math]\displaystyle{ O(n^5p^{9}) }[/math].
Putting all these together,
- [math]\displaystyle{ \mathbf{Var}[X]=O(n^4p^6+n^6p^{11}+n^5p^{9}). }[/math]
And
- [math]\displaystyle{ \Pr[X=0]\le\frac{\mathbf{Var}[X]}{(\mathbf{E}[X])^2}=O(n^{-4}p^{-6}+n^{-2}p^{-1}+n^{-3}p^{-3}) }[/math],
which is [math]\displaystyle{ o(1) }[/math] if [math]\displaystyle{ p=\omega(n^{-2/3}) }[/math]. The second claim is also proved.
- [math]\displaystyle{ \square }[/math]
Threshold for balanced subgraphs
The above theorem can be generalized to any "balanced" subgraphs.
Definition - The density of a graph [math]\displaystyle{ G(V,E) }[/math], denoted [math]\displaystyle{ \rho(G)\, }[/math], is defined as [math]\displaystyle{ \rho(G)=\frac{|E|}{|V|} }[/math].
- A graph [math]\displaystyle{ G(V,E) }[/math] is balanced if [math]\displaystyle{ \rho(H)\le \rho(G) }[/math] for all subgraphs [math]\displaystyle{ H }[/math] of [math]\displaystyle{ G }[/math].
Cliques are balanced, because [math]\displaystyle{ \frac{{k\choose 2}}{k}\le \frac{{n\choose 2}}{n} }[/math] for any [math]\displaystyle{ k\le n }[/math]. The threshold for 4-clique is a direct corollary of the following general theorem.
Theorem (Erdős–Rényi 1960) - Let [math]\displaystyle{ H }[/math] be a balanced graph with [math]\displaystyle{ k }[/math] vertices and [math]\displaystyle{ \ell }[/math] edges. The threshold for the property that a random graph [math]\displaystyle{ G(n,p) }[/math] contains a (not necessarily induced) subgraph isomorphic to [math]\displaystyle{ H }[/math] is [math]\displaystyle{ p=n^{-k/\ell}\, }[/math].
Sketch of proof. For any [math]\displaystyle{ S\in{V\choose k} }[/math], let [math]\displaystyle{ X_S }[/math] indicate whether [math]\displaystyle{ G_S }[/math] (the subgraph of [math]\displaystyle{ G }[/math] induced by [math]\displaystyle{ S }[/math]) contain a subgraph [math]\displaystyle{ H }[/math]. Then
- [math]\displaystyle{ p^{\ell}\le\mathbf{E}[X_S]\le k!p^{\ell} }[/math], since there are at most [math]\displaystyle{ k! }[/math] ways to match the substructure.
Note that [math]\displaystyle{ k }[/math] does not depend on [math]\displaystyle{ n }[/math]. Thus, [math]\displaystyle{ \mathbf{E}[X_S]=\Theta(p^{\ell}) }[/math]. Let [math]\displaystyle{ X=\sum_{S\in{V\choose k}}X_S }[/math] be the number of [math]\displaystyle{ H }[/math]-subgraphs.
- [math]\displaystyle{ \mathbf{E}[X]=\Theta(n^kp^{\ell}) }[/math].
By Markov's inequality, [math]\displaystyle{ \Pr[X\ge 1]\le \mathbf{E}[X]=\Theta(n^kp^{\ell}) }[/math] which is [math]\displaystyle{ o(1) }[/math] when [math]\displaystyle{ p\ll n^{-\ell/k} }[/math].
By Chebyshev's inequality, [math]\displaystyle{ \Pr[X=0]\le \frac{\mathbf{Var}[X]}{\mathbf{E}[X]^2} }[/math] where
- [math]\displaystyle{ \mathbf{Var}[X]=\sum_{S\in{V\choose k}}\mathbf{Var}[X_S]+\sum_{S\neq T}\mathbf{Cov}(X_S,X_T) }[/math].
The first term [math]\displaystyle{ \sum_{S\in{V\choose k}}\mathbf{Var}[X_S]\le \sum_{S\in{V\choose k}}\mathbf{E}[X_S^2]= \sum_{S\in{V\choose k}}\mathbf{E}[X_S]=\mathbf{E}[X]=\Theta(n^kp^{\ell}) }[/math].
For the covariances, [math]\displaystyle{ \mathbf{Cov}(X_S,X_T)\neq 0 }[/math] only if [math]\displaystyle{ |S\cap T|=i }[/math] for [math]\displaystyle{ 2\le i\le k-1 }[/math]. Note that [math]\displaystyle{ |S\cap T|=i }[/math] implies that [math]\displaystyle{ |S\cup T|=2k-i }[/math]. And for balanced [math]\displaystyle{ H }[/math], the number of edges of interest in [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math] is [math]\displaystyle{ 2\ell-i\rho(H_{S\cap T})\ge 2\ell-i\rho(H)=2\ell-i\ell/k }[/math]. Thus, [math]\displaystyle{ \mathbf{Cov}(X_S,X_T)\le\mathbf{E}[X_SX_T]\le p^{2\ell-i\ell/k} }[/math]. And,
- [math]\displaystyle{ \sum_{S\neq T}\mathbf{Cov}(X_S,X_T)=\sum_{i=2}^{k-1}O(n^{2k-i}p^{2\ell-i\ell/k}) }[/math]
Therefore, when [math]\displaystyle{ p\gg n^{-\ell/k} }[/math],
- [math]\displaystyle{ \Pr[X=0]\le \frac{\mathbf{Var}[X]}{\mathbf{E}[X]^2}\le \frac{\Theta(n^kp^{\ell})+\sum_{i=2}^{k-1}O(n^{2k-i}p^{2\ell-i\ell/k})}{\Theta(n^{2k}p^{2\ell})}=\Theta(n^{-k}p^{-\ell})+\sum_{i=2}^{k-1}O(n^{-i}p^{-i\ell/k})=o(1) }[/math].
- [math]\displaystyle{ \square }[/math]
Two-point sampling
Pairwise Independent Variables
We now consider constructing pairwise independent random variables ranging over [math]\displaystyle{ [p]=\{0,1,2,\ldots,p-1\} }[/math] for some prime [math]\displaystyle{ p }[/math]. We assume [math]\displaystyle{ X_0,X_1 }[/math] to be two independent random variables which are uniformly and independently distributed over [math]\displaystyle{ [p] }[/math].
Let [math]\displaystyle{ Y_0,Y_1,\ldots, Y_{p-1} }[/math] be defined as:
- [math]\displaystyle{ \begin{align} Y_i=(X_0+i\cdot X_1)\bmod p &\quad \mbox{for }i\in[p]. \end{align} }[/math]
Theorem - The random variables [math]\displaystyle{ Y_0,Y_1,\ldots, Y_{p-1} }[/math] are pairwise independent uniform random variables over [math]\displaystyle{ [p] }[/math].
Proof. We first show that [math]\displaystyle{ Y_i }[/math] are uniform. That is, we will show that for any [math]\displaystyle{ i,a\in[p] }[/math], - [math]\displaystyle{ \begin{align} \Pr\left[(X_0+i\cdot X_1)\bmod p=a\right] &= \frac{1}{p}. \end{align} }[/math]
Due to the law of total probability,
- [math]\displaystyle{ \begin{align} \Pr\left[(X_0+i\cdot X_1)\bmod p=a\right] &= \sum_{j\in[p]}\Pr[X_1=j]\cdot\Pr\left[(X_0+ij)\bmod p=a\right]\\ &=\frac{1}{p}\sum_{j\in[p]}\Pr\left[X_0\equiv(a-ij)\pmod{p}\right]. \end{align} }[/math]
For prime [math]\displaystyle{ p }[/math], for any [math]\displaystyle{ i,j,a\in[p] }[/math], there is exact one value in [math]\displaystyle{ [p] }[/math] of [math]\displaystyle{ X_0 }[/math] satisfying [math]\displaystyle{ X_0\equiv(a-ij)\pmod{p} }[/math]. Thus, [math]\displaystyle{ \Pr\left[X_0\equiv(a-ij)\pmod{p}\right]=1/p }[/math] and the above probability is [math]\displaystyle{ \frac{1}{p} }[/math].
We then show that [math]\displaystyle{ Y_i }[/math] are pairwise independent, i.e. we will show that for any [math]\displaystyle{ Y_i,Y_j }[/math] that [math]\displaystyle{ i\neq j }[/math] and any [math]\displaystyle{ a,b\in[p] }[/math],
- [math]\displaystyle{ \begin{align} \Pr\left[Y_i=a\wedge Y_j=b\right] &= \frac{1}{p^2}. \end{align} }[/math]
The event [math]\displaystyle{ Y_i=a\wedge Y_j=b }[/math] is equivalent to that
- [math]\displaystyle{ \begin{cases} (X_0+iX_1)\equiv a\pmod{p}\\ (X_0+jX_1)\equiv b\pmod{p} \end{cases} }[/math]
Due to the Chinese remainder theorem, there exists a unique solution of [math]\displaystyle{ X_0 }[/math] and [math]\displaystyle{ X_1 }[/math] in [math]\displaystyle{ [p] }[/math] to the above linear congruential system. Thus the probability of the event is [math]\displaystyle{ \frac{1}{p^2} }[/math].
- [math]\displaystyle{ \square }[/math]
Two-spoint sampling
Consider a Monte Carlo randomized algorithm with one-sided error for a decision problem [math]\displaystyle{ f }[/math]. We formulate the algorithm as a deterministic algorithm [math]\displaystyle{ A }[/math] that takes as input [math]\displaystyle{ x }[/math] and a uniform random number [math]\displaystyle{ r\in[p] }[/math] where [math]\displaystyle{ p }[/math] is a prime, such that for any input [math]\displaystyle{ x }[/math]:
- If [math]\displaystyle{ f(x)=1 }[/math], then [math]\displaystyle{ \Pr[A(x,r)=1]\ge\frac{1}{2} }[/math], where the probability is taken over the random choice of [math]\displaystyle{ r }[/math].
- If [math]\displaystyle{ f(x)=0 }[/math], then [math]\displaystyle{ A(x,r)=0 }[/math] for any [math]\displaystyle{ r }[/math].
We call [math]\displaystyle{ r }[/math] the random source for the algorithm.
For the [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=1 }[/math], we call the [math]\displaystyle{ r }[/math] that makes [math]\displaystyle{ A(x,r)=1 }[/math] a witness for [math]\displaystyle{ x }[/math]. For a positive [math]\displaystyle{ x }[/math], at least half of [math]\displaystyle{ [p] }[/math] are witnesses. The random source [math]\displaystyle{ r }[/math] has polynomial number of bits, which means that [math]\displaystyle{ p }[/math] is exponentially large, thus it is infeasible to find the witness for an input [math]\displaystyle{ x }[/math] by exhaustive search. Deterministic overcomes this by having sophisticated deterministic rules for efficiently searching for a witness. Randomization, on the other hard, reduce this to a bit of luck, by randomly choosing an [math]\displaystyle{ r }[/math] and winning with a probability of 1/2.
We can boost the accuracy (equivalently, reduce the error) of any Monte Carlo randomized algorithm with one-sided error by running the algorithm for a number of times.
Suppose that we sample [math]\displaystyle{ t }[/math] values [math]\displaystyle{ r_1,r_2,\ldots,r_t }[/math] uniformly and independently from [math]\displaystyle{ [p] }[/math], and run the following scheme:
[math]\displaystyle{ B(x,r_1,r_2,\ldots,r_t): }[/math] - return [math]\displaystyle{ \bigvee_{i=1}^t A(x,r_i) }[/math];
That is, return 1 if any instance of [math]\displaystyle{ A(x,r_i)=1 }[/math]. For any [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=1 }[/math], due to the independence of [math]\displaystyle{ r_1,r_2,\ldots,r_t }[/math], the probability that [math]\displaystyle{ B(x,r_1,r_2,\ldots,r_t) }[/math] returns an incorrect result is at most [math]\displaystyle{ 2^{-t} }[/math]. On the other hand, [math]\displaystyle{ B }[/math] never makes mistakes for the [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=0 }[/math] since [math]\displaystyle{ A }[/math] has no false positives. Thus, the error of the Monte Carlo algorithm is reduced to [math]\displaystyle{ 2^{-t} }[/math].
Sampling [math]\displaystyle{ t }[/math] mutually independent random numbers from [math]\displaystyle{ [p] }[/math] can be quite expensive since it requires [math]\displaystyle{ \Omega(t\log p) }[/math] random bits. Suppose that we can only afford [math]\displaystyle{ O(\log p) }[/math] random bits. In particular, we sample two independent uniform random number [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] from [math]\displaystyle{ [p] }[/math]. If we use [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] directly bu running two independent instances [math]\displaystyle{ A(x,a) }[/math] and [math]\displaystyle{ A(x,b) }[/math], we only get an error upper bound of 1/4.
The following scheme reduces the error significantly with the same number of random bits:
Algorithm Choose two independent uniform random number [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] from [math]\displaystyle{ [p] }[/math]. Construct [math]\displaystyle{ t }[/math] random number [math]\displaystyle{ r_1,r_2,\ldots,r_t }[/math] by:
- [math]\displaystyle{ \begin{align} \forall 1\le i\le t, &\quad \mbox{let }r_i = (a\cdot i+b)\bmod p. \end{align} }[/math]
Run [math]\displaystyle{ B(x,r_1,r_2,\ldots,r_t): }[/math].
Due to the discussion in the last section, we know that for [math]\displaystyle{ t\le p }[/math], [math]\displaystyle{ r_1,r_2,\ldots,r_t }[/math] are pairwise independent and uniform over [math]\displaystyle{ [p] }[/math]. Let [math]\displaystyle{ X_i=A(x,r_i) }[/math] and [math]\displaystyle{ X=\sum_{i=1}^tX_i }[/math]. Due to the uniformity of [math]\displaystyle{ r_i }[/math] and our definition of [math]\displaystyle{ A }[/math], for any [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=1 }[/math], it holds that
- [math]\displaystyle{ \Pr[X_i=1]=\Pr[A(x,r_i)=1]\ge\frac{1}{2}. }[/math]
By the linearity of expectations,
- [math]\displaystyle{ \mathbf{E}[X]=\sum_{i=1}^t\mathbf{E}[X_i]=\sum_{i=1}^t\Pr[X_i=1]\ge\frac{t}{2}. }[/math]
Since [math]\displaystyle{ X_i }[/math] is Bernoulli trial with a probability of success at least [math]\displaystyle{ p=1/2 }[/math]. We can estimate the variance of each [math]\displaystyle{ X_i }[/math] as follows.
- [math]\displaystyle{ \mathbf{Var}[X_i]=p(1-p)\le\frac{1}{4}. }[/math]
Applying Chebyshev's inequality, we have that for any [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=1 }[/math],
- [math]\displaystyle{ \begin{align} \Pr\left[\bigvee_{i=1}^t A(x,r_i)=0\right] &= \Pr[X=0]\\ &\le \Pr[|X-\mathbf{E}[X]|\ge \mathbf{E}[X]]\\ &\le \Pr\left[|X-\mathbf{E}[X]|\ge \frac{t}{2}\right]\\ &\le \frac{4}{t^2}\sum_{i=1}^t\mathbf{Var}[X_i]\\ &\le \frac{1}{t}. \end{align} }[/math]
The error is reduced to [math]\displaystyle{ 1/t }[/math] with only two random numbers. This scheme works as long as [math]\displaystyle{ t\le p }[/math].