随机算法 (Fall 2011)/Random Graphs: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>WikiSysop
(Created page with '== Erdős–Rényi Random Graphs == Consider a graph <math>G(V,E)</math> which is randomly generated as: * <math>|V|=n</math>; * <math>\forall \{u,v\}\in{V\choose 2}</math>, <ma…')
 
imported>WikiSysop
Line 7: Line 7:


Informally, the presence of every edge of <math>G(n,p)</math> is determined by an independent coin flipping (with probability of HEADs <math>p</math>).  
Informally, the presence of every edge of <math>G(n,p)</math> is determined by an independent coin flipping (with probability of HEADs <math>p</math>).  
=== Coloring large-girth graphs ===
The girth of a graph is the length of the shortest cycle of the graph.
{{Theorem|Definition|
Let <math>G(V,E)</math> be an undirected graph.
* A '''cycle''' of length <math>k</math> in <math>G</math> is a sequence of distinct vertices <math>v_1,v_2,\ldots,v_{k}</math> such that <math>v_iv_{i+1}\in E</math> for all <math>i=1,2,\ldots,k-1</math> and <math>v_kv_1\in E</math>.
* The '''girth''' of <math>G</math>, dented <math>g(G)</math>, is the length of the shortest cycle in <math>G</math>.
}}
The chromatic number of a graph is the minimum number of colors with which the graph can be ''properly'' colored.
{{Theorem|Definition (chromatic number)|
* The '''chromatic number''' of <math>G</math>, denoted <math>\chi(G)</math>, is the minimal number of colors which we need to color the vertices of <math>G</math> so that no two adjacent vertices have the same color. Formally,
::<math>\chi(G)=\min\{C\in\mathbb{N}\mid \exists f:V\rightarrow[C]\mbox{ such that }\forall uv\in E, f(u)\neq f(v)\}</math>.
}}
In 1959, Erdős proved the following theorem: for any fixed <math>k</math> and <math>\ell</math>, there exists a finite graph with girth at least <math>k</math> and chromatic number at least <math>\ell</math>. This is considered a striking example of the probabilistic method. The statement of the theorem itself calls for nothing about probability or randomness. And the result is highly contra-intuitive: if the girth is large there is no simple reason why the graph could not be colored with a few colors. We can always "locally" color a cycle with 2 or 3 colors. Erdős' result shows that there are "global" restrictions for coloring, and although such configurations are very difficult to explicitly construct, with the probabilistic method, we know that they commonly exist.
{{Theorem| Theorem (Erdős 1959)|
: For all <math>k,\ell</math> there exists a graph <math>G</math> with <math>g(G)>\ell</math> and <math>\chi(G)>k\,</math>.
}}
It is very hard to directly analyze the chromatic number of a graph. We find that the chromatic number can be related to the size of the maximum independent set.
{{Theorem|Definition (independence number)|
* The '''independence number''' of <math>G</math>, denoted <math>\alpha(G)</math>, is the size of the largest independent set in <math>G</math>. Formally,
::<math>\alpha(G)=\max\{|S|\mid S\subseteq V\mbox{ and }\forall u,v\in S, uv\not\in E\}</math>.
}}
We observe the following relationship between the chromatic number and the independence number.
{{Theorem|Lemma|
:For any <math>n</math>-vertex graph,
::<math>\chi(G)\ge\frac{n}{\alpha(G)}</math>.
}}
{{Proof|
*In the optimal coloring, <math>n</math> vertices are partitioned into <math>\chi(G)</math> color classes according to the vertex color.
*Every color class is an independent set, or otherwise there exist two adjacent vertice with the same color.
*By the pigeonhole principle, there is a color class (hence an independent set) of size <math>\frac{n}{\chi(G)}</math>. Therefore, <math>\alpha(G)\ge\frac{n}{\chi(G)}</math>.
The lemma follows.
}}
Therefore, it is sufficient to prove that <math>\alpha(G)\le\frac{n}{k}</math> and <math>g(G)>\ell</math>.
{{Prooftitle|Proof of Erdős theorem|
Fix <math>\theta<\frac{1}{\ell}</math>. Let <math>G</math> be <math>G(n,p)</math> with <math>p=n^{\theta-1}</math>.
For any length-<math>i</math> simple cycle <math>\sigma</math>, let <math>X_\sigma</math> be the indicator random variable such that
:<math>
X_\sigma=
\begin{cases}
1 & \sigma\mbox{ is a cycle in }G,\\
0 & \mbox{otherwise}.
\end{cases}
</math>
The number of cycles of length at most <math>\ell</math> in graph <math>G</math> is
:<math>X=\sum_{i=3}^\ell\sum_{\sigma:i\text{-cycle}}X_\sigma</math>.
For any particular length-<math>i</math> simple cycle <math>\sigma</math>,
:<math>\mathbf{E}[X_\sigma]=\Pr[X_\sigma=1]=\Pr[\sigma\mbox{ is a cycle in }G]=p^i=n^{\theta i-i}</math>.
For any <math>3\le i\le n</math>, the number of length-<math>i</math> simple cycle is <math>\frac{n(n-1)\cdots (n-i+1)}{2i!}</math>. By the linearity of expectation,
:<math>\mathbf{E}[X]=\sum_{i=3}^\ell\sum_{\sigma:i\text{-cycle}}\mathbf{E}[X_\sigma]=\sum_{i=3}^\ell\frac{n(n-1)\cdots (n-i+1)}{2i!}n^{\theta i-i}\le \sum_{i=3}^\ell\frac{n^{\theta i}}{2i!}=o(n)</math>.
Applying Markov's inequality,
:<math>
\Pr\left[X\ge \frac{n}{2}\right]\le\frac{\mathbf{E}[X]}{n/2}=o(1).
</math>
Therefore, with high probability the random graph has less than <math>n/2</math> short cycles.
Now we proceed to analyze the independence number. Let <math>m=\left\lceil\frac{3\ln n}{p}\right\rceil</math>, so that
:<math>
\begin{align}
\Pr[\alpha(G)\ge m]
&\le\Pr\left[\exists S\in{V\choose m}\forall \{u,v\}\in{S\choose 2}, uv\not\in G\right]\\
&\le{n\choose m}(1-p)^{m\choose 2}\\
&<n^m\mathrm{e}^{-p{m\choose 2}}\\
&=\left(n\mathrm{e}^{-p(m-1)/2}\right)^m=o(1)
\end{align}
</math>
The probability that either of the above events occurs is
:<math>
\begin{align}
\Pr\left[X<\frac{n}{2}\vee \alpha(G)<m\right]
\le \Pr\left[X<\frac{n}{2}\right]+\Pr\left[\alpha(G)<m\right]
=o(1).
\end{align}
</math>
Therefore, there exists a graph <math>G</math> with less than <math>n/2</math> "short" cycles, i.e., cycles of length at most <math>\ell</math>, and with <math>\alpha(G)<m\le 3n^{1-\theta}\ln n</math>.
Take each "short" cycle in <math>G</math> and remove a vertex from the cycle (and also remove all adjacent edges to the removed vertex). This gives a graph <math>G'</math> which has no short cycles, hence the girth <math>g(G')\ge\ell</math>. And <math>G'</math> has at least <math>n/2</math> vertices, because at most <math>n/2</math> vertices are removed.
Notice that removing vertices cannot makes <math>\alpha(G)</math> grow. It holds that <math>\alpha(G')\le\alpha(G)</math>. Thus
:<math>\chi(G')\ge\frac{n/2}{\alpha(G')}\ge\frac{n}{2m}\ge\frac{n^\theta}{6\ln n}</math>.
The theorem is proved by taking <math>n</math> sufficiently large so that this value is greater than <math>k</math>.
}}
The proof contains a very simple procedure which for any <math>k</math> and <math>\ell</math> ''generates'' such a graph <math>G</math> with <math>g(G)>\ell</math> and <math>\chi(G)>k</math>. The procedure is as such:
* Fix some <math>\theta<\frac{1}{\ell}</math>. Choose sufficiently large <math>n</math> with <math>\frac{n^\theta}{6\ln n}>k</math>, and let <math>p=n^{\theta-1}</math>.
* Generate a random graph <math>G</math> as <math>G(n,p)</math>.
* For each cycle of length at most <math>\ell</math> in <math>G</math>, remove a vertex from the cycle.
The resulting graph <math>G'</math> satisfying that <math>g(G)>\ell</math> and <math>\chi(G)>k</math> with high probability.


===Monotone properties ===
===Monotone properties ===

Revision as of 01:33, 24 July 2011

Erdős–Rényi Random Graphs

Consider a graph [math]\displaystyle{ G(V,E) }[/math] which is randomly generated as:

  • [math]\displaystyle{ |V|=n }[/math];
  • [math]\displaystyle{ \forall \{u,v\}\in{V\choose 2} }[/math], [math]\displaystyle{ uv\in E }[/math] independently with probability [math]\displaystyle{ p }[/math].

Such graph is denoted as [math]\displaystyle{ G(n,p) }[/math]. This is called the Erdős–Rényi model or [math]\displaystyle{ G(n,p) }[/math] model for random graphs.

Informally, the presence of every edge of [math]\displaystyle{ G(n,p) }[/math] is determined by an independent coin flipping (with probability of HEADs [math]\displaystyle{ p }[/math]).

Monotone properties

A graph property is a predicate of graph which depends only on the structure of the graph.

Definition
Let [math]\displaystyle{ \mathcal{G}_n=2^{V\choose 2} }[/math], where [math]\displaystyle{ |V|=n }[/math], be the set of all possible graphs on [math]\displaystyle{ n }[/math] vertices. A graph property is a boolean function [math]\displaystyle{ P:\mathcal{G}_n\rightarrow\{0,1\} }[/math] which is invariant under permutation of vertices, i.e. [math]\displaystyle{ P(G)=P(H) }[/math] whenever [math]\displaystyle{ G }[/math] is isomorphic to [math]\displaystyle{ H }[/math].

We are interested in the monotone properties, i.e., those properties that adding edges will not change a graph from having the property to not having the property.

Definition
A graph property [math]\displaystyle{ P }[/math] is monotone if for any [math]\displaystyle{ G\subseteq H }[/math], both on [math]\displaystyle{ n }[/math] vertices, [math]\displaystyle{ G }[/math] having property [math]\displaystyle{ P }[/math] implies [math]\displaystyle{ H }[/math] having property [math]\displaystyle{ P }[/math].

By seeing the property as a function mapping a set of edges to a numerical value in [math]\displaystyle{ \{0,1\} }[/math], a monotone property is just a monotonically increasing set function.

Some examples of monotone graph properties:

  • Hamiltonian;
  • [math]\displaystyle{ k }[/math]-clique;
  • contains a subgraph isomorphic to some [math]\displaystyle{ H }[/math];
  • non-planar;
  • chromatic number [math]\displaystyle{ \gt k }[/math] (i.e., not [math]\displaystyle{ k }[/math]-colorable);
  • girth [math]\displaystyle{ \lt \ell }[/math].

From the last two properties, you can see another reason that the Erdős theorem is unintuitive.

Some examples of non-monotone graph properties:

  • Eulerian;
  • contains an induced subgraph isomorphic to some [math]\displaystyle{ H }[/math];

For all monotone graph properties, we have the following theorem.

Theorem
Let [math]\displaystyle{ P }[/math] be a monotone graph property. Suppose [math]\displaystyle{ G_1=G(n,p_1) }[/math], [math]\displaystyle{ G_2=G(n,p_2) }[/math], and [math]\displaystyle{ 0\le p_1\le p_2\le 1 }[/math]. Then
[math]\displaystyle{ \Pr[P(G_1)]\le \Pr[P(G_2)] }[/math].

Although the statement in the theorem looks very natural, it is difficult to evaluate the probability that a random graph has some property. However, the theorem can be very easily proved by using the idea of coupling, a proof technique in probability theory which compare two unrelated random variables by forcing them to be related.

Proof.

For any [math]\displaystyle{ \{u,v\}\in{[n]\choose 2} }[/math], let [math]\displaystyle{ X_{\{u,v\}} }[/math] be independently and uniformly distributed over the continuous interval [math]\displaystyle{ [0,1] }[/math]. Let [math]\displaystyle{ uv\in G_1 }[/math] if and only if [math]\displaystyle{ X_{\{u,v\}}\in[0,p_1] }[/math] and let [math]\displaystyle{ uv\in G_2 }[/math] if and only if [math]\displaystyle{ X_{\{u,v\}}\in[0,p_2] }[/math].

It is obvious that [math]\displaystyle{ G_1\sim G(n,p_1)\, }[/math] and [math]\displaystyle{ G_2\sim G(n,p_2)\, }[/math]. For any [math]\displaystyle{ \{u,v\} }[/math], [math]\displaystyle{ uv\in G_1 }[/math] means that [math]\displaystyle{ X_{\{u,v\}}\in[0,p_1]\subseteq [0,p_2] }[/math], which implies that [math]\displaystyle{ uv\in G_2 }[/math]. Thus, [math]\displaystyle{ G_1\subseteq G_2 }[/math].

Since [math]\displaystyle{ P }[/math] is monotone, [math]\displaystyle{ P(G_1)=1 }[/math] implies [math]\displaystyle{ P(G_2) }[/math]. Thus,

[math]\displaystyle{ \Pr[P(G_1)=1]\le \Pr[P(G_2)=1] }[/math].
[math]\displaystyle{ \square }[/math]

Threshold phenomenon

One of the most fascinating phenomenon of random graphs is that for so many natural graph properties, the random graph [math]\displaystyle{ G(n,p) }[/math] suddenly changes from almost always not having the property to almost always having the property as [math]\displaystyle{ p }[/math] grows in a very small range.

A monotone graph property [math]\displaystyle{ P }[/math] is said to have the threshold [math]\displaystyle{ p(n) }[/math] if

  • when [math]\displaystyle{ p\ll p(n) }[/math], [math]\displaystyle{ \Pr[P(G(n,p))]=0 }[/math] as [math]\displaystyle{ n\rightarrow\infty }[/math] (also called [math]\displaystyle{ G(n,p) }[/math] almost always does not have [math]\displaystyle{ P }[/math]); and
  • when [math]\displaystyle{ p\gg p(n) }[/math], [math]\displaystyle{ \Pr[P(G(n,p))]=1 }[/math] as [math]\displaystyle{ n\rightarrow\infty }[/math] (also called [math]\displaystyle{ G(n,p) }[/math] almost always has [math]\displaystyle{ P }[/math]).

The classic method for proving the threshold is the so-called second moment method (Chebyshev's inequality).

Theorem
The threshold for a random graph [math]\displaystyle{ G(n,p) }[/math] to contain a 4-clique is [math]\displaystyle{ p=n^{2/3} }[/math].

We formulate the problem as such. For any [math]\displaystyle{ 4 }[/math]-subset of vertices [math]\displaystyle{ S\in{V\choose 4} }[/math], let [math]\displaystyle{ X_S }[/math] be the indicator random variable such that

[math]\displaystyle{ X_S= \begin{cases} 1 & S\mbox{ is a clique},\\ 0 & \mbox{otherwise}. \end{cases} }[/math]

Let [math]\displaystyle{ X=\sum_{S\in{V\choose 4}}X_S }[/math] be the total number of 4-cliques in [math]\displaystyle{ G }[/math].

It is sufficient to prove the following lemma.

Lemma
  • If [math]\displaystyle{ p=o(n^{-2/3}) }[/math], then [math]\displaystyle{ \Pr[X\ge 1]\rightarrow 0 }[/math] as [math]\displaystyle{ n\rightarrow\infty }[/math].
  • If [math]\displaystyle{ p=\omega(n^{-2/3}) }[/math], then [math]\displaystyle{ \Pr[X\ge 1]\rightarrow 1 }[/math] as [math]\displaystyle{ n\rightarrow\infty }[/math].
Proof.

The first claim is proved by the first moment (expectation and Markov's inequality) and the second claim is proved by the second moment method (Chebyshev's inequality).

Every 4-clique has 6 edges, thus for any [math]\displaystyle{ S\in{V\choose 4} }[/math],

[math]\displaystyle{ \mathbf{E}[X_S]=\Pr[X_S=1]=p^6 }[/math].

By the linearity of expectation,

[math]\displaystyle{ \mathbf{E}[X]=\sum_{S\in{V\choose 4}}\mathbf{E}[X_S]={n\choose 4}p^6 }[/math].

Applying Markov's inequality

[math]\displaystyle{ \Pr[X\ge 1]\le \mathbf{E}[X]=O(n^4p^6)=o(1) }[/math], if [math]\displaystyle{ p=o(n^{-2/3}) }[/math].

The first claim is proved.

To prove the second claim, it is equivalent to show that [math]\displaystyle{ \Pr[X=0]=o(1) }[/math] if [math]\displaystyle{ p=\omega(n^{-2/3}) }[/math]. By the Chebyshev's inequality,

[math]\displaystyle{ \Pr[X=0]\le\Pr[|X-\mathbf{E}[X]|\ge\mathbf{E}[X]]\le\frac{\mathbf{Var}[X]}{(\mathbf{E}[X])^2} }[/math],

where the variance is computed as

[math]\displaystyle{ \mathbf{Var}[X]=\mathbf{Var}\left[\sum_{S\in{V\choose 4}}X_S\right]=\sum_{S\in{V\choose 4}}\mathbf{Var}[X_S]+\sum_{S,T\in{V\choose 4}, S\neq T}\mathbf{Cov}(X_S,X_T) }[/math].

For any [math]\displaystyle{ S\in{V\choose 4} }[/math],

[math]\displaystyle{ \mathbf{Var}[X_S]=\mathbf{E}[X_S^2]-\mathbf{E}[X_S]^2\le \mathbf{E}[X_S^2]=\mathbf{E}[X_S]=p^6 }[/math]. Thus the first term of above formula is [math]\displaystyle{ \sum_{S\in{V\choose 4}}\mathbf{Var}[X_S]=O(n^4p^6) }[/math].

We now compute the covariances. For any [math]\displaystyle{ S,T\in{V\choose 4} }[/math] that [math]\displaystyle{ S\neq T }[/math]:

  • Case.1: [math]\displaystyle{ |S\cap T|\le 1 }[/math], so [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math] do not share any edges. [math]\displaystyle{ X_S }[/math] and [math]\displaystyle{ X_T }[/math] are independent, thus [math]\displaystyle{ \mathbf{Cov}(X_S,X_T)=0 }[/math].
  • Case.2: [math]\displaystyle{ |S\cap T|= 2 }[/math], so [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math] share an edge. Since [math]\displaystyle{ |S\cup T|=6 }[/math], there are [math]\displaystyle{ {n\choose 6}=O(n^6) }[/math] pairs of such [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math].
[math]\displaystyle{ \mathbf{Cov}(X_S,X_T)=\mathbf{E}[X_SX_T]-\mathbf{E}[X_S]\mathbf{E}[X_T]\le\mathbf{E}[X_SX_T]=\Pr[X_S=1\wedge X_T=1]=p^{11} }[/math]
since there are 11 edges in the union of two 4-cliques that share a common edge. The contribution of these pairs is [math]\displaystyle{ O(n^6p^{11}) }[/math].
  • Case.2: [math]\displaystyle{ |S\cap T|= 3 }[/math], so [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math] share a triangle. Since [math]\displaystyle{ |S\cup T|=5 }[/math], there are [math]\displaystyle{ {n\choose 5}=O(n^5) }[/math] pairs of such [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math]. By the same argument,
[math]\displaystyle{ \mathbf{Cov}(X_S,X_T)\le\Pr[X_S=1\wedge X_T=1]=p^{9} }[/math]
since there are 9 edges in the union of two 4-cliques that share a triangle. The contribution of these pairs is [math]\displaystyle{ O(n^5p^{9}) }[/math].

Putting all these together,

[math]\displaystyle{ \mathbf{Var}[X]=O(n^4p^6+n^6p^{11}+n^5p^{9}). }[/math]

And

[math]\displaystyle{ \Pr[X=0]\le\frac{\mathbf{Var}[X]}{(\mathbf{E}[X])^2}=O(n^{-4}p^{-6}+n^{-2}p^{-1}+n^{-3}p^{-3}) }[/math],

which is [math]\displaystyle{ o(1) }[/math] if [math]\displaystyle{ p=\omega(n^{-2/3}) }[/math]. The second claim is also proved.

[math]\displaystyle{ \square }[/math]

The above theorem can be generalized to any "balanced" subgraphs.

Definition
  • The density of a graph [math]\displaystyle{ G(V,E) }[/math], denoted [math]\displaystyle{ \rho(G)\, }[/math], is defined as [math]\displaystyle{ \rho(G)=\frac{|E|}{|V|} }[/math].
  • A graph [math]\displaystyle{ G(V,E) }[/math] is balanced if [math]\displaystyle{ \rho(H)\le \rho(G) }[/math] for all subgraphs [math]\displaystyle{ H }[/math] of [math]\displaystyle{ G }[/math].

Cliques are balanced, because [math]\displaystyle{ \frac{{k\choose 2}}{k}\le \frac{{n\choose 2}}{n} }[/math] for any [math]\displaystyle{ k\le n }[/math]. The threshold for 4-clique is a direct corollary of the following general theorem.

Theorem (Erdős–Rényi 1960)
Let [math]\displaystyle{ H }[/math] be a balanced graph with [math]\displaystyle{ k }[/math] vertices and [math]\displaystyle{ \ell }[/math] edges. The threshold for the property that a random graph [math]\displaystyle{ G(n,p) }[/math] contains a (not necessarily induced) subgraph isomorphic to [math]\displaystyle{ H }[/math] is [math]\displaystyle{ p=n^{-k/\ell}\, }[/math].
Sketch of proof.

For any [math]\displaystyle{ S\in{V\choose k} }[/math], let [math]\displaystyle{ X_S }[/math] indicate whether [math]\displaystyle{ G_S }[/math] (the subgraph of [math]\displaystyle{ G }[/math] induced by [math]\displaystyle{ S }[/math]) contain a subgraph [math]\displaystyle{ H }[/math]. Then

[math]\displaystyle{ p^{\ell}\le\mathbf{E}[X_S]\le k!p^{\ell} }[/math], since there are at most [math]\displaystyle{ k! }[/math] ways to match the substructure.

Note that [math]\displaystyle{ k }[/math] does not depend on [math]\displaystyle{ n }[/math]. Thus, [math]\displaystyle{ \mathbf{E}[X_S]=\Theta(p^{\ell}) }[/math]. Let [math]\displaystyle{ X=\sum_{S\in{V\choose k}}X_S }[/math] be the number of [math]\displaystyle{ H }[/math]-subgraphs.

[math]\displaystyle{ \mathbf{E}[X]=\Theta(n^kp^{\ell}) }[/math].

By Markov's inequality, [math]\displaystyle{ \Pr[X\ge 1]\le \mathbf{E}[X]=\Theta(n^kp^{\ell}) }[/math] which is [math]\displaystyle{ o(1) }[/math] when [math]\displaystyle{ p\ll n^{-\ell/k} }[/math].

By Chebyshev's inequality, [math]\displaystyle{ \Pr[X=0]\le \frac{\mathbf{Var}[X]}{\mathbf{E}[X]^2} }[/math] where

[math]\displaystyle{ \mathbf{Var}[X]=\sum_{S\in{V\choose k}}\mathbf{Var}[X_S]+\sum_{S\neq T}\mathbf{Cov}(X_S,X_T) }[/math].

The first term [math]\displaystyle{ \sum_{S\in{V\choose k}}\mathbf{Var}[X_S]\le \sum_{S\in{V\choose k}}\mathbf{E}[X_S^2]= \sum_{S\in{V\choose k}}\mathbf{E}[X_S]=\mathbf{E}[X]=\Theta(n^kp^{\ell}) }[/math].

For the covariances, [math]\displaystyle{ \mathbf{Cov}(X_S,X_T)\neq 0 }[/math] only if [math]\displaystyle{ |S\cap T|=i }[/math] for [math]\displaystyle{ 2\le i\le k-1 }[/math]. Note that [math]\displaystyle{ |S\cap T|=i }[/math] implies that [math]\displaystyle{ |S\cup T|=2k-i }[/math]. And for balanced [math]\displaystyle{ H }[/math], the number of edges of interest in [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math] is [math]\displaystyle{ 2\ell-i\rho(H_{S\cap T})\ge 2\ell-i\rho(H)=2\ell-i\ell/k }[/math]. Thus, [math]\displaystyle{ \mathbf{Cov}(X_S,X_T)\le\mathbf{E}[X_SX_T]\le p^{2\ell-i\ell/k} }[/math]. And,

[math]\displaystyle{ \sum_{S\neq T}\mathbf{Cov}(X_S,X_T)=\sum_{i=2}^{k-1}O(n^{2k-i}p^{2\ell-i\ell/k}) }[/math]

Therefore, when [math]\displaystyle{ p\gg n^{-\ell/k} }[/math],

[math]\displaystyle{ \Pr[X=0]\le \frac{\mathbf{Var}[X]}{\mathbf{E}[X]^2}\le \frac{\Theta(n^kp^{\ell})+\sum_{i=2}^{k-1}O(n^{2k-i}p^{2\ell-i\ell/k})}{\Theta(n^{2k}p^{2\ell})}=\Theta(n^{-k}p^{-\ell})+\sum_{i=2}^{k-1}O(n^{-i}p^{-i\ell/k})=o(1) }[/math].
[math]\displaystyle{ \square }[/math]