随机算法 (Spring 2014)/Second Moment
Erdős–Rényi Random Graphs
Consider a graph which is randomly generated as:
- , independently with probability .
Such graph is denoted as . This is called the Erdős–Rényi model or model for random graphs.
Informally, the presence of every edge of is determined by an independent coin flipping (with probability of HEADs ).
One of the most fascinating phenomenon of random graphs is that for so many natural graph properties, the random graph suddenly changes from almost always not having the property to almost always having the property as grows in a very small range.
A monotone graph property is said to have the threshold if
- when , as (also called almost always does not have ); and
- when , as (also called almost always has ).
The classic method for proving the threshold is the so-called second moment method (Chebyshev's inequality).
Threshold for 4-clique
- The threshold for a random graph to contain a 4-clique is .
We formulate the problem as such. For any -subset of vertices , let be the indicator random variable such that
Let be the total number of 4-cliques in .
It is sufficient to prove the following lemma.
- If , then as .
- If , then as .
The first claim is proved by the first moment (expectation and Markov's inequality) and the second claim is proved by the second moment method (Chebyshev's inequality).
Every 4-clique has 6 edges, thus for any ,
By the linearity of expectation,
Applying Markov's inequality
- , if .
The first claim is proved.
To prove the second claim, it is equivalent to show that if . By the Chebyshev's inequality,
where the variance is computed as
For any ,
- . Thus the first term of above formula is .
We now compute the covariances. For any that :
- Case.1: , so and do not share any edges. and are independent, thus .
- Case.2: , so and share an edge. Since , there are pairs of such and .
- since there are 11 edges in the union of two 4-cliques that share a common edge. The contribution of these pairs is .
- Case.2: , so and share a triangle. Since , there are pairs of such and . By the same argument,
- since there are 9 edges in the union of two 4-cliques that share a triangle. The contribution of these pairs is .
Putting all these together,
which is if . The second claim is also proved.
Threshold for balanced subgraphs
The above theorem can be generalized to any "balanced" subgraphs.
- The density of a graph , denoted , is defined as .
- A graph is balanced if for all subgraphs of .
Cliques are balanced, because for any . The threshold for 4-clique is a direct corollary of the following general theorem.
Theorem (Erdős–Rényi 1960)
- Let be a balanced graph with vertices and edges. The threshold for the property that a random graph contains a (not necessarily induced) subgraph isomorphic to is .
Sketch of proof.
For any , let indicate whether (the subgraph of induced by ) contain a subgraph . Then
- , since there are at most ways to match the substructure.
Note that does not depend on . Thus, . Let be the number of -subgraphs.
By Markov's inequality, which is when .
By Chebyshev's inequality, where
The first term .
For the covariances, only if for . Note that implies that . And for balanced , the number of edges of interest in and is . Thus, . And,
Therefore, when ,
Pairwise Independent Variables
We now consider constructing pairwise independent random variables ranging over for some prime . We assume to be two independent random variables which are uniformly and independently distributed over .
Let be defined as:
- The random variables are pairwise independent uniform random variables over .
Proof. We first show that are uniform. That is, we will show that for any ,
Due to the law of total probability,
For prime , for any , there is exact one value in of satisfying . Thus, and the above probability is .
We then show that are pairwise independent, i.e. we will show that for any that and any ,
The event is equivalent to that
Due to the Chinese remainder theorem, there exists a unique solution of and in to the above linear congruential system. Thus the probability of the event is .
Consider a Monte Carlo randomized algorithm with one-sided error for a decision problem . We formulate the algorithm as a deterministic algorithm that takes as input and a uniform random number where is a prime, such that for any input :
- If , then , where the probability is taken over the random choice of .
- If , then for any .
We call the random source for the algorithm.
For the that , we call the that makes a witness for . For a positive , at least half of are witnesses. The random source has polynomial number of bits, which means that is exponentially large, thus it is infeasible to find the witness for an input by exhaustive search. Deterministic overcomes this by having sophisticated deterministic rules for efficiently searching for a witness. Randomization, on the other hard, reduce this to a bit of luck, by randomly choosing an and winning with a probability of 1/2.
We can boost the accuracy (equivalently, reduce the error) of any Monte Carlo randomized algorithm with one-sided error by running the algorithm for a number of times.
Suppose that we sample values uniformly and independently from , and run the following scheme:
- return ;
That is, return 1 if any instance of . For any that , due to the independence of , the probability that returns an incorrect result is at most . On the other hand, never makes mistakes for the that since has no false positives. Thus, the error of the Monte Carlo algorithm is reduced to .
Sampling mutually independent random numbers from can be quite expensive since it requires random bits. Suppose that we can only afford random bits. In particular, we sample two independent uniform random number and from . If we use and directly bu running two independent instances and , we only get an error upper bound of 1/4.
The following scheme reduces the error significantly with the same number of random bits:
Choose two independent uniform random number and from . Construct random number by:
Due to the discussion in the last section, we know that for , are pairwise independent and uniform over . Let and . Due to the uniformity of and our definition of , for any that , it holds that
By the linearity of expectations,
Since is Bernoulli trial with a probability of success at least . We can estimate the variance of each as follows.
Applying Chebyshev's inequality, we have that for any that ,
The error is reduced to with only two random numbers. This scheme works as long as .