随机算法 (Fall 2011)/Randomized Algorithms: an Introduction

From EtoneWiki
Jump to: navigation, search

Randomized Quicksort

Given as input a permutation of numbers, we want to reorder the numbers in in increasing order. This is the problem of sorting, one of the most classic problem in Computer Science. The famous Quicksort algorithm has very good performance in practice.

Quicksort

if the length of is greater than 1 do:

  • pick an and choose as the pivot;
  • partition into three parts: , , and , where all numbers in are smaller than and all numbers in are larger than ;
  • recursively sort and ;

The time complexity of this sorting algorithm is measured by the number of comparisons.

For the deterministic quicksort algorithm, the pivot is picked from a fixed position (e.g. the first element ). The worst-case time complexity in terms of number of comparisons is , though the average-case complexity is , matching the lower bound for comparison-based sorting.

We consider the following randomized version of the quicksort.

Randomized Quicksort

if the length of is greater than 1 do:

  • pick an uniformly at random and choose as the pivot;
  • partition into three parts: , , and , where all numbers in are smaller than and all numbers in are larger than ;
  • recursively sort and ;

We assume that the complexity is measured in terms of number of comparisons, and the deterministic quicksort chooses the first element in the permutation as the pivot.

Average-case vs. Randomization

We show that the expected running time of randomized quicksort on any input is equal to the average-case complexity of deterministic quicksort. Formally, we have the following theorem.

Theorem
For any permutation of numbers, let be the complexity of the deterministic quicksort algorithm on input , and let be the random variable representing the complexity of the randomized quicksort on input .
Let be a uniformly random permutation of numbers.
where the first expectation is taken over the randomness of the algorithm, and the second expectation is taken over the random permutation .

We know that the average-case complexity of deterministic quicksort is , i.e. . Then the above theorem implies that the expected running time of the randomized quicksort is on any input .

For the deterministic quicksort, there exists some bad input such that the running time of the algorithm is as bad as , each time we run the algorithm on that input. However, for the randomized quicksort, for any input , the running time of the algorithm is random and exhibits a well-behaved distribution.


We now prove the theorem.

Each running instance of the quicksort can be represented by a binary recursion tree. Each node corresponds to a recursive call of the quciksort algorithm, and the node is labeled with the size of the current input permutation . The running time of the algorithm is the sum of the labels of the nodes.

Quicksort-tree.png

We can show that the distribution of the labels of the tree defined by the deterministic quicksort on a random permutation is equivalent to the distribution of the labels of the tree defined by the randomized quicksort on any fixed permutation . The argument uses an important principle for the probabilistic analysis of algorithms: the principle of deferred decisions.

Principle of deferred decisions

For the average-case analysis, the input is a uniformly random permutation which is generated before the running of the algorithm. We can defer the decision of the random choices to the time when they are actually used by the algorithm.

For the deterministic quicksort, each time we choose the first element in the current permutation as the pivot. For a uniformly random inout permutation . Once the random choice of is determined, all the pivots and the labels of the recursion tree are accordingly fixed. We defer the random choice of the uniform input permutation to the time when the pivots are determined. At each step, the pivot is uniformly sampled from the current permutation. This deferring of random choices does not change the distribution of the pivots since the element in a fixed position of a random permutation is equally distributed as the element in a random position of a fixed permutation.

Observe that the resulting process (each time the pivot is uniformly sampled from the current permutation) has exactly the same way of sampling pivots as the randomized quicksort. Thus, the expected complexity of randomized quicksort on any fixed input is equal to the expected complexity of deterministic quicksort on a random input. The theorem is proved.

Primality Test

A primality test is an algorithm that given as input a number determines whether is prime.

Fermat Test

Recall the Fermat's little theorem.

Fermat's little theorem
If is prime, then for every .

There are several proofs for this famous theorem. We will not prove the theorem but will only use it here.

If we can find an such that , it will prove that is composite. This inspires the following "primality testing" algorithm.

Fermat test
  1. Choose a uniformly random .
  2. If , then return "composite".
  3. Else return "probably prime".

Complexity of Fermat test

The time complexity of this algorithm depends on the computational cost , whose straightforward computing takes multiplications, which is too expensive. We describe an efficient way of computing the modular exponent where .

We first make the following observations regarding the modular exponentiations:

  • If the values of and are both known, then can be computed by multiplying (modulo ) them.
  • can be computed by letting and for , which takes only modular multiplications.

Let . A number can be represented in its binary form: , where each , so that .

Combining the above two observations, all can be computed in many multiplications, and can be computed by multiplying (modulo ) them together.

The time complexity of Fermat test thus can be made in polynomial of .

Accuracy of Fermat test

If the output is "composite", then for some . By the Fermat's little theorem, must be composite. Therefore, for any prime , the output is always "probably prime".

For composite , it is possible that the algorithm picks an such that and outputs "probably prime". But if the fraction of such bad in is small enough, then the testing algorithm may still correctly output "composite" with a good chance. However, there exist (though very rare) such composites, called Carmichael numbers, that may fool the Fermat test.

Definition (Carmichael number)
A composite number is a Carmichael number if for all .

Here is the multiplicative group modulo , defined as .

For non-Carmichael composites, the Fermat test may detect the compositeness with a fairly good chance. Let . Note that is closed under multiplication (modulo ), thus is a subgroup of . Therefore, is divisible by .

If is neither prime nor Carmichael, then is nonempty, i.e. is a proper subgroup of , thus is at least 2 and there are at least half satisfying .

In conclusion,

  • if is prime, then the Fermat test returns "probably prime" with probability 1;
  • if is non-Carmichael composite, then the Fermat test returns "composite" with probability at least ;
  • if is a Carmichael number, the Fermat test breaks down.

As long as the input is not a Carmichael number, we can repeat the Fermat test independently for times and reduce the error probability to .

The Carmichael numbers are very rare. Let be the "Carmichael density" that

.

In 1956, Erdős proved that

.

If one only needs to generates a prime number instead of testing the primality of a given number, then we can generates a random number, and apply the Fermat test. Due to the prime number theorem, the number of prime numbers less than or equal to is . This scheme will generates a prime number in a reasonable number of independent trials with a good chance.

Miller-Rabin Test

The Fermat test is based on the following way to prove that a number is composite:

  • there exists a number such that .

The Miller-Rabin primality test is based on an additional way to prove that a number is composite:

  • 1 has a nontrivial square root, that is, a number satisfying that but .

The following theorem states that the existence of nontrivial square root of 1 is a valid proof of compositeness of .

Theorem
If is prime, then does not have a nontrivial square root.
Proof.

Suppose is a square root of 1, that is, . Therefore,

,

which means that .

If , then divides neither nor , which contradicts that is prime and divides .

The idea of Miller-Rabin test is to find either a Fermat proof of compositeness, or a nontrivial square root of 1.

Miller-Rabin Primality Test
  1. Choose a uniformly random .
  2. Let and be such that , is odd, and .
  3. Let . For to , let .
  4. If , then return "composite".
  5. If there is an , , such that but , then return "composite".
  6. Else return "probably prime".

An easy inductive proof shows that for all , . In particular, .

The original algorithm due to Miller is deterministic, which test all small up to an order. The correctness of this deterministic algorithm relies on the unproven conjecture of Generalized Riemann hypothesis. It was observed by Rabin that the deterministic searching can be replaced by random sampling.

Line 4 of the algorithm is equivalent to that , thus line 4 is just the Fermat test. If passes the Fermat test, line 5 tries to find a nontrivial square root of 1 in form of .

If is prime, then due to the Fermat little theorem and the fact that prime numbers do not have nontrivial square roots of 1, the conditions in line 4 and line 5 will never hold, thus the algorithm will return "probably prime". If is a non-Carmichael composite, as in the Fermat test, line 4 returns "composite" with probability at least . The only remaining case is when is a Carmichael number.

We pick the largest such that there is a satisfying , and define

.
Theorem
If is a Carmichael number, then the defined as above is a proper subgroup of .

Since is fixed, it is easy to verify that is closed under multiplication, thus is a subgroup of . It is a bit complicated to show that is nonempty and we will not give the full proof here.

The accuracy of Miller-Rabin test on Carmichael numbers is implied by this theorem. Suppose is a Carmichael number. We call an a liar if it fools the test in line 5, i.e. there is no such that but .

We claim that all liars belong to . Due to the maximality of , for all . Since is a Carmichael number, , if is a liar then it mus hold that for all or otherwise cannot be a liar. In particular, . Again, since is a liar, , therefore .

We show that when is a Carmichael number, all numbers that fools the Miller-Rabin test belongs to a proper subgroup of , therefore the Miller-Rabin test returns a "composite" with probability .

In conclusion,

  • if is prime, the algorithm returns "probably prime";
  • if is a non-Carmichael composite, the algorithm returns "composite" in line 4 with probability at least ;
  • if is a Carmichael number, the algorithm returns "composite" in line 5 with probability at least .

Graph Coloring

A coloring of a graph is a mapping for some integer , satisfying that for all .

The problem of deciding whether an input graph is colorable by some fixed number of colors is a classic problem in Computer Science. Denote by the maximum degree of .

  • If , there always exists a coloring. Moreover, the coloring can be found by a simple greedy algorithm.
  • If , has a coloring unless it contains a -clique or it is an odd cycle. (Brooks Theorem)
  • If , the problem of deciding whether is -colorable is NP-hard.

Sampling a graph coloring

We consider the problem of sampling a uniformly random coloring of a given graph.

Sampling a random coloring is at least as hard as deciding its existence, so we don't expect to solve the sampling problem when . The decision problem for the case is also nontrivial. Thus people are interested only in the case when .

Unlike sampling a number from uniformly at random, which can be done by flipping a fair coin for times, sampling a coloring cannot be easily done, because of the constraint that any two adjacent vertices cannot have the same color. Such constraints (also called "structures") tremendously increase the difficulty of random sampling a combinatorial object as well as counting the number of combinatorial objects.

The following is a simple randomized algorithm for sampling colorings.

Sampling Graph Coloring
Start with an arbitrary coloring of . At each step:
  1. Pick a vertex and a color uniformly at random.
  2. Change the color of to if it results a valid coloring; do nothing if otherwise.
Repeat this process for sufficiently many steps.

This very simple algorithm uses an important idea called random walks. There is a huge class of randomized algorithms for random sampling based on this principle.

The algorithm start with a fixed coloring, and "walks" to more and more random colorings as it runs. In finite steps, it will get "close" enough to a uniform random coloring. There are two issues:

  • How do we measure the randomness of the coloring, or how doe we measure how close the current random coloring to the uniform random coloring?
  • How many steps it takes to get close enough to a random coloring?

We will introduce the formal concepts addressing these issues in future lectures. We will introduce a beautiful theory of random walks, which is fundamental to the analysis of randomized algorithms.

The followings are the two most important conjectures regarding the graph coloring problem.

Conjecture
  1. The above simple algorithm returns a nearly uniform random coloring in steps whenever .
  2. Random sampling a nearly uniform graph colorings can be done in polynomial time whenever .

These two conjectures are still open. We will not solve them in this class, but we will approach them by some important techniques for analyzing random walks.

Counting by sampling

Given a graph of vertices, we want to compute the number of different colorings of graph . This is a well-defined computational problem.

Enumerating all colorings will take exponential time in the worst case. Since we only need a number instead of a list of all colorings, we expect cleverer ways of computing this number than the brute force enumeration.

The problem of counting graph colorings has the following formulation. Let be a matrix defined as

, where .

The number of colorings for a given graph is described by the following function:

.

The sum enumerates all possible colorings, valid or invalid, and the product determines whether the present coloring is valid.

We can replace with any matrix and talk about the computation of . This gives us the Potts model in statistical physics. When , it becomes the famous Ising model.

The function is called the partition function in statistical physics. Virtually all interesting information about a statistical physics system can be deduced from its partition function value.

The partition function can also be used to model many natural counting problems. For example, when and

The partition function is the total number of independent sets of graph .

There is a complexity class for counting problems, named #P. The NP class is for decision problems. The #P class is the analog of the NP class for counting problems. Many counting problems are known to be #P-hard, including counting graph colorings or independent sets.

It is believed that #P-hard problems do not have polynomial time algorithms. Actually it is believed that #P-hard problems are harder than NP-hard problems, partly because the existence of polynomial time algorithm for any #P-hard problem implies that P=NP, but not vice versa.

Later in this class, we will show that although the exact counting is computationally hard, for #P-hard problems such as counting graph colorings, there exist randomized algorithms which can approximately compute the problems. These algorithms are based on random samplings obtained from simulating a random walk. This powerful technique is called the Monte Carlo Markov Chain (MCMC) method.

Identity Checking

Consider the following scenario: Two datacenters located in two far apart cities (Beijing and Nanjing), each holds a copy of a very large database. We want to check whether the two copies are identical with minimum communication between two datacenters. The cost of local computation is ignored because it is incomparable to the cost caused by remote communication.

Communication Complexity

In 1979, Andrew Yao introduced the communication complexity to model the problems of this kind. The communication complexity is much more general, which does not just consider the problem of checking identity but any problems that can be computed by communication.

Alice and Bob are two entities. Alice has a private input and Bob has a private input . Together they want to compute a function by communicating with each other. The communication follows a predefined communication protocol (the "algorithm" in this model) which depends only on the problem but not on the inputs. The complexity of a communication protocol is measured by the number of bits communicated between Alice and Bob in the worst case.

The problem of checking identity is formally defined by the function EQ as follows: and for any ,

A trivial way to solve EQ is to let Bob send to Alice. Supposed that , this costs bits of communications.

It is known that for deterministic communication protocols, this is the best we can get for computing EQ.

Theorem (Yao 1979)
Any deterministic communication protocol computing EQ on two -bit strings costs bits of communication in the worst-case.

This theorem is much more nontrivial to prove than it looks, because Alice and Bob are allowed to interact with each other in arbitrary ways. The proof of this theorem in Yao's 1979 paper initiates the field of communication complexity.

Randomized communication protocols

If the randomness is allowed, we can solve this problem up to a tolerable probabilistic error with significantly less communications. We treat the inputs as two binary numbers . The randomized communication protocol is as follows:

A randomized protocol for EQ

Alice does:

for some parameter (to be specified),
  • choose uniformly at random a prime ;
  • send and to Bob;

Upon receiving and , Bob does:

  • If return "probably identical"; else return "distinct".

The number of bits to be communicated is .

This technique is called fingerprinting. The random prime induces a random fingerprinting function , so that the and are like the "fingerprints" of and . Like the fingerprints of people, the same input must have the same fingerprints and the chance that two distinct inputs have the same fingerprint is small.

If , then it is trivial to see that the output is "probably identical". For the case that , it is probable that Alice pick a wrong prime so that and the protocol incorrectly outputs "probably identical", however, it can be proved that this probability of error is polynomially small for some polynomially large .

Therefore, this randomized protocol solves the problem up to a one-sided error with communication cost .


In the above algorithm, the random coin flips made by Alice are private to herself but not available to Bob.

We may assume that both Alice and Bob observe the same random coin flips with no additional costs. This model is called public coins (so the previous algorithm is actually communication with private coins). With this stronger assumption, we can have new randomized protocol.

We treat the inputs as vectors of Boolean entries. We define the inner product for Boolean vectors as , which has an equivalent definition , where is the Boolean operator XOR.

A randomized (public coin) protocol for EQ
Suppose that a uniformly random Boolean vector is known to both Alice and Bob.

Alice does:

  • send to Bob;

Upon receiving , Bob does:

  • If return "probably identical"; else return "distinct".

Same as before, if , the output is always correct. If , it is easy to check that the protocol gives a wrong anser with probability at most . By repeatedly running the protocol for a number of times (with independent public coins ), the error probability can be reduced significantly.