随机算法 (Spring 2013)/Introduction and Probability Space: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>Etone
Created page with "= Randomized Quicksort = Given as input a permutation <math>\pi</math> of <math>n</math> numbers, we want to reorder the numbers in <math>\pi</math> in increasing order. This is …"
 
imported>Etone
No edit summary
Line 1: Line 1:
= Randomized Quicksort =
= Checking Matrix Multiplication=
Given as input a permutation <math>\pi</math> of <math>n</math> numbers, we want to reorder the numbers in <math>\pi</math> in increasing order. This is the problem of sorting, one of the most classic problem in Computer Science. The famous [http://en.wikipedia.org/wiki/Quicksort Quicksort] algorithm has very good performance in practice.
Consider the following problem:
{{Theorem| Quicksort|
* '''Input''': Three <math>n\times n</math> matrices <math>A,B</math> and <math>C</math>.
if the length of <math>\pi</math> is greater than 1 do:
* '''Output''': return "yes" if <math>C=AB</math> and "no" if otherwise.
* pick an <math>i\in[n]</math> and choose <math>\pi_i</math> as the ''pivot'';
* partition <math>\pi</math> into three parts: <math>\pi'</math>, <math>\pi_i</math>, and <math>\pi''</math>, where all numbers in <math>\pi'</math> are smaller than <math>\pi_i</math> and all numbers in <math>\pi''</math> are  larger than <math>\pi_i</math>;
* recursively sort <math>\pi'</math> and <math>\pi''</math>;
}}
The time complexity of this sorting algorithm is measured by the '''number of comparisons'''.


For the '''deterministic''' quicksort algorithm, the pivot is picked from a fixed position <math>i</math> (e.g. the first element <math>\pi_0</math>). The worst-case time complexity in terms of number of comparisons is <math>\Theta(n^2)</math>, though the average-case complexity is <math>O(n\log n)</math>, matching the lower bound for comparison-based sorting.
A naive way of checking the equality is first computing <math>AB</math> and then comparing the result with <math>C</math>. The (asymptotically) fastest matrix multiplication algorithm known today runs in time <math>O(n^{2.376})</math>. The naive algorithm will take asymptotically the same amount of time.


We consider the following randomized version of the quicksort.
== Freivalds Algorithm ==
{{Theorem|Randomized Quicksort|
The following is a very simple randomized algorithm, due to Freivalds, running in only <math>O(n^2)</math> time:
if the length of <math>\pi</math> is greater than 1 do:
* pick an <math>i\in[n]</math> ''uniformly at random'' and choose <math>\pi_i</math> as the pivot;
* partition <math>\pi</math> into three parts: <math>\pi'</math>, <math>\pi_i</math>, and <math>\pi''</math>, where all numbers in <math>\pi'</math> are smaller than <math>\pi_i</math> and all numbers in <math>\pi''</math> are  larger than <math>\pi_i</math>;
* recursively sort <math>\pi'</math> and <math>\pi''</math>;
}}


We assume that the complexity is measured in terms of number of comparisons, and the deterministic quicksort chooses the first element in the permutation as the pivot.
{{Theorem|Algorithm (Freivalds)|
 
*pick a vector <math>r \in\{0, 1\}^n</math> uniformly at random;
== Average-case vs. Randomization ==
*if <math>A(Br) = Cr</math> then return "yes" else return "no";
We show that the expected running time of randomized quicksort on any input is equal to the average-case complexity of deterministic quicksort. Formally, we have the following theorem.
{{Theorem|Theorem|
:For any permutation <math>\pi</math> of <math>n</math> numbers, let <math>t(\pi)</math> be the complexity of the deterministic quicksort algorithm on input <math>\pi</math>, and let <math>T(\pi)</math> be the random variable representing the complexity of the randomized quicksort on input <math>\pi</math>.
:Let <math>\Pi</math> be a uniformly random permutation of <math>n</math> numbers.
::<math>\mathrm{E}[T(\pi)]=\mathrm{E}[t(\Pi)]</math>
:where the first expectation is taken over the randomness of the algorithm, and the second expectation is taken over the random permutation <math>\Pi</math>.
}}
}}
The product <math>A(Br)</math> is computed by first multiplying <math>Br</math> and then <math>A(Br)</math>.
The running time is <math>O(n^2)</math> because the algorithm does 3 matrix-vector multiplications in total.


We know that the average-case complexity of deterministic quicksort is <math>O(n\log n)</math>, i.e. <math>\mathrm{E}[t(\Pi)]=O(n\log n)</math>. Then the above theorem implies that the expected running time of the randomized quicksort is <math>\mathrm{E}[T(\pi)]=O(n\log n)</math> on any input <math>\pi</math>.
If <math>AB=C</math> then <math>A(Br) = Cr</math> for any <math>r \in\{0, 1\}^n</math>, thus the algorithm will return a "yes" for any positive instance (<math>AB=C</math>).  
 
But if <math>AB \neq C</math> then the algorithm will make a mistake if it chooses such an <math>r</math> that <math>ABr = Cr</math>. However, the following lemma states that the probability of this event is bounded.
For the deterministic quicksort, there exists some bad input <math>\pi</math> such that the running time of the algorithm is as bad as <math>\Omega(n^2)</math>, each time we run the algorithm on that input. However, for the randomized quicksort, for any input <math>\pi</math>, the running time of the algorithm is random and exhibits a well-behaved distribution.
 
------
We now prove the theorem.
 
Each running instance of the quicksort can be represented by a binary recursion tree. Each node corresponds to a recursive call of the quciksort algorithm, and the node is labeled with the size of the current input permutation <math>\pi</math>. The running time of the algorithm is the sum of the labels of the nodes.
:[[image:Quicksort-tree.png|300px]]
We can show that the distribution of the labels of the tree defined by the deterministic quicksort on a random permutation <math>\Pi</math> is equivalent to the distribution of the labels of the tree defined by the randomized quicksort on any fixed permutation <math>\pi</math>. The argument uses an important principle for the probabilistic analysis of algorithms: the principle of deferred decisions.
 
== Principle of deferred decisions ==
For the average-case analysis, the input is a uniformly random permutation which is generated ''before'' the running of the algorithm. We can ''defer'' the decision of the random choices to the time when they are actually used by the algorithm.
 
For the deterministic quicksort, each time we choose the first element in the current permutation as the pivot. For a uniformly random inout permutation <math>\Pi</math>. Once the random choice of <math>\Pi</math> is determined, all the pivots and the labels of the recursion tree are accordingly fixed. We defer the random choice of the uniform input permutation <math>\Pi</math> to the time when the pivots are determined. At each step, the pivot is uniformly sampled from the current permutation. This deferring of random choices does not change the distribution of the pivots since the element in a fixed position of a random permutation is equally distributed as the element in a random position of a fixed permutation.
 
Observe that the resulting process (each time the pivot is uniformly sampled from the current permutation) has exactly the same way of sampling pivots as the randomized quicksort. Thus, the expected complexity of randomized quicksort on any fixed input is equal to the expected complexity of deterministic quicksort on a random input. The theorem is proved.
 
=Primality Test=
A [http://en.wikipedia.org/wiki/Primality_test '''primality test'''] is an algorithm that given as input a number <math>n</math> determines whether <math>n</math> is prime.


== Fermat Test ==
{{Theorem|Lemma|
Recall the [http://en.wikipedia.org/wiki/Fermat's_little_theorem '''Fermat's little theorem'''].
:If <math>AB\neq C</math> then for a uniformly random <math>r \in\{0, 1\}^n</math>,
{{Theorem|Fermat's little theorem|
::<math>\Pr[ABr = Cr]\le \frac{1}{2}</math>.
:If <math>n>2</math> is prime, then <math>a^{n-1}\equiv 1\pmod n</math> for every <math>a\in\{1,2,\ldots,n-1\}</math>.
}}
}}
There are several [http://en.wikipedia.org/wiki/Proofs_of_Fermat's_little_theorem proofs] for this famous theorem. We will not prove the theorem but will only use it here.
{{Proof| Let <math>D=AB-C</math>. The event <math>ABr=Cr</math> is equivalent to that <math>Dr=0</math>. It is then sufficient to show that for a <math>D\neq \boldsymbol{0}</math>, it holds that <math>\Pr[Dr = \boldsymbol{0}]\le \frac{1}{2}</math>.


If we can find an <math>a\in\{1,2,\ldots,n-1\}</math> such that <math>a^{n-1}\not\equiv 1\pmod n</math>, it will prove that <math>n</math> is composite. This inspires the following "primality testing" algorithm.
Since <math>D\neq \boldsymbol{0}</math>, it must have at least one non-zero entry. Suppose that <math>D(i,j)\neq 0</math>.


{{Theorem|Fermat test|
We assume the event that <math>Dr=\boldsymbol{0}</math>. In particular, the <math>i</math>-th entry of <math>Dr</math> is
# Choose a uniformly random <math>a\in\{1,2,\ldots,n-1\}</math>.
:<math>(Dr)_{i}=\sum_{k=1}^nD(i,k)r_k=0.</math>
# If <math>a^{n-1}\not\equiv 1\pmod n</math>, then return "'''composite'''".
The <math>r_j</math> can be calculated by
# Else return "'''probably prime'''".
:<math>r_j=-\frac{1}{D(i,j)}\sum_{k\neq j}^nD(i,k)r_k.</math>
Once all <math>r_k</math> where <math>k\neq j</math> are fixed, there is a unique solution of <math>r_j</math>. That is to say, conditioning on any <math>r_k, k\neq j</math>, there is at most '''one''' value of <math>r_j\in\{0,1\}</math> satisfying <math>Dr=0</math>. On the other hand, observe that <math>r_j</math> is chosen from '''two''' values <math>\{0,1\}</math> uniformly and independently at random. Therefore, with at least <math>\frac{1}{2}</math> probability, the choice of <math>r</math> fails to give us a zero <math>Dr</math>. That is, <math>\Pr[ABr=Cr]=\Pr[Dr=0]\le\frac{1}{2}</math>.
}}
}}


=== Complexity of Fermat test ===
When <math>AB=C</math>, Freivalds algorithm always returns "yes"; and when <math>AB\neq C</math>, Freivalds algorithm returns "no" with probability at least 1/2.
The time complexity of this algorithm depends on the computational cost <math>a^{n-1} \bmod n</math>, whose straightforward computing takes <math>n-2</math> multiplications, which is too expensive. We describe an efficient way of computing the [http://en.wikipedia.org/wiki/Modular_exponentiation modular exponent] <math>a^x\bmod n\,</math> where <math>x\in[n]</math>.
 
We first make the following observations regarding the modular exponentiations:
* If the values of <math>a^x\bmod n</math> and <math>a^y\bmod n</math> are both known, then <math>a^{x+y}\bmod n</math> can be computed by multiplying (modulo <math>n</math>) them.
* <math>a^{2^i}</math> can be computed by letting <math>a_0=a</math> and <math>a_{j}=a_{j-1}^2 \pmod n</math> for <math>j=1,2,\ldots, i</math>, which takes only <math>i</math> modular multiplications.
 
Let <math>\ell=\lceil\log_2 n\rceil</math>. A number <math>x\in[n]</math> can be represented in its binary form: <math>x_\ell x_{\ell-1}\cdots x_1x_0</math>, where each <math>x_i\in\{0,1\}</math>, so that <math>x=\sum_{i=0}^{\ell}x_i\cdot2^i</math>.
 
Combining the above two observations, all <math>a^{x_i2^i}\bmod n</math> can be computed in <math>O(\log n)</math> many multiplications, and <math>a^x\bmod n</math> can be computed by multiplying (modulo <math>n</math>) them together.
 
The time complexity of Fermat test thus can be made in polynomial of <math>\log n</math>.
 
=== Accuracy of Fermat test ===
If the output is "'''composite'''", then <math>a^{n-1}\not\equiv 1\pmod n</math> for some <math>a\in\{1,2,\ldots,n-1\}</math>. By the Fermat's little theorem, <math>n</math> must be composite. Therefore, for any prime <math>n</math>, the output is always "'''probably prime'''".
 
For composite <math>n</math>, it is possible that the algorithm picks an <math>a</math> such that <math>a^{n-1}\equiv 1\pmod n</math> and outputs "'''probably prime'''".
But if the fraction of such bad <math>a</math> in <math>\{1,2,\ldots,n-1\}</math> is small enough, then the testing algorithm may still correctly output "'''composite'''" with a good chance. However, there exist (though very rare) such composites, called [http://en.wikipedia.org/wiki/Carmichael_number '''Carmichael numbers'''], that may fool the Fermat test.
 
{{Theorem|Definition (Carmichael number)|
:A composite number <math>n</math> is a Carmichael number if <math>a^{n-1}\equiv 1\pmod n</math> for all <math>a\in\mathbb{Z}_n^*</math>.
}}
Here <math>\mathbb{Z}_n^*</math> is the [http://en.wikipedia.org/wiki/Multiplicative_group_of_integers_modulo_n multiplicative group modulo <math>n</math>], defined as <math>\mathbb{Z}_n^*=\{a\mid 1\le a\le n-1\wedge \mathrm{gcd}(a,n)=1\}</math>.
 
For non-Carmichael composites, the Fermat test may detect the compositeness with a fairly good chance. Let <math>B=\{a\in\mathbb{Z}_n^*\mid a^{n-1}\equiv 1\pmod n\}</math>. Note that <math>B</math> is closed under multiplication (modulo <math>n</math>), thus <math>B</math> is a subgroup of <math>\mathbb{Z}_n^*</math>. Therefore, <math>|\mathbb{Z}_n^*|</math> is divisible by <math>|B|</math>.
 
If <math>n</math> is neither prime nor Carmichael, then <math>\mathbb{Z}_n^*\setminus B</math> is nonempty, i.e. <math>B</math> is a proper subgroup of <math>\mathbb{Z}_n^*</math>, thus <math>|\mathbb{Z}_n^*|/|B|</math> is at least 2 and there are at least half <math>a\in\{1,2,\ldots,n-1\}</math> satisfying <math>a^{n-1}\not\equiv 1</math>.
 
In conclusion,
* if <math>n</math> is prime, then the Fermat test returns "'''probably prime'''" with probability 1;
* if <math>n</math> is non-Carmichael composite, then the Fermat test returns "'''composite'''" with probability at least <math>1/2</math>;
* if <math>n</math> is a Carmichael number, the Fermat test breaks down.
 
As long as the input is not a Carmichael number, we can repeat the Fermat test independently for <math>k</math> times and reduce the error probability to <math>2^{-k}</math>.
 
The Carmichael numbers are very rare. Let <math>c(n)</math> be the "Carmichael density" that
:<math>c(n)=\frac{\text{number of Carmichael numbers }\le n}{n}</math>.
In 1956, [http://en.wikipedia.org/wiki/Paul_Erd%C5%91s Erdős] proved that
:<math>c(n)\le\exp\left(-\Omega\left(\frac{\log n\log\log\log n}{\log \log n}\right)\right)=n^{-\Omega\left(\frac{\log\log\log n}{\log\log n}\right)}</math>.
If one only needs to ''generates'' a prime number instead of ''testing'' the primality of a given number, then we can generates a random number, and apply the Fermat test. Due to the [http://en.wikipedia.org/wiki/Prime_number_theorem prime number theorem], the number of prime numbers less than or equal to <math>n</math> is <math>\pi(n)\sim\frac{n}{\ln n}</math>. This scheme will generates a prime number in a reasonable number of independent trials with a good chance.
 
== Miller-Rabin Test ==
The Fermat test is based on the following way to ''prove'' that a number <math>n</math> is composite:
* there exists a number <math>a</math> such that <math>a^{n-1}\not\equiv 1\pmod n</math>.
The Miller-Rabin primality test is based on an additional way to prove that a number <math>n</math> is composite:
* 1 has a '''nontrivial square root''', that is, a number <math>a</math> satisfying that <math>a^2\equiv 1\pmod n</math> but <math>a\not\equiv \pm 1\pmod n</math>.
 
The following theorem states that the existence of nontrivial square root of 1 is a valid proof of compositeness of <math>n</math>.
{{Theorem|Theorem|
:If <math>n>2</math> is prime, then <math>1</math> does not have a nontrivial square root.
}}
{{Proof|
Suppose <math>a</math> is a square root of 1, that is, <math>a^2\equiv1\pmod n</math>. Therefore,
:<math>(a-1)(a+1)=a^2-1\equiv 0\pmod n</math>,
which means that <math>(a-1)(a+1)|n\,</math>.
 
If <math>a\not\equiv \pm1\pmod n</math>, then <math>n</math> divides neither <math>(a-1)</math> nor <math>(a+1)</math>, which contradicts that <math>n</math> is prime and divides <math>(a-1)(a+1)</math>.
}}
 
The idea of Miller-Rabin test is to find either a Fermat proof of compositeness, or a nontrivial square root of 1.
{{Theorem|Miller-Rabin Primality Test|
# Choose a uniformly random <math>a\in\{1,2,\ldots,n-1\}</math>.
# Let <math>t</math> and <math>m</math> be such that <math>t\ge 1</math>, <math>m</math> is odd, and <math>n-1=2^tm</math>.
# Let <math>a_0=a^m\bmod\, n\,</math>. For <math>i=1</math> to <math>t</math>, let <math>a_i=a_{i-1}^2 \bmod\, n</math>.
# If <math>a_t\not\equiv 1\pmod n</math>, then return "'''composite'''".
# If there is an <math>i</math>, <math>1\le i\le t</math>, such that <math>a_i\equiv 1\pmod n</math> but <math>a_{i-1}\not\equiv \pm 1\pmod n</math>, then return "'''composite'''".
# Else return "'''probably prime'''".
}}
 
An easy inductive proof shows that <math>a_i=a^{2^im}\bmod\, n</math> for all <math>i</math>, <math>0\le i\le t</math>. In particular, <math>a_t\equiv a^{2^tm}=a^{n-1}\pmod n</math>.
 
The original algorithm due to Miller is deterministic, which test all small <math>a</math> up to an <math>O(\log n)</math> order. The correctness of this deterministic algorithm relies on the unproven conjecture of [http://en.wikipedia.org/wiki/Generalized_Riemann_hypothesis Generalized Riemann hypothesis]. It was observed by Rabin that the deterministic searching can be replaced by random sampling.
 
Line 4 of the algorithm is equivalent to that <math>a^{n-1}\not\equiv 1\pmod n</math>, thus line 4 is just the Fermat test. If <math>n</math> passes the Fermat test, line 5 tries to find a nontrivial square root of 1 in form of <math>a^{2^im}</math>.
 
If <math>n</math> is prime, then due to the Fermat little theorem and the fact that prime numbers do not have nontrivial square roots of 1, the conditions in line 4 and line 5 will never hold, thus the algorithm will return "'''probably prime'''". If <math>n</math> is a non-Carmichael composite, as in the Fermat test, line 4 returns "'''composite'''" with probability at least <math>1/2</math>. The only remaining case is when <math>n</math> is a Carmichael number.
 
We pick the largest <math>j</math> such that there is a <math>b\in\mathbb{Z}_{n}^*</math> satisfying <math>b^{2^jm}\equiv -1\pmod n</math>, and define
:<math>B=\{a\in\mathbb{Z}_n^*\mid a^{2^jm}\equiv \pm 1\pmod n\}</math>.
 
{{Theorem|Theorem|
:If <math>n</math> is a Carmichael number, then the <math>B</math> defined as above is a proper subgroup of <math>\mathbb{Z}_n^*</math>.
}}
Since <math>j</math> is fixed, it is easy to verify that <math>B</math> is closed under multiplication, thus <math>B</math> is a subgroup of <math>\mathbb{Z}_n^*</math>. It is a bit complicated to show that <math>\mathbb{Z}_n^*\setminus B</math> is nonempty and we will not give the full proof here.
 
The accuracy of Miller-Rabin test on Carmichael numbers is implied by this theorem. Suppose <math>n</math> is a Carmichael number. We call an <math>a\in\{1,2,\ldots, n-1\}</math> a ''liar'' if it fools the test in line 5, i.e.  there is no such <math>i</math> that <math>a^{2^im}\equiv 1\pmod n</math> but <math>a^{2^{i-1}m}\not\equiv \pm 1\pmod n</math>.
 
We claim that all liars belong to <math>B</math>. Due to the maximality of <math>j</math>, <math>a^{2^im}\not\equiv -1</math> for all <math>i>j</math>. Since <math>n</math> is a Carmichael number, <math>a^{n-1}\equiv 1\pmod n</math>, if <math>a</math> is a liar then it mus hold that <math>a^{2^im}\equiv 1\pmod n</math> for all <math>i>j</math> or otherwise <math>a</math> cannot be a liar. In particular, <math>a^{2^{j+1}m}\equiv 1\pmod n</math>. Again, since <math>a</math> is a liar, <math>a^{2^jm}\equiv \pm1\pmod n</math>, therefore <math>a\in B</math>.
 
We show that when <math>n</math> is a Carmichael number, all numbers <math>a</math> that fools the Miller-Rabin test belongs to a proper subgroup of <math>\mathbb{Z}_n^*</math>, therefore the Miller-Rabin test returns a "'''composite'''" with probability <math>1/2</math>.
 
In conclusion,
* if <math>n</math> is prime, the algorithm returns "'''probably prime'''";
* if <math>n</math> is a non-Carmichael composite, the algorithm returns "'''composite'''" in line 4 with probability at least <math>1/2</math>;
* if <math>n</math> is a Carmichael number, the algorithm returns "'''composite'''" in line 5 with probability at least <math>1/2</math>.
 
= Graph Coloring =
A '''coloring''' of a graph <math>G(V,E)</math> is a mapping <math>\sigma:V\rightarrow[q]</math> for some integer <math>q</math>, satisfying that <math>\sigma(u)\neq \sigma(v)</math> for all <math>uv\in E</math>.
 
The problem of deciding whether an input graph is colorable by some fixed number of colors is a classic problem in Computer Science. Denote by <math>\Delta</math> the maximum degree of <math>G</math>.
* If <math>q\ge \Delta+1</math>, there always exists a coloring. Moreover, the coloring can be found by a simple greedy algorithm.
* If <math>q=\Delta</math>, <math>G</math> has a coloring unless it contains a <math>(\Delta+1)</math>-clique or it is an odd cycle. ([http://en.wikipedia.org/wiki/Brooks'_theorem Brooks Theorem])
* If <math>q<\Delta</math>, the problem of deciding whether <math>G</math> is <math>q</math>-colorable is NP-hard.
 
== Sampling a graph coloring ==
We consider the problem of '''sampling''' a uniformly random coloring of a given graph.
 
Sampling a random coloring is at least as hard as deciding its existence, so we don't expect to solve the sampling problem when <math>q<\Delta</math>. The decision problem for the case <math>q=\Delta</math> is also nontrivial. Thus people are interested only in the case when <math>q\ge \Delta+1</math>.
 
Unlike sampling a number from <math>[n]</math> uniformly at random, which can be done by flipping a fair coin for <math>O(\log n)</math> times, sampling a coloring cannot be easily done, because of the constraint that any two adjacent vertices cannot have the same color. Such constraints (also called "structures") tremendously increase the difficulty of random sampling a combinatorial object as well as counting the number of combinatorial objects.
 
The following is a simple randomized algorithm for sampling colorings.
 
{{Theorem|Sampling Graph Coloring|
:Start with an arbitrary coloring of <math>G(V,E)</math>. At each step:
# Pick a vertex <math>v\in V</math> and a color <math>c\in[q]</math> uniformly at random.
# Change the color of <math>v</math> to <math>c</math> if it results a valid coloring; do nothing if otherwise.
: Repeat this process for sufficiently many steps.
}}
 
This very simple algorithm uses an important idea called '''random walks'''. There is a huge class of randomized algorithms for random sampling based on this principle.
 
The algorithm start with a fixed coloring, and "walks" to more and more random colorings as it runs. In finite steps, it will get "close" enough to a uniform random coloring. There are two issues:
* How do we measure the randomness of the coloring, or how doe we measure how close the current random coloring to the uniform random coloring?
* How many steps it takes to get close enough to a random coloring?
We will introduce the formal concepts addressing these issues in future lectures. We will introduce a beautiful theory of random walks, which is fundamental to the analysis of randomized algorithms.
 
The followings are the two most important conjectures regarding the graph coloring problem.
{{Theorem|Conjecture|
# The above simple algorithm returns a nearly uniform random coloring in <math>O(n\ln n)</math> steps whenever <math>q\ge\Delta+2</math>.
# Random sampling a nearly uniform graph colorings can be done in polynomial time whenever <math>q\ge\Delta+1</math>.
}}
 
These two conjectures are still open. We will not solve them in this class, but we will approach them by some important techniques for analyzing random walks.
 
== Counting by sampling ==
Given a graph <math>G(V,E)</math> of <math>n</math> vertices, we want to compute the number of different colorings of graph <math>G</math>. This is a well-defined computational problem.
 
Enumerating all colorings will take exponential time in the worst case. Since we only need a number instead of a list of all colorings, we expect cleverer ways of computing this number than the brute force enumeration.
 
The problem of counting graph colorings has the following formulation. Let <math>A</math> be a <math>q\times q</math> matrix defined as
:<math>A=\begin{bmatrix}
A_{00} & A_{01} & \cdots & A_{0,q-1}\\
A_{10} & A_{11} & \cdots & A_{1,q-1}\\
\vdots & \vdots & \ddots & \vdots\\
A_{q-1, 0} & A_{q-1, 1} & \cdots & A_{q-1,q-1}\\
\end{bmatrix}</math>, where <math>A_{ij}=\begin{cases}0 & i=j \\ 1 & i\neq j \end{cases}</math>.
The number of colorings for a given graph <math>G</math> is described by the following function:
:<math>Z_A(G)=\sum_{\sigma:V\rightarrow[q]}\prod_{(u,v)\in E}A_{\sigma(u),\sigma(v)}</math>.
The sum enumerates all possible colorings, valid or invalid, and the product determines whether the present coloring is valid.
 
We can replace <math>A</math> with any <math>q\times q</math> matrix and talk about the computation of <math>Z_A(G)</math>. This gives us the [http://en.wikipedia.org/wiki/Potts_model Potts model] in statistical physics. When <math>q=2</math>, it becomes the famous [http://en.wikipedia.org/wiki/Ising_model Ising model].
 
The function <math>Z_A(G)</math> is called the [http://en.wikipedia.org/wiki/Partition_function_(statistical_mechanics) partition function] in statistical physics. Virtually all interesting information about a statistical physics system can be deduced from its partition function value.
 
The partition function can also be used to model many natural counting problems. For example, when <math>q=2</math> and
:<math>A=\begin{bmatrix}
A_{00} & A_{01} \\
A_{10} & A_{11} \\
\end{bmatrix}=
\begin{bmatrix}
1 & 1\\
1 & 0 \\
\end{bmatrix}</math>
The partition function <math>Z_A(G)</math> is the total number of independent sets of graph <math>G</math>.
 
There is a [http://en.wikipedia.org/wiki/Complexity_class complexity class] for counting problems, named [http://en.wikipedia.org/wiki/Sharp-P '''<nowiki>#P</nowiki>''']. The '''NP''' class is for decision problems. The '''<nowiki>#P</nowiki>''' class is the analog of the '''NP''' class for counting problems. Many counting problems are known to be '''<nowiki>#P</nowiki>-hard''', including counting graph colorings or independent sets.
 
It is believed that '''<nowiki>#P</nowiki>-hard''' problems do not have polynomial time algorithms. Actually it is believed that '''<nowiki>#P</nowiki>-hard''' problems are harder than '''NP-hard''' problems, partly because the existence of polynomial time algorithm for any '''<nowiki>#P</nowiki>-hard''' problem implies that '''P'''='''NP''', but not vice versa.
 
Later in this class, we will show that although the ''exact'' counting is computationally hard, for '''<nowiki>#P</nowiki>-hard''' problems such as counting graph colorings, there exist randomized algorithms which can approximately compute the problems. These algorithms are based on random samplings obtained from simulating a random walk. This powerful technique is called the [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Monte Carlo Markov Chain (MCMC) method].
 
= Identity Checking =
Consider the following scenario: Two ''datacenters'' located in two far apart cities (Beijing and Nanjing), each holds a copy of a very large database. We want to check whether the two copies are identical with minimum communication between two datacenters.  The cost of local computation is ignored because it is incomparable to the cost caused by remote communication.
 
== Communication Complexity ==
In 1979, [http://en.wikipedia.org/wiki/Andrew_Yao Andrew Yao] introduced the '''communication complexity''' to model the problems of this kind. The communication complexity is much more general, which does not just consider the problem of checking identity but any problems that can be computed by communication.
 
Alice and Bob are two entities. Alice has a private input <math>x</math> and Bob has a private input <math>y</math>. Together they want to compute a function <math>f(x,y)</math> by communicating with each other. The communication follows a predefined '''communication protocol''' (the "algorithm" in this model) which depends only on the problem <math>f</math> but not on the inputs. The complexity of a communication protocol is measured by the number of bits communicated between Alice and Bob in the worst case.
 
The problem of checking identity is formally defined by the function EQ as follows: <math>\mathrm{EQ}:\{0,1\}^n\times\{0,1\}^n\rightarrow\{0,1\}</math> and for any <math>x,y\in\{0,1\}^n</math>,
:<math>
\mathrm{EQ}(x,y)=
\begin{cases}
1& \mbox{if } x=y,\\
0& \mbox{otherwise.}
\end{cases}</math>
 
A trivial way to solve EQ is to let Bob send <math>y</math> to Alice. Supposed that <math>x,y\in\{0,1\}^n</math>, this costs <math>n</math> bits of communications.
 
It is known that for deterministic communication protocols, this is the best we can get for computing EQ.
 
{{Theorem|Theorem (Yao 1979)|
:Any deterministic communication protocol computing EQ on two <math>n</math>-bit strings costs <math>n</math> bits of communication in the worst-case.
}}
 
This theorem is much more nontrivial to prove than it looks, because Alice and Bob are allowed to interact with each other in arbitrary ways. The proof of this theorem in Yao's 1979 paper initiates the field of communication complexity.
 
== Randomized communication protocols ==
If the randomness is allowed, we can solve this problem up to a tolerable probabilistic error with significantly less communications. We treat the inputs <math>x,y\in\{0,1\}^n</math> as two binary numbers <math>x,y\in[2^n]</math>. The randomized communication protocol is as follows:
{{Theorem|A randomized protocol for EQ|
'''Alice does''':
:for some parameter <math>k</math> (to be specified),
:* choose uniformly at random a prime <math>p\in[k]</math>;
:* send <math>p</math> and <math>x\bmod p</math> to Bob;
'''Upon receiving''' <math>p</math> and <math>x\bmod p</math>, '''Bob does''':
:* If <math>x\bmod p=y\bmod p</math> return "'''probably identical'''"; else return "'''distinct'''".
}}
The number of bits to be communicated is <math>O(\log k)</math>.
 
This technique is called '''fingerprinting'''. The random prime <math>p</math> induces a random fingerprinting function <math>\bmod p</math>, so that the <math>x\bmod p</math> and <math>y\bmod p</math> are like the "fingerprints" of <math>x</math> and <math>y</math>. Like the fingerprints of people, the same input must have the same fingerprints and the chance that two distinct inputs have the same fingerprint is small.
 
If <math>x=y</math>, then it is trivial to see that the output is "'''probably identical'''". For the case that <math>x\neq y</math>, it is probable that Alice pick a wrong prime <math>p</math> so that <math>x\equiv y\pmod p</math> and the protocol incorrectly outputs "'''probably identical'''", however, it can be proved that this probability of error is polynomially small <math>O(\frac{1}{\mathrm{poly}(n)})</math> for some polynomially large <math>k=\mathrm{poly}(n)</math>.
 
Therefore, this randomized protocol solves the problem up to a one-sided error <math>O(\frac{1}{\mathrm{poly}(n)})</math> with communication cost <math>O(\log k)=O(\log n)</math>.
 
-------
In the above algorithm, the random coin flips made by Alice are private to herself but not available to Bob.  


We may assume that both Alice and Bob observe the same random coin flips with no additional costs. This model is called '''public coins''' (so the previous algorithm is actually communication with private coins). With this stronger assumption, we can have new randomized protocol.
To improve its accuracy, we can run Freivalds algorithm for <math>k</math> times, each time with an ''independent'' random <math>r\in\{0,1\}^n</math>, and return "yes" iff all running instances pass the test.


We treat the inputs <math>x,y\in\{0,1\}^n</math> as vectors of <math>n</math> Boolean entries. We define the inner product <math>\langle x,y\rangle</math> for Boolean vectors <math>x,y\in\{0,1\}^n</math> as <math>\langle x,y\rangle=\left(\sum_{i=1}^nx_iy_i\right)\bmod 2</math>, which has an equivalent definition <math>\langle x,y\rangle=\bigoplus_{i=1}^n(x_i\wedge y_i)</math>, where <math>\oplus</math> is the Boolean operator XOR.
{{Theorem|Freivalds' Algorithm (multi-round)|
{{Theorem|A randomized (public coin) protocol for EQ|
*pick <math>k</math> vectors <math>r_1,r_2,\ldots,r_k \in\{0, 1\}^n</math> uniformly and independently at random;
: Suppose that a uniformly random Boolean vector <math>r\in\{0,1\}^n</math> is known to both Alice and Bob.
*if <math>A(Br_i) = Cr_i</math> for all <math>i=1,\ldots,k</math> then return "yes" else return "no";
'''Alice does''':
:* send <math>\langle x, r\rangle</math> to Bob;
'''Upon receiving''' <math>\langle x, r\rangle</math>, '''Bob does''':
:* If <math>\langle x, r\rangle=\langle y, r\rangle</math> return "'''probably identical'''"; else return "'''distinct'''".
}}
}}


Same as before, if <math>x=y</math>, the output is always correct. If <math>x\neq y</math>, it is easy to check that the protocol gives a wrong anser with probability at most <math>1/2</math>. By repeatedly running the protocol for a number of times (with independent public coins <math>r</math>), the error probability can be reduced significantly.
If <math>AB=C</math>, then the algorithm returns a "yes" with probability 1. If <math>AB\neq C</math>, then due to the independence, the probability that all <math>r_i</math> have <math>ABr_i=C_i</math> is at most <math>2^{-k}</math>, so the algorithm returns "no" with probability at least <math>1-2^{-k}</math>. Choose <math>k=O(\log n)</math>. The algorithm runs in time <math>O(n^2\log n)</math> and has a one-sided error (false positive) bounded by <math>\frac{1}{\mathrm{poly}(n)}</math>.

Revision as of 10:31, 22 February 2013

Checking Matrix Multiplication

Consider the following problem:

  • Input: Three [math]\displaystyle{ n\times n }[/math] matrices [math]\displaystyle{ A,B }[/math] and [math]\displaystyle{ C }[/math].
  • Output: return "yes" if [math]\displaystyle{ C=AB }[/math] and "no" if otherwise.

A naive way of checking the equality is first computing [math]\displaystyle{ AB }[/math] and then comparing the result with [math]\displaystyle{ C }[/math]. The (asymptotically) fastest matrix multiplication algorithm known today runs in time [math]\displaystyle{ O(n^{2.376}) }[/math]. The naive algorithm will take asymptotically the same amount of time.

Freivalds Algorithm

The following is a very simple randomized algorithm, due to Freivalds, running in only [math]\displaystyle{ O(n^2) }[/math] time:

Algorithm (Freivalds)
  • pick a vector [math]\displaystyle{ r \in\{0, 1\}^n }[/math] uniformly at random;
  • if [math]\displaystyle{ A(Br) = Cr }[/math] then return "yes" else return "no";

The product [math]\displaystyle{ A(Br) }[/math] is computed by first multiplying [math]\displaystyle{ Br }[/math] and then [math]\displaystyle{ A(Br) }[/math]. The running time is [math]\displaystyle{ O(n^2) }[/math] because the algorithm does 3 matrix-vector multiplications in total.

If [math]\displaystyle{ AB=C }[/math] then [math]\displaystyle{ A(Br) = Cr }[/math] for any [math]\displaystyle{ r \in\{0, 1\}^n }[/math], thus the algorithm will return a "yes" for any positive instance ([math]\displaystyle{ AB=C }[/math]). But if [math]\displaystyle{ AB \neq C }[/math] then the algorithm will make a mistake if it chooses such an [math]\displaystyle{ r }[/math] that [math]\displaystyle{ ABr = Cr }[/math]. However, the following lemma states that the probability of this event is bounded.

Lemma
If [math]\displaystyle{ AB\neq C }[/math] then for a uniformly random [math]\displaystyle{ r \in\{0, 1\}^n }[/math],
[math]\displaystyle{ \Pr[ABr = Cr]\le \frac{1}{2} }[/math].
Proof.
Let [math]\displaystyle{ D=AB-C }[/math]. The event [math]\displaystyle{ ABr=Cr }[/math] is equivalent to that [math]\displaystyle{ Dr=0 }[/math]. It is then sufficient to show that for a [math]\displaystyle{ D\neq \boldsymbol{0} }[/math], it holds that [math]\displaystyle{ \Pr[Dr = \boldsymbol{0}]\le \frac{1}{2} }[/math].

Since [math]\displaystyle{ D\neq \boldsymbol{0} }[/math], it must have at least one non-zero entry. Suppose that [math]\displaystyle{ D(i,j)\neq 0 }[/math].

We assume the event that [math]\displaystyle{ Dr=\boldsymbol{0} }[/math]. In particular, the [math]\displaystyle{ i }[/math]-th entry of [math]\displaystyle{ Dr }[/math] is

[math]\displaystyle{ (Dr)_{i}=\sum_{k=1}^nD(i,k)r_k=0. }[/math]

The [math]\displaystyle{ r_j }[/math] can be calculated by

[math]\displaystyle{ r_j=-\frac{1}{D(i,j)}\sum_{k\neq j}^nD(i,k)r_k. }[/math]

Once all [math]\displaystyle{ r_k }[/math] where [math]\displaystyle{ k\neq j }[/math] are fixed, there is a unique solution of [math]\displaystyle{ r_j }[/math]. That is to say, conditioning on any [math]\displaystyle{ r_k, k\neq j }[/math], there is at most one value of [math]\displaystyle{ r_j\in\{0,1\} }[/math] satisfying [math]\displaystyle{ Dr=0 }[/math]. On the other hand, observe that [math]\displaystyle{ r_j }[/math] is chosen from two values [math]\displaystyle{ \{0,1\} }[/math] uniformly and independently at random. Therefore, with at least [math]\displaystyle{ \frac{1}{2} }[/math] probability, the choice of [math]\displaystyle{ r }[/math] fails to give us a zero [math]\displaystyle{ Dr }[/math]. That is, [math]\displaystyle{ \Pr[ABr=Cr]=\Pr[Dr=0]\le\frac{1}{2} }[/math].

[math]\displaystyle{ \square }[/math]

When [math]\displaystyle{ AB=C }[/math], Freivalds algorithm always returns "yes"; and when [math]\displaystyle{ AB\neq C }[/math], Freivalds algorithm returns "no" with probability at least 1/2.

To improve its accuracy, we can run Freivalds algorithm for [math]\displaystyle{ k }[/math] times, each time with an independent random [math]\displaystyle{ r\in\{0,1\}^n }[/math], and return "yes" iff all running instances pass the test.

Freivalds' Algorithm (multi-round)
  • pick [math]\displaystyle{ k }[/math] vectors [math]\displaystyle{ r_1,r_2,\ldots,r_k \in\{0, 1\}^n }[/math] uniformly and independently at random;
  • if [math]\displaystyle{ A(Br_i) = Cr_i }[/math] for all [math]\displaystyle{ i=1,\ldots,k }[/math] then return "yes" else return "no";

If [math]\displaystyle{ AB=C }[/math], then the algorithm returns a "yes" with probability 1. If [math]\displaystyle{ AB\neq C }[/math], then due to the independence, the probability that all [math]\displaystyle{ r_i }[/math] have [math]\displaystyle{ ABr_i=C_i }[/math] is at most [math]\displaystyle{ 2^{-k} }[/math], so the algorithm returns "no" with probability at least [math]\displaystyle{ 1-2^{-k} }[/math]. Choose [math]\displaystyle{ k=O(\log n) }[/math]. The algorithm runs in time [math]\displaystyle{ O(n^2\log n) }[/math] and has a one-sided error (false positive) bounded by [math]\displaystyle{ \frac{1}{\mathrm{poly}(n)} }[/math].