随机算法 (Fall 2015)/Identity Testing

From EtoneWiki
Jump to: navigation, search

Checking Matrix Multiplication

The evolution of time complexity for matrix multiplication.

Let be a feild (you may think of it as the filed of rational numbers, or the finite field of integers modulo prime ). We suppose that each field operation (addition, subtraction, multiplication, division) has unit cost. This model is called the unit-cost RAM model, which is an ideal abstraction of a computer.

Consider the following problem:

  • Input: Three matrices , , and over the field .
  • Output: "yes" if and "no" if otherwise.

A naive way to solve this is to multiply and and compare the result with . The straightforward algorithm for matrix multiplication takes time, assuming that each arithmetic operation takes unit time. The Strassen's algorithm discovered in 1969 now implemented by many numerical libraries runs in time . Strassen's algorithm starts the search for fast matrix multiplication algorithms. The Coppersmith–Winograd algorithm discovered in 1987 runs in time but is only faster than Strassens' algorithm on extremely large matrices due to the very large constant coefficient. This has been the best known for decades, until recently Stothers got an algorithm in his PhD thesis in 2010, and independently Vassilevska Williams got an algorithm in 2012. Both these improvements are based on generalization of Coppersmith–Winograd algorithm. It is unknown whether the matrix multiplication can be done in time .

Freivalds Algorithm

The following is a very simple randomized algorithm due to Freivalds, running in time:

Algorithm (Freivalds, 1979)
  • pick a vector uniformly at random;
  • if then return "yes" else return "no";

The product is computed by first multiplying and then . The running time of Freivalds algorithm is because the algorithm computes 3 matrix-vector multiplications.

If then for any , thus the algorithm will return a "yes" for any positive instance (). But if then the algorithm will make a mistake if it chooses such an that . However, the following lemma states that the probability of this event is bounded.

Lemma
If then for a uniformly random ,
.
Proof.
Let . The event is equivalent to that . It is then sufficient to show that for a , it holds that .

Since , it must have at least one non-zero entry. Suppose that .

We assume the event that . In particular, the -th entry of is

The can be calculated by

Once all other entries with are fixed, there is a unique solution of . Therefore, the number of satisfying is at most . The probability that is bounded as

.

When , Freivalds algorithm always returns "yes"; and when , Freivalds algorithm returns "no" with probability at least 1/2.

To improve its accuracy, we can run Freivalds algorithm for times, each time with an independent , and return "yes" if and only if all running instances returns "yes".

Freivalds' Algorithm (multi-round)
  • pick vectors uniformly and independently at random;
  • if for all then return "yes" else return "no";

If , then the algorithm returns a "yes" with probability 1. If , then due to the independence, the probability that all have is at most , so the algorithm returns "no" with probability at least . For any , choose . The algorithm runs in time and has a one-sided error (false positive) bounded by .

Polynomial Identity Testing (PIT)

The Polynomial Identity Testing (PIT) is such a problem: given as input two polynomials, determine whether two polynomials are identical. This problem plays a fundamental role in Computer Science.

First let's consider the following simplified version of Polynomial Identity Testing (PIT) which takes only the single-variate polynomials:

  • Input: two polynomials of degree .
  • Output: "yes" if two polynomials are identical, i.e. , and "no" if otherwise.

The denote the ring of polynomials over field .

Alternatively, we can consider the following equivalent problem:

  • Input: a polynomial of degree .
  • Output: "yes" if , and "no" if otherwise.

The probalem is trivial if is presented in its explicit form . But we assume that is given in product form or as black box.

A straightforward deterministic algorithm that solves PIT is to query points and check whether thay are all zero. This can determine whether by interpolation.

We now introduce a simple randomized algorithm for the problem.

Algorithm for PIT
  • pick uniformly at random;
  • if then return “yes” else return “no”;

This algorithm requires only the evaluation of at a single point. And if it is always correct. And if then the probability that the algorithm wrongly returns "yes" is bounded as follows.

Theorem
Let be a polynomial of degree over the field . Let be an arbitrary set and is chosen uniformly at random from . If then
Proof.

A non-zero -degree polynomial has at most distinct roots, thus at most members of satisfy that . Therefore, .

By the theorem, the algorithm can distinguish a non-zero polynomial from 0 with probability at least . This is achieved by evaluation of the polynomial at only one point and many random bits.

Communication Complexity of Equality

The communication complexity is introduced by Andrew Chi-Chih Yao as a model of computation which involves multiple participants, each with partial information of the input.

Assume that there are two entities, say Alice and Bob. Alice has a private input and Bob has a private input . Together they want to compute a function by communicating with each other. The communication follows a predefined communication protocol (the "algorithm" in this model) whose logics depends only on the problem but not on the inputs. The complexity of a communication protocol is measured by the number of bits communicated between Alice and Bob in the worst case.

The problem of checking identity is formally defined by the function EQ as follows: and for any ,

A trivial way to solve EQ is to let Bob send his entire input string to Alice and let Alice check whether . This costs bits of communications.

It is known that for deterministic communication protocols, this is the best we can get for computing EQ.

Theorem (Yao 1979)
Any deterministic communication protocol computing EQ on two -bit strings costs bits of communication in the worst-case.

This theorem is much more nontrivial to prove than it looks, because Alice and Bob are allowed to interact with each other in arbitrary ways. The proof of this theorem in Yao's 1979 paper initiates the field of communication complexity.

If the randomness is allowed, we can solve this problem up to a tolerable probabilistic error with significantly less communications. The inputs are two strings of bits. Let and be an arbitrary prime number. (Such a prime always exists.) The input strings can be respectively represented as two polynomials such that and of degree , where all additions and multiplications are modulo . The randomized communication protocol is given as follows:

A randomized protocol for EQ

Alice does:

  • pick uniformly at random;
  • send and to Bob;

Upon receiving and Bob does:

  • If return "yes"; else return "no".

Repeat this protocol for 100 times. The total number of bits to communicate is bounded by . Due to the analysis of the randomized algorithm for PIT, if the protocol is always correct and if the protocol fails to report a difference with probability less than .

Schwartz-Zippel Theorem

Now let's move on to the the true form of Polynomial Identity Testing (PIT) which works on multi-variate polynomials:

  • Input: two -variate polynomials of degree .
  • Output: "yes" if , and "no" if otherwise.

The is the ring of multi-variate polynomials over field . The most natural way to represent an -variate polynomial of degree is to write it as a sum of monomials:

.

The degree or total degree of a monomial is given by and the degree of a polynomial is the maximum degree of monomials of nonzero coefficients.

Alternatively, we can consider the following equivalent problem:

  • Input: a polynomial of degree .
  • Output: "yes" if , and "no" if otherwise.

If is written explicitly as a sum of monomials, then the problem is trivial. Again we allow to be represented in product form.

Example

The Vandermonde matrix is defined as that , that is

.

Let be the polynomial defined as

It is pretty easy to evaluate on any particular , however it is prohibitively expensive to symbolically expand to its sum-of-monomial form.

Here is a very simple randomized algorithm, due to Schwartz and Zippel.

Randomized algorithm for multi-variate PIT
  • fix an arbitrary set whose size to be fixed;
  • pick uniformly and independently at random;
  • if then return “yes” else return “no”;

This algorithm requires only the evaluation of at a single point . And if it is always correct.

In the Theorem below, we’ll see that if then the algorithm is incorrect with probability at most , where is the degree of the polynomial .

Schwartz-Zippel Theorem
Let be a multivariate polynomial of degree over a field such that . Fix any finite set , and let be chosen uniformly and independently at random from . Then
Proof.

We prove by induction on the number of variables.

For , assuming that , due to the fundamental theorem of algebra, the degree- polynomial has at most roots, thus

Assume the induction hypothesis for a multi-variate polynomial up to variable.

An -variate polynomial can be represented as

,

where is the largest power of , which means that the degree of is at most and .

In particular, we write as a sum of two parts:

,

where both and are polynomials, such that

  • as argued above, the degree of is at most and ;
  • , thus has no factor in any term.

By the law of total probability, it holds that

Note that is a polynomial on variables of degree such that . By the induction hypothesis, we have

For the second case, recall that has no factor in any term, thus the condition guarantees that

is a single-variate polynomial such that the degree of is and , for which we already known that the probability is at most . Therefore,

.

Substituting both and back in the total probability, we have

which proves the theorem.


In above proof, for the second case that , we use an "probabilistic arguement" to deal with the random choices in the condition. Here we give a more rigorous proof by enumerating all elementary events in applying the law of total probability. You make your own judgement which proof is better.

By the law of total probability,

We have argued that and the degree of is . By the induction hypothesis, we have

And for every fixed such that , we have argued that is a polynomial in of degree , thus

which holds for all such that , therefore the weighted average

Substituting these inequalities back to the total probability, we have

Fingerprinting

The Freivald's algorithm and Schwartz-Zippel theorem can be abstracted as the following procedure: Suppose we want to compare two items and . Instead of comparing them directly, we compute random fingerprints and of them and compare the fingerprints. The fingerprints has the following properties:

  • is a function, so if then .
  • If then is small.
  • It is much easier to compute and compare the fingerprints than to compare and directly.

In Freivald's algorithm, the items to compare are two matrices and , and given an matrix , its random fingerprint is computed as for a uniformly random .

In Schwartz-Zippel theorem, the items to compare are two polynomials and , and given a polynomial , its random fingerprint is computed as for chosen independently and uniformly at random from some fixed set .

For different problems, we may have different definitions of .

Communication complexity revisited

Now consider again the communication model where the two players Alice with a private input and Bob with a private input together compute a function by running a communication protocol.

We still consider the communication protocols for the equality function EQ

With the language of fingerprinting, this communication problem can be solved by the following generic scheme:

  • Alice choose a random fingerprint function and compute the fingerprint of her input ;
  • Alice sends both the description of and the value of to Bob;
  • Bob computes and check whether .

In this way we have a randomized communication protocol for the equality function EQ with a false positive. The communication cost as well as the error probability are reduced to the question of how to design this random fingerprint function to guarantee:

  1. A random can be described succinctly.
  2. The range of is small, so the fingerprints are succinct.
  3. If , the probability is small.

In above application of single-variate PIT, we know that , where is a random element from a finite field and the additions and multiplications are defined over the finite field, is a good fingerprint function. Now we introduce another fingerprint and hence a new communication protocol.

The new fingerprint function we design is as follows: by treating the input string as the binary representation of a number, let for some random prime . The prime can uniquely specify a random fingerprint function , thus can be used as a description of the function, and alos the range of the fingerprints is , thus we want the prime to be reasonably small, but still has a good chance to distinguish different and after modulo .

A randomized protocol for EQ

Alice does:

for some parameter (to be specified),
  • choose uniformly at random a prime ;
  • send and to Bob;

Upon receiving and , Bob does:

  • check whether .

The number of bits to be communicated is . We then bound the probability of error for , in terms of .

Suppose without loss of generality . Let . Then since , and for . It holds that if and only if is dividable by . Note that since . We only need to bound the probability

for , where is a random prime chosen from .

The probability is computed directly as

.

For the numerator, we have the following lemma.

Lemma
The number of distinct prime divisors of any natural number less than is at most .
Proof.
Each prime number is . If an has more than distinct prime divisors, then .

Due to this lemma, has at most prime divisors.

We then lower bound the number of primes in . This is given by the celebrated Prime Number Theorem (PNT).

Prime Number Theorem
Let denote the number of primes less than . Then as .

Therefore, by choosing for some , we have that for a , and a random prime ,

.

We can make this error probability polynomially small and the number of bits to be communicated is still .

Randomized pattern matching

Consider the following problem of pattern matching, which has nothing to do with communication complexity.

  • Input: a string and a "pattern" .
  • Determine whether the pattern is a contiguous substring of . Usually, we are also asked to find the location of the substring.

A naive algorithm trying every possible match runs in time. The more sophisticated KMP algorithm inspired by automaton theory runs in time.

A simple randomized algorithm, due to Karp and Rabin, uses the idea of fingerprinting and also runs in time.

Let denote the substring of of length starting at position .

Algorithm (Karp-Rabin)
pick a random prime ;
for to do
if then report a match;
return "no match";

So the algorithm just compares the and for every , with the same definition of fingerprint function as in the communication protocol for EQ.

By the same analysis, by choosing , the probability of a single false match is

.

By the union bound, the probability that a false match occurs is .

The algorithm runs in linear time if we assume that we can compute for each in constant time. This outrageous assumption can be made realistic by the following observation.

Lemma
Let .
.
Proof.
It holds that
.

So the equation holds on the finite field modulo .

Due to this lemma, each fingerprint can be computed in an incremental way, each in constant time. The running time of the algorithm is .