高级算法 (Fall 2023)/Fingerprinting

From TCS Wiki
Revision as of 11:02, 19 September 2023 by Etone (talk | contribs)
Jump to navigation Jump to search

Checking Matrix Multiplication

The evolution of time complexity O(nω) for matrix multiplication.

Let F be a feild (you may think of it as the filed Q of rational numbers, or the finite field Zp of integers modulo prime p). We suppose that each field operation (addition, subtraction, multiplication, division) has unit cost. This model is called the unit-cost RAM model, which is an ideal abstraction of a computer.

Consider the following problem:

  • Input: Three n×n matrices A, B, and C over the field F.
  • Output: "yes" if C=AB and "no" if otherwise.

A naive way to solve this is to multiply A and B and compare the result with C. The straightforward algorithm for matrix multiplication takes O(n3) time, assuming that each arithmetic operation takes unit time. The Strassen's algorithm discovered in 1969 now implemented by many numerical libraries runs in time O(nlog27)O(n2.81). Strassen's algorithm starts the search for fast matrix multiplication algorithms. The Coppersmith–Winograd algorithm discovered in 1987 runs in time O(n2.376) but is only faster than Strassens' algorithm on extremely large matrices due to the very large constant coefficient. This has been the best known for decades, until recently Stothers got an O(n2.374) algorithm in his PhD thesis in 2010, and independently Vassilevska Williams got an O(n2.373) algorithm in 2012. Both these improvements are based on generalization of Coppersmith–Winograd algorithm. It is unknown whether the matrix multiplication can be done in time O(n2+o(1)).

Freivalds Algorithm

The following is a very simple randomized algorithm due to Freivalds, running in O(n2) time:

Algorithm (Freivalds, 1979)
  • pick a vector r{0,1}n uniformly at random;
  • if A(Br)=Cr then return "yes" else return "no";

The product A(Br) is computed by first multiplying Br and then A(Br). The running time of Freivalds algorithm is O(n2) because the algorithm computes 3 matrix-vector multiplications.

If AB=C then A(Br)=Cr for any r{0,1}n, thus the algorithm will return a "yes" for any positive instance (AB=C). But if ABC then the algorithm will make a mistake if it chooses such an r that ABr=Cr. However, the following lemma states that the probability of this event is bounded.

Lemma
If ABC then for a uniformly random r{0,1}n,
Pr[ABr=Cr]12.
Proof.
Let D=ABC. The event ABr=Cr is equivalent to that Dr=0. It is then sufficient to show that for a D0, it holds that Pr[Dr=0]12.

Since D0, it must have at least one non-zero entry. Suppose that Dij0.

We assume the event that Dr=0. In particular, the i-th entry of Dr is

(Dr)i=k=1nDikrk=0.

The rj can be calculated by

rj=1DijkjnDikrk.

Once all other entries rk with kj are fixed, there is a unique solution of rj. Therefore, the number of r{0,1}n satisfying Dr=0 is at most 2n1. The probability that ABr=Cr is bounded as

Pr[ABr=Cr]=Pr[Dr=0]2n12n=12.

When AB=C, Freivalds algorithm always returns "yes"; and when ABC, Freivalds algorithm returns "no" with probability at least 1/2.

To improve its accuracy, we can run Freivalds algorithm for k times, each time with an independent r{0,1}n, and return "yes" if and only if all running instances returns "yes".

Freivalds' Algorithm (multi-round)
  • pick k vectors r1,r2,,rk{0,1}n uniformly and independently at random;
  • if A(Bri)=Cri for all i=1,,k then return "yes" else return "no";

If AB=C, then the algorithm returns a "yes" with probability 1. If ABC, then due to the independence, the probability that all ri have ABri=Ci is at most 2k, so the algorithm returns "no" with probability at least 12k. For any 0<ϵ<1, choose k=log21ϵ. The algorithm runs in time O(n2log21ϵ) and has a one-sided error (false positive) bounded by ϵ.


Polynomial Identity Testing (PIT)

The Polynomial Identity Testing (PIT) is such a problem: given as input two polynomials, determine whether they are identical. It plays a fundamental role in Identity Testing problems.

First, let's consider the univariate ("one variable") case:

  • Input: two polynomials f,gF[x] of degree d.
  • Determine whether fg (f and g are identical).

Here the F[x] denotes the ring of univariate polynomials on a field F. More precisely, a polynomial fF[x] is

f(x)=i=0aixi,

where the coefficients ai are taken from the field F, and the addition and multiplication are also defined over the field F. And:

  • the degree of f is the highest i with non-zero ai;
  • a polynomial f is a zero-polynomial, denoted as f0, if all coefficients ai=0.

Alternatively, we can consider the following equivalent problem by comparing the polynomial fg (whose degree is at most d) with the zero-polynomial:

  • Input: a polynomial fF[x] of degree d.
  • Determine whether f0 (f is the 0 polynomial).

The problem is trivial if the input polynomial f is given explicitly: one can trivially solve the problem by checking whether all d+1 coefficients are 0. To make the problem nontrivial, we assume that the input polynomial is given implicitly as a black box (also called an oracle): the only way the algorithm can access to f is to evaluate f(x) over some x from the field F, x chosen by the algorithm.

A straightforward deterministic algorithm is to evaluate f(x1),f(x2),,f(xd+1) over d+1 distinct elements x1,x2,,xd+1 from the field F and check whether they are all zero. By the fundamental theorem of algebra, also known as polynomial interpolations, this guarantees to verify whether a degree-d univariate polynomial f0.

Fundamental Theorem of Algebra
Any non-zero univariate polynomial of degree d has at most d roots.

The reason for this fundamental theorem holding generally over any field F is that any univariate polynomial of degree d factors uniquely into at most d irreducible polynomials, each of which has at most one root.

The following simple randomized algorithm is natural:

Algorithm for PIT
  • suppose we have a finite subset SF (to be specified later);
  • pick rS uniformly at random;
  • if f(r)=0 then return “yes” else return “no”;

This algorithm evaluates f at one point chosen uniformly at random from a finite subset SF. It is easy to see the followings:

  • If f0, the algorithm always returns "yes", so it is always correct.
  • If f0, the algorithm may wrongly return "yes" (a false positive). But this happens only when the random r is a root of f. By the fundamental theorem of algebra, f has at most d roots, so the probability that the algorithm is wrong is bounded as
Pr[f(r)=0]d|S|.

By fixing SF to be an arbitrary subset of size |S|=2d, this probability of false positive is at most 1/2. We can reduce it to an arbitrarily small constant δ by repeat the above testing independently for log21δ times, since the error probability decays geometrically as we repeat the algorithm independently.

Communication Complexity of Equality

The communication complexity is introduced by Andrew Chi-Chih Yao as a model of computation with more than one entities, each with partial information about the input.

Assume that there are two entities, say Alice and Bob. Alice has a private input a and Bob has a private input b. Together they want to compute a function f(a,b) by communicating with each other. The communication follows a predefined communication protocol (the "algorithm" in this model). The complexity of a communication protocol is measured by the number of bits communicated between Alice and Bob in the worst case.

The problem of checking identity is formally defined by the function EQ as follows: EQ:{0,1}n×{0,1}n{0,1} and for any a,b{0,1}n,

EQ(a,b)={1if a=b,0otherwise.

A trivial way to solve EQ is to let Bob send his entire input string b to Alice and let Alice check whether a=b. This costs n bits of communications.

It is known that for deterministic communication protocols, this is the best we can get for computing EQ.

Theorem (Yao 1979)
Any deterministic communication protocol computing EQ on two n-bit strings costs n bits of communication in the worst-case.

This theorem is much more nontrivial to prove than it looks, because Alice and Bob are allowed to interact with each other in arbitrary ways. The proof of this theorem is in Yao's celebrated paper in 1979 with a humble title. It pioneered the field of communication complexity.

If we allow randomness in protocols, and also tolerate a small probabilistic error, the problem can be solved with significantly less communications. To present this randomized protocol, we need a few preparations:

  • We represent the inputs a,b{0,1}n of Alice and Bob as two univariate polynomials of degree at most n1, respectively
f(x)=i=0n1aixi and g(x)=i=0n1bixi.
  • The two polynomials f and g are defined over finite field Zp={0,1,,p1} for some suitable prime p (to be specified later), which means the additions and multiplications are modulo p.

The randomized communication protocol is then as follows:

A randomized protocol for EQ

Bob does:

  • pick rZp uniformly at random;
  • send r and g(r) to Alice;

Upon receiving r and g(r) Alice does:

  • compute f(r);
  • If f(r)=g(r) return "yes"; else return "no".

The communication complexity of the protocol is given by the number of bits used to represent the values of r and g(r). Since the polynomials are defined over finite field Zp and the random number f is also chosen from Zp, this is bounded by O(logp).

On the other hand the protocol makes mistakes only when ab but wrongly answers "yes". This happens only when fg but f(r)=g(r). The degrees of f,g are at most n1, and r is chosen among p distinct values, we have

Pr[f(r)=g(r)]n1p.

By choosing p to be a prime in the interval [n2,2n2] (by Chebyshev's theorem, such prime p always exists), the above randomized communication protocol solves the Equality function EQ with an error probability of false positive at most O(1/n), with communication complexity O(logn), an EXPONENTIAL improvement to ANY deterministic communication protocol!

Schwartz-Zippel Theorem

Now let's see the the true form of Polynomial Identity Testing (PIT), for multivariate polynomials:

  • Input: two n-variate polynomials f,gF[x1,x2,,xn] of degree d.
  • Determine whether fg.

The F[x1,x2,,xn] is the ring of multivariate polynomials over field F. An n-variate polynomial of degree d, written as a sum of monomials, is:

f(x1,x2,,xn)=i1,i2,,in0i1+i2++indai1,i2,,inx1i1x2i2xnin.

The degree or total degree of a monomial ai1,i2,,inx1i1x2i2xnin is given by i1+i2++in and the degree of a polynomial f is the maximum degree of monomials of nonzero coefficients.

As before, we also consider the following equivalent problem:

  • Input: a polynomial fF[x1,x2,,xn] of degree d.
  • Determine whether f0.

If f is written explicitly as a sum of monomials, then the problem can be solved by checking whether all coefficients, and there at most (n+dd)(n+d)d coefficients in an n-variate polynomial of degree at most d.

A multivariate polynomial f can also be presented in its product form, for example:

Example

The Vandermonde matrix M=M(x1,x2,,xn) is defined as that Mij=xij1, that is

M=[1x1x12x1n11x2x22x2n11x3x32x3n11xnxn2xnn1].

Let f be the polynomial defined as

f(x1,,xn)=det(M)=j<i(xixj).

For polynomials in product form, it is quite efficient to evaluate the polynomial at any specific point from the field over which the polynomial is defined, however, expanding the polynomial to a sum of monomials can be very expensive.

The following is a simple randomized algorithm for testing identity of multivariate polynomials:

Randomized algorithm for multivariate PIT
  • suppose we have a finite subset SF (to be specified later);
  • pick r1,r2,,rnS uniformly and independently at random;
  • if f(r)=f(r1,r2,,rn)=0 then return “yes” else return “no”;

This algorithm evaluates f at one point chosen uniformly from an n-dimensional cube Sn, where SF is a finite subset. And:

  • If f0, the algorithm always returns "yes", so it is always correct.
  • If f0, the algorithm may wrongly return "yes" (a false positive). But this happens only when the random r=(r1,r2,,rn) is a root of f. The probability of this bad event is upper bounded by the following famous result due to Schwartz (1980) and Zippel (1979).
Schwartz-Zippel Theorem
Let fF[x1,x2,,xn] be a multivariate polynomial of degree d over a field F. If f0, then for any finite set SF, and r1,r2,rnS chosen uniformly and independently at random,
Pr[f(r1,r2,,rn)=0]d|S|.

The Schwartz-Zippel Theorem states that for any nonzero n-variate polynomial of degree at most d, the number of roots in any cube Sn is at most d|S|n1.

Dana Moshkovitz gave a surprisingly simply and elegant proof of Schwartz-Zippel Theorem, using some advanced ideas. Now we introduce the standard proof by induction.

Proof.

The theorem is proved by induction on n.

Induction basis:

For n=1, this is the univariate case. Assume that f0. Due to the fundamental theorem of algebra, any polynomial f(x) of degree at most d must have at most d roots, thus

Pr[f(r)=0]d|S|.
Induction hypothesis:

Assume the theorem holds for any m-variate polynomials for all m<n.

Induction step:

For any n-variate polynomial f(x1,x2,,xn) of degree at most d, we write f as

f(x1,x2,,xn)=i=0kxnifi(x1,x2,,xn1),

where k is the highest degree of xn, which means the degree of fk is at most dk and fk0.

In particular, we write f as a sum of two parts:

f(x1,x2,,xn)=xnkfk(x1,x2,,xn1)+f¯(x1,x2,,xn),

where both fk and f¯ are polynomials, such that

  • fk0 is as above, whose degree is at most dk;
  • f¯(x1,x2,,xn)=i=0k1xnifi(x1,x2,,xn1), thus f¯(x1,x2,,xn) has no xnk factor in any term.

By the law of total probability, we have

Pr[f(r1,r2,,rn)=0]=Pr[f(r)=0fk(r1,r2,,rn1)=0]Pr[fk(r1,r2,,rn1)=0]+Pr[f(r)=0fk(r1,r2,,rn1)0]Pr[fk(r1,r2,,rn1)0].

Note that fk(r1,r2,,rn1) is a polynomial on n1 variables of degree dk such that fk0. By the induction hypothesis, we have

()Pr[fk(r1,r2,,rn1)=0]dk|S|.

Now we look at the case conditioning on fk(r1,r2,,rn1)0. Recall that f¯(x1,,xn) has no xnk factor in any term, thus the condition fk(r1,r2,,rn1)0 guarantees that

f(r1,,rn1,xn)=xnkfk(r1,r2,,rn1)+f¯(r1,r2,,rn1,xn)=gr1,,rn1(xn)

is a nonzero univariate polynomial of xn such that the degree of gr1,,rn1(xn) is k and gr1,,rn10, for which we already known that the probability gr1,,rn1(rn)=0 is at most k|S|. Therefore,

()Pr[f(r)=0fk(r1,r2,,rn1)0]=Pr[gr1,,rn1(rn)=0fk(r1,r2,,rn1)0]k|S|.

Substituting both () and () back in the total probability, we have

Pr[f(r1,r2,,rn)=0]dk|S|+k|S|=d|S|,

which proves the theorem.

Fingerprinting

The polynomial identity testing algorithm in the Schwartz-Zippel theorem can be abstracted as the following framework: Suppose we want to compare two objects X and Y. Instead of comparing them directly, we compute random fingerprints FING(X) and FING(Y) of them and compare the fingerprints.

The fingerprints has the following properties:

  • FING() is a function, meaning that if X=Y then FING(X)=FING(Y).
  • It is much easier to compute and compare the fingerprints.
  • Ideally, the domain of fingerprints is much smaller than the domain of original objects, so storing and comparing fingerprints are easy. This means the fingerprint function FING() cannot be an injection (one-to-one mapping), so it's possible that different X and Y are mapped to the same fingerprint. We resolve this by making fingerprint function randomized, and for XY, we want the probability Pr[FING(X)=FING(Y)] to be small.

In Schwartz-Zippel theorem, the objects to compare are polynomials from F[x1,,xn]. Given a polynomial fF[x1,,xn], its fingerprint is computed as FING(f)=f(r1,,rn) for ri chosen independently and uniformly at random from some fixed set SF.

With this generic framework, for various identity testing problems, we may design different fingerprints FING().

Communication protocols for Equality by fingerprinting

Now consider again the communication model where the two players Alice with a private input x{0,1}n and Bob with a private input y{0,1}n together compute a function f(x,y) by running a communication protocol.

We still consider the communication protocols for the equality function EQ

EQ(x,y)={1if x=y,0otherwise.

With the language of fingerprinting, this communication problem can be solved by the following generic scheme:

Communication protocol for EQ by fingerprinting

Bob does:

  • choose a random fingerprint function FING() and compute the fingerprint of her input FING(y);
  • sends both the description of FING() and the value of FING(y) to Alice;

Upon receiving the description of FING() and the value of FING(y), Alice does:

  • computes FING(x) and check whether FING(x)=FING(y).

In this way we have a randomized communication protocol for the equality function EQ with false positive. The communication cost as well as the error probability are reduced to the question of how to design this random fingerprint function FING() to guarantee:

  1. A random fingerprint function FING() can be described succinctly.
  2. The range of FING() is small, so the fingerprints are succinct.
  3. If xy, the probability Pr[FING(x)=FING(y)] is small.

Fingerprinting by PIT

As before, we can define the fingerprint function as: for any bit-string x{0,1}n, its random fingerprint is FING(x)=i=1nxiri, where the additions and multiplications are defined over a finite field Zp, and r is chosen uniformly at random from Zp, where p is some suitable prime which can be represented in Θ(logn) bits. More specifically, we can choose p to be any prime from the interval [n2,2n2]. Due to Chebyshev's theorem, such prime must exist.

As we have shown before, it takes O(logp)=O(logn) bits to represent FING(y) and to describe the random function FING() (since it a random function FING() from this family is uniquely identified by a random rZp, which can be represented within logp=O(logn) bits). And it follows easily from the fundamental theorem of algebra that for any distinct x,y{0,1}n,

Pr[FING(x)=FING(y)]n1p1n.

Fingerprinting by randomized checksum

Now we consider a new fingerprint function: We treat each input string x{0,1}n as the binary representation of a number, and let FING(x)=xmodp for some random prime p chosen from [k]={0,1,,k1}, for some k to be specified later.

Now a random fingerprint function FING() can be uniquely identified by this random prime p. The new communication protocol for EQ with this fingerprint is as follows:

Communication protocol for EQ by random checksum

Bob does:

for some parameter k (to be specified),
  • choose a prime p[k] uniformly at random;
  • send p and xmodp to Alice;

Upon receiving p and xmodp, Alice does:

  • check whether xmodp=ymodp.

The number of bits to be communicated is obviously O(logk). When xy, we want to upper bound the error probability Pr[xmodp=ymodp].

Suppose without loss of generality x>y. Let z=xy. Then z<2n since x,y[2n], and z0 for xy. It holds that xy(modp) if and only if pz. Therefore, we only need to upper bound the probability

Pr[zmodp=0]

for an arbitrarily fixed 0<z<2n, and a uniform random prime p[k].

The probability Pr[zmodp=0] is computed directly as

Pr[zmodp=0]the number of prime divisors of zthe number of primes in [k].

For the numerator, any positive z<2n has at most n prime factors. To see this, by contradiction assume that z has more than n prime factors. Note that any prime number is at least 2. Then z must be greater than 2n, contradicting the fact that z<2n.

For the denominator, we need to lower bound the number of primes in [k]. This is given by the celebrated Prime Number Theorem (PNT).

Prime Number Theorem
Let π(k) denote the number of primes less than k. Then π(k)klnk as k.

Therefore, by choosing k=2n2lnn, we have that for a 0<z<2n, and a random prime p[k],

Pr[zmodp=0]nπ(k)1n,

which means the for any inputs x,y{0,1}n, if xy, then the false positive is bounded as

Pr[FING(x)=FING(y)]Pr[|xy|modp=0]1n.

Moreover, by this choice of parameter k=2n2lnn, the communication complexity of the protocol is bounded by O(logk)=O(logn).

Checking distinctness

Consider the following problem of checking distinctness:

  • Given a sequence x1,x2,,xn{1,2,,n}, check whether every element of {1,2,,n} appears exactly once.

Obviously this problem can be solved in linear time and linear space (in addition to the space for storing the input) by maintaining a n-bit vector that indicates which numbers among {1,2,,n} have appeared.

When this n is enormously large, Ω(n) space cost is too expensive. We wonder whether we could solve this problem with a space cost (in addition to the space for storing the input) much less than O(n). This can be done by fingerprinting if we tolerate a certain degree of inaccuracy.

Fingerprinting multisets

We consider the following more generalized problem, checking identity of multisets:

  • Input: two multisets A={a1,a2,,an} and B={b1,b2,,bn} where a1,a2,,b1,b2,,bn{1,2,,n}.
  • Determine whether A=B (multiset equivalence).

Here for a multiset A={a1,a2,,an}, its elements ai are not necessarily distinct. The multiplicity of an element ai in a multiset A is the number of times ai appears in A. Two multisets A and B are equivalent if they contain the same set of elements and the multiplicities of every element in A and B are equal.

Obviously the above problem of checking distinctness can be treated as a special case of checking identity of multisets: by checking the identity of the multiset A and set {1,2,,n}.

The following fingerprinting function for multisets was introduced by Lipton for solving multiset identity testing.

Fingerprint for multiset
Let p be a uniform random prime chosen from the interval [(nlogn)2,2(nlogn)2]. By Chebyshev's theorem, such prime must exist. And consider the the finite field Zp=[p].
Given a multiset A={a1,a2,,an}, we define a univariate polynomial fAZp[x] over the finite field Zp as follows:
fA(x)=i=1n(xai),
where + and are defined over the finite field Zp.
We then define the random fingerprinting function as:
FING(A)=fA(r)=i=1n(rai),
where r is chosen uniformly at random from Zp.

Since all computations of FING(A)=i=1n(rai) are over the finite field Zp, the space cost for computing the fingerprint FING(A) is only O(logp)=O(logn).

Moreover, the fingerprinting function FING(A) is invariant under permutation of elements of the multiset A={a1,a2,,an}, thus it is indeed a function of multisets (meaning every multiset has only one fingerprint). Therefore, if A=B then FING(A)=FING(B).

For two distinct multisets AB, it is possible that FING(A)=FING(B), but the following theorem due to Lipton bounds this error probability of false positive.

Theorem (Lipton 1989)
Let A={a1,a2,,an} and B={b1,b2,,bn} be two multisets whose elements are from {1,2,,n}. If AB, then
Pr[FING(A)=FING(B)]=O(1n).
Proof.

Let f~A(x)=i=1n(xai) and f~B(x)=i=1n(xbi) be two univariate polynomials defined over reals R. Note that in contrast to fA(x) and fB(x), the + and in f~A(x),f~B(x) do not modulo p. It is easy to verify that the polynomials f~A(x),f~B(x) have the following properties:

  • f~Af~B if and only if A=B. Here A=B means the multiset equivalence.
  • By the properties of finite field, for any value rZp, it holds that fA(r)=f~A(r)modp and fB(r)=f~B(r)modp.

Therefore, assuming that AB, we must have f~A(x)f~B(x). Then by the law of total probability:

Pr[FING(A)=FING(B)]=Pr[fA(r)=fB(r)fAfB]Pr[fAfB]+Pr[fA(r)=fB(r)fAfB]Pr[fAfB]Pr[fA(r)=fB(r)fAfB]+Pr[fAfB].

Note that the degrees of fA,fB are at most n and r is chosen uniformly from [p]. By the Schwartz-Zippel theorem for univariate polynomials, the first probability

Pr[fA(r)=fB(r)fAfB]np=o(1n),

since p is chosen from the interval [(nlogn)2,2(nlogn)2].

For the second probability Pr[fAfB], recall that f~Af~B, therefore there is at least a non-zero coefficient cnn in f~Af~B. The event fAfB occurs only if cmodp=0, which means

Pr[fAfB]Pr[cmodp=0]=number of prime factors of cnumber of primes in [(nlogn)2,2(nlogn)2]nlog2nπ(2(nlogn)2)π((nlogn)2).

By the prime number theorem, π(N)NlnN as N. Therefore,

Pr[fAfB]=O(nlognn2logn)=O(1n).

Combining everything together, we have Pr[FING(A)=FING(B)]=O(1n).