高级算法 (Fall 2023)/Fingerprinting: Difference between revisions
Line 245: | Line 245: | ||
which proves the theorem. | which proves the theorem. | ||
}} | }} | ||
== Detecting perfect matching == | |||
TBA | |||
;Edmonds matrix | |||
= Fingerprinting = | = Fingerprinting = |
Latest revision as of 12:21, 19 September 2023
Checking Matrix Multiplication

Let
Consider the following problem:
- Input: Three
matrices , , and over the field . - Output: "yes" if
and "no" if otherwise.
A naive way to solve this is to multiply
Freivalds Algorithm
The following is a very simple randomized algorithm due to Freivalds, running in
Algorithm (Freivalds, 1979) - pick a vector
uniformly at random; - if
then return "yes" else return "no";
- pick a vector
The product
If
Lemma - If
then for a uniformly random , .
- If
Proof. Let . The event is equivalent to that . It is then sufficient to show that for a , it holds that .Since
, it must have at least one non-zero entry. Suppose that .We assume the event that
. In particular, the -th entry of isThe
can be calculated byOnce all other entries
with are fixed, there is a unique solution of . Therefore, the number of satisfying is at most . The probability that is bounded as .
When
To improve its accuracy, we can run Freivalds algorithm for
Freivalds' Algorithm (multi-round) - pick
vectors uniformly and independently at random; - if
for all then return "yes" else return "no";
- pick
If
Polynomial Identity Testing (PIT)
The Polynomial Identity Testing (PIT) is such a problem: given as input two polynomials, determine whether they are identical. It plays a fundamental role in Identity Testing problems.
First, let's consider the univariate ("one variable") case:
- Input: two polynomials
of degree . - Determine whether
( and are identical).
Here the
,
where the coefficients
- the degree of
is the highest with non-zero ; - a polynomial
is a zero-polynomial, denoted as , if all coefficients .
Alternatively, we can consider the following equivalent problem by comparing the polynomial
- Input: a polynomial
of degree . - Determine whether
( is the 0 polynomial).
The problem is trivial if the input polynomial
A straightforward deterministic algorithm is to evaluate
Fundamental Theorem of Algebra - Any non-zero univariate polynomial of degree
has at most roots.
- Any non-zero univariate polynomial of degree
The reason for this fundamental theorem holding generally over any field
The following simple randomized algorithm is natural:
Algorithm for PIT - suppose we have a finite subset
(to be specified later); - pick
uniformly at random; - if
then return “yes” else return “no”;
- suppose we have a finite subset
This algorithm evaluates
- If
, the algorithm always returns "yes", so it is always correct. - If
, the algorithm may wrongly return "yes" (a false positive). But this happens only when the random is a root of . By the fundamental theorem of algebra, has at most roots, so the probability that the algorithm is wrong is bounded as
By fixing
Communication Complexity of Equality
The communication complexity is introduced by Andrew Chi-Chih Yao as a model of computation with more than one entities, each with partial information about the input.
Assume that there are two entities, say Alice and Bob. Alice has a private input
The problem of checking identity is formally defined by the function EQ as follows:
A trivial way to solve EQ is to let Bob send his entire input string
It is known that for deterministic communication protocols, this is the best we can get for computing EQ.
Theorem (Yao 1979) - Any deterministic communication protocol computing EQ on two
-bit strings costs bits of communication in the worst-case.
- Any deterministic communication protocol computing EQ on two
This theorem is much more nontrivial to prove than it looks, because Alice and Bob are allowed to interact with each other in arbitrary ways. The proof of this theorem is in Yao's celebrated paper in 1979 with a humble title. It pioneered the field of communication complexity.
If we allow randomness in protocols, and also tolerate a small probabilistic error, the problem can be solved with significantly less communications. To present this randomized protocol, we need a few preparations:
- We represent the inputs
of Alice and Bob as two univariate polynomials of degree at most , respectively
and .
- The two polynomials
and are defined over finite field for some suitable prime (to be specified later), which means the additions and multiplications are modulo .
The randomized communication protocol is then as follows:
A randomized protocol for EQ Bob does:
- pick
uniformly at random; - send
and to Alice;
- pick
Upon receiving
and Alice does:- compute
; - If
return "yes"; else return "no".
- compute
The communication complexity of the protocol is given by the number of bits used to represent the values of
On the other hand the protocol makes mistakes only when
.
By choosing
Schwartz-Zippel Theorem
Now let's see the the true form of Polynomial Identity Testing (PIT), for multivariate polynomials:
- Input: two
-variate polynomials of degree . - Determine whether
.
The
.
The degree or total degree of a monomial
As before, we also consider the following equivalent problem:
- Input: a polynomial
of degree . - Determine whether
.
If
A multivariate polynomial
Example The Vandermonde matrix
is defined as that , that is .
Let
be the polynomial defined as
For polynomials in product form, it is quite efficient to evaluate the polynomial at any specific point from the field over which the polynomial is defined, however, expanding the polynomial to a sum of monomials can be very expensive.
The following is a simple randomized algorithm for testing identity of multivariate polynomials:
Randomized algorithm for multivariate PIT - suppose we have a finite subset
(to be specified later); - pick
uniformly and independently at random; - if
then return “yes” else return “no”;
- suppose we have a finite subset
This algorithm evaluates
- If
, the algorithm always returns "yes", so it is always correct. - If
, the algorithm may wrongly return "yes" (a false positive). But this happens only when the random is a root of . The probability of this bad event is upper bounded by the following famous result due to Schwartz (1980) and Zippel (1979).
Schwartz-Zippel Theorem - Let
be a multivariate polynomial of degree over a field . If , then for any finite set , and chosen uniformly and independently at random,
- Let
The Schwartz-Zippel Theorem states that for any nonzero
Dana Moshkovitz gave a surprisingly simply and elegant proof of Schwartz-Zippel Theorem, using some advanced ideas. Now we introduce the standard proof by induction.
Proof. The theorem is proved by induction on
.- Induction basis:
For
, this is the univariate case. Assume that . Due to the fundamental theorem of algebra, any polynomial of degree at most must have at most roots, thus- Induction hypothesis:
Assume the theorem holds for any
-variate polynomials for all .- Induction step:
For any
-variate polynomial of degree at most , we write as ,
where
is the highest degree of , which means the degree of is at most and .In particular, we write
as a sum of two parts: ,
where both
and are polynomials, such that is as above, whose degree is at most ; , thus has no factor in any term.
By the law of total probability, we have
Note that
is a polynomial on variables of degree such that . By the induction hypothesis, we haveNow we look at the case conditioning on
. Recall that has no factor in any term, thus the condition guarantees thatis a nonzero univariate polynomial of
such that the degree of is and , for which we already known that the probability is at most . Therefore, .
Substituting both
and back in the total probability, we havewhich proves the theorem.
Detecting perfect matching
TBA
- Edmonds matrix
Fingerprinting
The polynomial identity testing algorithm in the Schwartz-Zippel theorem can be abstracted as the following framework:
Suppose we want to compare two objects
The fingerprints has the following properties:
is a function, meaning that if then .- It is much easier to compute and compare the fingerprints.
- Ideally, the domain of fingerprints is much smaller than the domain of original objects, so storing and comparing fingerprints are easy. This means the fingerprint function
cannot be an injection (one-to-one mapping), so it's possible that different and are mapped to the same fingerprint. We resolve this by making fingerprint function randomized, and for , we want the probability to be small.
In Schwartz-Zippel theorem, the objects to compare are polynomials from
With this generic framework, for various identity testing problems, we may design different fingerprints
Communication protocols for Equality
Now consider again the communication model where the two players Alice with a private input
We still consider the communication protocols for the equality function EQ
With the language of fingerprinting, this communication problem can be solved by the following generic scheme:
Communication protocol for EQ by fingerprinting Bob does:
- choose a random fingerprint function
and compute the fingerprint of her input ; - sends both the description of
and the value of to Alice;
- choose a random fingerprint function
Upon receiving the description of
and the value of , Alice does:- computes
and check whether .
- computes
In this way we have a randomized communication protocol for the equality function EQ with false positive. The communication cost as well as the error probability are reduced to the question of how to design this random fingerprint function
- A random fingerprint function
can be described succinctly. - The range of
is small, so the fingerprints are succinct. - If
, the probability is small.
Fingerprinting by PIT
As before, we can define the fingerprint function as: for any bit-string
As we have shown before, it takes
Fingerprinting by randomized checksum
Now we consider a new fingerprint function: We treat each input string
Now a random fingerprint function
Communication protocol for EQ by random checksum Bob does:
- for some parameter
(to be specified),- choose a prime
uniformly at random; - send
and to Alice;
- choose a prime
Upon receiving
and , Alice does:- check whether
.
- check whether
- for some parameter
The number of bits to be communicated is obviously
Suppose without loss of generality
for an arbitrarily fixed
The probability
.
For the numerator, any positive
For the denominator, we need to lower bound the number of primes in
Prime Number Theorem - Let
denote the number of primes less than . Then as .
- Let
Therefore, by choosing
,
which means the for any inputs
.
Moreover, by this choice of parameter
Pattern matching
Consider the following problem of pattern matching, which has nothing to do with communication complexity.
- Input: a string
and a "pattern" . - Determine whether the pattern
is a contiguous substring of . Usually, we are also asked to find the location of the substring.
A naive algorithm trying every possible match runs in
A simple randomized algorithm, due to Karp and Rabin, uses the idea of fingerprinting and also runs in
Let
Algorithm (Karp-Rabin) - pick a random prime
; - for
to do- if
then report a match;
- if
- return "no match";
- pick a random prime
So the algorithm just compares the
By the same analysis, by choosing
.
By the union bound, the probability that a false match occurs is
The algorithm runs in linear time if we assume that we can compute
Lemma - Let
. .
- Let
Proof. It holds that .
So the equation holds on the finite field modulo
.
Due to this lemma, each fingerprint
Checking distinctness
Consider the following problem of checking distinctness:
- Given a sequence
, check whether every element of appears exactly once.
Obviously this problem can be solved in linear time and linear space (in addition to the space for storing the input) by maintaining a
When this
We consider the following more generalized problem, checking identity of multisets:
- Input: two multisets
and where . - Determine whether
(multiset equivalence).
Here for a multiset
Obviously the above problem of checking distinctness can be treated as a special case of checking identity of multisets: by checking the identity of the multiset
The following fingerprinting function for multisets was introduced by Lipton for solving multiset identity testing.
Fingerprint for multiset - Let
be a uniform random prime chosen from the interval . By Chebyshev's theorem, such prime must exist. And consider the the finite field .
- Given a multiset
, we define a univariate polynomial over the finite field as follows: ,
- where
and are defined over the finite field .
- We then define the random fingerprinting function as:
,
- where
is chosen uniformly at random from .
- Let
Since all computations of
Moreover, the fingerprinting function
For two distinct multisets
Theorem (Lipton 1989) - Let
and be two multisets whose elements are from . If , then .
- Let
Proof. Let
and be two univariate polynomials defined over reals . Note that in contrast to and , the and in do not modulo . It is easy to verify that the polynomials have the following properties: if and only if . Here means the multiset equivalence.- By the properties of finite field, for any value
, it holds that and .
Therefore, assuming that
, we must have . Then by the law of total probability:Note that the degrees of
are at most and is chosen uniformly from . By the Schwartz-Zippel theorem for univariate polynomials, the first probabilitysince
is chosen from the interval .For the second probability
, recall that , therefore there is at least a non-zero coefficient in . The event occurs only if , which meansBy the prime number theorem,
as . Therefore,Combining everything together, we have
.