随机算法 (Fall 2016)/Fingerprinting

From TCS Wiki
Revision as of 06:50, 10 November 2016 by imported>Etone (→‎Schwartz-Zippel Theorem)
Jump to navigation Jump to search

Polynomial Identity Testing (PIT)

The Polynomial Identity Testing (PIT) is such a problem: given as input two polynomials, determine whether they are identical. It plays a fundamental role in Identity Testing problems.

First, let's consider the univariate ("one variable") case:

  • Input: two polynomials [math]\displaystyle{ f, g\in\mathbb{F}[x] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv g }[/math] ([math]\displaystyle{ f }[/math] and [math]\displaystyle{ g }[/math] are identical).

Here the [math]\displaystyle{ \mathbb{F}[x] }[/math] denotes the ring of univariate polynomials on a field [math]\displaystyle{ \mathbb{F} }[/math]. More precisely, a polynomial [math]\displaystyle{ f\in\mathbb{F}[x] }[/math] is

[math]\displaystyle{ f(x)=\sum_{i=0}^\infty a_ix^i }[/math],

where the coefficients [math]\displaystyle{ a_i }[/math] are taken from the field [math]\displaystyle{ \mathbb{F} }[/math], and the addition and multiplication are also defined over the field [math]\displaystyle{ \mathbb{F} }[/math]. And:

  • the degree of [math]\displaystyle{ f }[/math] is the highest [math]\displaystyle{ i }[/math] with non-zero [math]\displaystyle{ a_i }[/math];
  • a polynomial [math]\displaystyle{ f }[/math] is a zero-polynomial, denoted as [math]\displaystyle{ f\equiv 0 }[/math], if all coefficients [math]\displaystyle{ a_i=0 }[/math].

Alternatively, we can consider the following equivalent problem by comparing the polynomial [math]\displaystyle{ f-g }[/math] (whose degree is at most [math]\displaystyle{ d }[/math]) with the zero-polynomial:

  • Input: a polynomial [math]\displaystyle{ f\in\mathbb{F}[x] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv 0 }[/math] ([math]\displaystyle{ f }[/math] is the 0 polynomial).

The probalem is trivial if the input polynomial [math]\displaystyle{ f }[/math] is given explicitly: one can trivially solve the problem by checking whether all [math]\displaystyle{ d+1 }[/math] coefficients are [math]\displaystyle{ 0 }[/math]. To make the problem nontrivial, we assume that the input polynomial is given implicitly as a black box (also called an oracle): the only way the algorithm can access to [math]\displaystyle{ f }[/math] is to evaluate [math]\displaystyle{ f(x) }[/math] over some [math]\displaystyle{ x }[/math] from the field [math]\displaystyle{ \mathbb{F} }[/math], [math]\displaystyle{ x }[/math] chosen by the algorithm.

A straightforward deterministic algorithm is to evaluate [math]\displaystyle{ f(x_1),f(x_2),\ldots,f(x_{d+1}) }[/math] over [math]\displaystyle{ d+1 }[/math] distinct elements [math]\displaystyle{ x_1,x_2,\ldots,x_{d+1} }[/math] from the field [math]\displaystyle{ \mathbb{F} }[/math] and check whether they are all zero. By the fundamental theorem of algebra, also known as polynomial interpolations, this guarantees to verify whether a degree-[math]\displaystyle{ d }[/math] univariate polynomial [math]\displaystyle{ f\equiv 0 }[/math].

Fundamental Theorem of Algebra
Any non-zero univariate polynomial of degree [math]\displaystyle{ d }[/math] has at most [math]\displaystyle{ d }[/math] roots.

The reason for this fundamental theorem holding generally over any field [math]\displaystyle{ \mathbb{F} }[/math] is that any univariate polynomial of degree [math]\displaystyle{ d }[/math] factors uniquely into at most [math]\displaystyle{ d }[/math] irreducible polynomials, each of which has at most one root.

The following simple randomized algorithm is natural:

Algorithm for PIT
  • suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
  • pick [math]\displaystyle{ r\in S }[/math] uniformly at random;
  • if [math]\displaystyle{ f(r) = 0 }[/math] then return “yes” else return “no”;

This algorithm evaluates [math]\displaystyle{ f }[/math] at one point chosen uniformly at random from a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math]. It is easy to see the followings:

  • If [math]\displaystyle{ f\equiv 0 }[/math], the algorithm always returns "yes", so it is always correct.
  • If [math]\displaystyle{ f\not\equiv 0 }[/math], the algorithm may wrongly return "yes" (a false positive). But this happens only when the random [math]\displaystyle{ r }[/math] is a root of [math]\displaystyle{ f }[/math]. By the fundamental theorem of algebra, [math]\displaystyle{ f }[/math] has at most [math]\displaystyle{ d }[/math] roots, so the probability that the algorithm is wrong is bounded as
[math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]

By fixing [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] to be an arbitrary subset of size [math]\displaystyle{ |S|=2d }[/math], this probability of false positive is at most [math]\displaystyle{ 1/2 }[/math]. We can reduce it to an arbitrarily small constant [math]\displaystyle{ \delta }[/math] by repeat the above testing independently for [math]\displaystyle{ \log_2 \frac{1}{\delta} }[/math] times, since the error probability decays geometrically as we repeat the algorithm independently.

Communication Complexity of Equality

The communication complexity is introduced by Andrew Chi-Chih Yao as a model of computation with more than one entities, each with partial information about the input.

Assume that there are two entities, say Alice and Bob. Alice has a private input [math]\displaystyle{ a }[/math] and Bob has a private input [math]\displaystyle{ b }[/math]. Together they want to compute a function [math]\displaystyle{ f(a,b) }[/math] by communicating with each other. The communication follows a predefined communication protocol (the "algorithm" in this model). The complexity of a communication protocol is measured by the number of bits communicated between Alice and Bob in the worst case.

The problem of checking identity is formally defined by the function EQ as follows: [math]\displaystyle{ \mathrm{EQ}:\{0,1\}^n\times\{0,1\}^n\rightarrow\{0,1\} }[/math] and for any [math]\displaystyle{ a,b\in\{0,1\}^n }[/math],

[math]\displaystyle{ \mathrm{EQ}(a,b)= \begin{cases} 1& \mbox{if } a=b,\\ 0& \mbox{otherwise.} \end{cases} }[/math]

A trivial way to solve EQ is to let Bob send his entire input string [math]\displaystyle{ b }[/math] to Alice and let Alice check whether [math]\displaystyle{ a=b }[/math]. This costs [math]\displaystyle{ n }[/math] bits of communications.

It is known that for deterministic communication protocols, this is the best we can get for computing EQ.

Theorem (Yao 1979)
Any deterministic communication protocol computing EQ on two [math]\displaystyle{ n }[/math]-bit strings costs [math]\displaystyle{ n }[/math] bits of communication in the worst-case.

This theorem is much more nontrivial to prove than it looks, because Alice and Bob are allowed to interact with each other in arbitrary ways. The proof of this theorem is in Yao'scelebrated paper in 1979 with a humble title, which initiates the field of communication complexity.

If we allow randomness in protocols, and also tolerate a small probabilistic error, the problem can be solved with significantly less communications. To present this randomized protocol, we need a few preparations:

  • We represent the inputs [math]\displaystyle{ a\in\{0,1\}^{n} }[/math] of Alice and Bob as two univariate polynomials of degree at most [math]\displaystyle{ n-1 }[/math], respectively
[math]\displaystyle{ f(x)=\sum_{i=0}^{n-1}a_ix^{i} }[/math] and [math]\displaystyle{ g(x)=\sum_{i=0}^{n-1}b_ix^{i} }[/math].
  • The two polynomials [math]\displaystyle{ f }[/math] and [math]\displaystyle{ g }[/math] are defined over finite field [math]\displaystyle{ \mathbb{Z}_p=\{0,1,\ldots,p-1\} }[/math] for some suitable prime [math]\displaystyle{ p }[/math] (to be specified later), which means the additions and multiplications are modulo [math]\displaystyle{ p }[/math].

The randomized communication protocol is then as follows:

A randomized protocol for EQ

Bob does:

  • pick [math]\displaystyle{ r\in\mathbb{Z}_p }[/math] uniformly at random;
  • send [math]\displaystyle{ r }[/math] and [math]\displaystyle{ g(r) }[/math] to Alice;

Upon receiving [math]\displaystyle{ r }[/math] and [math]\displaystyle{ g(r) }[/math] Alice does:

  • compute [math]\displaystyle{ f(r) }[/math];
  • If [math]\displaystyle{ f(r)= g(r) }[/math] return "yes"; else return "no".

The communication complexity of the protocol is given by the number of bits used to represent the values of [math]\displaystyle{ r }[/math] and [math]\displaystyle{ g(r) }[/math]. Since the polynomials are defined over finite field [math]\displaystyle{ \mathbb{Z}_p }[/math] and the random number [math]\displaystyle{ f }[/math] is also chosen from [math]\displaystyle{ \mathbb{Z}_p }[/math], this is bounded by [math]\displaystyle{ O(\log p) }[/math].

On the other hand the protocol makes mistakes only when [math]\displaystyle{ a\neq b }[/math] but wrongly answers "no". This happens only when [math]\displaystyle{ f\not\equiv g }[/math] but [math]\displaystyle{ f(r)=g(r) }[/math]. The degrees of [math]\displaystyle{ f, g }[/math] are at most [math]\displaystyle{ n-1 }[/math], and [math]\displaystyle{ r }[/math] is chosen among [math]\displaystyle{ p }[/math] distinct values, we have

[math]\displaystyle{ \Pr[f(r)=g(r)]\le \frac{n-1}{p} }[/math].

By choosing [math]\displaystyle{ p }[/math] to be a prime in the interval [math]\displaystyle{ [n^2, 2n^2] }[/math] (by Chebyshev's theorem, such prime [math]\displaystyle{ p }[/math] always exists), the above randomized communication protocol solves the Equality function EQ with an error probability of false positive at most [math]\displaystyle{ O(1/n) }[/math], with communication complexity [math]\displaystyle{ O(\log n) }[/math], an EXPONENTIAL improvement to ANY deterministic communication protocol!

Schwartz-Zippel Theorem

Now let's see the the true form of Polynomial Identity Testing (PIT), for multivariate polynomials:

  • Input: two [math]\displaystyle{ n }[/math]-variate polynomials [math]\displaystyle{ f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv g }[/math].

The [math]\displaystyle{ \mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] is the ring of multivariate polynomials over field [math]\displaystyle{ \mathbb{F} }[/math]. An [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math], written as a sum of monomials, is:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math].

The degree or total degree of a monomial [math]\displaystyle{ a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math] is given by [math]\displaystyle{ i_1+i_2+\cdots+i_n }[/math] and the degree of a polynomial [math]\displaystyle{ f }[/math] is the maximum degree of monomials of nonzero coefficients.

As before, we also consider the following equivalent problem:

  • Input: a polynomial [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv 0 }[/math].

If [math]\displaystyle{ f }[/math] is written explicitly as a sum of monomials, then the problem can be solved by checking whether all coefficients, and there at most [math]\displaystyle{ n^{d+1} }[/math] coefficients in an [math]\displaystyle{ n }[/math]-variate polynomial of degree at most [math]\displaystyle{ d }[/math].

A multivariate polynomial [math]\displaystyle{ f }[/math] can also be presented in its product form, for example:

Example

The Vandermonde matrix [math]\displaystyle{ M=M(x_1,x_2,\ldots,x_n) }[/math] is defined as that [math]\displaystyle{ M_{ij}=x_i^{j-1} }[/math], that is

[math]\displaystyle{ M=\begin{bmatrix} 1 & x_1 & x_1^2 & \dots & x_1^{n-1}\\ 1 & x_2 & x_2^2 & \dots & x_2^{n-1}\\ 1 & x_3 & x_3^2 & \dots & x_3^{n-1}\\ \vdots & \vdots & \vdots & \ddots &\vdots \\ 1 & x_n & x_n^2 & \dots & x_n^{n-1} \end{bmatrix} }[/math].

Let [math]\displaystyle{ f }[/math] be the polynomial defined as

[math]\displaystyle{ f(x_1,\ldots,x_n)=\det(M)=\prod_{j\lt i}(x_i-x_j). }[/math]

For polynomials in product form, it is quite efficient to evaluate the polynomial at any specific point from the field over which the polynomial is defined, however, expanding the polynomial to a sum of monomials can be very expansive.

The following is a simple randomized algorithm for testing identity of multivariate polynomials:

Randomized algorithm for multivariate PIT
  • suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
  • pick [math]\displaystyle{ r_1,r_2,\ldots,r_n\in S }[/math] uniformly and independently at random;
  • if [math]\displaystyle{ f(\vec{r})=f(r_1,r_2,\ldots,r_n) = 0 }[/math] then return “yes” else return “no”;

This algorithm requires only the evaluation of [math]\displaystyle{ f }[/math] at a single point [math]\displaystyle{ \vec{r} }[/math]. And if [math]\displaystyle{ f\equiv 0 }[/math] it is always correct.

In the Theorem below, we’ll see that if [math]\displaystyle{ f\not\equiv 0 }[/math] then the algorithm is incorrect with probability at most [math]\displaystyle{ \frac{d}{|S|} }[/math], where [math]\displaystyle{ d }[/math] is the degree of the polynomial [math]\displaystyle{ f }[/math].

Schwartz-Zippel Theorem
Let [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] be a multivariate polynomial of degree [math]\displaystyle{ d }[/math] over a field [math]\displaystyle{ \mathbb{F} }[/math] such that [math]\displaystyle{ f\not\equiv 0 }[/math]. Fix any finite set [math]\displaystyle{ S\subset\mathbb{F} }[/math], and let [math]\displaystyle{ r_1,r_2\ldots,r_n }[/math] be chosen uniformly and independently at random from [math]\displaystyle{ S }[/math]. Then
[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}. }[/math]
Proof.

We prove by induction on [math]\displaystyle{ n }[/math] the number of variables.

For [math]\displaystyle{ n=1 }[/math], assuming that [math]\displaystyle{ f\not\equiv 0 }[/math], due to the fundamental theorem of algebra, the degree-[math]\displaystyle{ d }[/math] polynomial [math]\displaystyle{ f(x) }[/math] has at most [math]\displaystyle{ d }[/math] roots, thus

[math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]

Assume the induction hypothesis for a multi-variate polynomial up to [math]\displaystyle{ n-1 }[/math] variable.

An [math]\displaystyle{ n }[/math]-variate polynomial [math]\displaystyle{ f(x_1,x_2,\ldots,x_n) }[/math] can be represented as

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1}) }[/math],

where [math]\displaystyle{ k }[/math] is the largest power of [math]\displaystyle{ x_n }[/math], which means that the degree of [math]\displaystyle{ f_k }[/math] is at most [math]\displaystyle{ d-k }[/math] and [math]\displaystyle{ f_k\not\equiv 0 }[/math].

In particular, we write [math]\displaystyle{ f }[/math] as a sum of two parts:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n) }[/math],

where both [math]\displaystyle{ f_k }[/math] and [math]\displaystyle{ \bar{f} }[/math] are polynomials, such that

  • as argued above, the degree of [math]\displaystyle{ f_k }[/math] is at most [math]\displaystyle{ d-k }[/math] and [math]\displaystyle{ f_k\not\equiv 0 }[/math];
  • [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1}) }[/math], thus [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^{k} }[/math] factor in any term.

By the law of total probability, it holds that

[math]\displaystyle{ \begin{align} &\Pr[f(r_1,r_2,\ldots,r_n)=0]\\ = &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\ &+\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})\neq0]. \end{align} }[/math]

Note that [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1}) }[/math] is a polynomial on [math]\displaystyle{ n-1 }[/math] variables of degree [math]\displaystyle{ d-k }[/math] such that [math]\displaystyle{ f_k\not\equiv 0 }[/math]. By the induction hypothesis, we have

[math]\displaystyle{ \begin{align} (*) &\qquad &\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. \end{align} }[/math]

For the second case, recall that [math]\displaystyle{ \bar{f}(x_1,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^k }[/math] factor in any term, thus the condition [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math] guarantees that

[math]\displaystyle{ f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n) }[/math]

is a single-variate polynomial such that the degree of [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }[/math] is [math]\displaystyle{ k }[/math] and [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math], for which we already known that the probability [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(r_n)=0 }[/math] is at most [math]\displaystyle{ \frac{k}{|S|} }[/math]. Therefore,

[math]\displaystyle{ \begin{align} (**) &\qquad &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|} \end{align} }[/math].

Substituting both [math]\displaystyle{ (*) }[/math] and [math]\displaystyle{ (**) }[/math] back in the total probability, we have

[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0] \le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}, }[/math]

which proves the theorem.


In above proof, for the second case that [math]\displaystyle{ f_k(r_1,\ldots,r_{n-1})\neq 0 }[/math], we use an "probabilistic arguement" to deal with the random choices in the condition. Here we give a more rigorous proof by enumerating all elementary events in applying the law of total probability. You make your own judgement which proof is better.

By the law of total probability,

[math]\displaystyle{ \begin{align} &\Pr[f(\vec{r})=0]\\ = &\sum_{x_1,\ldots,x_{n-1}\in S}\Pr[f(\vec{r})=0\mid \forall i\lt n, r_i=x_i]\cdot\Pr[\forall i\lt n, r_i=x_i]\\ = &\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})=0}\Pr[f(\vec{r})=0\mid \forall i\lt n, r_i=x_i]\cdot\Pr[\forall i\lt n, r_i=x_i]\\ &+\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq0}\Pr[f(\vec{r})=0\mid \forall i\lt n, r_i=x_i]\cdot\Pr[\forall i\lt n, r_i=x_i]\\ \le &\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})=0}\Pr[\forall i\lt n, r_i=x_i]\\ &+\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq 0}\Pr[f(x_1,\ldots,x_{n-1},r_n)=0\mid \forall i\lt n, r_i=x_i]\cdot\Pr[\forall i\lt n, r_i=x_i]\\ = &\Pr[f_k(r_1,\ldots,r_{n-1})=0]+\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq 0}\Pr[f(x_1,\ldots,x_{n-1},r_n)=0]\cdot\Pr[\forall i\lt n, r_i=x_i]. \end{align} }[/math]

We have argued that [math]\displaystyle{ f_k\not\equiv 0 }[/math] and the degree of [math]\displaystyle{ f_k }[/math] is [math]\displaystyle{ d-k }[/math]. By the induction hypothesis, we have

[math]\displaystyle{ \Pr[f_k(r_1,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. }[/math]

And for every fixed [math]\displaystyle{ x_1,\ldots,x_{n-1}\in S }[/math] such that [math]\displaystyle{ f_k(x_1,\ldots,x_{n-1})\neq 0 }[/math], we have argued that [math]\displaystyle{ f(x_1,\ldots,x_{n-1},x_n) }[/math] is a polynomial in [math]\displaystyle{ x_n }[/math] of degree [math]\displaystyle{ k }[/math], thus

[math]\displaystyle{ \Pr[f(x_1,\ldots,x_{n-1},r_n)=0]\le\frac{k}{|S|}, }[/math]

which holds for all [math]\displaystyle{ x_1,\ldots,x_{n-1}\in S }[/math] such that [math]\displaystyle{ f_k(x_1,\ldots,x_{n-1})\neq 0 }[/math], therefore the weighted average

[math]\displaystyle{ \sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq 0}\Pr[f(x_1,\ldots,x_{n-1},r_n)=0]\cdot\Pr[\forall i\lt n, r_i=x_i] \le\frac{k}{|S|}. }[/math]

Substituting these inequalities back to the total probability, we have [math]\displaystyle{ \Pr[f(\vec{r})=0] \le\frac{d-k}{|S|}+\frac{k}{|S|} =\frac{d}{|S|}. }[/math]

[math]\displaystyle{ \square }[/math]

Fingerprinting

The Freivald's algorithm and Schwartz-Zippel theorem can be abstracted as the following procedure: Suppose we want to compare two items [math]\displaystyle{ Z_1 }[/math] and [math]\displaystyle{ Z_2 }[/math]. Instead of comparing them directly, we compute random fingerprints [math]\displaystyle{ \mathrm{FING}(Z_1) }[/math] and [math]\displaystyle{ \mathrm{FING}(Z_2) }[/math] of them and compare the fingerprints. The fingerprints has the following properties:

  • [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] is a function, so if [math]\displaystyle{ Z_1= Z_2 }[/math] then [math]\displaystyle{ \mathrm{FING}(Z_1)=\mathrm{FING}(Z_2) }[/math].
  • If [math]\displaystyle{ Z_1\neq Z_2 }[/math] then [math]\displaystyle{ \Pr[\mathrm{FING}(Z_1)=\mathrm{FING}(Z_2)] }[/math] is small.
  • It is much easier to compute and compare the fingerprints than to compare [math]\displaystyle{ Z_1 }[/math] and [math]\displaystyle{ Z_2 }[/math] directly.

In Freivald's algorithm, the items to compare are two [math]\displaystyle{ n\times n }[/math] matrices [math]\displaystyle{ AB }[/math] and [math]\displaystyle{ C }[/math], and given an [math]\displaystyle{ n\times n }[/math] matrix [math]\displaystyle{ M }[/math], its random fingerprint is computed as [math]\displaystyle{ \mathrm{FING}(M)=Mr }[/math] for a uniformly random [math]\displaystyle{ r\in\{0,1\}^n }[/math].

In Schwartz-Zippel theorem, the items to compare are two polynomials [math]\displaystyle{ P_1(x_1,\ldots,x_n) }[/math] and [math]\displaystyle{ P_2(x_1,\ldots,x_n) }[/math], and given a polynomial [math]\displaystyle{ Q(x_1,\ldots,x_n) }[/math], its random fingerprint is computed as [math]\displaystyle{ \mathrm{FING}(Q)=Q(r_1,\ldots,r_n) }[/math] for [math]\displaystyle{ r_i }[/math] chosen independently and uniformly at random from some fixed set [math]\displaystyle{ S }[/math].

For different problems, we may have different definitions of [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math].

Communication complexity revisited

Now consider again the communication model where the two players Alice with a private input [math]\displaystyle{ x\in\{0,1\}^n }[/math] and Bob with a private input [math]\displaystyle{ y\in\{0,1\}^n }[/math] together compute a function [math]\displaystyle{ f(x,y) }[/math] by running a communication protocol.

We still consider the communication protocols for the equality function EQ

[math]\displaystyle{ \mathrm{EQ}(x,y)= \begin{cases} 1& \mbox{if } x=y,\\ 0& \mbox{otherwise.} \end{cases} }[/math]

With the language of fingerprinting, this communication problem can be solved by the following generic scheme:

  • Alice choose a random fingerprint function [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] and compute the fingerprint of her input [math]\displaystyle{ \mathrm{FING}(x) }[/math];
  • Alice sends both the description of [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] and the value of [math]\displaystyle{ \mathrm{FING}(x) }[/math] to Bob;
  • Bob computes [math]\displaystyle{ \mathrm{FING}(y) }[/math] and check whether [math]\displaystyle{ \mathrm{FING}(x)=\mathrm{FING}(y) }[/math].

In this way we have a randomized communication protocol for the equality function EQ with a false positive. The communication cost as well as the error probability are reduced to the question of how to design this random fingerprint function [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] to guarantee:

  1. A random [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] can be described succinctly.
  2. The range of [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] is small, so the fingerprints are succinct.
  3. If [math]\displaystyle{ x\neq y }[/math], the probability [math]\displaystyle{ \Pr[\mathrm{FING}(x)=\mathrm{FING}(y)] }[/math] is small.

In above application of single-variate PIT, we know that [math]\displaystyle{ \mathrm{FING}(x)=\sum_{i=1}^n x_i r^{i} }[/math], where [math]\displaystyle{ r }[/math] is a random element from a finite field and the additions and multiplications are defined over the finite field, is a good fingerprint function. Now we introduce another fingerprint and hence a new communication protocol.

The new fingerprint function we design is as follows: by treating the input string [math]\displaystyle{ x\in\{0,1\}^n }[/math] as the binary representation of a number, let [math]\displaystyle{ \mathrm{FING}(x)=x\bmod p }[/math] for some random prime [math]\displaystyle{ p }[/math]. The prime [math]\displaystyle{ p }[/math] can uniquely specify a random fingerprint function [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math], thus can be used as a description of the function, and alos the range of the fingerprints is [math]\displaystyle{ [p] }[/math], thus we want the prime [math]\displaystyle{ p }[/math] to be reasonably small, but still has a good chance to distinguish different [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math] after modulo [math]\displaystyle{ p }[/math].

A randomized protocol for EQ

Alice does:

for some parameter [math]\displaystyle{ k }[/math] (to be specified),
  • choose uniformly at random a prime [math]\displaystyle{ p\in[k] }[/math];
  • send [math]\displaystyle{ p }[/math] and [math]\displaystyle{ x\bmod p }[/math] to Bob;

Upon receiving [math]\displaystyle{ p }[/math] and [math]\displaystyle{ x\bmod p }[/math], Bob does:

  • check whether [math]\displaystyle{ x\bmod p=y\bmod p }[/math].

The number of bits to be communicated is [math]\displaystyle{ O(\log k) }[/math]. We then bound the probability of error [math]\displaystyle{ \Pr[x\bmod p=y\bmod p] }[/math] for [math]\displaystyle{ x\neq y }[/math], in terms of [math]\displaystyle{ k }[/math].

Suppose without loss of generality [math]\displaystyle{ x\gt y }[/math]. Let [math]\displaystyle{ z=x-y }[/math]. Then [math]\displaystyle{ z\lt 2^n }[/math] since [math]\displaystyle{ x,y\in[2^n] }[/math], and [math]\displaystyle{ z\neq 0 }[/math] for [math]\displaystyle{ x\neq y }[/math]. It holds that [math]\displaystyle{ x\bmod p=y\bmod p }[/math] if and only if [math]\displaystyle{ z }[/math] is dividable by [math]\displaystyle{ p }[/math]. Note that [math]\displaystyle{ z\lt 2^n }[/math] since [math]\displaystyle{ x,y\in[2^n] }[/math]. We only need to bound the probability

[math]\displaystyle{ \Pr[z\bmod p=0] }[/math] for [math]\displaystyle{ 0\lt z\lt 2^n }[/math], where [math]\displaystyle{ p }[/math] is a random prime chosen from [math]\displaystyle{ [k] }[/math].

The probability [math]\displaystyle{ \Pr[z\bmod p=0] }[/math] is computed directly as

[math]\displaystyle{ \Pr[z\bmod p=0]\le\frac{\mbox{the number of prime divisors of }z}{\mbox{the number of primes in }[k]} }[/math].

For the numerator, we have the following lemma.

Lemma
The number of distinct prime divisors of any natural number less than [math]\displaystyle{ 2^n }[/math] is at most [math]\displaystyle{ n }[/math].
Proof.
Each prime number is [math]\displaystyle{ \ge2 }[/math]. If an [math]\displaystyle{ N\gt 0 }[/math] has more than [math]\displaystyle{ n }[/math] distinct prime divisors, then [math]\displaystyle{ N\ge 2^n }[/math].
[math]\displaystyle{ \square }[/math]

Due to this lemma, [math]\displaystyle{ z }[/math] has at most [math]\displaystyle{ n }[/math] prime divisors.

We then lower bound the number of primes in [math]\displaystyle{ [k] }[/math]. This is given by the celebrated Prime Number Theorem (PNT).

Prime Number Theorem
Let [math]\displaystyle{ \pi(k) }[/math] denote the number of primes less than [math]\displaystyle{ k }[/math]. Then [math]\displaystyle{ \pi(k)\sim\frac{k}{\ln k} }[/math] as [math]\displaystyle{ k\rightarrow\infty }[/math].

Therefore, by choosing [math]\displaystyle{ k=tn\ln tn }[/math] for some [math]\displaystyle{ t }[/math], we have that for a [math]\displaystyle{ 0\lt z\lt 2^n }[/math], and a random prime [math]\displaystyle{ p\in[k] }[/math],

[math]\displaystyle{ \Pr[z\bmod p=0]\le\frac{n}{\pi(k)}\sim\frac{1}{t} }[/math].

We can make this error probability polynomially small and the number of bits to be communicated is still [math]\displaystyle{ O(\log k)=O(\log n) }[/math].

Checking distinctness

Consider the following problem:

  • Given a sequence [math]\displaystyle{ x_1,x_2,\ldots,x_n\in\{1,2,\ldots,n\} }[/math], check whether every member of [math]\displaystyle{ \{1,2,\ldots,n\} }[/math] appears exactly once.

This problem is called detecting duplicate or checking distinctness. It can be solved in linear time and linear space by a straightforward algorithm.

For many real applications, the [math]\displaystyle{ n }[/math] is enormously large, and we would like to have an algorithm using very limited extra space.

Fingerprinting multisets

A randomized algorithm due to Lipton, checks distinctness by solving a more general problem fingerprinting multisets.

Given a multiset (each member may appear more than once) [math]\displaystyle{ M=\{x_1,x_2,\ldots,x_n\} }[/math], its fingerprint is defined as

[math]\displaystyle{ \mathrm{FING}(M)=\prod_{i=1}^n(r-x_i)\bmod p, }[/math]

where [math]\displaystyle{ p }[/math] is a random prime chosen from the interval [math]\displaystyle{ [(n\log n)^2,2(n\log n)^2] }[/math], and [math]\displaystyle{ r }[/math] is chosen uniformly at random from [math]\displaystyle{ [p] }[/math].

We first see that the space reuqired to compute [math]\displaystyle{ \mathrm{FING}(M) }[/math] is only [math]\displaystyle{ O(\log n) }[/math] bits. We do not need to compute the value of the product [math]\displaystyle{ \prod_{i=1}^n(r-x_i) }[/math]. Instead, all the computation can be done in the finite field [math]\displaystyle{ \mathbb{Z}_p }[/math] (you need some knowledge in algebra to understand this). So the space requirement is only [math]\displaystyle{ O(\log p)=O(\log n) }[/math].

It is easy to see that the above fingerprint function is invariant under permutations, thus one multiset has only one fingerprint. The next theorem due to Lipton, states that the probability that two distinct multisets have the same fingerprint is small.

Theorem (Lipton 1989)
Let [math]\displaystyle{ M_1=\{x_1,x_2,\ldots,x_n\} }[/math] and [math]\displaystyle{ M_2=\{y_1,y_2,\ldots,y_n\} }[/math] be two multisets whose members are from [math]\displaystyle{ \{1,2,\ldots,n\} }[/math]. If [math]\displaystyle{ M_1\neq M_2 }[/math], then
[math]\displaystyle{ \Pr[\mathrm{FING}(M_1)= \mathrm{FING}(M_2)]=O\left(\frac{1}{n}\right) }[/math].
Proof.
Let [math]\displaystyle{ P_1(u)=\prod_{i=1}^n(u-x_i) }[/math] and [math]\displaystyle{ P_2(u)=\prod_{i=1}^n(u-y_i) }[/math], and let [math]\displaystyle{ Q(u)=P_1(u)-P_2(u) }[/math].

If [math]\displaystyle{ M_1\neq M_2 }[/math], then the polynomials [math]\displaystyle{ P_1 }[/math] and [math]\displaystyle{ P_2 }[/math] are not identical, and the polynomial [math]\displaystyle{ Q\not\equiv 0 }[/math]. We only need to show that [math]\displaystyle{ \Pr[Q(r)\bmod =0]\, }[/math] is small for our choice of random [math]\displaystyle{ p }[/math] and [math]\displaystyle{ r }[/math].

We first show that the probability that [math]\displaystyle{ Q\equiv 0 \bmod p }[/math] is small. Then apply the Schwartz-Zippel technique to show that conditioning on that [math]\displaystyle{ Q\not\equiv 0 \bmod p }[/math], the probability [math]\displaystyle{ Q(r) \bmod p=0 }[/math] is small.

Expand [math]\displaystyle{ Q(u) }[/math] in form of [math]\displaystyle{ Q(u)=\sum_{k=0}^n a_i u^k }[/math]. It can be verified by induction on [math]\displaystyle{ n }[/math] that for any coefficient [math]\displaystyle{ a_k }[/math], it holds that [math]\displaystyle{ |a_k|\le 2n\cdot 2^{n} }[/math].

Since [math]\displaystyle{ Q\not\equiv 0 }[/math], it has at least one nonzero coefficient [math]\displaystyle{ c\neq 0 }[/math], and it holds that [math]\displaystyle{ |c|\le 2n\cdot 2^{n} }[/math]. Then with the same analysis as in the problem of EQ, by our choice of random [math]\displaystyle{ p }[/math],

[math]\displaystyle{ \Pr[c\bmod p=0]=O\left(\frac{1}{n}\right) }[/math].

Conditioning on that [math]\displaystyle{ Q\not\equiv 0 }[/math] after modulo [math]\displaystyle{ p }[/math], since the degree of [math]\displaystyle{ Q }[/math] is at most [math]\displaystyle{ n }[/math], [math]\displaystyle{ Q }[/math] has at most [math]\displaystyle{ n }[/math] roots. Since [math]\displaystyle{ p\in[(n\log n)^2,2(n\log n)^2] }[/math], therefore [math]\displaystyle{ r }[/math] is uniformly distributed over a set of size at least [math]\displaystyle{ (n\log n)^2 }[/math]. Therefore, the probability that [math]\displaystyle{ Q(r)\bmod p =0 }[/math] conditioning on that [math]\displaystyle{ Q\not\equiv 0\bmod p }[/math] is at most [math]\displaystyle{ \frac{1}{n(\log n)^2} }[/math].

By the union bound, [math]\displaystyle{ \Pr[\mathrm{FING}(M_1)= \mathrm{FING}(M_2)]=O\left(\frac{1}{n}\right) }[/math].

[math]\displaystyle{ \square }[/math]

Checking distinctness by fingerprinting multisets

We now have a fingerprint function for multisets, which is useful for checking the identities of multisets. To solve the problem of checking distinctness, we just check whether the input multiset [math]\displaystyle{ M }[/math] is identical to the set [math]\displaystyle{ \{1,2,\ldots,n\} }[/math].