随机算法 (Fall 2016)/Fingerprinting: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>Etone
No edit summary
imported>Etone
 
(30 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Polynomial Identity Testing (PIT) =
=Polynomial Identity Testing (PIT) =
The  '''Polynomial Identity Testing (PIT)''' is such a problem: given as input two polynomials, determine whether two polynomials are identical. This problem plays a fundamental role in Computer Science.
The  '''Polynomial Identity Testing (PIT)''' is such a problem: given as input two polynomials, determine whether they are identical. It plays a fundamental role in ''Identity Testing'' problems.


First let's consider the following simplified version of '''Polynomial Identity Testing (PIT)''' which takes only the single-variate polynomials:
First, let's consider the univariate ("one variable") case:
* '''Input:''' two polynomials <math>P_1, P_2\in\mathbb{F}[x]</math> of degree <math>d</math>.
* '''Input:''' two polynomials <math>f, g\in\mathbb{F}[x]</math> of degree <math>d</math>.
* '''Output:''' "yes" if two polynomials are identical, i.e. <math>P_1\equiv P_2</math>, and "no" if otherwise.
* Determine whether <math>f\equiv g</math> (<math>f</math> and <math>g</math> are identical).
The <math>\mathbb{F}[x]</math> denote the [http://en.wikipedia.org/wiki/Polynomial_ring ring of polynomials] over field <math>\mathbb{F}</math>.
Here the <math>\mathbb{F}[x]</math> denotes the [http://en.wikipedia.org/wiki/Polynomial_ring ring of univariate polynomials] on a field <math>\mathbb{F}</math>. More precisely, a polynomial <math>f\in\mathbb{F}[x]</math> is
:<math>f(x)=\sum_{i=0}^\infty a_ix^i</math>,
where the coefficients <math>a_i</math> are taken from the field <math>\mathbb{F}</math>, and the addition and multiplication are also defined over the field <math>\mathbb{F}</math>.
And:
* the '''degree''' of <math>f</math> is the highest <math>i</math> with non-zero <math>a_i</math>;
* a polynomial <math>f</math> is a '''zero-polynomial''', denoted as <math>f\equiv 0</math>, if all coefficients <math>a_i=0</math>.


Alternatively, we can consider the following equivalent problem:
Alternatively, we can consider the following equivalent problem by comparing the polynomial <math>f-g</math> (whose degree is at most <math>d</math>) with the zero-polynomial:
* '''Input:''' a polynomial <math>P\in\mathbb{F}[x]</math> of degree <math>d</math>.
* '''Input:''' a polynomial <math>f\in\mathbb{F}[x]</math> of degree <math>d</math>.
* '''Output:''' "yes" if <math>P\equiv 0</math>, and "no" if otherwise.
* Determine whether <math>f\equiv 0</math> (<math>f</math> is the 0 polynomial).


The probalem is trivial if <math>P</math> is presented in its explicit form <math>P(x)=\sum_{i=0}^d a_ix^i</math>. But we assume that <math>P</math> is given in product form or as black box.
The probalem is trivial if the input polynomial <math>f</math> is given explicitly: one can trivially solve the problem by checking whether all <math>d+1</math> coefficients are <math>0</math>. To make the problem nontrivial, we assume that the input polynomial is given implicitly as a ''black box'' (also called an ''oracle''): the only way the algorithm can access to <math>f</math> is to evaluate <math>f(x)</math> over some <math>x</math> from the field <math>\mathbb{F}</math>, <math>x</math> chosen by the algorithm.


A straightforward deterministic algorithm that solves PIT is to query <math>d+1</math> points <math>P(1),P(2),\ldots,P(d+1)</math> and check whether thay are all zero. This can determine whether <math>P\equiv 0</math> by interpolation.
A straightforward deterministic algorithm is to evaluate <math>f(x_1),f(x_2),\ldots,f(x_{d+1})</math> over <math>d+1</math> ''distinct'' elements <math>x_1,x_2,\ldots,x_{d+1}</math> from the field <math>\mathbb{F}</math> and check whether they are all zero. By the [https://en.wikipedia.org/wiki/Fundamental_theorem_of_algebra fundamental theorem of algebra], also known as polynomial interpolations, this guarantees to verify whether a degree-<math>d</math> univariate polynomial <math>f\equiv 0</math>.
 
{{Theorem|Fundamental Theorem of Algebra|
We now introduce a simple randomized algorithm for the problem.
:Any non-zero univariate polynomial of degree <math>d</math> has at most <math>d</math> roots.
}}
The reason for this fundamental theorem holding generally over any field <math>\mathbb{F}</math> is that any univariate polynomial of degree <math>d</math> factors uniquely into at most <math>d</math> irreducible polynomials, each of which has at most one root.


The following simple randomized algorithm is natural:
{{Theorem|Algorithm for PIT|
{{Theorem|Algorithm for PIT|
*pick <math>x\in\{1,2,\ldots,2d\}</math> uniformly at random;
*suppose we have a finite subset <math>S\subseteq\mathbb{F}</math> (to be specified later);
*if <math>P(x) = 0</math> then return “yes” else return “no”;
*pick <math>r\in S</math> ''uniformly'' at random;
*if <math>f(r) = 0</math> then return “yes” else return “no”;
}}
}}


This algorithm requires only the evaluation of <math>P</math> at a single point. And if <math>P\equiv 0</math> it is always correct.
This algorithm evaluates <math>f</math> at one point chosen uniformly at random from a finite subset <math>S\subseteq\mathbb{F}</math>. It is easy to see the followings:
And if <math>P\not\equiv 0</math> then the probability that the algorithm wrongly returns "yes" is bounded as follows.
* If <math>f\equiv 0</math>, the algorithm always returns "yes", so it is always correct.
 
* If <math>f\not\equiv 0</math>, the algorithm may wrongly return "yes" (a '''false positive'''). But this happens only when the random <math>r</math> is a root of <math>f</math>. By the fundamental theorem of algebra, <math>f</math> has at most <math>d</math> roots, so the probability that the algorithm is wrong is bounded as
{{Theorem|Theorem|
::<math>\Pr[f(r)=0]\le\frac{d}{|S|}.</math>
: Let <math>P\in\mathbb{F}[x]</math> be a polynomial of degree <math>d</math> over the field <math>\mathbb{F}</math>. Let <math>S\subset\mathbb{F}</math> be an arbitrary set and <math>x\in S</math> is chosen uniformly at random from <math>S</math>. If <math>P\not\equiv 0</math> then
::<math>\Pr[P(x)=0]\le\frac{d}{|S|}.</math>
}}
{{Proof|
A non-zero <math>d</math>-degree polynomial <math>P</math> has at most <math>d</math> distinct roots, thus at most <math>d</math> members <math>x</math> of <math>S</math> satisfy that <math>P(x)=0</math>. Therefore, <math>\Pr[P(x)=0]\le\frac{d}{|S|}</math>.
}}


By the theorem, the algorithm can distinguish a non-zero polynomial from 0 with probability at least <math>1/2</math>. This is achieved by evaluation of the polynomial at only one point and <math>1+\log_2 d</math> many random bits.
By fixing <math>S\subseteq\mathbb{F}</math> to be an arbitrary subset of size <math>|S|=2d</math>, this probability of false positive is at most <math>1/2</math>. We can reduce it to an arbitrarily small constant <math>\delta</math> by repeat the above testing ''independently'' for <math>\log_2 \frac{1}{\delta}</math> times, since the error probability decays geometrically as we repeat the algorithm independently.  


== Communication Complexity of Equality ==
== Communication Complexity of Equality ==
The [http://en.wikipedia.org/wiki/Communication_complexity communication complexity] is introduced by Andrew Chi-Chih Yao as a model of computation which involves multiple participants, each with partial information of the input.
The [http://en.wikipedia.org/wiki/Communication_complexity communication complexity] is introduced by Andrew Chi-Chih Yao as a model of computation with more than one entities, each with partial information about the input.


Assume that there are two entities, say Alice and Bob. Alice has a private input <math>a</math> and Bob has a private input <math>b</math>. Together they want to compute a function <math>f(a,b)</math> by communicating with each other. The communication follows a predefined '''communication protocol''' (the "algorithm" in this model) whose logics depends only on the problem <math>f</math> but not on the inputs. The complexity of a communication protocol is measured by the number of bits communicated between Alice and Bob in the worst case.
Assume that there are two entities, say Alice and Bob. Alice has a private input <math>a</math> and Bob has a private input <math>b</math>. Together they want to compute a function <math>f(a,b)</math> by communicating with each other. The communication follows a predefined '''communication protocol''' (the "algorithm" in this model). The complexity of a communication protocol is measured by the number of bits communicated between Alice and Bob in the worst case.


The problem of checking identity is formally defined by the function EQ as follows: <math>\mathrm{EQ}:\{0,1\}^n\times\{0,1\}^n\rightarrow\{0,1\}</math> and for any <math>a,b\in\{0,1\}^n</math>,
The problem of checking identity is formally defined by the function EQ as follows: <math>\mathrm{EQ}:\{0,1\}^n\times\{0,1\}^n\rightarrow\{0,1\}</math> and for any <math>a,b\in\{0,1\}^n</math>,
Line 56: Line 59:
}}
}}


This theorem is much more nontrivial to prove than it looks, because Alice and Bob are allowed to interact with each other in arbitrary ways. The proof of this theorem in Yao's 1979 paper initiates the field of communication complexity.
This theorem is much more nontrivial to prove than it looks, because Alice and Bob are allowed to interact with each other in arbitrary ways. The proof of this theorem is in Yao's [http://math.ucla.edu/~znorwood/290d.2.14s/papers/yao.pdf  celebrated paper in 1979 with a humble title]. It pioneered the field of communication complexity.


If the randomness is allowed, we can solve this problem up to a tolerable probabilistic error with significantly less communications. The inputs <math>a,b\in\{0,1\}^{n}</math> are two strings <math>a=a_0a_1\cdots a_{n-1}, b=b_0b_1\cdots b_{n-1}</math> of <math>n</math> bits. Let <math>k=\lceil\log_2 (2n)\rceil</math> and <math>p\in[2^k,2^{k+1}]</math> be an arbitrary prime number. (Such a prime <math>p</math> always exists.) The input strings <math>a,b</math> can be respectively represented as two polynomials <math>f,g\in\mathbb{Z}_p[x]</math> such that <math>f(x)=\sum_{i=0}^{n-1}a_ix^{i}</math> and <math>g(x)=\sum_{i=0}^{n-1}b_ix^{i}</math> of degree <math>n-1</math>, where all additions and multiplications are modulo <math>p</math>. The randomized communication protocol is given as follows:
If we allow randomness in protocols, and also tolerate a small probabilistic error, the problem can be solved with significantly less communications. To present this randomized protocol, we need a few preparations:
* We represent the inputs <math>a\in\{0,1\}^{n}</math> of Alice and Bob as two univariate polynomials of degree at most <math>n-1</math>, respectively  
::<math>f(x)=\sum_{i=0}^{n-1}a_ix^{i}</math> and <math>g(x)=\sum_{i=0}^{n-1}b_ix^{i}</math>.
* The two polynomials <math>f</math> and <math>g</math> are defined over finite field <math>\mathbb{Z}_p=\{0,1,\ldots,p-1\}</math> for some suitable prime <math>p</math> (to be specified later), which means the additions and multiplications are modulo <math>p</math>.
The randomized communication protocol is then as follows:
{{Theorem|A randomized protocol for EQ|
{{Theorem|A randomized protocol for EQ|
'''Alice does''':
'''Bob does''':
:* pick <math>r\in[p]</math> uniformly at random;
:* pick <math>r\in\mathbb{Z}_p</math> uniformly at random;
:* send <math>r</math> and <math>f(r)</math> to Bob;
:* send <math>r</math> and <math>g(r)</math> to Alice;
'''Upon receiving''' <math>r</math> and <math>f(r)</math> '''Bob does''':
'''Upon receiving''' <math>r</math> and <math>g(r)</math> '''Alice does''':
:* compute <math>f(r)</math>;
:* If <math>f(r)= g(r)</math> return "'''yes'''"; else return "'''no'''".
:* If <math>f(r)= g(r)</math> return "'''yes'''"; else return "'''no'''".
}}
}}
Repeat this protocol for 100 times.
The communication complexity of the protocol is given by the number of bits used to represent the values of <math>r</math> and <math>g(r)</math>.
The total number of bits to communicate is bounded by <math>200\log_2p=O(\log n)</math>. Due to the analysis of the randomized algorithm for PIT, if <math>a=b</math> the protocol is always correct and if <math>a\neq b</math> the protocol fails to report a difference with probability less than <math>2^{-100}</math>.
Since the polynomials are defined over finite field <math>\mathbb{Z}_p</math> and the random number <math>f</math> is also chosen from <math>\mathbb{Z}_p</math>, this is bounded by <math>O(\log p)</math>.  
 
On the other hand the protocol makes mistakes only when <math>a\neq b</math> but wrongly answers "no". This happens only when <math>f\not\equiv g</math> but <math>f(r)=g(r)</math>. The degrees of <math>f, g</math> are at most <math>n-1</math>, and <math>r</math> is chosen among <math>p</math> distinct values, we have
:<math>\Pr[f(r)=g(r)]\le \frac{n-1}{p}</math>.
 
By choosing <math>p</math> to be a prime in the interval <math>[n^2, 2n^2]</math> (by [https://en.wikipedia.org/wiki/Bertrand%27s_postulate Chebyshev's theorem], such prime <math>p</math> always exists), the above randomized communication protocol solves the Equality function EQ with an error probability of false positive at most <math>O(1/n)</math>, with communication complexity <math>O(\log n)</math>, an EXPONENTIAL improvement to ANY deterministic communication protocol!


== Schwartz-Zippel Theorem ==
== Schwartz-Zippel Theorem ==
Now let's move on to the the true form of '''Polynomial Identity Testing (PIT)''' which works on multi-variate polynomials:
Now let's see the the true form of '''Polynomial Identity Testing (PIT)''', for multivariate polynomials:
* '''Input:''' two <math>n</math>-variate polynomials <math>f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> of degree <math>d</math>.
* '''Input:''' two <math>n</math>-variate polynomials <math>f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> of degree <math>d</math>.
* '''Output:''' "yes" if <math>f\equiv g</math>, and "no" if otherwise.
* Determine whether <math>f\equiv g</math>.
The <math>\mathbb{F}[x_1,x_2,\ldots,x_n]</math> is the [http://en.wikipedia.org/wiki/Polynomial_ring#The_polynomial_ring_in_several_variables ring of multi-variate polynomials] over field <math>\mathbb{F}</math>. The most natural way to represent an <math>n</math>-variate polynomial of degree <math>d</math> is to write it as a sum of monomials:
The <math>\mathbb{F}[x_1,x_2,\ldots,x_n]</math> is the [http://en.wikipedia.org/wiki/Polynomial_ring#The_polynomial_ring_in_several_variables ring of multivariate polynomials] over field <math>\mathbb{F}</math>. An <math>n</math>-variate polynomial of degree <math>d</math>, written as a sum of monomials, is:
:<math>f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n}</math>.
:<math>f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n}</math>.
The '''degree''' or '''total degree''' of a monomial <math>a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n}</math> is given by <math>i_1+i_2+\cdots+i_n</math> and the degree of a polynomial <math>f</math> is the maximum degree of monomials of nonzero coefficients.
The '''degree''' or '''total degree''' of a monomial <math>a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n}</math> is given by <math>i_1+i_2+\cdots+i_n</math> and the degree of a polynomial <math>f</math> is the maximum degree of monomials of nonzero coefficients.


Alternatively, we can consider the following equivalent problem:
As before, we also consider the following equivalent problem:
* '''Input:''' a polynomial <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> of degree <math>d</math>.
* '''Input:''' a polynomial <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> of degree <math>d</math>.
* '''Output:''' "yes" if <math>f\equiv 0</math>, and "no" if otherwise.
* Determine whether <math>f\equiv 0</math>.


If <math>f</math> is written explicitly as a sum of monomials, then the problem is trivial. Again we allow <math>f</math> to be represented in product form.
If <math>f</math> is written explicitly as a sum of monomials, then the problem can be solved by checking whether all coefficients, and there at most <math>n^{d+1}</math> coefficients in an <math>n</math>-variate polynomial of degree at most <math>d</math>.  


A multivariate polynomial <math>f</math> can also be presented in its '''product form''', for example:
{{Theorem|Example|
{{Theorem|Example|
The [http://en.wikipedia.org/wiki/Vandermonde_matrix Vandermonde matrix] <math>M=M(x_1,x_2,\ldots,x_n)</math> is defined as that <math>M_{ij}=x_i^{j-1}</math>, that is
The [http://en.wikipedia.org/wiki/Vandermonde_matrix Vandermonde matrix] <math>M=M(x_1,x_2,\ldots,x_n)</math> is defined as that <math>M_{ij}=x_i^{j-1}</math>, that is
Line 96: Line 110:
f(x_1,\ldots,x_n)=\det(M)=\prod_{j<i}(x_i-x_j).
f(x_1,\ldots,x_n)=\det(M)=\prod_{j<i}(x_i-x_j).
</math>
</math>
It is pretty easy to evaluate <math>f(x_1,x_2,\ldots,x_n)</math> on any particular <math>x_1,x_2,\ldots,x_n</math>, however it is prohibitively expensive to symbolically expand <math>f(x_1,\ldots,x_n)</math> to its sum-of-monomial form.
}}
}}
For polynomials in product form, it is quite efficient to '''evaluate''' the polynomial at any specific point from the field over which the polynomial is defined, however, '''expanding''' the polynomial to a sum of monomials can be very expansive. 


Here is a very simple randomized algorithm, due to Schwartz and Zippel.
The following is a simple randomized algorithm for testing identity of multivariate polynomials:
{{Theorem|Randomized algorithm for multi-variate PIT|
{{Theorem|Randomized algorithm for multivariate PIT|
* fix an arbitrary set <math>S\subseteq \mathbb{F}</math> whose size to be fixed;
*suppose we have a finite subset <math>S\subseteq\mathbb{F}</math> (to be specified later);
* pick <math>r_1,r_2,\ldots,r_n\in S</math> uniformly and independently at random;
* pick <math>r_1,r_2,\ldots,r_n\in S</math> ''uniformly'' and ''independently'' at random;
* if <math>f(\vec{r})=f(r_1,r_2,\ldots,r_n) = 0</math> then return “yes” else return “no”;
* if <math>f(\vec{r})=f(r_1,r_2,\ldots,r_n) = 0</math> then return “yes” else return “no”;
}}
}}


This algorithm requires only the evaluation of <math>f</math> at a single point <math>\vec{r}</math>. And if <math>f\equiv 0</math> it is always correct.
This algorithm evaluates <math>f</math> at one point chosen uniformly from an <math>n</math>-dimensional cube <math>S^n</math>, where <math>S\subseteq\mathbb{F}</math> is a finite subset. And:
 
* If <math>f\equiv 0</math>, the algorithm always returns "yes", so it is always correct.
In the Theorem below, we’ll see that if <math>f\not\equiv 0</math> then the algorithm is incorrect with probability at most <math>\frac{d}{|S|}</math>, where <math>d</math> is the degree of the polynomial <math>f</math>.  
* If <math>f\not\equiv 0</math>, the algorithm may wrongly return "yes" (a '''false positive'''). But this happens only when the random <math>\vec{r}=(r_1,r_2,\ldots,r_n)</math> is a root of <math>f</math>. The probability of this bad event is upper bounded by the following famous result due to [https://pdfs.semanticscholar.org/b913/cf330852035f49b4ec5fe2db86c47d8a98fd.pdf Schwartz (1980)] and [http://www.cecm.sfu.ca/~monaganm/teaching/TopicsinCA15/zippel79.pdf Zippel (1979)].  


{{Theorem|Schwartz-Zippel Theorem|
{{Theorem|Schwartz-Zippel Theorem|
: Let <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> be a multivariate polynomial of degree <math>d</math> over a field <math>\mathbb{F}</math> such that <math>f\not\equiv 0</math>. Fix any finite set <math>S\subset\mathbb{F}</math>, and let <math>r_1,r_2\ldots,r_n</math> be chosen uniformly and independently at random from <math>S</math>. Then
: Let <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> be a multivariate polynomial of degree <math>d</math> over a field <math>\mathbb{F}</math>. If <math>f\not\equiv 0</math>, then for any finite set <math>S\subset\mathbb{F}</math>, and <math>r_1,r_2\ldots,r_n\in S</math> chosen uniformly and independently at random,
::<math>\Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}.</math>
::<math>\Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}.</math>
}}
}}
The Schwartz-Zippel Theorem states that for any nonzero <math>n</math>-variate polynomial of degree at most <math>d</math>, the number of roots in any cube <math>S^n</math> is at most <math>d\cdot |S|^{n-1}</math>.
Dana Moshkovitz gave a surprisingly simply and elegant [http://eccc.hpi-web.de/report/2010/096/ proof] of Schwartz-Zippel Theorem, using some advanced ideas. Now we introduce the standard proof by induction.
{{Proof|  
{{Proof|  
We prove by induction on <math>n</math> the number of variables.
The theorem is proved by induction on <math>n</math>.


For <math>n=1</math>, assuming that <math>f\not\equiv 0</math>, due to the fundamental theorem of algebra, the degree-<math>d</math> polynomial <math>f(x)</math> has at most <math>d</math> roots, thus  
;Induction basis''':'''
For <math>n=1</math>, this is the univariate case. Assume that <math>f\not\equiv 0</math>. Due to the fundamental theorem of algebra, any polynomial <math>f(x)</math> of degree at most <math>d</math> must have at most <math>d</math> roots, thus  
:<math>\Pr[f(r)=0]\le\frac{d}{|S|}.
:<math>\Pr[f(r)=0]\le\frac{d}{|S|}.
</math>
</math>


Assume the induction hypothesis for a multi-variate polynomial up to <math>n-1</math> variable.
;Induction hypothesis''':'''
Assume the theorem holds for any <math>m</math>-variate polynomials for all <math>m<n</math>.


An <math>n</math>-variate polynomial <math>f(x_1,x_2,\ldots,x_n)</math> can be represented as
;Induction step''':'''
For any <math>n</math>-variate polynomial <math>f(x_1,x_2,\ldots,x_n)</math> of degree at most <math>d</math>, we write <math>f</math> as
:<math>f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1})</math>,
:<math>f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1})</math>,
where <math>k</math> is the largest power of <math>x_n</math>, which means that the degree of <math>f_k</math> is at most <math>d-k</math> and <math>f_k\not\equiv 0</math>.
where <math>k</math> is the highest degree of <math>x_n</math>, which means the degree of <math>f_k</math> is at most <math>d-k</math> and <math>f_k\not\equiv 0</math>.


In particular, we write <math>f</math> as a sum of two parts:
In particular, we write <math>f</math> as a sum of two parts:
:<math>f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n)</math>,
:<math>f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n)</math>,
where both <math>f_k</math> and <math>\bar{f}</math> are polynomials, such that  
where both <math>f_k</math> and <math>\bar{f}</math> are polynomials, such that  
* as argued above, the degree of <math>f_k</math> is at most <math>d-k</math> and <math>f_k\not\equiv 0</math>;
* <math>f_k\not\equiv 0</math> is as above, whose degree is  at most <math>d-k</math>;
* <math>\bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1})</math>, thus <math>\bar{f}(x_1,x_2,\ldots,x_n)</math> has no <math>x_n^{k}</math> factor in any term.
* <math>\bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1})</math>, thus <math>\bar{f}(x_1,x_2,\ldots,x_n)</math> has no <math>x_n^{k}</math> factor in any term.


By the law of total probability, it holds that
By the law of total probability, we have
:<math>
:<math>
\begin{align}
\begin{align}
Line 152: Line 173:
</math>
</math>


For the second case, recall that <math>\bar{f}(x_1,\ldots,x_n)</math> has no <math>x_n^k</math> factor in any term, thus the condition <math>f_k(r_1,r_2,\ldots,r_{n-1})\neq0</math> guarantees that  
Now we look at the case conditioning on <math>f_k(r_1,r_2,\ldots,r_{n-1})\neq0</math>. Recall that <math>\bar{f}(x_1,\ldots,x_n)</math> has no <math>x_n^k</math> factor in any term, thus the condition <math>f_k(r_1,r_2,\ldots,r_{n-1})\neq0</math> guarantees that  
:<math>f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n)</math>
:<math>f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n)</math>
is a single-variate polynomial such that the degree of <math>g_{r_1,\ldots,r_{n-1}}(x_n)</math> is <math>k</math> and <math>g_{r_1,\ldots,r_{n-1}}\not\equiv 0</math>, for which we already known that the probability <math>g_{r_1,\ldots,r_{n-1}}(r_n)=0</math> is at most <math>\frac{k}{|S|}</math>.
is a nonzero univariate polynomial of <math>x_n</math> such that the degree of <math>g_{r_1,\ldots,r_{n-1}}(x_n)</math> is <math>k</math> and <math>g_{r_1,\ldots,r_{n-1}}\not\equiv 0</math>, for which we already known that the probability <math>g_{r_1,\ldots,r_{n-1}}(r_n)=0</math> is at most <math>\frac{k}{|S|}</math>.
Therefore,
Therefore,
:<math>
:<math>
Line 169: Line 190:
</math>
</math>
which proves the theorem.
which proves the theorem.
------
In above proof, for the second case that <math>f_k(r_1,\ldots,r_{n-1})\neq 0</math>, we use an "probabilistic arguement" to deal with the random choices in the condition. Here we give a more rigorous proof by enumerating all elementary events in applying the law of total probability. You make your own judgement which proof is better.
By the law of total probability,
:<math>
\begin{align}
&\Pr[f(\vec{r})=0]\\
=
&\sum_{x_1,\ldots,x_{n-1}\in S}\Pr[f(\vec{r})=0\mid \forall i<n, r_i=x_i]\cdot\Pr[\forall i<n, r_i=x_i]\\
=
&\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})=0}\Pr[f(\vec{r})=0\mid \forall i<n, r_i=x_i]\cdot\Pr[\forall i<n, r_i=x_i]\\
&+\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq0}\Pr[f(\vec{r})=0\mid \forall i<n, r_i=x_i]\cdot\Pr[\forall i<n, r_i=x_i]\\
\le
&\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})=0}\Pr[\forall i<n, r_i=x_i]\\
&+\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq 0}\Pr[f(x_1,\ldots,x_{n-1},r_n)=0\mid \forall i<n, r_i=x_i]\cdot\Pr[\forall i<n, r_i=x_i]\\
=
&\Pr[f_k(r_1,\ldots,r_{n-1})=0]+\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq 0}\Pr[f(x_1,\ldots,x_{n-1},r_n)=0]\cdot\Pr[\forall i<n, r_i=x_i].
\end{align}
</math>
We have argued that <math>f_k\not\equiv 0</math> and the degree of <math>f_k</math> is <math>d-k</math>. By the induction hypothesis, we have
:<math>
\Pr[f_k(r_1,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}.
</math>
And for every fixed <math>x_1,\ldots,x_{n-1}\in S</math> such that <math>f_k(x_1,\ldots,x_{n-1})\neq 0</math>, we have argued that <math>f(x_1,\ldots,x_{n-1},x_n)</math> is a polynomial in <math>x_n</math> of degree <math>k</math>, thus
:<math>
\Pr[f(x_1,\ldots,x_{n-1},r_n)=0]\le\frac{k}{|S|},
</math>
which holds for all <math>x_1,\ldots,x_{n-1}\in S</math> such that <math>f_k(x_1,\ldots,x_{n-1})\neq 0</math>, therefore the weighted average
:<math>
\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq 0}\Pr[f(x_1,\ldots,x_{n-1},r_n)=0]\cdot\Pr[\forall i<n, r_i=x_i]
\le\frac{k}{|S|}.
</math>
Substituting these inequalities back to the total probability, we have
<math>
\Pr[f(\vec{r})=0]
\le\frac{d-k}{|S|}+\frac{k}{|S|}
=\frac{d}{|S|}.
</math>
}}
}}


= Fingerprinting =
= Fingerprinting =
The Freivald's algorithm and Schwartz-Zippel theorem can be abstracted as the following procedure:
The polynomial identity testing algorithm in the Schwartz-Zippel theorem can be abstracted as the following framework:
Suppose we want to compare two items <math>Z_1</math> and <math>Z_2</math>. Instead of comparing them directly, we compute random '''fingerprints''' <math>\mathrm{FING}(Z_1)</math> and <math>\mathrm{FING}(Z_2)</math> of them and compare the fingerprints.  
Suppose we want to compare two objects <math>X</math> and <math>Y</math>. Instead of comparing them directly, we compute random '''fingerprints''' <math>\mathrm{FING}(X)</math> and <math>\mathrm{FING}(Y)</math> of them and compare the fingerprints.  
 
The fingerprints has the following properties:
The fingerprints has the following properties:
* <math>\mathrm{FING}(\cdot)</math> is a function, so if <math>Z_1= Z_2</math> then <math>\mathrm{FING}(Z_1)=\mathrm{FING}(Z_2)</math>.
* <math>\mathrm{FING}(\cdot)</math> is a function, meaning that if <math>X= Y</math> then <math>\mathrm{FING}(X)=\mathrm{FING}(Y)</math>.
* If <math>Z_1\neq Z_2</math> then <math>\Pr[\mathrm{FING}(Z_1)=\mathrm{FING}(Z_2)]</math> is small.
* It is much easier to compute and compare the fingerprints.
* It is much easier to compute and compare the fingerprints than to compare <math>Z_1</math> and <math>Z_2</math> directly.
* Ideally, the domain of fingerprints is much smaller than the domain of original objects, so storing and comparing fingerprints are easy. This means the fingerprint function <math>\mathrm{FING}(\cdot)</math> cannot be an injection (one-to-one mapping), so it's possible that different <math>X</math> and <math>Y</math> are mapped to the same fingerprint. We resolve this by making fingerprint function ''randomized'', and for <math>X\neq Y</math>, we want the probability <math>\Pr[\mathrm{FING}(X)=\mathrm{FING}(Y)]</math> to be small.


In Freivald's algorithm, the items to compare are two <math>n\times n</math> matrices <math>AB</math> and <math>C</math>, and given an <math>n\times n</math> matrix <math>M</math>, its random fingerprint is computed as <math>\mathrm{FING}(M)=Mr</math> for a uniformly random <math>r\in\{0,1\}^n</math>.
In Schwartz-Zippel theorem, the objects to compare are polynomials from <math>\mathbb{F}[x_1,\ldots,x_n]</math>. Given a polynomial <math>f\in \mathbb{F}[x_1,\ldots,x_n]</math>, its fingerprint is computed as <math>\mathrm{FING}(f)=f(r_1,\ldots,r_n)</math> for <math>r_i</math> chosen independently and uniformly at random from some fixed set <math>S\subseteq\mathbb{F}</math>.


In Schwartz-Zippel theorem, the items to compare are two polynomials <math>P_1(x_1,\ldots,x_n)</math> and <math>P_2(x_1,\ldots,x_n)</math>, and given a polynomial <math>Q(x_1,\ldots,x_n)</math>, its random fingerprint is computed as <math>\mathrm{FING}(Q)=Q(r_1,\ldots,r_n)</math> for <math>r_i</math> chosen independently and uniformly at random from some fixed set <math>S</math>.
With this generic framework, for various identity testing problems, we may design different fingerprints <math>\mathrm{FING}(\cdot)</math>.


For different problems, we may have different definitions of <math>\mathrm{FING}(\cdot)</math>.
== Communication protocols for Equality by fingerprinting==
 
== Communication complexity revisited==
Now consider again the communication model where the two players Alice with a private input <math>x\in\{0,1\}^n</math> and Bob with a private input <math>y\in\{0,1\}^n</math> together compute a function <math>f(x,y)</math> by running a communication protocol.  
Now consider again the communication model where the two players Alice with a private input <math>x\in\{0,1\}^n</math> and Bob with a private input <math>y\in\{0,1\}^n</math> together compute a function <math>f(x,y)</math> by running a communication protocol.  


Line 237: Line 217:


With the language of fingerprinting, this communication problem can be solved by the following generic scheme:
With the language of fingerprinting, this communication problem can be solved by the following generic scheme:
* Alice choose a random fingerprint function <math>\mathrm{FING}(\cdot)</math> and compute the fingerprint of her input <math>\mathrm{FING}(x)</math>;
{{Theorem|Communication protocol for EQ by fingerprinting|
* Alice sends both the description of <math>\mathrm{FING}(\cdot)</math> and the value of <math>\mathrm{FING}(x)</math> to Bob;
'''Bob does''':
* Bob computes <math>\mathrm{FING}(y)</math> and check whether <math>\mathrm{FING}(x)=\mathrm{FING}(y)</math>.
:* choose a random fingerprint function <math>\mathrm{FING}(\cdot)</math> and compute the fingerprint of her input <math>\mathrm{FING}(y)</math>;
In this way we have a randomized communication protocol for the equality function EQ with a false positive. The communication cost as well as the error probability are reduced to the question of how to design this random fingerprint function <math>\mathrm{FING}(\cdot)</math> to guarantee:
:* sends both the description of <math>\mathrm{FING}(\cdot)</math> and the value of <math>\mathrm{FING}(y)</math> to Alice;
# A random <math>\mathrm{FING}(\cdot)</math> can be described succinctly.
'''Upon receiving''' the description of <math>\mathrm{FING}(\cdot)</math> and the value of <math>\mathrm{FING}(y)</math>, '''Alice does''':
:* computes <math>\mathrm{FING}(x)</math> and check whether <math>\mathrm{FING}(x)=\mathrm{FING}(y)</math>.
}}
In this way we have a randomized communication protocol for the equality function EQ with false positive. The communication cost as well as the error probability are reduced to the question of how to design this random fingerprint function <math>\mathrm{FING}(\cdot)</math> to guarantee:
# A random fingerprint function <math>\mathrm{FING}(\cdot)</math> can be described succinctly.
# The range of <math>\mathrm{FING}(\cdot)</math> is small, so the fingerprints are succinct.
# The range of <math>\mathrm{FING}(\cdot)</math> is small, so the fingerprints are succinct.
# If <math>x\neq y</math>, the probability <math>\Pr[\mathrm{FING}(x)=\mathrm{FING}(y)]</math> is small.
# If <math>x\neq y</math>, the probability <math>\Pr[\mathrm{FING}(x)=\mathrm{FING}(y)]</math> is small.


In above application of single-variate PIT, we know that <math>\mathrm{FING}(x)=\sum_{i=1}^n x_i r^{i}</math>, where <math>r</math> is a random element from a finite field and the additions and multiplications are defined over the finite field, is a good fingerprint function. Now we introduce another fingerprint and hence a new communication protocol.
=== Fingerprinting by PIT===
As before, we can define the fingerprint function as: for any bit-string <math>x\in\{0,1\}^n</math>,  its random fingerprint is <math>\mathrm{FING}(x)=\sum_{i=1}^n x_i r^{i}</math>, where the additions and multiplications are defined over a finite field <math>\mathbb{Z}_p</math>, and <math>r</math> is chosen uniformly at random from <math>\mathbb{Z}_p</math>, where <math>p</math> is some suitable prime which can be represented in <math>\Theta(\log n)</math> bits. More specifically, we can choose <math>p</math> to be any prime from the interval <math>[n^2, 2n^2]</math>. Due to Chebyshev's theorem, such prime must exist.
 
As we have shown before, it takes <math>O(\log p)=O(\log n)</math> bits to represent <math>\mathrm{FING}(y)</math> and to describe the random function <math>\mathrm{FING}(\cdot)</math> (since it a random function <math>\mathrm{FING}(\cdot)</math> from this family is uniquely identified by a random <math>r\in\mathbb{Z}_p</math>, which can be represented within <math>\log p=O(\log n)</math> bits). And it follows easily from the fundamental theorem of algebra that for any distinct <math>x, y\in\{0,1\}^n</math>,
:<math>
\Pr[\mathrm{FING}(x)=\mathrm{FING}(y)] \le \frac{n-1}{p}\le \frac{1}{n}.
</math>


The new fingerprint function we design is as follows: by treating the input string <math>x\in\{0,1\}^n</math> as the binary representation of a number, let <math>\mathrm{FING}(x)=x\bmod p</math> for some random prime <math>p</math>. The prime <math>p</math> can uniquely specify a random fingerprint function <math>\mathrm{FING}(\cdot)</math>, thus can be used as a description of the function, and alos the range of the fingerprints is <math>[p]</math>, thus we want the prime <math>p</math> to be reasonably small, but still has a good chance to distinguish different <math>x</math> and <math>y</math> after modulo <math>p</math>.
=== Fingerprinting by randomized checksum===
Now we consider a new fingerprint function: We treat each input string <math>x\in\{0,1\}^n</math> as the binary representation of a number, and let <math>\mathrm{FING}(x)=x\bmod p</math> for some random prime <math>p</math> chosen from <math>[k]=\{0,1,\ldots,k-1\}</math>, for some <math>k</math> to be specified later.  


{{Theorem|A randomized protocol for EQ|
Now a random fingerprint function <math>\mathrm{FING}(\cdot)</math> can be uniquely identified by this random prime <math>p</math>. The new communication protocol for EQ with this fingerprint is as follows:
'''Alice does''':
{{Theorem|Communication protocol for EQ by random checksum|
'''Bob does''':
:for some parameter <math>k</math> (to be specified),  
:for some parameter <math>k</math> (to be specified),  
:* choose uniformly at random a prime <math>p\in[k]</math>;
:* choose a prime <math>p\in[k]</math> uniformly at random;
:* send <math>p</math> and <math>x\bmod p</math> to Bob;
:* send <math>p</math> and <math>x\bmod p</math> to Alice;
'''Upon receiving''' <math>p</math> and <math>x\bmod p</math>, '''Bob does''':
'''Upon receiving''' <math>p</math> and <math>x\bmod p</math>, '''Alice does''':
:* check whether <math>x\bmod p=y\bmod p</math>.
:* check whether <math>x\bmod p=y\bmod p</math>.
}}
}}
The number of bits to be communicated is <math>O(\log k)</math>. We then bound the probability of error <math>\Pr[x\bmod p=y\bmod p]</math> for <math>x\neq y</math>, in terms of <math>k</math>.  
The number of bits to be communicated is obviously <math>O(\log k)</math>. When <math>x\neq y</math>, we want to upper bound the error probability <math>\Pr[x\bmod p=y\bmod p]</math>.


Suppose without loss of generality <math>x>y</math>. Let <math>z=x-y</math>. Then <math>z<2^n</math> since <math>x,y\in[2^n]</math>, and  
Suppose without loss of generality <math>x>y</math>. Let <math>z=x-y</math>. Then <math>z<2^n</math> since <math>x,y\in[2^n]</math>, and  
<math>z\neq 0</math> for <math>x\neq y</math>.  It holds that <math>x\bmod p=y\bmod p</math> if and only if <math>z</math> is dividable by <math>p</math>. Note that <math>z<2^n</math> since <math>x,y\in[2^n]</math>. We only need to bound the probability
<math>z\neq 0</math> for <math>x\neq y</math>.  It holds that <math>x\equiv y\pmod p</math> if and only if <math>p\mid z</math>. Therefore, we only need to upper bound the probability
:<math>\Pr[z\bmod p=0]</math> for <math>0<z<2^n</math>, where <math>p</math> is a random prime chosen from <math>[k]</math>.
:<math>\Pr[z\bmod p=0]</math>
for an arbitrarily fixed <math>0<z<2^n</math>, and a uniform random prime <math>p\in[k]</math>.


The probability <math>\Pr[z\bmod p=0]</math> is computed directly as
The probability <math>\Pr[z\bmod p=0]</math> is computed directly as
:<math>\Pr[z\bmod p=0]\le\frac{\mbox{the number of prime divisors of }z}{\mbox{the number of primes in }[k]}</math>.
:<math>\Pr[z\bmod p=0]\le\frac{\mbox{the number of prime divisors of }z}{\mbox{the number of primes in }[k]}</math>.


For the numerator, we have the following lemma.
For the numerator, any positive <math>z<2^n</math> has at most <math>n</math> prime factors. To see this, by contradiction assume that <math>z</math> has more than <math>n</math> prime factors. Note that any prime number is at least 2. Then <math>z</math> must be greater than <math>2^n</math>, contradicting the fact that <math>z<2^n</math>.
 
For the denominator, we need to lower bound the number of primes in <math>[k]</math>. This is given by the celebrated [http://en.wikipedia.org/wiki/Prime_number_theorem '''Prime Number Theorem (PNT)'''].


{{Theorem|Lemma|
{{Theorem|Prime Number Theorem|
:The number of distinct prime divisors of any natural number less than <math>2^n</math> is at most <math>n</math>.
:Let <math>\pi(k)</math> denote the number of primes less than <math>k</math>. Then <math>\pi(k)\sim\frac{k}{\ln k}</math> as <math>k\rightarrow\infty</math>.
}}
{{Proof| Each prime number is <math>\ge2</math>. If an <math>N>0</math> has more than <math>n</math> distinct prime divisors, then <math>N\ge 2^n</math>.
}}
}}


Due to this lemma, <math>z</math> has at most <math>n</math> prime divisors.
Therefore, by choosing <math>k=2n^2\ln n</math>, we have that for a <math>0<z<2^n</math>, and a random prime <math>p\in[k]</math>,
:<math>\Pr[z\bmod p=0]\le\frac{n}{\pi(k)}\sim\frac{1}{n}</math>,
which means the for any inputs <math>x,y\in\{0,1\}^n</math>, if <math>x\neq y</math>, then the false positive is bounded as
:<math>\Pr[\mathrm{FING}(x)=\mathrm{FING}(y)]\le\Pr[|x-y|\bmod p=0]\le \frac{1}{n}</math>.
Moreover, by this choice of parameter <math>k=2n^2\ln n</math>, the communication complexity of the protocol is bounded by <math>O(\log k)=O(\log n)</math>.
 
= Checking distinctness =
Consider the following problem of '''checking distinctness''':
*Given a sequence <math>x_1,x_2,\ldots,x_n\in\{1,2,\ldots,n\}</math>, check whether every element of <math>\{1,2,\ldots,n\}</math> appears '''exactly''' once.


We then lower bound the number of primes in <math>[k]</math>. This is given by the celebrated [http://en.wikipedia.org/wiki/Prime_number_theorem '''Prime Number Theorem (PNT)'''].
Obviously this problem can be solved in linear time and linear space (in addition to the space for storing the input) by maintaining a <math>n</math>-bit vector that indicates which numbers among <math>\{1,2,\ldots,n\}</math> have appeared.


{{Theorem|Prime Number Theorem|
When this <math>n</math> is enormously large, <math>\Omega(n)</math> space cost is too expensive. We wonder whether we could solve this problem with a space cost (in addition to the space for storing the input) much less than <math>O(n)</math>. This can be done by fingerprinting if we tolerate a certain degree of inaccuracy.
:Let <math>\pi(k)</math> denote the number of primes less than <math>k</math>. Then <math>\pi(k)\sim\frac{k}{\ln k}</math> as <math>k\rightarrow\infty</math>.
}}


Therefore, by choosing <math>k=tn\ln tn</math> for some <math>t</math>, we have that for a <math>0<z<2^n</math>, and a random prime <math>p\in[k]</math>,
== Fingerprinting multisets ==
:<math>\Pr[z\bmod p=0]\le\frac{n}{\pi(k)}\sim\frac{1}{t}</math>.
We consider the following more generalized problem, '''checking identity of multisets''':
We can make this error probability polynomially small and the number of bits to be communicated is still <math>O(\log k)=O(\log n)</math>.
* '''Input:''' two multisets <math>A=\{a_1,a_2,\ldots, a_n\}</math> and <math>B=\{b_1,b_2,\ldots, b_n\}</math> where <math>a_1,a_2,\ldots,b_1,b_2,\ldots,b_n\in \{1,2,\ldots,n\}</math>.
* Determine whether <math>A=B</math> (multiset equivalence).


== Randomized pattern matching ==
Here for a '''multiset''' <math>A=\{a_1,a_2,\ldots, a_n\}</math>, its elements <math>a_i</math> are not necessarily distinct. The '''multiplicity''' of an element <math>a_i</math> in a multiset <math>A</math> is the number of times <math>a_i</math> appears in <math>A</math>. Two multisets <math>A</math> and <math>B</math> are equivalent if they contain the same set of elements and the multiplicities of every element in <math>A</math> and <math>B</math> are equal.  
Consider the following problem of pattern matching, which has nothing to do with communication complexity.


*Input: a string <math>x\in\{0,1\}^n</math> and a "pattern" <math>y\in\{0,1\}^m</math>.
Obviously the above problem of checking distinctness can be treated as a special case of checking identity of multisets: by checking the identity of the multiset <math>A</math> and set <math>\{1,2,\ldots, n\}</math>.
*Determine whether the pattern <math>y</math> is a contiguous  substring of <math>x</math>. Usually, we are also asked to find the location of the substring.


A naive algorithm trying every possible match runs in <math>O(nm)</math> time. The more sophisticated KMP algorithm inspired by automaton theory runs in <math>O(n+m)</math> time.
The following fingerprinting function for multisets was introduced by Lipton for solving multiset identity testing.


A simple randomized algorithm, due to Karp and Rabin, uses the idea of fingerprinting and also runs in <math>O(n + m)</math> time.
{{Theorem|Fingerprint for multiset|
:Let <math>p</math> be a uniform random prime chosen from the interval <math>[(n\log n)^2,2(n\log n)^2]</math>. By Chebyshev's theorem, such prime must exist. And consider the the finite field <math>\mathbb{Z}_p=[p]</math>.


Let <math>X(j)=x_jx_{j+1}\cdots x_{j+m-1}</math> denote the substring of <math>x</math> of length <math>m</math> starting at position <math>j</math>.
:Given a multiset <math>A=\{a_1,a_2,\ldots,a_n\}</math>, we define a univariate polynomial <math>f_A\in\mathbb{Z}_p[x]</math> over the finite field <math>\mathbb{Z}_p</math> as follows:
::<math>f_A(x)=\prod_{i=1}^n(x-a_i)</math>,
:where <math>+</math> and <math>\cdot</math> are defined over the finite field <math>\mathbb{Z}_p</math>.


{{Theorem|Algorithm (Karp-Rabin)|
:We then define the random fingerprinting function as:
:pick a random prime <math>p\in[k]</math>;
::<math>\mathrm{FING}(A)=f_A(r)=\prod_{i=1}^n(r-a_i)</math>,
:'''for''' <math>j = 1</math> to <math>n -m + 1</math> '''do'''
:where <math>r</math> is chosen uniformly at random from <math>\mathbb{Z}_p</math>.
::'''if''' <math>X(j)\bmod p = y \bmod p</math> then report a match;
:'''return''' "no match";
}}
}}


So the algorithm just compares the <math>\mathrm{FING}(X(j))</math> and <math>\mathrm{FING}(y)</math> for every <math>j</math>, with the same definition of fingerprint function <math>\mathrm{FING}(\cdot)</math> as in the communication protocol for EQ.
Since all computations of <math>\mathrm{FING}(A)=\prod_{i=1}^n(r-a_i)</math> are over the finite field <math>\mathbb{Z}_p</math>, the space cost for computing the fingerprint <math>\mathrm{FING}(A)</math> is only <math>O(\log p)=O(\log n)</math>.


By the same analysis, by choosing <math>k=n^2m\ln (n^2m)</math>, the probability of a single false match is
Moreover, the fingerprinting function <math>\mathrm{FING}(A)</math> is invariant under permutation of elements of the multiset <math>A=\{a_1,a_2,\ldots,a_n\}</math>, thus it is indeed a function of multisets (meaning every multiset has only one fingerprint). Therefore, if <math>A=B</math> then <math>\mathrm{FING}(A)=\mathrm{FING}(B)</math>.
:<math>\Pr[X(j)\bmod p=y\bmod p\mid X(j)\neq y ]=O\left(\frac{1}{n^2}\right)</math>.
By the union bound, the probability that a false match occurs is <math>O\left(\frac{1}{n}\right)</math>.


The algorithm runs in linear time if we assume that we can compute <math>X(j)\bmod p </math> for each <math>j</math> in constant time. This outrageous assumption can be made realistic by the following observation.
For two distinct multisets <math>A\neq B</math>, it is possible that <math>\mathrm{FING}(A)=\mathrm{FING}(B)</math>, but the following theorem due to Lipton bounds this error probability of false positive.


{{Theorem|Lemma|
{{Theorem|Theorem (Lipton 1989)|
:Let <math>\mathrm{FING}(a)=a\bmod p</math>.
:Let <math>A=\{a_1,a_2,\ldots,a_n\}</math> and <math>B=\{b_1,b_2,\ldots,b_n\}</math> be two multisets whose elements are from <math>\{1,2,\ldots,n\}</math>. If <math>A\neq B</math>, then
::<math>\mathrm{FING}(X(j+1))\equiv2(\mathrm{FING}(X(j))-2^{m-1}x_j)+x_{j+m}\pmod p\,</math>.
::<math>\Pr[\mathrm{FING}(A)= \mathrm{FING}(B)]=O\left(\frac{1}{n}\right)</math>.
}}
}}
{{Proof| It holds that  
{{Proof|  
:<math>X(j+1)=2(X(j)-2^{m-1}x_j)+x_{j+m}\,</math>.
Let <math>\tilde{f}_A(x)=\prod_{i=1}^n(x-a_i)</math> and <math>\tilde{f}_B(x)=\prod_{i=1}^n(x-b_i)</math> be two univariate polynomials defined over reals <math>\mathbb{R}</math>. Note that in contrast to <math>f_A(x)</math> and <math>f_B(x)</math>,  the <math>+</math> and <math>\cdot</math> in <math>\tilde{f}_A(x), \tilde{f}_B(x)</math> do not modulo <math>p</math>. It is easy to verify that the polynomials <math>\tilde{f}_A(x), \tilde{f}_B(x)</math> have the following properties:
So the equation holds on the finite field modulo <math>p</math>.
*<math>\tilde{f}_A\equiv \tilde{f}_B</math> if and only if <math>A=B</math>. Here <math>A=B</math> means the multiset equivalence.
*By the properties of finite field, for any value <math>r\in\mathbb{Z}_p</math>, it holds that <math>f_A(r)=\tilde{f}_A(r)\bmod p</math> and <math>f_B(r)=\tilde{f}_B(r)\bmod p</math>.
 
Therefore, assuming that <math>A\neq B</math>, we must have <math>\tilde{f}_A(x)\not\equiv \tilde{f}_B(x)</math>. Then by the law of total probability:
:<math>
\begin{align}
\Pr[\mathrm{FING}(A)= \mathrm{FING}(B)]
&=
\Pr\left[f_A(r)=f_B(r)\mid f_A\not\equiv f_B\right]\Pr[f_A\not\equiv f_B]\\
&\quad\,\,+\Pr\left[f_A(r)=f_B(r)\mid f_A\equiv f_B\right]\Pr[f_A\equiv f_B]\\
&\le \Pr\left[f_A(r)=f_B(r)\mid f_A\not\equiv f_B\right]+\Pr[f_A\equiv f_B].
\end{align}
</math>
Note that the degrees of <math>f_A,f_B</math> are at most <math>n</math> and <math>r</math> is chosen uniformly from <math>[p]</math>. By the Schwartz-Zippel theorem for univariate polynomials, the first probability
:<math>
\Pr\left[f_A(r)=f_B(r)\mid f_A\not\equiv f_B\right]\le \frac{n}{p}=o\left(\frac{1}{n}\right),
</math>
since <math>p</math> is chosen from the interval <math>[(n\log n)^2,2(n\log n)^2]</math>.
 
For the second probability <math>\Pr[f_A\equiv f_B]</math>, recall that <math>\tilde{f}_A\not\equiv \tilde{f}_B</math>, therefore there is at least a non-zero coefficient <math>c\le n^n</math> in <math>\tilde{f}_A-\tilde{f}_B</math>. The event <math>f_A\equiv f_B</math> occurs only if <math>c\bmod p=0</math>, which means
:<math>
\begin{align}
\Pr[f_A\equiv f_B]
&\le \Pr[c\bmod p=0]\\
&=\frac{\text{number of prime factors of }c}{\text{number of primes in }[(n\log n)^2,2(n\log n)^2]}\\
&\le \frac{n\log_2n}{\pi(2(n\log n)^2)-\pi((n\log n)^2)}.
\end{align}
</math>
By the prime number theorem, <math>\pi(N)\rightarrow \frac{N}{\ln N}</math> as <math>N\to\infty</math>. Therefore,
:<math>
\Pr[f_A\equiv f_B]=O\left(\frac{n\log n}{n^2\log n}\right)=O\left(\frac{1}{n}\right).
</math>
Combining everything together, we have
<math>\Pr[\mathrm{FING}(A)= \mathrm{FING}(B)]=O\left(\frac{1}{n}\right)</math>.
}}
}}
Due to this lemma, each fingerprint <math>\mathrm{FING}(X(j))</math> can be computed in an incremental way, each in constant time. The running time of the algorithm is <math>O(n+m)</math>.

Latest revision as of 05:55, 26 November 2016

Polynomial Identity Testing (PIT)

The Polynomial Identity Testing (PIT) is such a problem: given as input two polynomials, determine whether they are identical. It plays a fundamental role in Identity Testing problems.

First, let's consider the univariate ("one variable") case:

  • Input: two polynomials [math]\displaystyle{ f, g\in\mathbb{F}[x] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv g }[/math] ([math]\displaystyle{ f }[/math] and [math]\displaystyle{ g }[/math] are identical).

Here the [math]\displaystyle{ \mathbb{F}[x] }[/math] denotes the ring of univariate polynomials on a field [math]\displaystyle{ \mathbb{F} }[/math]. More precisely, a polynomial [math]\displaystyle{ f\in\mathbb{F}[x] }[/math] is

[math]\displaystyle{ f(x)=\sum_{i=0}^\infty a_ix^i }[/math],

where the coefficients [math]\displaystyle{ a_i }[/math] are taken from the field [math]\displaystyle{ \mathbb{F} }[/math], and the addition and multiplication are also defined over the field [math]\displaystyle{ \mathbb{F} }[/math]. And:

  • the degree of [math]\displaystyle{ f }[/math] is the highest [math]\displaystyle{ i }[/math] with non-zero [math]\displaystyle{ a_i }[/math];
  • a polynomial [math]\displaystyle{ f }[/math] is a zero-polynomial, denoted as [math]\displaystyle{ f\equiv 0 }[/math], if all coefficients [math]\displaystyle{ a_i=0 }[/math].

Alternatively, we can consider the following equivalent problem by comparing the polynomial [math]\displaystyle{ f-g }[/math] (whose degree is at most [math]\displaystyle{ d }[/math]) with the zero-polynomial:

  • Input: a polynomial [math]\displaystyle{ f\in\mathbb{F}[x] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv 0 }[/math] ([math]\displaystyle{ f }[/math] is the 0 polynomial).

The probalem is trivial if the input polynomial [math]\displaystyle{ f }[/math] is given explicitly: one can trivially solve the problem by checking whether all [math]\displaystyle{ d+1 }[/math] coefficients are [math]\displaystyle{ 0 }[/math]. To make the problem nontrivial, we assume that the input polynomial is given implicitly as a black box (also called an oracle): the only way the algorithm can access to [math]\displaystyle{ f }[/math] is to evaluate [math]\displaystyle{ f(x) }[/math] over some [math]\displaystyle{ x }[/math] from the field [math]\displaystyle{ \mathbb{F} }[/math], [math]\displaystyle{ x }[/math] chosen by the algorithm.

A straightforward deterministic algorithm is to evaluate [math]\displaystyle{ f(x_1),f(x_2),\ldots,f(x_{d+1}) }[/math] over [math]\displaystyle{ d+1 }[/math] distinct elements [math]\displaystyle{ x_1,x_2,\ldots,x_{d+1} }[/math] from the field [math]\displaystyle{ \mathbb{F} }[/math] and check whether they are all zero. By the fundamental theorem of algebra, also known as polynomial interpolations, this guarantees to verify whether a degree-[math]\displaystyle{ d }[/math] univariate polynomial [math]\displaystyle{ f\equiv 0 }[/math].

Fundamental Theorem of Algebra
Any non-zero univariate polynomial of degree [math]\displaystyle{ d }[/math] has at most [math]\displaystyle{ d }[/math] roots.

The reason for this fundamental theorem holding generally over any field [math]\displaystyle{ \mathbb{F} }[/math] is that any univariate polynomial of degree [math]\displaystyle{ d }[/math] factors uniquely into at most [math]\displaystyle{ d }[/math] irreducible polynomials, each of which has at most one root.

The following simple randomized algorithm is natural:

Algorithm for PIT
  • suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
  • pick [math]\displaystyle{ r\in S }[/math] uniformly at random;
  • if [math]\displaystyle{ f(r) = 0 }[/math] then return “yes” else return “no”;

This algorithm evaluates [math]\displaystyle{ f }[/math] at one point chosen uniformly at random from a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math]. It is easy to see the followings:

  • If [math]\displaystyle{ f\equiv 0 }[/math], the algorithm always returns "yes", so it is always correct.
  • If [math]\displaystyle{ f\not\equiv 0 }[/math], the algorithm may wrongly return "yes" (a false positive). But this happens only when the random [math]\displaystyle{ r }[/math] is a root of [math]\displaystyle{ f }[/math]. By the fundamental theorem of algebra, [math]\displaystyle{ f }[/math] has at most [math]\displaystyle{ d }[/math] roots, so the probability that the algorithm is wrong is bounded as
[math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]

By fixing [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] to be an arbitrary subset of size [math]\displaystyle{ |S|=2d }[/math], this probability of false positive is at most [math]\displaystyle{ 1/2 }[/math]. We can reduce it to an arbitrarily small constant [math]\displaystyle{ \delta }[/math] by repeat the above testing independently for [math]\displaystyle{ \log_2 \frac{1}{\delta} }[/math] times, since the error probability decays geometrically as we repeat the algorithm independently.

Communication Complexity of Equality

The communication complexity is introduced by Andrew Chi-Chih Yao as a model of computation with more than one entities, each with partial information about the input.

Assume that there are two entities, say Alice and Bob. Alice has a private input [math]\displaystyle{ a }[/math] and Bob has a private input [math]\displaystyle{ b }[/math]. Together they want to compute a function [math]\displaystyle{ f(a,b) }[/math] by communicating with each other. The communication follows a predefined communication protocol (the "algorithm" in this model). The complexity of a communication protocol is measured by the number of bits communicated between Alice and Bob in the worst case.

The problem of checking identity is formally defined by the function EQ as follows: [math]\displaystyle{ \mathrm{EQ}:\{0,1\}^n\times\{0,1\}^n\rightarrow\{0,1\} }[/math] and for any [math]\displaystyle{ a,b\in\{0,1\}^n }[/math],

[math]\displaystyle{ \mathrm{EQ}(a,b)= \begin{cases} 1& \mbox{if } a=b,\\ 0& \mbox{otherwise.} \end{cases} }[/math]

A trivial way to solve EQ is to let Bob send his entire input string [math]\displaystyle{ b }[/math] to Alice and let Alice check whether [math]\displaystyle{ a=b }[/math]. This costs [math]\displaystyle{ n }[/math] bits of communications.

It is known that for deterministic communication protocols, this is the best we can get for computing EQ.

Theorem (Yao 1979)
Any deterministic communication protocol computing EQ on two [math]\displaystyle{ n }[/math]-bit strings costs [math]\displaystyle{ n }[/math] bits of communication in the worst-case.

This theorem is much more nontrivial to prove than it looks, because Alice and Bob are allowed to interact with each other in arbitrary ways. The proof of this theorem is in Yao's celebrated paper in 1979 with a humble title. It pioneered the field of communication complexity.

If we allow randomness in protocols, and also tolerate a small probabilistic error, the problem can be solved with significantly less communications. To present this randomized protocol, we need a few preparations:

  • We represent the inputs [math]\displaystyle{ a\in\{0,1\}^{n} }[/math] of Alice and Bob as two univariate polynomials of degree at most [math]\displaystyle{ n-1 }[/math], respectively
[math]\displaystyle{ f(x)=\sum_{i=0}^{n-1}a_ix^{i} }[/math] and [math]\displaystyle{ g(x)=\sum_{i=0}^{n-1}b_ix^{i} }[/math].
  • The two polynomials [math]\displaystyle{ f }[/math] and [math]\displaystyle{ g }[/math] are defined over finite field [math]\displaystyle{ \mathbb{Z}_p=\{0,1,\ldots,p-1\} }[/math] for some suitable prime [math]\displaystyle{ p }[/math] (to be specified later), which means the additions and multiplications are modulo [math]\displaystyle{ p }[/math].

The randomized communication protocol is then as follows:

A randomized protocol for EQ

Bob does:

  • pick [math]\displaystyle{ r\in\mathbb{Z}_p }[/math] uniformly at random;
  • send [math]\displaystyle{ r }[/math] and [math]\displaystyle{ g(r) }[/math] to Alice;

Upon receiving [math]\displaystyle{ r }[/math] and [math]\displaystyle{ g(r) }[/math] Alice does:

  • compute [math]\displaystyle{ f(r) }[/math];
  • If [math]\displaystyle{ f(r)= g(r) }[/math] return "yes"; else return "no".

The communication complexity of the protocol is given by the number of bits used to represent the values of [math]\displaystyle{ r }[/math] and [math]\displaystyle{ g(r) }[/math]. Since the polynomials are defined over finite field [math]\displaystyle{ \mathbb{Z}_p }[/math] and the random number [math]\displaystyle{ f }[/math] is also chosen from [math]\displaystyle{ \mathbb{Z}_p }[/math], this is bounded by [math]\displaystyle{ O(\log p) }[/math].

On the other hand the protocol makes mistakes only when [math]\displaystyle{ a\neq b }[/math] but wrongly answers "no". This happens only when [math]\displaystyle{ f\not\equiv g }[/math] but [math]\displaystyle{ f(r)=g(r) }[/math]. The degrees of [math]\displaystyle{ f, g }[/math] are at most [math]\displaystyle{ n-1 }[/math], and [math]\displaystyle{ r }[/math] is chosen among [math]\displaystyle{ p }[/math] distinct values, we have

[math]\displaystyle{ \Pr[f(r)=g(r)]\le \frac{n-1}{p} }[/math].

By choosing [math]\displaystyle{ p }[/math] to be a prime in the interval [math]\displaystyle{ [n^2, 2n^2] }[/math] (by Chebyshev's theorem, such prime [math]\displaystyle{ p }[/math] always exists), the above randomized communication protocol solves the Equality function EQ with an error probability of false positive at most [math]\displaystyle{ O(1/n) }[/math], with communication complexity [math]\displaystyle{ O(\log n) }[/math], an EXPONENTIAL improvement to ANY deterministic communication protocol!

Schwartz-Zippel Theorem

Now let's see the the true form of Polynomial Identity Testing (PIT), for multivariate polynomials:

  • Input: two [math]\displaystyle{ n }[/math]-variate polynomials [math]\displaystyle{ f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv g }[/math].

The [math]\displaystyle{ \mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] is the ring of multivariate polynomials over field [math]\displaystyle{ \mathbb{F} }[/math]. An [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math], written as a sum of monomials, is:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math].

The degree or total degree of a monomial [math]\displaystyle{ a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math] is given by [math]\displaystyle{ i_1+i_2+\cdots+i_n }[/math] and the degree of a polynomial [math]\displaystyle{ f }[/math] is the maximum degree of monomials of nonzero coefficients.

As before, we also consider the following equivalent problem:

  • Input: a polynomial [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv 0 }[/math].

If [math]\displaystyle{ f }[/math] is written explicitly as a sum of monomials, then the problem can be solved by checking whether all coefficients, and there at most [math]\displaystyle{ n^{d+1} }[/math] coefficients in an [math]\displaystyle{ n }[/math]-variate polynomial of degree at most [math]\displaystyle{ d }[/math].

A multivariate polynomial [math]\displaystyle{ f }[/math] can also be presented in its product form, for example:

Example

The Vandermonde matrix [math]\displaystyle{ M=M(x_1,x_2,\ldots,x_n) }[/math] is defined as that [math]\displaystyle{ M_{ij}=x_i^{j-1} }[/math], that is

[math]\displaystyle{ M=\begin{bmatrix} 1 & x_1 & x_1^2 & \dots & x_1^{n-1}\\ 1 & x_2 & x_2^2 & \dots & x_2^{n-1}\\ 1 & x_3 & x_3^2 & \dots & x_3^{n-1}\\ \vdots & \vdots & \vdots & \ddots &\vdots \\ 1 & x_n & x_n^2 & \dots & x_n^{n-1} \end{bmatrix} }[/math].

Let [math]\displaystyle{ f }[/math] be the polynomial defined as

[math]\displaystyle{ f(x_1,\ldots,x_n)=\det(M)=\prod_{j\lt i}(x_i-x_j). }[/math]

For polynomials in product form, it is quite efficient to evaluate the polynomial at any specific point from the field over which the polynomial is defined, however, expanding the polynomial to a sum of monomials can be very expansive.

The following is a simple randomized algorithm for testing identity of multivariate polynomials:

Randomized algorithm for multivariate PIT
  • suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
  • pick [math]\displaystyle{ r_1,r_2,\ldots,r_n\in S }[/math] uniformly and independently at random;
  • if [math]\displaystyle{ f(\vec{r})=f(r_1,r_2,\ldots,r_n) = 0 }[/math] then return “yes” else return “no”;

This algorithm evaluates [math]\displaystyle{ f }[/math] at one point chosen uniformly from an [math]\displaystyle{ n }[/math]-dimensional cube [math]\displaystyle{ S^n }[/math], where [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] is a finite subset. And:

  • If [math]\displaystyle{ f\equiv 0 }[/math], the algorithm always returns "yes", so it is always correct.
  • If [math]\displaystyle{ f\not\equiv 0 }[/math], the algorithm may wrongly return "yes" (a false positive). But this happens only when the random [math]\displaystyle{ \vec{r}=(r_1,r_2,\ldots,r_n) }[/math] is a root of [math]\displaystyle{ f }[/math]. The probability of this bad event is upper bounded by the following famous result due to Schwartz (1980) and Zippel (1979).
Schwartz-Zippel Theorem
Let [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] be a multivariate polynomial of degree [math]\displaystyle{ d }[/math] over a field [math]\displaystyle{ \mathbb{F} }[/math]. If [math]\displaystyle{ f\not\equiv 0 }[/math], then for any finite set [math]\displaystyle{ S\subset\mathbb{F} }[/math], and [math]\displaystyle{ r_1,r_2\ldots,r_n\in S }[/math] chosen uniformly and independently at random,
[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}. }[/math]

The Schwartz-Zippel Theorem states that for any nonzero [math]\displaystyle{ n }[/math]-variate polynomial of degree at most [math]\displaystyle{ d }[/math], the number of roots in any cube [math]\displaystyle{ S^n }[/math] is at most [math]\displaystyle{ d\cdot |S|^{n-1} }[/math].

Dana Moshkovitz gave a surprisingly simply and elegant proof of Schwartz-Zippel Theorem, using some advanced ideas. Now we introduce the standard proof by induction.

Proof.

The theorem is proved by induction on [math]\displaystyle{ n }[/math].

Induction basis:

For [math]\displaystyle{ n=1 }[/math], this is the univariate case. Assume that [math]\displaystyle{ f\not\equiv 0 }[/math]. Due to the fundamental theorem of algebra, any polynomial [math]\displaystyle{ f(x) }[/math] of degree at most [math]\displaystyle{ d }[/math] must have at most [math]\displaystyle{ d }[/math] roots, thus

[math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]
Induction hypothesis:

Assume the theorem holds for any [math]\displaystyle{ m }[/math]-variate polynomials for all [math]\displaystyle{ m\lt n }[/math].

Induction step:

For any [math]\displaystyle{ n }[/math]-variate polynomial [math]\displaystyle{ f(x_1,x_2,\ldots,x_n) }[/math] of degree at most [math]\displaystyle{ d }[/math], we write [math]\displaystyle{ f }[/math] as

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1}) }[/math],

where [math]\displaystyle{ k }[/math] is the highest degree of [math]\displaystyle{ x_n }[/math], which means the degree of [math]\displaystyle{ f_k }[/math] is at most [math]\displaystyle{ d-k }[/math] and [math]\displaystyle{ f_k\not\equiv 0 }[/math].

In particular, we write [math]\displaystyle{ f }[/math] as a sum of two parts:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n) }[/math],

where both [math]\displaystyle{ f_k }[/math] and [math]\displaystyle{ \bar{f} }[/math] are polynomials, such that

  • [math]\displaystyle{ f_k\not\equiv 0 }[/math] is as above, whose degree is at most [math]\displaystyle{ d-k }[/math];
  • [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1}) }[/math], thus [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^{k} }[/math] factor in any term.

By the law of total probability, we have

[math]\displaystyle{ \begin{align} &\Pr[f(r_1,r_2,\ldots,r_n)=0]\\ = &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\ &+\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})\neq0]. \end{align} }[/math]

Note that [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1}) }[/math] is a polynomial on [math]\displaystyle{ n-1 }[/math] variables of degree [math]\displaystyle{ d-k }[/math] such that [math]\displaystyle{ f_k\not\equiv 0 }[/math]. By the induction hypothesis, we have

[math]\displaystyle{ \begin{align} (*) &\qquad &\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. \end{align} }[/math]

Now we look at the case conditioning on [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math]. Recall that [math]\displaystyle{ \bar{f}(x_1,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^k }[/math] factor in any term, thus the condition [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math] guarantees that

[math]\displaystyle{ f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n) }[/math]

is a nonzero univariate polynomial of [math]\displaystyle{ x_n }[/math] such that the degree of [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }[/math] is [math]\displaystyle{ k }[/math] and [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math], for which we already known that the probability [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(r_n)=0 }[/math] is at most [math]\displaystyle{ \frac{k}{|S|} }[/math]. Therefore,

[math]\displaystyle{ \begin{align} (**) &\qquad &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|} \end{align} }[/math].

Substituting both [math]\displaystyle{ (*) }[/math] and [math]\displaystyle{ (**) }[/math] back in the total probability, we have

[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0] \le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}, }[/math]

which proves the theorem.

[math]\displaystyle{ \square }[/math]

Fingerprinting

The polynomial identity testing algorithm in the Schwartz-Zippel theorem can be abstracted as the following framework: Suppose we want to compare two objects [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math]. Instead of comparing them directly, we compute random fingerprints [math]\displaystyle{ \mathrm{FING}(X) }[/math] and [math]\displaystyle{ \mathrm{FING}(Y) }[/math] of them and compare the fingerprints.

The fingerprints has the following properties:

  • [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] is a function, meaning that if [math]\displaystyle{ X= Y }[/math] then [math]\displaystyle{ \mathrm{FING}(X)=\mathrm{FING}(Y) }[/math].
  • It is much easier to compute and compare the fingerprints.
  • Ideally, the domain of fingerprints is much smaller than the domain of original objects, so storing and comparing fingerprints are easy. This means the fingerprint function [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] cannot be an injection (one-to-one mapping), so it's possible that different [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] are mapped to the same fingerprint. We resolve this by making fingerprint function randomized, and for [math]\displaystyle{ X\neq Y }[/math], we want the probability [math]\displaystyle{ \Pr[\mathrm{FING}(X)=\mathrm{FING}(Y)] }[/math] to be small.

In Schwartz-Zippel theorem, the objects to compare are polynomials from [math]\displaystyle{ \mathbb{F}[x_1,\ldots,x_n] }[/math]. Given a polynomial [math]\displaystyle{ f\in \mathbb{F}[x_1,\ldots,x_n] }[/math], its fingerprint is computed as [math]\displaystyle{ \mathrm{FING}(f)=f(r_1,\ldots,r_n) }[/math] for [math]\displaystyle{ r_i }[/math] chosen independently and uniformly at random from some fixed set [math]\displaystyle{ S\subseteq\mathbb{F} }[/math].

With this generic framework, for various identity testing problems, we may design different fingerprints [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math].

Communication protocols for Equality by fingerprinting

Now consider again the communication model where the two players Alice with a private input [math]\displaystyle{ x\in\{0,1\}^n }[/math] and Bob with a private input [math]\displaystyle{ y\in\{0,1\}^n }[/math] together compute a function [math]\displaystyle{ f(x,y) }[/math] by running a communication protocol.

We still consider the communication protocols for the equality function EQ

[math]\displaystyle{ \mathrm{EQ}(x,y)= \begin{cases} 1& \mbox{if } x=y,\\ 0& \mbox{otherwise.} \end{cases} }[/math]

With the language of fingerprinting, this communication problem can be solved by the following generic scheme:

Communication protocol for EQ by fingerprinting

Bob does:

  • choose a random fingerprint function [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] and compute the fingerprint of her input [math]\displaystyle{ \mathrm{FING}(y) }[/math];
  • sends both the description of [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] and the value of [math]\displaystyle{ \mathrm{FING}(y) }[/math] to Alice;

Upon receiving the description of [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] and the value of [math]\displaystyle{ \mathrm{FING}(y) }[/math], Alice does:

  • computes [math]\displaystyle{ \mathrm{FING}(x) }[/math] and check whether [math]\displaystyle{ \mathrm{FING}(x)=\mathrm{FING}(y) }[/math].

In this way we have a randomized communication protocol for the equality function EQ with false positive. The communication cost as well as the error probability are reduced to the question of how to design this random fingerprint function [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] to guarantee:

  1. A random fingerprint function [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] can be described succinctly.
  2. The range of [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] is small, so the fingerprints are succinct.
  3. If [math]\displaystyle{ x\neq y }[/math], the probability [math]\displaystyle{ \Pr[\mathrm{FING}(x)=\mathrm{FING}(y)] }[/math] is small.

Fingerprinting by PIT

As before, we can define the fingerprint function as: for any bit-string [math]\displaystyle{ x\in\{0,1\}^n }[/math], its random fingerprint is [math]\displaystyle{ \mathrm{FING}(x)=\sum_{i=1}^n x_i r^{i} }[/math], where the additions and multiplications are defined over a finite field [math]\displaystyle{ \mathbb{Z}_p }[/math], and [math]\displaystyle{ r }[/math] is chosen uniformly at random from [math]\displaystyle{ \mathbb{Z}_p }[/math], where [math]\displaystyle{ p }[/math] is some suitable prime which can be represented in [math]\displaystyle{ \Theta(\log n) }[/math] bits. More specifically, we can choose [math]\displaystyle{ p }[/math] to be any prime from the interval [math]\displaystyle{ [n^2, 2n^2] }[/math]. Due to Chebyshev's theorem, such prime must exist.

As we have shown before, it takes [math]\displaystyle{ O(\log p)=O(\log n) }[/math] bits to represent [math]\displaystyle{ \mathrm{FING}(y) }[/math] and to describe the random function [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] (since it a random function [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] from this family is uniquely identified by a random [math]\displaystyle{ r\in\mathbb{Z}_p }[/math], which can be represented within [math]\displaystyle{ \log p=O(\log n) }[/math] bits). And it follows easily from the fundamental theorem of algebra that for any distinct [math]\displaystyle{ x, y\in\{0,1\}^n }[/math],

[math]\displaystyle{ \Pr[\mathrm{FING}(x)=\mathrm{FING}(y)] \le \frac{n-1}{p}\le \frac{1}{n}. }[/math]

Fingerprinting by randomized checksum

Now we consider a new fingerprint function: We treat each input string [math]\displaystyle{ x\in\{0,1\}^n }[/math] as the binary representation of a number, and let [math]\displaystyle{ \mathrm{FING}(x)=x\bmod p }[/math] for some random prime [math]\displaystyle{ p }[/math] chosen from [math]\displaystyle{ [k]=\{0,1,\ldots,k-1\} }[/math], for some [math]\displaystyle{ k }[/math] to be specified later.

Now a random fingerprint function [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] can be uniquely identified by this random prime [math]\displaystyle{ p }[/math]. The new communication protocol for EQ with this fingerprint is as follows:

Communication protocol for EQ by random checksum

Bob does:

for some parameter [math]\displaystyle{ k }[/math] (to be specified),
  • choose a prime [math]\displaystyle{ p\in[k] }[/math] uniformly at random;
  • send [math]\displaystyle{ p }[/math] and [math]\displaystyle{ x\bmod p }[/math] to Alice;

Upon receiving [math]\displaystyle{ p }[/math] and [math]\displaystyle{ x\bmod p }[/math], Alice does:

  • check whether [math]\displaystyle{ x\bmod p=y\bmod p }[/math].

The number of bits to be communicated is obviously [math]\displaystyle{ O(\log k) }[/math]. When [math]\displaystyle{ x\neq y }[/math], we want to upper bound the error probability [math]\displaystyle{ \Pr[x\bmod p=y\bmod p] }[/math].

Suppose without loss of generality [math]\displaystyle{ x\gt y }[/math]. Let [math]\displaystyle{ z=x-y }[/math]. Then [math]\displaystyle{ z\lt 2^n }[/math] since [math]\displaystyle{ x,y\in[2^n] }[/math], and [math]\displaystyle{ z\neq 0 }[/math] for [math]\displaystyle{ x\neq y }[/math]. It holds that [math]\displaystyle{ x\equiv y\pmod p }[/math] if and only if [math]\displaystyle{ p\mid z }[/math]. Therefore, we only need to upper bound the probability

[math]\displaystyle{ \Pr[z\bmod p=0] }[/math]

for an arbitrarily fixed [math]\displaystyle{ 0\lt z\lt 2^n }[/math], and a uniform random prime [math]\displaystyle{ p\in[k] }[/math].

The probability [math]\displaystyle{ \Pr[z\bmod p=0] }[/math] is computed directly as

[math]\displaystyle{ \Pr[z\bmod p=0]\le\frac{\mbox{the number of prime divisors of }z}{\mbox{the number of primes in }[k]} }[/math].

For the numerator, any positive [math]\displaystyle{ z\lt 2^n }[/math] has at most [math]\displaystyle{ n }[/math] prime factors. To see this, by contradiction assume that [math]\displaystyle{ z }[/math] has more than [math]\displaystyle{ n }[/math] prime factors. Note that any prime number is at least 2. Then [math]\displaystyle{ z }[/math] must be greater than [math]\displaystyle{ 2^n }[/math], contradicting the fact that [math]\displaystyle{ z\lt 2^n }[/math].

For the denominator, we need to lower bound the number of primes in [math]\displaystyle{ [k] }[/math]. This is given by the celebrated Prime Number Theorem (PNT).

Prime Number Theorem
Let [math]\displaystyle{ \pi(k) }[/math] denote the number of primes less than [math]\displaystyle{ k }[/math]. Then [math]\displaystyle{ \pi(k)\sim\frac{k}{\ln k} }[/math] as [math]\displaystyle{ k\rightarrow\infty }[/math].

Therefore, by choosing [math]\displaystyle{ k=2n^2\ln n }[/math], we have that for a [math]\displaystyle{ 0\lt z\lt 2^n }[/math], and a random prime [math]\displaystyle{ p\in[k] }[/math],

[math]\displaystyle{ \Pr[z\bmod p=0]\le\frac{n}{\pi(k)}\sim\frac{1}{n} }[/math],

which means the for any inputs [math]\displaystyle{ x,y\in\{0,1\}^n }[/math], if [math]\displaystyle{ x\neq y }[/math], then the false positive is bounded as

[math]\displaystyle{ \Pr[\mathrm{FING}(x)=\mathrm{FING}(y)]\le\Pr[|x-y|\bmod p=0]\le \frac{1}{n} }[/math].

Moreover, by this choice of parameter [math]\displaystyle{ k=2n^2\ln n }[/math], the communication complexity of the protocol is bounded by [math]\displaystyle{ O(\log k)=O(\log n) }[/math].

Checking distinctness

Consider the following problem of checking distinctness:

  • Given a sequence [math]\displaystyle{ x_1,x_2,\ldots,x_n\in\{1,2,\ldots,n\} }[/math], check whether every element of [math]\displaystyle{ \{1,2,\ldots,n\} }[/math] appears exactly once.

Obviously this problem can be solved in linear time and linear space (in addition to the space for storing the input) by maintaining a [math]\displaystyle{ n }[/math]-bit vector that indicates which numbers among [math]\displaystyle{ \{1,2,\ldots,n\} }[/math] have appeared.

When this [math]\displaystyle{ n }[/math] is enormously large, [math]\displaystyle{ \Omega(n) }[/math] space cost is too expensive. We wonder whether we could solve this problem with a space cost (in addition to the space for storing the input) much less than [math]\displaystyle{ O(n) }[/math]. This can be done by fingerprinting if we tolerate a certain degree of inaccuracy.

Fingerprinting multisets

We consider the following more generalized problem, checking identity of multisets:

  • Input: two multisets [math]\displaystyle{ A=\{a_1,a_2,\ldots, a_n\} }[/math] and [math]\displaystyle{ B=\{b_1,b_2,\ldots, b_n\} }[/math] where [math]\displaystyle{ a_1,a_2,\ldots,b_1,b_2,\ldots,b_n\in \{1,2,\ldots,n\} }[/math].
  • Determine whether [math]\displaystyle{ A=B }[/math] (multiset equivalence).

Here for a multiset [math]\displaystyle{ A=\{a_1,a_2,\ldots, a_n\} }[/math], its elements [math]\displaystyle{ a_i }[/math] are not necessarily distinct. The multiplicity of an element [math]\displaystyle{ a_i }[/math] in a multiset [math]\displaystyle{ A }[/math] is the number of times [math]\displaystyle{ a_i }[/math] appears in [math]\displaystyle{ A }[/math]. Two multisets [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math] are equivalent if they contain the same set of elements and the multiplicities of every element in [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math] are equal.

Obviously the above problem of checking distinctness can be treated as a special case of checking identity of multisets: by checking the identity of the multiset [math]\displaystyle{ A }[/math] and set [math]\displaystyle{ \{1,2,\ldots, n\} }[/math].

The following fingerprinting function for multisets was introduced by Lipton for solving multiset identity testing.

Fingerprint for multiset
Let [math]\displaystyle{ p }[/math] be a uniform random prime chosen from the interval [math]\displaystyle{ [(n\log n)^2,2(n\log n)^2] }[/math]. By Chebyshev's theorem, such prime must exist. And consider the the finite field [math]\displaystyle{ \mathbb{Z}_p=[p] }[/math].
Given a multiset [math]\displaystyle{ A=\{a_1,a_2,\ldots,a_n\} }[/math], we define a univariate polynomial [math]\displaystyle{ f_A\in\mathbb{Z}_p[x] }[/math] over the finite field [math]\displaystyle{ \mathbb{Z}_p }[/math] as follows:
[math]\displaystyle{ f_A(x)=\prod_{i=1}^n(x-a_i) }[/math],
where [math]\displaystyle{ + }[/math] and [math]\displaystyle{ \cdot }[/math] are defined over the finite field [math]\displaystyle{ \mathbb{Z}_p }[/math].
We then define the random fingerprinting function as:
[math]\displaystyle{ \mathrm{FING}(A)=f_A(r)=\prod_{i=1}^n(r-a_i) }[/math],
where [math]\displaystyle{ r }[/math] is chosen uniformly at random from [math]\displaystyle{ \mathbb{Z}_p }[/math].

Since all computations of [math]\displaystyle{ \mathrm{FING}(A)=\prod_{i=1}^n(r-a_i) }[/math] are over the finite field [math]\displaystyle{ \mathbb{Z}_p }[/math], the space cost for computing the fingerprint [math]\displaystyle{ \mathrm{FING}(A) }[/math] is only [math]\displaystyle{ O(\log p)=O(\log n) }[/math].

Moreover, the fingerprinting function [math]\displaystyle{ \mathrm{FING}(A) }[/math] is invariant under permutation of elements of the multiset [math]\displaystyle{ A=\{a_1,a_2,\ldots,a_n\} }[/math], thus it is indeed a function of multisets (meaning every multiset has only one fingerprint). Therefore, if [math]\displaystyle{ A=B }[/math] then [math]\displaystyle{ \mathrm{FING}(A)=\mathrm{FING}(B) }[/math].

For two distinct multisets [math]\displaystyle{ A\neq B }[/math], it is possible that [math]\displaystyle{ \mathrm{FING}(A)=\mathrm{FING}(B) }[/math], but the following theorem due to Lipton bounds this error probability of false positive.

Theorem (Lipton 1989)
Let [math]\displaystyle{ A=\{a_1,a_2,\ldots,a_n\} }[/math] and [math]\displaystyle{ B=\{b_1,b_2,\ldots,b_n\} }[/math] be two multisets whose elements are from [math]\displaystyle{ \{1,2,\ldots,n\} }[/math]. If [math]\displaystyle{ A\neq B }[/math], then
[math]\displaystyle{ \Pr[\mathrm{FING}(A)= \mathrm{FING}(B)]=O\left(\frac{1}{n}\right) }[/math].
Proof.

Let [math]\displaystyle{ \tilde{f}_A(x)=\prod_{i=1}^n(x-a_i) }[/math] and [math]\displaystyle{ \tilde{f}_B(x)=\prod_{i=1}^n(x-b_i) }[/math] be two univariate polynomials defined over reals [math]\displaystyle{ \mathbb{R} }[/math]. Note that in contrast to [math]\displaystyle{ f_A(x) }[/math] and [math]\displaystyle{ f_B(x) }[/math], the [math]\displaystyle{ + }[/math] and [math]\displaystyle{ \cdot }[/math] in [math]\displaystyle{ \tilde{f}_A(x), \tilde{f}_B(x) }[/math] do not modulo [math]\displaystyle{ p }[/math]. It is easy to verify that the polynomials [math]\displaystyle{ \tilde{f}_A(x), \tilde{f}_B(x) }[/math] have the following properties:

  • [math]\displaystyle{ \tilde{f}_A\equiv \tilde{f}_B }[/math] if and only if [math]\displaystyle{ A=B }[/math]. Here [math]\displaystyle{ A=B }[/math] means the multiset equivalence.
  • By the properties of finite field, for any value [math]\displaystyle{ r\in\mathbb{Z}_p }[/math], it holds that [math]\displaystyle{ f_A(r)=\tilde{f}_A(r)\bmod p }[/math] and [math]\displaystyle{ f_B(r)=\tilde{f}_B(r)\bmod p }[/math].

Therefore, assuming that [math]\displaystyle{ A\neq B }[/math], we must have [math]\displaystyle{ \tilde{f}_A(x)\not\equiv \tilde{f}_B(x) }[/math]. Then by the law of total probability:

[math]\displaystyle{ \begin{align} \Pr[\mathrm{FING}(A)= \mathrm{FING}(B)] &= \Pr\left[f_A(r)=f_B(r)\mid f_A\not\equiv f_B\right]\Pr[f_A\not\equiv f_B]\\ &\quad\,\,+\Pr\left[f_A(r)=f_B(r)\mid f_A\equiv f_B\right]\Pr[f_A\equiv f_B]\\ &\le \Pr\left[f_A(r)=f_B(r)\mid f_A\not\equiv f_B\right]+\Pr[f_A\equiv f_B]. \end{align} }[/math]

Note that the degrees of [math]\displaystyle{ f_A,f_B }[/math] are at most [math]\displaystyle{ n }[/math] and [math]\displaystyle{ r }[/math] is chosen uniformly from [math]\displaystyle{ [p] }[/math]. By the Schwartz-Zippel theorem for univariate polynomials, the first probability

[math]\displaystyle{ \Pr\left[f_A(r)=f_B(r)\mid f_A\not\equiv f_B\right]\le \frac{n}{p}=o\left(\frac{1}{n}\right), }[/math]

since [math]\displaystyle{ p }[/math] is chosen from the interval [math]\displaystyle{ [(n\log n)^2,2(n\log n)^2] }[/math].

For the second probability [math]\displaystyle{ \Pr[f_A\equiv f_B] }[/math], recall that [math]\displaystyle{ \tilde{f}_A\not\equiv \tilde{f}_B }[/math], therefore there is at least a non-zero coefficient [math]\displaystyle{ c\le n^n }[/math] in [math]\displaystyle{ \tilde{f}_A-\tilde{f}_B }[/math]. The event [math]\displaystyle{ f_A\equiv f_B }[/math] occurs only if [math]\displaystyle{ c\bmod p=0 }[/math], which means

[math]\displaystyle{ \begin{align} \Pr[f_A\equiv f_B] &\le \Pr[c\bmod p=0]\\ &=\frac{\text{number of prime factors of }c}{\text{number of primes in }[(n\log n)^2,2(n\log n)^2]}\\ &\le \frac{n\log_2n}{\pi(2(n\log n)^2)-\pi((n\log n)^2)}. \end{align} }[/math]

By the prime number theorem, [math]\displaystyle{ \pi(N)\rightarrow \frac{N}{\ln N} }[/math] as [math]\displaystyle{ N\to\infty }[/math]. Therefore,

[math]\displaystyle{ \Pr[f_A\equiv f_B]=O\left(\frac{n\log n}{n^2\log n}\right)=O\left(\frac{1}{n}\right). }[/math]

Combining everything together, we have [math]\displaystyle{ \Pr[\mathrm{FING}(A)= \mathrm{FING}(B)]=O\left(\frac{1}{n}\right) }[/math].

[math]\displaystyle{ \square }[/math]