随机算法 (Fall 2016)/Fingerprinting
Polynomial Identity Testing (PIT)
The Polynomial Identity Testing (PIT) is such a problem: given as input two polynomials, determine whether two polynomials are identical. This problem plays a fundamental role in Computer Science.
First let's consider the following simplified version of Polynomial Identity Testing (PIT) which takes only the single-variate polynomials:
- Input: two polynomials [math]\displaystyle{ P_1, P_2\in\mathbb{F}[x] }[/math] of degree [math]\displaystyle{ d }[/math].
- Output: "yes" if two polynomials are identical, i.e. [math]\displaystyle{ P_1\equiv P_2 }[/math], and "no" if otherwise.
The [math]\displaystyle{ \mathbb{F}[x] }[/math] denote the ring of polynomials over field [math]\displaystyle{ \mathbb{F} }[/math].
Alternatively, we can consider the following equivalent problem:
- Input: a polynomial [math]\displaystyle{ P\in\mathbb{F}[x] }[/math] of degree [math]\displaystyle{ d }[/math].
- Output: "yes" if [math]\displaystyle{ P\equiv 0 }[/math], and "no" if otherwise.
The probalem is trivial if [math]\displaystyle{ P }[/math] is presented in its explicit form [math]\displaystyle{ P(x)=\sum_{i=0}^d a_ix^i }[/math]. But we assume that [math]\displaystyle{ P }[/math] is given in product form or as black box.
A straightforward deterministic algorithm that solves PIT is to query [math]\displaystyle{ d+1 }[/math] points [math]\displaystyle{ P(1),P(2),\ldots,P(d+1) }[/math] and check whether thay are all zero. This can determine whether [math]\displaystyle{ P\equiv 0 }[/math] by interpolation.
We now introduce a simple randomized algorithm for the problem.
Algorithm for PIT - pick [math]\displaystyle{ x\in\{1,2,\ldots,2d\} }[/math] uniformly at random;
- if [math]\displaystyle{ P(x) = 0 }[/math] then return “yes” else return “no”;
This algorithm requires only the evaluation of [math]\displaystyle{ P }[/math] at a single point. And if [math]\displaystyle{ P\equiv 0 }[/math] it is always correct. And if [math]\displaystyle{ P\not\equiv 0 }[/math] then the probability that the algorithm wrongly returns "yes" is bounded as follows.
Theorem - Let [math]\displaystyle{ P\in\mathbb{F}[x] }[/math] be a polynomial of degree [math]\displaystyle{ d }[/math] over the field [math]\displaystyle{ \mathbb{F} }[/math]. Let [math]\displaystyle{ S\subset\mathbb{F} }[/math] be an arbitrary set and [math]\displaystyle{ x\in S }[/math] is chosen uniformly at random from [math]\displaystyle{ S }[/math]. If [math]\displaystyle{ P\not\equiv 0 }[/math] then
- [math]\displaystyle{ \Pr[P(x)=0]\le\frac{d}{|S|}. }[/math]
- Let [math]\displaystyle{ P\in\mathbb{F}[x] }[/math] be a polynomial of degree [math]\displaystyle{ d }[/math] over the field [math]\displaystyle{ \mathbb{F} }[/math]. Let [math]\displaystyle{ S\subset\mathbb{F} }[/math] be an arbitrary set and [math]\displaystyle{ x\in S }[/math] is chosen uniformly at random from [math]\displaystyle{ S }[/math]. If [math]\displaystyle{ P\not\equiv 0 }[/math] then
Proof. A non-zero [math]\displaystyle{ d }[/math]-degree polynomial [math]\displaystyle{ P }[/math] has at most [math]\displaystyle{ d }[/math] distinct roots, thus at most [math]\displaystyle{ d }[/math] members [math]\displaystyle{ x }[/math] of [math]\displaystyle{ S }[/math] satisfy that [math]\displaystyle{ P(x)=0 }[/math]. Therefore, [math]\displaystyle{ \Pr[P(x)=0]\le\frac{d}{|S|} }[/math].
- [math]\displaystyle{ \square }[/math]
By the theorem, the algorithm can distinguish a non-zero polynomial from 0 with probability at least [math]\displaystyle{ 1/2 }[/math]. This is achieved by evaluation of the polynomial at only one point and [math]\displaystyle{ 1+\log_2 d }[/math] many random bits.
Communication Complexity of Equality
The communication complexity is introduced by Andrew Chi-Chih Yao as a model of computation which involves multiple participants, each with partial information of the input.
Assume that there are two entities, say Alice and Bob. Alice has a private input [math]\displaystyle{ a }[/math] and Bob has a private input [math]\displaystyle{ b }[/math]. Together they want to compute a function [math]\displaystyle{ f(a,b) }[/math] by communicating with each other. The communication follows a predefined communication protocol (the "algorithm" in this model) whose logics depends only on the problem [math]\displaystyle{ f }[/math] but not on the inputs. The complexity of a communication protocol is measured by the number of bits communicated between Alice and Bob in the worst case.
The problem of checking identity is formally defined by the function EQ as follows: [math]\displaystyle{ \mathrm{EQ}:\{0,1\}^n\times\{0,1\}^n\rightarrow\{0,1\} }[/math] and for any [math]\displaystyle{ a,b\in\{0,1\}^n }[/math],
- [math]\displaystyle{ \mathrm{EQ}(a,b)= \begin{cases} 1& \mbox{if } a=b,\\ 0& \mbox{otherwise.} \end{cases} }[/math]
A trivial way to solve EQ is to let Bob send his entire input string [math]\displaystyle{ b }[/math] to Alice and let Alice check whether [math]\displaystyle{ a=b }[/math]. This costs [math]\displaystyle{ n }[/math] bits of communications.
It is known that for deterministic communication protocols, this is the best we can get for computing EQ.
Theorem (Yao 1979) - Any deterministic communication protocol computing EQ on two [math]\displaystyle{ n }[/math]-bit strings costs [math]\displaystyle{ n }[/math] bits of communication in the worst-case.
This theorem is much more nontrivial to prove than it looks, because Alice and Bob are allowed to interact with each other in arbitrary ways. The proof of this theorem in Yao's 1979 paper initiates the field of communication complexity.
If the randomness is allowed, we can solve this problem up to a tolerable probabilistic error with significantly less communications. The inputs [math]\displaystyle{ a,b\in\{0,1\}^{n} }[/math] are two strings [math]\displaystyle{ a=a_0a_1\cdots a_{n-1}, b=b_0b_1\cdots b_{n-1} }[/math] of [math]\displaystyle{ n }[/math] bits. Let [math]\displaystyle{ k=\lceil\log_2 (2n)\rceil }[/math] and [math]\displaystyle{ p\in[2^k,2^{k+1}] }[/math] be an arbitrary prime number. (Such a prime [math]\displaystyle{ p }[/math] always exists.) The input strings [math]\displaystyle{ a,b }[/math] can be respectively represented as two polynomials [math]\displaystyle{ f,g\in\mathbb{Z}_p[x] }[/math] such that [math]\displaystyle{ f(x)=\sum_{i=0}^{n-1}a_ix^{i} }[/math] and [math]\displaystyle{ g(x)=\sum_{i=0}^{n-1}b_ix^{i} }[/math] of degree [math]\displaystyle{ n-1 }[/math], where all additions and multiplications are modulo [math]\displaystyle{ p }[/math]. The randomized communication protocol is given as follows:
A randomized protocol for EQ Alice does:
- pick [math]\displaystyle{ r\in[p] }[/math] uniformly at random;
- send [math]\displaystyle{ r }[/math] and [math]\displaystyle{ f(r) }[/math] to Bob;
Upon receiving [math]\displaystyle{ r }[/math] and [math]\displaystyle{ f(r) }[/math] Bob does:
- If [math]\displaystyle{ f(r)= g(r) }[/math] return "yes"; else return "no".
Repeat this protocol for 100 times. The total number of bits to communicate is bounded by [math]\displaystyle{ 200\log_2p=O(\log n) }[/math]. Due to the analysis of the randomized algorithm for PIT, if [math]\displaystyle{ a=b }[/math] the protocol is always correct and if [math]\displaystyle{ a\neq b }[/math] the protocol fails to report a difference with probability less than [math]\displaystyle{ 2^{-100} }[/math].
Schwartz-Zippel Theorem
Now let's move on to the the true form of Polynomial Identity Testing (PIT) which works on multi-variate polynomials:
- Input: two [math]\displaystyle{ n }[/math]-variate polynomials [math]\displaystyle{ f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
- Output: "yes" if [math]\displaystyle{ f\equiv g }[/math], and "no" if otherwise.
The [math]\displaystyle{ \mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] is the ring of multi-variate polynomials over field [math]\displaystyle{ \mathbb{F} }[/math]. The most natural way to represent an [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math] is to write it as a sum of monomials:
- [math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math].
The degree or total degree of a monomial [math]\displaystyle{ a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math] is given by [math]\displaystyle{ i_1+i_2+\cdots+i_n }[/math] and the degree of a polynomial [math]\displaystyle{ f }[/math] is the maximum degree of monomials of nonzero coefficients.
Alternatively, we can consider the following equivalent problem:
- Input: a polynomial [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
- Output: "yes" if [math]\displaystyle{ f\equiv 0 }[/math], and "no" if otherwise.
If [math]\displaystyle{ f }[/math] is written explicitly as a sum of monomials, then the problem is trivial. Again we allow [math]\displaystyle{ f }[/math] to be represented in product form.
Example The Vandermonde matrix [math]\displaystyle{ M=M(x_1,x_2,\ldots,x_n) }[/math] is defined as that [math]\displaystyle{ M_{ij}=x_i^{j-1} }[/math], that is
- [math]\displaystyle{ M=\begin{bmatrix} 1 & x_1 & x_1^2 & \dots & x_1^{n-1}\\ 1 & x_2 & x_2^2 & \dots & x_2^{n-1}\\ 1 & x_3 & x_3^2 & \dots & x_3^{n-1}\\ \vdots & \vdots & \vdots & \ddots &\vdots \\ 1 & x_n & x_n^2 & \dots & x_n^{n-1} \end{bmatrix} }[/math].
Let [math]\displaystyle{ f }[/math] be the polynomial defined as
- [math]\displaystyle{ f(x_1,\ldots,x_n)=\det(M)=\prod_{j\lt i}(x_i-x_j). }[/math]
It is pretty easy to evaluate [math]\displaystyle{ f(x_1,x_2,\ldots,x_n) }[/math] on any particular [math]\displaystyle{ x_1,x_2,\ldots,x_n }[/math], however it is prohibitively expensive to symbolically expand [math]\displaystyle{ f(x_1,\ldots,x_n) }[/math] to its sum-of-monomial form.
Here is a very simple randomized algorithm, due to Schwartz and Zippel.
Randomized algorithm for multi-variate PIT - fix an arbitrary set [math]\displaystyle{ S\subseteq \mathbb{F} }[/math] whose size to be fixed;
- pick [math]\displaystyle{ r_1,r_2,\ldots,r_n\in S }[/math] uniformly and independently at random;
- if [math]\displaystyle{ f(\vec{r})=f(r_1,r_2,\ldots,r_n) = 0 }[/math] then return “yes” else return “no”;
This algorithm requires only the evaluation of [math]\displaystyle{ f }[/math] at a single point [math]\displaystyle{ \vec{r} }[/math]. And if [math]\displaystyle{ f\equiv 0 }[/math] it is always correct.
In the Theorem below, we’ll see that if [math]\displaystyle{ f\not\equiv 0 }[/math] then the algorithm is incorrect with probability at most [math]\displaystyle{ \frac{d}{|S|} }[/math], where [math]\displaystyle{ d }[/math] is the degree of the polynomial [math]\displaystyle{ f }[/math].
Schwartz-Zippel Theorem - Let [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] be a multivariate polynomial of degree [math]\displaystyle{ d }[/math] over a field [math]\displaystyle{ \mathbb{F} }[/math] such that [math]\displaystyle{ f\not\equiv 0 }[/math]. Fix any finite set [math]\displaystyle{ S\subset\mathbb{F} }[/math], and let [math]\displaystyle{ r_1,r_2\ldots,r_n }[/math] be chosen uniformly and independently at random from [math]\displaystyle{ S }[/math]. Then
- [math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}. }[/math]
- Let [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] be a multivariate polynomial of degree [math]\displaystyle{ d }[/math] over a field [math]\displaystyle{ \mathbb{F} }[/math] such that [math]\displaystyle{ f\not\equiv 0 }[/math]. Fix any finite set [math]\displaystyle{ S\subset\mathbb{F} }[/math], and let [math]\displaystyle{ r_1,r_2\ldots,r_n }[/math] be chosen uniformly and independently at random from [math]\displaystyle{ S }[/math]. Then
Proof. We prove by induction on [math]\displaystyle{ n }[/math] the number of variables.
For [math]\displaystyle{ n=1 }[/math], assuming that [math]\displaystyle{ f\not\equiv 0 }[/math], due to the fundamental theorem of algebra, the degree-[math]\displaystyle{ d }[/math] polynomial [math]\displaystyle{ f(x) }[/math] has at most [math]\displaystyle{ d }[/math] roots, thus
- [math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]
Assume the induction hypothesis for a multi-variate polynomial up to [math]\displaystyle{ n-1 }[/math] variable.
An [math]\displaystyle{ n }[/math]-variate polynomial [math]\displaystyle{ f(x_1,x_2,\ldots,x_n) }[/math] can be represented as
- [math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1}) }[/math],
where [math]\displaystyle{ k }[/math] is the largest power of [math]\displaystyle{ x_n }[/math], which means that the degree of [math]\displaystyle{ f_k }[/math] is at most [math]\displaystyle{ d-k }[/math] and [math]\displaystyle{ f_k\not\equiv 0 }[/math].
In particular, we write [math]\displaystyle{ f }[/math] as a sum of two parts:
- [math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n) }[/math],
where both [math]\displaystyle{ f_k }[/math] and [math]\displaystyle{ \bar{f} }[/math] are polynomials, such that
- as argued above, the degree of [math]\displaystyle{ f_k }[/math] is at most [math]\displaystyle{ d-k }[/math] and [math]\displaystyle{ f_k\not\equiv 0 }[/math];
- [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1}) }[/math], thus [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^{k} }[/math] factor in any term.
By the law of total probability, it holds that
- [math]\displaystyle{ \begin{align} &\Pr[f(r_1,r_2,\ldots,r_n)=0]\\ = &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\ &+\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})\neq0]. \end{align} }[/math]
Note that [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1}) }[/math] is a polynomial on [math]\displaystyle{ n-1 }[/math] variables of degree [math]\displaystyle{ d-k }[/math] such that [math]\displaystyle{ f_k\not\equiv 0 }[/math]. By the induction hypothesis, we have
- [math]\displaystyle{ \begin{align} (*) &\qquad &\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. \end{align} }[/math]
For the second case, recall that [math]\displaystyle{ \bar{f}(x_1,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^k }[/math] factor in any term, thus the condition [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math] guarantees that
- [math]\displaystyle{ f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n) }[/math]
is a single-variate polynomial such that the degree of [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }[/math] is [math]\displaystyle{ k }[/math] and [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math], for which we already known that the probability [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(r_n)=0 }[/math] is at most [math]\displaystyle{ \frac{k}{|S|} }[/math]. Therefore,
- [math]\displaystyle{ \begin{align} (**) &\qquad &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|} \end{align} }[/math].
Substituting both [math]\displaystyle{ (*) }[/math] and [math]\displaystyle{ (**) }[/math] back in the total probability, we have
- [math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0] \le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}, }[/math]
which proves the theorem.
In above proof, for the second case that [math]\displaystyle{ f_k(r_1,\ldots,r_{n-1})\neq 0 }[/math], we use an "probabilistic arguement" to deal with the random choices in the condition. Here we give a more rigorous proof by enumerating all elementary events in applying the law of total probability. You make your own judgement which proof is better.
By the law of total probability,
- [math]\displaystyle{ \begin{align} &\Pr[f(\vec{r})=0]\\ = &\sum_{x_1,\ldots,x_{n-1}\in S}\Pr[f(\vec{r})=0\mid \forall i\lt n, r_i=x_i]\cdot\Pr[\forall i\lt n, r_i=x_i]\\ = &\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})=0}\Pr[f(\vec{r})=0\mid \forall i\lt n, r_i=x_i]\cdot\Pr[\forall i\lt n, r_i=x_i]\\ &+\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq0}\Pr[f(\vec{r})=0\mid \forall i\lt n, r_i=x_i]\cdot\Pr[\forall i\lt n, r_i=x_i]\\ \le &\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})=0}\Pr[\forall i\lt n, r_i=x_i]\\ &+\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq 0}\Pr[f(x_1,\ldots,x_{n-1},r_n)=0\mid \forall i\lt n, r_i=x_i]\cdot\Pr[\forall i\lt n, r_i=x_i]\\ = &\Pr[f_k(r_1,\ldots,r_{n-1})=0]+\sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq 0}\Pr[f(x_1,\ldots,x_{n-1},r_n)=0]\cdot\Pr[\forall i\lt n, r_i=x_i]. \end{align} }[/math]
We have argued that [math]\displaystyle{ f_k\not\equiv 0 }[/math] and the degree of [math]\displaystyle{ f_k }[/math] is [math]\displaystyle{ d-k }[/math]. By the induction hypothesis, we have
- [math]\displaystyle{ \Pr[f_k(r_1,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. }[/math]
And for every fixed [math]\displaystyle{ x_1,\ldots,x_{n-1}\in S }[/math] such that [math]\displaystyle{ f_k(x_1,\ldots,x_{n-1})\neq 0 }[/math], we have argued that [math]\displaystyle{ f(x_1,\ldots,x_{n-1},x_n) }[/math] is a polynomial in [math]\displaystyle{ x_n }[/math] of degree [math]\displaystyle{ k }[/math], thus
- [math]\displaystyle{ \Pr[f(x_1,\ldots,x_{n-1},r_n)=0]\le\frac{k}{|S|}, }[/math]
which holds for all [math]\displaystyle{ x_1,\ldots,x_{n-1}\in S }[/math] such that [math]\displaystyle{ f_k(x_1,\ldots,x_{n-1})\neq 0 }[/math], therefore the weighted average
- [math]\displaystyle{ \sum_{x_1,\ldots,x_{n-1}\in S\atop f_k(x_1,\ldots,x_{n-1})\neq 0}\Pr[f(x_1,\ldots,x_{n-1},r_n)=0]\cdot\Pr[\forall i\lt n, r_i=x_i] \le\frac{k}{|S|}. }[/math]
Substituting these inequalities back to the total probability, we have [math]\displaystyle{ \Pr[f(\vec{r})=0] \le\frac{d-k}{|S|}+\frac{k}{|S|} =\frac{d}{|S|}. }[/math]
- [math]\displaystyle{ \square }[/math]