概率论与数理统计 (Spring 2023)/Polynomial identity testing: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
 
(28 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Polynomial Identity Testing (PIT) =
In the '''Polynomial Identity Testing (PIT)''' problem, we are given two polynomials, and we want to determine whether they are identical.
The  '''Polynomial Identity Testing (PIT)''' is such a problem: given as input two polynomials, determine whether they are identical. It plays a fundamental role in ''Identity Testing'' problems.
{{Theorem|Polynomial Identity Testing|
 
First, let's consider the univariate ("one variable") case:
* '''Input:''' two polynomials <math>f, g\in\mathbb{F}[x]</math> of degree <math>d</math>.
* Determine whether <math>f\equiv g</math> (<math>f</math> and <math>g</math> are identical).
Here the <math>\mathbb{F}[x]</math> denotes the [http://en.wikipedia.org/wiki/Polynomial_ring ring of univariate polynomials] on a field <math>\mathbb{F}</math>. More precisely, a polynomial <math>f\in\mathbb{F}[x]</math> is
:<math>f(x)=\sum_{i=0}^\infty a_ix^i</math>,
where the coefficients <math>a_i</math> are taken from the field <math>\mathbb{F}</math>, and the addition and multiplication are also defined over the field <math>\mathbb{F}</math>.
And:
* the '''degree''' of <math>f</math> is the highest <math>i</math> with non-zero <math>a_i</math>;
* a polynomial <math>f</math> is a '''zero-polynomial''', denoted as <math>f\equiv 0</math>, if all coefficients <math>a_i=0</math>.
 
Alternatively, we can consider the following equivalent problem by comparing the polynomial <math>f-g</math> (whose degree is at most <math>d</math>) with the zero-polynomial:
* '''Input:''' a polynomial <math>f\in\mathbb{F}[x]</math> of degree <math>d</math>.
* Determine whether <math>f\equiv 0</math> (<math>f</math> is the 0 polynomial).
 
The problem is trivial if the input polynomial <math>f</math> is given explicitly: one can trivially solve the problem by checking whether all <math>d+1</math> coefficients are <math>0</math>. To make the problem nontrivial, we assume that the input polynomial is given implicitly as a ''black box'' (also called an ''oracle''): the only way the algorithm can access to <math>f</math> is to evaluate <math>f(x)</math> over some <math>x</math> from the field <math>\mathbb{F}</math>, <math>x</math> chosen by the algorithm.
 
A straightforward deterministic algorithm is to evaluate <math>f(x_1),f(x_2),\ldots,f(x_{d+1})</math> over <math>d+1</math> ''distinct'' elements <math>x_1,x_2,\ldots,x_{d+1}</math> from the field <math>\mathbb{F}</math> and check whether they are all zero. By the [https://en.wikipedia.org/wiki/Fundamental_theorem_of_algebra fundamental theorem of algebra], also known as polynomial interpolations, this guarantees to verify whether a degree-<math>d</math> univariate polynomial <math>f\equiv 0</math>.
{{Theorem|Fundamental Theorem of Algebra|
:Any non-zero univariate polynomial of degree <math>d</math> has at most <math>d</math> roots.
}}
The reason for this fundamental theorem holding generally over any field <math>\mathbb{F}</math> is that any univariate polynomial of degree <math>d</math> factors uniquely into at most <math>d</math> irreducible polynomials, each of which has at most one root.
 
The following simple randomized algorithm is natural:
{{Theorem|Algorithm for PIT|
*suppose we have a finite subset <math>S\subseteq\mathbb{F}</math> (to be specified later);
*pick <math>r\in S</math> ''uniformly'' at random;
*if <math>f(r) = 0</math> then return “yes” else return “no”;
}}
 
This algorithm evaluates <math>f</math> at one point chosen uniformly at random from a finite subset <math>S\subseteq\mathbb{F}</math>. It is easy to see the followings:
* If <math>f\equiv 0</math>, the algorithm always returns "yes", so it is always correct.
* If <math>f\not\equiv 0</math>, the algorithm may wrongly return "yes" (a '''false positive'''). But this happens only when the random <math>r</math> is a root of <math>f</math>. By the fundamental theorem of algebra, <math>f</math> has at most <math>d</math> roots, so the probability that the algorithm is wrong is bounded as
::<math>\Pr[f(r)=0]\le\frac{d}{|S|}.</math>
 
By fixing <math>S\subseteq\mathbb{F}</math> to be an arbitrary subset of size <math>|S|=2d</math>, this probability of false positive is at most <math>1/2</math>. We can reduce it to an arbitrarily small constant <math>\delta</math> by repeat the above testing ''independently'' for <math>\log_2 \frac{1}{\delta}</math> times, since the error probability decays geometrically as we repeat the algorithm independently.
 
== Schwartz-Zippel Theorem ==
Now let's see the the true form of '''Polynomial Identity Testing (PIT)''', for multivariate polynomials:
* '''Input:''' two <math>n</math>-variate polynomials <math>f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> of degree <math>d</math>.
* '''Input:''' two <math>n</math>-variate polynomials <math>f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> of degree <math>d</math>.
* Determine whether <math>f\equiv g</math>.
* Determine whether <math>f\equiv g</math>.
The <math>\mathbb{F}[x_1,x_2,\ldots,x_n]</math> is the [http://en.wikipedia.org/wiki/Polynomial_ring#The_polynomial_ring_in_several_variables ring of multivariate polynomials] over field <math>\mathbb{F}</math>. An <math>n</math>-variate polynomial of degree <math>d</math>, written as a sum of monomials, is:
}}
The <math>\mathbb{F}[x_1,x_2,\ldots,x_n]</math> denotes the [http://en.wikipedia.org/wiki/Polynomial_ring#The_polynomial_ring_in_several_variables ring of multivariate polynomials] over field <math>\mathbb{F}</math>. An <math>n</math>-variate polynomial of degree <math>d</math>, written as a sum of monomials, is:
:<math>f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n}</math>.
:<math>f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n}</math>.
The '''degree''' or '''total degree''' of a monomial <math>a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n}</math> is given by <math>i_1+i_2+\cdots+i_n</math> and the degree of a polynomial <math>f</math> is the maximum degree of monomials of nonzero coefficients.
The '''degree''' or '''total degree''' of a monomial <math>a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n}</math> is given by <math>i_1+i_2+\cdots+i_n</math> and the degree of a polynomial <math>f</math> is the maximum degree of monomials of nonzero coefficients.


As before, we also consider the following equivalent problem:
This problem is equivalent to the following problem of checking whether <math>f-g</math> (which is a polynomial of degree at most <math>d</math>) is the zero-polynomial.
{{Theorem|Polynomial Identity Testing (equivalent version)|
* '''Input:''' a polynomial <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> of degree <math>d</math>.
* '''Input:''' a polynomial <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> of degree <math>d</math>.
* Determine whether <math>f\equiv 0</math>.
* Determine whether <math>f\equiv 0</math>.
}}


If <math>f</math> is written explicitly as a sum of monomials, then the problem can be solved by checking whether all coefficients, and there at most <math>{n+d\choose d}\le (n+d)^{d}</math> coefficients in an <math>n</math>-variate polynomial of degree at most <math>d</math>.  
The problem seems to be solvable by checking whether all coefficients of monomials in <math>f</math> are zeros. And there are at most <math>(n+d)^{d}</math> coefficients since <math>f</math> is an <math>n</math>-variate polynomial of degree at most <math>d</math>. However, this relies on that all coefficients of <math>f</math> are explicitly given, which may not always be the case.


A multivariate polynomial <math>f</math> can also be presented in its '''product form''', for example:
For example, a multivariate polynomial <math>f</math> may be expressed in a '''product form'''.
{{Theorem|Example|
{{Theorem|Example|
The [http://en.wikipedia.org/wiki/Vandermonde_matrix Vandermonde matrix] <math>M=M(x_1,x_2,\ldots,x_n)</math> is defined as that <math>M_{ij}=x_i^{j-1}</math>, that is
The [http://en.wikipedia.org/wiki/Vandermonde_matrix Vandermonde matrix] <math>M=M(x_1,x_2,\ldots,x_n)</math> is defined as that <math>M_{ij}=x_i^{j-1}</math>, that is
Line 67: Line 31:
</math>
</math>
}}
}}
For polynomials in product form, it is quite efficient to '''evaluate''' the polynomial at any specific point from the field over which the polynomial is defined, however, '''expanding''' the polynomial to a sum of monomials can be very expensive.
For polynomials <math>f</math> in product form, it is quite possible that '''evaluating''' the polynomial  <math>f(x_1,\ldots,x_n)</math> at any <math>(x_1,\ldots,x_n)\in\mathbb{F}^n</math> is always efficient but symbolically '''expanding''' the polynomial to calculate the coefficients may be very expensive.
 
We assume that the input polynomial <math>f</math> is given as a '''block-box''' or in '''product form''', so that the evaluation of <math>f(x_1,\ldots,x_n)</math> at any specific <math>(x_1,\ldots,x_n)\in\mathbb{F}^n</math> is efficient.
 
== Polynomial Interpolation ==
First, let's consider the '''univariate''' case where <math>n=1</math>. In such case <math>f(x)</math> is a univariate polynomial of degree at most <math>d</math>.
 
A straightforward deterministic algorithm is to evaluate <math>f(x_1),f(x_2),\ldots,f(x_{d+1})</math> over <math>d+1</math> ''distinct'' elements <math>x_1,x_2,\ldots,x_{d+1}</math> from the field <math>\mathbb{F}</math> and check whether they are all zero. By the [https://en.wikipedia.org/wiki/Fundamental_theorem_of_algebra fundamental theorem of algebra], also known as '''polynomial interpolation''', this guarantees to answer correctly whether a degree-<math>d</math> univariate polynomial <math>f\equiv 0</math>.
{{Theorem|Fundamental Theorem of Algebra|
:Any non-zero univariate polynomial of degree <math>d</math> has at most <math>d</math> roots.
}}
The reason for this fundamental theorem holding generally over any field <math>\mathbb{F}</math> is that any univariate polynomial of degree <math>d</math> factors uniquely into at most <math>d</math> irreducible polynomials, each of which has at most one root.
 
The following simple randomized algorithm is natural:
{{Theorem|Randomized algorithm for univariate PIT|
*suppose we have a finite subset <math>S\subseteq\mathbb{F}</math> (to be specified later);
*pick <math>r\in S</math> ''uniformly'' at random;
*if <math>f(r) = 0</math> then return “yes” else return “no”;
}}
 
This algorithm evaluates <math>f</math> at one point chosen uniformly at random from a finite subset <math>S\subseteq\mathbb{F}</math>. It is easy to see the followings:
* If <math>f\equiv 0</math>, the algorithm always returns "yes", so it is always correct.
* If <math>f\not\equiv 0</math>, the algorithm may wrongly return "yes" (a '''false positive'''). But this happens only when the random <math>r</math> is a root of <math>f</math>. By the fundamental theorem of algebra, <math>f</math> has at most <math>d</math> roots, so the probability that the algorithm is wrong is bounded as
::<math>\Pr[f(r)=0]\le\frac{d}{|S|}.</math>
 
By fixing <math>S\subseteq\mathbb{F}</math> to be an arbitrary subset of size <math>|S|=2d</math>, this probability of false positive is at most <math>1/2</math>. We can reduce it to an arbitrarily small constant <math>\delta</math> by repeat the above testing ''independently'' for <math>\log_2 \frac{1}{\delta}</math> times, since the error probability decays geometrically as we repeat the algorithm independently.
 
== Schwartz-Zippel Lemma ==
Now let's consider the general multivariate case, where <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> is an <math>n</math>-variate polynomial of degree <math>d</math>.


The following is a simple randomized algorithm for testing identity of multivariate polynomials:
The following is a simple randomized algorithm for testing identity of multivariate polynomials:
Line 80: Line 72:
* If <math>f\not\equiv 0</math>, the algorithm may wrongly return "yes" (a '''false positive'''). But this happens only when the random <math>\vec{r}=(r_1,r_2,\ldots,r_n)</math> is a root of <math>f</math>. The probability of this bad event is upper bounded by the following famous result due to [https://pdfs.semanticscholar.org/b913/cf330852035f49b4ec5fe2db86c47d8a98fd.pdf Schwartz (1980)] and [http://www.cecm.sfu.ca/~monaganm/teaching/TopicsinCA15/zippel79.pdf Zippel (1979)].  
* If <math>f\not\equiv 0</math>, the algorithm may wrongly return "yes" (a '''false positive'''). But this happens only when the random <math>\vec{r}=(r_1,r_2,\ldots,r_n)</math> is a root of <math>f</math>. The probability of this bad event is upper bounded by the following famous result due to [https://pdfs.semanticscholar.org/b913/cf330852035f49b4ec5fe2db86c47d8a98fd.pdf Schwartz (1980)] and [http://www.cecm.sfu.ca/~monaganm/teaching/TopicsinCA15/zippel79.pdf Zippel (1979)].  


{{Theorem|Schwartz-Zippel Theorem|
{{Theorem|Schwartz-Zippel Lemma|
: Let <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> be a multivariate polynomial of degree <math>d</math> over a field <math>\mathbb{F}</math>. If <math>f\not\equiv 0</math>, then for any finite set <math>S\subset\mathbb{F}</math>, and <math>r_1,r_2\ldots,r_n\in S</math> chosen uniformly and independently at random,  
: Let <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> be an <math>n</math>-variate polynomial of degree <math>d</math> over a field <math>\mathbb{F}</math>. If <math>f\not\equiv 0</math>, then for any finite set <math>S\subset\mathbb{F}</math>, and <math>r_1,r_2\ldots,r_n\in S</math> chosen uniformly and independently at random,  
::<math>\Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}.</math>
::<math>\Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}.</math>
}}
}}
The Schwartz-Zippel Theorem states that for any nonzero <math>n</math>-variate polynomial of degree at most <math>d</math>, the number of roots in any cube <math>S^n</math> is at most <math>d\cdot |S|^{n-1}</math>.
The Schwartz-Zippel lemma states that for any nonzero <math>n</math>-variate polynomial of degree at most <math>d</math>, the number of roots in any cube <math>S^n</math> is at most <math>d\cdot |S|^{n-1}</math>. And such algebraic proposition enjoins a probabilistic proof which uses no more than the '''law of total probability'''.
 
A naive thought for proving the Schwartz-Zippel lemma is to express the multivariate polynomial <math>f(r_1,r_2,\ldots,r_n)=g_{r_1,r_2,\ldots,r_{n-1}}(r_n)</math> as a univariate polynomial <math>g_{r_1,r_2,\ldots,r_{n-1}}</math> on the <math>{n}</math>-th variable <math>r_{n}</math>, where <math>g_{r_1,\ldots,r_{n-1}}(\cdot)=f(r_1,\ldots,r_{n-1},\cdot)</math> is a univariate polynomial "parameterized" by the random values <math>r_1,r_2,\ldots,r_{n-1}</math>.
 
Then, the argument for the univariate case can apply, and easily we have
:<math>\Pr[f(r_1,r_2,\ldots,r_n)=0]=\Pr[g_{r_1,r_2,\ldots,r_{n-1}}(r_n)=0]\le\frac{d}{|S|}.</math>
 
A fatal flaw of this "proof" is that when the first <math>{n-1}</math> variables are assigned with random values <math>r_1,r_2,\ldots,r_{n-1}</math>, the univariate polynomial <math>g_{r_1,\ldots,r_{n-1}}(\cdot)=f(r_1,\ldots,r_{n-1},\cdot)</math> may no longer always satisfy that <math>g_{r_1,\ldots,r_{n-1}}\not\equiv 0</math>, which is required by polynomial interpolation.  


Dana Moshkovitz gave a surprisingly simply and elegant [http://eccc.hpi-web.de/report/2010/096/ proof] of Schwartz-Zippel Theorem, using some advanced ideas. Now we introduce the standard proof by induction.
And this is precisely where the law of total probability kicks in, to separate the case where <math>g_{r_1,\ldots,r_{n-1}}\not\equiv 0</math> from the case where <math>g_{r_1,\ldots,r_{n-1}}\equiv 0</math>.


{{Proof|  
{{Proof|  
Line 100: Line 99:


;Induction step''':'''
;Induction step''':'''
For any <math>n</math>-variate polynomial <math>f(x_1,x_2,\ldots,x_n)</math>  of degree at most <math>d</math>, we write <math>f</math> as
For any <math>n</math>-variate polynomial <math>f(x_1,x_2,\ldots,x_n)</math>  of degree at most <math>d</math>, write <math>f</math> as
:<math>f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1})</math>,
:<math>f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1})</math>,
where <math>k</math> is the highest degree of <math>x_n</math>, which means the degree of <math>f_k</math> is at most <math>d-k</math> and <math>f_k\not\equiv 0</math>.
where <math>k</math> is the highest degree of <math>x_n</math>. Note that the degree of <math>f_k</math> is at most <math>d-k</math> and <math>f_k\not\equiv 0</math> (otherwise <math>k</math> wouldn't be the highest degree of <math>x_n</math>).


In particular, we write <math>f</math> as a sum of two parts:
For short, express <math>f</math> as:
:<math>f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n)</math>,
:<math>f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n)</math>,
where both <math>f_k</math> and <math>\bar{f}</math> are polynomials, such that  
where both <math>f_k</math> and <math>\bar{f}</math> are polynomials, such that  
* <math>f_k\not\equiv 0</math> is as above, whose degree is  at most <math>d-k</math>;
* <math>f_k\not\equiv 0</math> and has degree at most <math>d-k</math>;
* <math>\bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1})</math>, thus <math>\bar{f}(x_1,x_2,\ldots,x_n)</math> has no <math>x_n^{k}</math> factor in any term.
* <math>\bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1})</math>, thus <math>\bar{f}(x_1,x_2,\ldots,x_n)</math> has no <math>x_n^{k}</math> factor in any term.


By the law of total probability, we have
By the '''law of total probability''',  
:<math>
:<math>
\begin{align}
\begin{align}
&\Pr[f(r_1,r_2,\ldots,r_n)=0]\\
\Pr[f(r_1,r_2,\ldots,r_n)=0]
=
=
&\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\
&\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\
Line 119: Line 118:
\end{align}
\end{align}
</math>
</math>
Note that <math>f_k(r_1,r_2,\ldots,r_{n-1})</math> is a polynomial on <math>n-1</math> variables of degree <math>d-k</math> such that <math>f_k\not\equiv 0</math>.
Since <math>f_k(r_1,r_2,\ldots,r_{n-1})</math> is an <math>(n-1)</math>-variate polynomial of degree at most <math>d-k</math> satisfying that <math>f_k\not\equiv 0</math>,
By the induction hypothesis, we have
by the induction hypothesis,  
:<math>
:<math>
\begin{align}
\begin{align}
(*)
(\text{case-I})
&\qquad
&\qquad
&\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}.
&\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}.
Line 129: Line 128:
</math>
</math>


Now we look at the case conditioning on <math>f_k(r_1,r_2,\ldots,r_{n-1})\neq0</math>. Recall that <math>\bar{f}(x_1,\ldots,x_n)</math> has no <math>x_n^k</math> factor in any term, thus the condition <math>f_k(r_1,r_2,\ldots,r_{n-1})\neq0</math> guarantees that  
Now look at the case with condition <math>f_k(r_1,r_2,\ldots,r_{n-1})\neq0</math>.  
Recall that <math>\bar{f}(x_1,\ldots,x_n)</math> has no <math>x_n^k</math> factor in any term, thus the condition <math>f_k(r_1,r_2,\ldots,r_{n-1})\neq0</math> implies that  
:<math>f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n)</math>
:<math>f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n)</math>
is a nonzero univariate polynomial of <math>x_n</math> such that the degree of <math>g_{r_1,\ldots,r_{n-1}}(x_n)</math> is <math>k</math> and <math>g_{r_1,\ldots,r_{n-1}}\not\equiv 0</math>, for which we already known that the probability <math>g_{r_1,\ldots,r_{n-1}}(r_n)=0</math> is at most <math>\frac{k}{|S|}</math>.
when being interpreted as a univariate polynomial <math>g_{r_1,\ldots,r_{n-1}}(x_n)</math> on variable <math>x_n</math>, always has that <math>g_{r_1,\ldots,r_{n-1}}\not\equiv 0</math> and the degree of <math>g_{r_1,\ldots,r_{n-1}}(x_n)</math> is at most <math>k</math>,
and thus due to the argument for the univariate case, the probability <math>g_{r_1,\ldots,r_{n-1}}(r_n)=0</math> is at most <math>\frac{k}{|S|}</math>.
 
Therefore,
Therefore,
:<math>
:<math>
\begin{align}
\begin{align}
(**)
(\text{case-II})
&\qquad
&\qquad
&\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|}
&\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|}
\end{align}
\end{align}
</math>.
</math>.
Substituting both <math>(*)</math> and <math>(**)</math> back in the total probability, we have
Applying the inequalities of (case-I) and (case-II) in the total probability above, gives
:<math>
:<math>
\Pr[f(r_1,r_2,\ldots,r_n)=0]
\Pr[f(r_1,r_2,\ldots,r_n)=0]
\le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|},
\le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}.
</math>
</math>
which proves the theorem.
}}
}}

Latest revision as of 03:05, 12 March 2023

In the Polynomial Identity Testing (PIT) problem, we are given two polynomials, and we want to determine whether they are identical.

Polynomial Identity Testing
  • Input: two [math]\displaystyle{ n }[/math]-variate polynomials [math]\displaystyle{ f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv g }[/math].

The [math]\displaystyle{ \mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] denotes the ring of multivariate polynomials over field [math]\displaystyle{ \mathbb{F} }[/math]. An [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math], written as a sum of monomials, is:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math].

The degree or total degree of a monomial [math]\displaystyle{ a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math] is given by [math]\displaystyle{ i_1+i_2+\cdots+i_n }[/math] and the degree of a polynomial [math]\displaystyle{ f }[/math] is the maximum degree of monomials of nonzero coefficients.

This problem is equivalent to the following problem of checking whether [math]\displaystyle{ f-g }[/math] (which is a polynomial of degree at most [math]\displaystyle{ d }[/math]) is the zero-polynomial.

Polynomial Identity Testing (equivalent version)
  • Input: a polynomial [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv 0 }[/math].

The problem seems to be solvable by checking whether all coefficients of monomials in [math]\displaystyle{ f }[/math] are zeros. And there are at most [math]\displaystyle{ (n+d)^{d} }[/math] coefficients since [math]\displaystyle{ f }[/math] is an [math]\displaystyle{ n }[/math]-variate polynomial of degree at most [math]\displaystyle{ d }[/math]. However, this relies on that all coefficients of [math]\displaystyle{ f }[/math] are explicitly given, which may not always be the case.

For example, a multivariate polynomial [math]\displaystyle{ f }[/math] may be expressed in a product form.

Example

The Vandermonde matrix [math]\displaystyle{ M=M(x_1,x_2,\ldots,x_n) }[/math] is defined as that [math]\displaystyle{ M_{ij}=x_i^{j-1} }[/math], that is

[math]\displaystyle{ M=\begin{bmatrix} 1 & x_1 & x_1^2 & \dots & x_1^{n-1}\\ 1 & x_2 & x_2^2 & \dots & x_2^{n-1}\\ 1 & x_3 & x_3^2 & \dots & x_3^{n-1}\\ \vdots & \vdots & \vdots & \ddots &\vdots \\ 1 & x_n & x_n^2 & \dots & x_n^{n-1} \end{bmatrix} }[/math].

Let [math]\displaystyle{ f }[/math] be the polynomial defined as

[math]\displaystyle{ f(x_1,\ldots,x_n)=\det(M)=\prod_{j\lt i}(x_i-x_j). }[/math]

For polynomials [math]\displaystyle{ f }[/math] in product form, it is quite possible that evaluating the polynomial [math]\displaystyle{ f(x_1,\ldots,x_n) }[/math] at any [math]\displaystyle{ (x_1,\ldots,x_n)\in\mathbb{F}^n }[/math] is always efficient but symbolically expanding the polynomial to calculate the coefficients may be very expensive.

We assume that the input polynomial [math]\displaystyle{ f }[/math] is given as a block-box or in product form, so that the evaluation of [math]\displaystyle{ f(x_1,\ldots,x_n) }[/math] at any specific [math]\displaystyle{ (x_1,\ldots,x_n)\in\mathbb{F}^n }[/math] is efficient.

Polynomial Interpolation

First, let's consider the univariate case where [math]\displaystyle{ n=1 }[/math]. In such case [math]\displaystyle{ f(x) }[/math] is a univariate polynomial of degree at most [math]\displaystyle{ d }[/math].

A straightforward deterministic algorithm is to evaluate [math]\displaystyle{ f(x_1),f(x_2),\ldots,f(x_{d+1}) }[/math] over [math]\displaystyle{ d+1 }[/math] distinct elements [math]\displaystyle{ x_1,x_2,\ldots,x_{d+1} }[/math] from the field [math]\displaystyle{ \mathbb{F} }[/math] and check whether they are all zero. By the fundamental theorem of algebra, also known as polynomial interpolation, this guarantees to answer correctly whether a degree-[math]\displaystyle{ d }[/math] univariate polynomial [math]\displaystyle{ f\equiv 0 }[/math].

Fundamental Theorem of Algebra
Any non-zero univariate polynomial of degree [math]\displaystyle{ d }[/math] has at most [math]\displaystyle{ d }[/math] roots.

The reason for this fundamental theorem holding generally over any field [math]\displaystyle{ \mathbb{F} }[/math] is that any univariate polynomial of degree [math]\displaystyle{ d }[/math] factors uniquely into at most [math]\displaystyle{ d }[/math] irreducible polynomials, each of which has at most one root.

The following simple randomized algorithm is natural:

Randomized algorithm for univariate PIT
  • suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
  • pick [math]\displaystyle{ r\in S }[/math] uniformly at random;
  • if [math]\displaystyle{ f(r) = 0 }[/math] then return “yes” else return “no”;

This algorithm evaluates [math]\displaystyle{ f }[/math] at one point chosen uniformly at random from a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math]. It is easy to see the followings:

  • If [math]\displaystyle{ f\equiv 0 }[/math], the algorithm always returns "yes", so it is always correct.
  • If [math]\displaystyle{ f\not\equiv 0 }[/math], the algorithm may wrongly return "yes" (a false positive). But this happens only when the random [math]\displaystyle{ r }[/math] is a root of [math]\displaystyle{ f }[/math]. By the fundamental theorem of algebra, [math]\displaystyle{ f }[/math] has at most [math]\displaystyle{ d }[/math] roots, so the probability that the algorithm is wrong is bounded as
[math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]

By fixing [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] to be an arbitrary subset of size [math]\displaystyle{ |S|=2d }[/math], this probability of false positive is at most [math]\displaystyle{ 1/2 }[/math]. We can reduce it to an arbitrarily small constant [math]\displaystyle{ \delta }[/math] by repeat the above testing independently for [math]\displaystyle{ \log_2 \frac{1}{\delta} }[/math] times, since the error probability decays geometrically as we repeat the algorithm independently.

Schwartz-Zippel Lemma

Now let's consider the general multivariate case, where [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] is an [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math].

The following is a simple randomized algorithm for testing identity of multivariate polynomials:

Randomized algorithm for multivariate PIT
  • suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
  • pick [math]\displaystyle{ r_1,r_2,\ldots,r_n\in S }[/math] uniformly and independently at random;
  • if [math]\displaystyle{ f(\vec{r})=f(r_1,r_2,\ldots,r_n) = 0 }[/math] then return “yes” else return “no”;

This algorithm evaluates [math]\displaystyle{ f }[/math] at one point chosen uniformly from an [math]\displaystyle{ n }[/math]-dimensional cube [math]\displaystyle{ S^n }[/math], where [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] is a finite subset. And:

  • If [math]\displaystyle{ f\equiv 0 }[/math], the algorithm always returns "yes", so it is always correct.
  • If [math]\displaystyle{ f\not\equiv 0 }[/math], the algorithm may wrongly return "yes" (a false positive). But this happens only when the random [math]\displaystyle{ \vec{r}=(r_1,r_2,\ldots,r_n) }[/math] is a root of [math]\displaystyle{ f }[/math]. The probability of this bad event is upper bounded by the following famous result due to Schwartz (1980) and Zippel (1979).
Schwartz-Zippel Lemma
Let [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] be an [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math] over a field [math]\displaystyle{ \mathbb{F} }[/math]. If [math]\displaystyle{ f\not\equiv 0 }[/math], then for any finite set [math]\displaystyle{ S\subset\mathbb{F} }[/math], and [math]\displaystyle{ r_1,r_2\ldots,r_n\in S }[/math] chosen uniformly and independently at random,
[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}. }[/math]

The Schwartz-Zippel lemma states that for any nonzero [math]\displaystyle{ n }[/math]-variate polynomial of degree at most [math]\displaystyle{ d }[/math], the number of roots in any cube [math]\displaystyle{ S^n }[/math] is at most [math]\displaystyle{ d\cdot |S|^{n-1} }[/math]. And such algebraic proposition enjoins a probabilistic proof which uses no more than the law of total probability.

A naive thought for proving the Schwartz-Zippel lemma is to express the multivariate polynomial [math]\displaystyle{ f(r_1,r_2,\ldots,r_n)=g_{r_1,r_2,\ldots,r_{n-1}}(r_n) }[/math] as a univariate polynomial [math]\displaystyle{ g_{r_1,r_2,\ldots,r_{n-1}} }[/math] on the [math]\displaystyle{ {n} }[/math]-th variable [math]\displaystyle{ r_{n} }[/math], where [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(\cdot)=f(r_1,\ldots,r_{n-1},\cdot) }[/math] is a univariate polynomial "parameterized" by the random values [math]\displaystyle{ r_1,r_2,\ldots,r_{n-1} }[/math].

Then, the argument for the univariate case can apply, and easily we have

[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0]=\Pr[g_{r_1,r_2,\ldots,r_{n-1}}(r_n)=0]\le\frac{d}{|S|}. }[/math]

A fatal flaw of this "proof" is that when the first [math]\displaystyle{ {n-1} }[/math] variables are assigned with random values [math]\displaystyle{ r_1,r_2,\ldots,r_{n-1} }[/math], the univariate polynomial [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(\cdot)=f(r_1,\ldots,r_{n-1},\cdot) }[/math] may no longer always satisfy that [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math], which is required by polynomial interpolation.

And this is precisely where the law of total probability kicks in, to separate the case where [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math] from the case where [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\equiv 0 }[/math].

Proof.

The theorem is proved by induction on [math]\displaystyle{ n }[/math].

Induction basis:

For [math]\displaystyle{ n=1 }[/math], this is the univariate case. Assume that [math]\displaystyle{ f\not\equiv 0 }[/math]. Due to the fundamental theorem of algebra, any polynomial [math]\displaystyle{ f(x) }[/math] of degree at most [math]\displaystyle{ d }[/math] must have at most [math]\displaystyle{ d }[/math] roots, thus

[math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]
Induction hypothesis:

Assume the theorem holds for any [math]\displaystyle{ m }[/math]-variate polynomials for all [math]\displaystyle{ m\lt n }[/math].

Induction step:

For any [math]\displaystyle{ n }[/math]-variate polynomial [math]\displaystyle{ f(x_1,x_2,\ldots,x_n) }[/math] of degree at most [math]\displaystyle{ d }[/math], write [math]\displaystyle{ f }[/math] as

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1}) }[/math],

where [math]\displaystyle{ k }[/math] is the highest degree of [math]\displaystyle{ x_n }[/math]. Note that the degree of [math]\displaystyle{ f_k }[/math] is at most [math]\displaystyle{ d-k }[/math] and [math]\displaystyle{ f_k\not\equiv 0 }[/math] (otherwise [math]\displaystyle{ k }[/math] wouldn't be the highest degree of [math]\displaystyle{ x_n }[/math]).

For short, express [math]\displaystyle{ f }[/math] as:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n) }[/math],

where both [math]\displaystyle{ f_k }[/math] and [math]\displaystyle{ \bar{f} }[/math] are polynomials, such that

  • [math]\displaystyle{ f_k\not\equiv 0 }[/math] and has degree at most [math]\displaystyle{ d-k }[/math];
  • [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1}) }[/math], thus [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^{k} }[/math] factor in any term.

By the law of total probability,

[math]\displaystyle{ \begin{align} \Pr[f(r_1,r_2,\ldots,r_n)=0] = &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\ &+\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})\neq0]. \end{align} }[/math]

Since [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1}) }[/math] is an [math]\displaystyle{ (n-1) }[/math]-variate polynomial of degree at most [math]\displaystyle{ d-k }[/math] satisfying that [math]\displaystyle{ f_k\not\equiv 0 }[/math], by the induction hypothesis,

[math]\displaystyle{ \begin{align} (\text{case-I}) &\qquad &\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. \end{align} }[/math]

Now look at the case with condition [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math]. Recall that [math]\displaystyle{ \bar{f}(x_1,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^k }[/math] factor in any term, thus the condition [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math] implies that

[math]\displaystyle{ f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n) }[/math]

when being interpreted as a univariate polynomial [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }[/math] on variable [math]\displaystyle{ x_n }[/math], always has that [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math] and the degree of [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }[/math] is at most [math]\displaystyle{ k }[/math], and thus due to the argument for the univariate case, the probability [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(r_n)=0 }[/math] is at most [math]\displaystyle{ \frac{k}{|S|} }[/math].

Therefore,

[math]\displaystyle{ \begin{align} (\text{case-II}) &\qquad &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|} \end{align} }[/math].

Applying the inequalities of (case-I) and (case-II) in the total probability above, gives

[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0] \le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}. }[/math]
[math]\displaystyle{ \square }[/math]