概率论与数理统计 (Spring 2023)/Polynomial identity testing: Difference between revisions
(28 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
In the '''Polynomial Identity Testing (PIT)''' problem, we are given two polynomials, and we want to determine whether they are identical. | |||
{{Theorem|Polynomial Identity Testing| | |||
{{Theorem| | |||
* '''Input:''' two <math>n</math>-variate polynomials <math>f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> of degree <math>d</math>. | * '''Input:''' two <math>n</math>-variate polynomials <math>f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> of degree <math>d</math>. | ||
* Determine whether <math>f\equiv g</math>. | * Determine whether <math>f\equiv g</math>. | ||
The <math>\mathbb{F}[x_1,x_2,\ldots,x_n]</math> | }} | ||
The <math>\mathbb{F}[x_1,x_2,\ldots,x_n]</math> denotes the [http://en.wikipedia.org/wiki/Polynomial_ring#The_polynomial_ring_in_several_variables ring of multivariate polynomials] over field <math>\mathbb{F}</math>. An <math>n</math>-variate polynomial of degree <math>d</math>, written as a sum of monomials, is: | |||
:<math>f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n}</math>. | :<math>f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n}</math>. | ||
The '''degree''' or '''total degree''' of a monomial <math>a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n}</math> is given by <math>i_1+i_2+\cdots+i_n</math> and the degree of a polynomial <math>f</math> is the maximum degree of monomials of nonzero coefficients. | The '''degree''' or '''total degree''' of a monomial <math>a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n}</math> is given by <math>i_1+i_2+\cdots+i_n</math> and the degree of a polynomial <math>f</math> is the maximum degree of monomials of nonzero coefficients. | ||
This problem is equivalent to the following problem of checking whether <math>f-g</math> (which is a polynomial of degree at most <math>d</math>) is the zero-polynomial. | |||
{{Theorem|Polynomial Identity Testing (equivalent version)| | |||
* '''Input:''' a polynomial <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> of degree <math>d</math>. | * '''Input:''' a polynomial <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> of degree <math>d</math>. | ||
* Determine whether <math>f\equiv 0</math>. | * Determine whether <math>f\equiv 0</math>. | ||
}} | |||
The problem seems to be solvable by checking whether all coefficients of monomials in <math>f</math> are zeros. And there are at most <math>(n+d)^{d}</math> coefficients since <math>f</math> is an <math>n</math>-variate polynomial of degree at most <math>d</math>. However, this relies on that all coefficients of <math>f</math> are explicitly given, which may not always be the case. | |||
For example, a multivariate polynomial <math>f</math> may be expressed in a '''product form'''. | |||
{{Theorem|Example| | {{Theorem|Example| | ||
The [http://en.wikipedia.org/wiki/Vandermonde_matrix Vandermonde matrix] <math>M=M(x_1,x_2,\ldots,x_n)</math> is defined as that <math>M_{ij}=x_i^{j-1}</math>, that is | The [http://en.wikipedia.org/wiki/Vandermonde_matrix Vandermonde matrix] <math>M=M(x_1,x_2,\ldots,x_n)</math> is defined as that <math>M_{ij}=x_i^{j-1}</math>, that is | ||
Line 67: | Line 31: | ||
</math> | </math> | ||
}} | }} | ||
For polynomials in product form, it is quite efficient to ''' | For polynomials <math>f</math> in product form, it is quite possible that '''evaluating''' the polynomial <math>f(x_1,\ldots,x_n)</math> at any <math>(x_1,\ldots,x_n)\in\mathbb{F}^n</math> is always efficient but symbolically '''expanding''' the polynomial to calculate the coefficients may be very expensive. | ||
We assume that the input polynomial <math>f</math> is given as a '''block-box''' or in '''product form''', so that the evaluation of <math>f(x_1,\ldots,x_n)</math> at any specific <math>(x_1,\ldots,x_n)\in\mathbb{F}^n</math> is efficient. | |||
== Polynomial Interpolation == | |||
First, let's consider the '''univariate''' case where <math>n=1</math>. In such case <math>f(x)</math> is a univariate polynomial of degree at most <math>d</math>. | |||
A straightforward deterministic algorithm is to evaluate <math>f(x_1),f(x_2),\ldots,f(x_{d+1})</math> over <math>d+1</math> ''distinct'' elements <math>x_1,x_2,\ldots,x_{d+1}</math> from the field <math>\mathbb{F}</math> and check whether they are all zero. By the [https://en.wikipedia.org/wiki/Fundamental_theorem_of_algebra fundamental theorem of algebra], also known as '''polynomial interpolation''', this guarantees to answer correctly whether a degree-<math>d</math> univariate polynomial <math>f\equiv 0</math>. | |||
{{Theorem|Fundamental Theorem of Algebra| | |||
:Any non-zero univariate polynomial of degree <math>d</math> has at most <math>d</math> roots. | |||
}} | |||
The reason for this fundamental theorem holding generally over any field <math>\mathbb{F}</math> is that any univariate polynomial of degree <math>d</math> factors uniquely into at most <math>d</math> irreducible polynomials, each of which has at most one root. | |||
The following simple randomized algorithm is natural: | |||
{{Theorem|Randomized algorithm for univariate PIT| | |||
*suppose we have a finite subset <math>S\subseteq\mathbb{F}</math> (to be specified later); | |||
*pick <math>r\in S</math> ''uniformly'' at random; | |||
*if <math>f(r) = 0</math> then return “yes” else return “no”; | |||
}} | |||
This algorithm evaluates <math>f</math> at one point chosen uniformly at random from a finite subset <math>S\subseteq\mathbb{F}</math>. It is easy to see the followings: | |||
* If <math>f\equiv 0</math>, the algorithm always returns "yes", so it is always correct. | |||
* If <math>f\not\equiv 0</math>, the algorithm may wrongly return "yes" (a '''false positive'''). But this happens only when the random <math>r</math> is a root of <math>f</math>. By the fundamental theorem of algebra, <math>f</math> has at most <math>d</math> roots, so the probability that the algorithm is wrong is bounded as | |||
::<math>\Pr[f(r)=0]\le\frac{d}{|S|}.</math> | |||
By fixing <math>S\subseteq\mathbb{F}</math> to be an arbitrary subset of size <math>|S|=2d</math>, this probability of false positive is at most <math>1/2</math>. We can reduce it to an arbitrarily small constant <math>\delta</math> by repeat the above testing ''independently'' for <math>\log_2 \frac{1}{\delta}</math> times, since the error probability decays geometrically as we repeat the algorithm independently. | |||
== Schwartz-Zippel Lemma == | |||
Now let's consider the general multivariate case, where <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> is an <math>n</math>-variate polynomial of degree <math>d</math>. | |||
The following is a simple randomized algorithm for testing identity of multivariate polynomials: | The following is a simple randomized algorithm for testing identity of multivariate polynomials: | ||
Line 80: | Line 72: | ||
* If <math>f\not\equiv 0</math>, the algorithm may wrongly return "yes" (a '''false positive'''). But this happens only when the random <math>\vec{r}=(r_1,r_2,\ldots,r_n)</math> is a root of <math>f</math>. The probability of this bad event is upper bounded by the following famous result due to [https://pdfs.semanticscholar.org/b913/cf330852035f49b4ec5fe2db86c47d8a98fd.pdf Schwartz (1980)] and [http://www.cecm.sfu.ca/~monaganm/teaching/TopicsinCA15/zippel79.pdf Zippel (1979)]. | * If <math>f\not\equiv 0</math>, the algorithm may wrongly return "yes" (a '''false positive'''). But this happens only when the random <math>\vec{r}=(r_1,r_2,\ldots,r_n)</math> is a root of <math>f</math>. The probability of this bad event is upper bounded by the following famous result due to [https://pdfs.semanticscholar.org/b913/cf330852035f49b4ec5fe2db86c47d8a98fd.pdf Schwartz (1980)] and [http://www.cecm.sfu.ca/~monaganm/teaching/TopicsinCA15/zippel79.pdf Zippel (1979)]. | ||
{{Theorem|Schwartz-Zippel | {{Theorem|Schwartz-Zippel Lemma| | ||
: Let <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> be | : Let <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> be an <math>n</math>-variate polynomial of degree <math>d</math> over a field <math>\mathbb{F}</math>. If <math>f\not\equiv 0</math>, then for any finite set <math>S\subset\mathbb{F}</math>, and <math>r_1,r_2\ldots,r_n\in S</math> chosen uniformly and independently at random, | ||
::<math>\Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}.</math> | ::<math>\Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}.</math> | ||
}} | }} | ||
The Schwartz-Zippel | The Schwartz-Zippel lemma states that for any nonzero <math>n</math>-variate polynomial of degree at most <math>d</math>, the number of roots in any cube <math>S^n</math> is at most <math>d\cdot |S|^{n-1}</math>. And such algebraic proposition enjoins a probabilistic proof which uses no more than the '''law of total probability'''. | ||
A naive thought for proving the Schwartz-Zippel lemma is to express the multivariate polynomial <math>f(r_1,r_2,\ldots,r_n)=g_{r_1,r_2,\ldots,r_{n-1}}(r_n)</math> as a univariate polynomial <math>g_{r_1,r_2,\ldots,r_{n-1}}</math> on the <math>{n}</math>-th variable <math>r_{n}</math>, where <math>g_{r_1,\ldots,r_{n-1}}(\cdot)=f(r_1,\ldots,r_{n-1},\cdot)</math> is a univariate polynomial "parameterized" by the random values <math>r_1,r_2,\ldots,r_{n-1}</math>. | |||
Then, the argument for the univariate case can apply, and easily we have | |||
:<math>\Pr[f(r_1,r_2,\ldots,r_n)=0]=\Pr[g_{r_1,r_2,\ldots,r_{n-1}}(r_n)=0]\le\frac{d}{|S|}.</math> | |||
A fatal flaw of this "proof" is that when the first <math>{n-1}</math> variables are assigned with random values <math>r_1,r_2,\ldots,r_{n-1}</math>, the univariate polynomial <math>g_{r_1,\ldots,r_{n-1}}(\cdot)=f(r_1,\ldots,r_{n-1},\cdot)</math> may no longer always satisfy that <math>g_{r_1,\ldots,r_{n-1}}\not\equiv 0</math>, which is required by polynomial interpolation. | |||
And this is precisely where the law of total probability kicks in, to separate the case where <math>g_{r_1,\ldots,r_{n-1}}\not\equiv 0</math> from the case where <math>g_{r_1,\ldots,r_{n-1}}\equiv 0</math>. | |||
{{Proof| | {{Proof| | ||
Line 100: | Line 99: | ||
;Induction step''':''' | ;Induction step''':''' | ||
For any <math>n</math>-variate polynomial <math>f(x_1,x_2,\ldots,x_n)</math> of degree at most <math>d</math>, | For any <math>n</math>-variate polynomial <math>f(x_1,x_2,\ldots,x_n)</math> of degree at most <math>d</math>, write <math>f</math> as | ||
:<math>f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1})</math>, | :<math>f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1})</math>, | ||
where <math>k</math> is the highest degree of <math>x_n</math> | where <math>k</math> is the highest degree of <math>x_n</math>. Note that the degree of <math>f_k</math> is at most <math>d-k</math> and <math>f_k\not\equiv 0</math> (otherwise <math>k</math> wouldn't be the highest degree of <math>x_n</math>). | ||
For short, express <math>f</math> as: | |||
:<math>f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n)</math>, | :<math>f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n)</math>, | ||
where both <math>f_k</math> and <math>\bar{f}</math> are polynomials, such that | where both <math>f_k</math> and <math>\bar{f}</math> are polynomials, such that | ||
* <math>f_k\not\equiv 0</math> | * <math>f_k\not\equiv 0</math> and has degree at most <math>d-k</math>; | ||
* <math>\bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1})</math>, thus <math>\bar{f}(x_1,x_2,\ldots,x_n)</math> has no <math>x_n^{k}</math> factor in any term. | * <math>\bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1})</math>, thus <math>\bar{f}(x_1,x_2,\ldots,x_n)</math> has no <math>x_n^{k}</math> factor in any term. | ||
By the law of total probability, | By the '''law of total probability''', | ||
:<math> | :<math> | ||
\begin{align} | \begin{align} | ||
\Pr[f(r_1,r_2,\ldots,r_n)=0] | |||
= | = | ||
&\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\ | &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\ | ||
Line 119: | Line 118: | ||
\end{align} | \end{align} | ||
</math> | </math> | ||
Since <math>f_k(r_1,r_2,\ldots,r_{n-1})</math> is an <math>(n-1)</math>-variate polynomial of degree at most <math>d-k</math> satisfying that <math>f_k\not\equiv 0</math>, | |||
by the induction hypothesis, | |||
:<math> | :<math> | ||
\begin{align} | \begin{align} | ||
( | (\text{case-I}) | ||
&\qquad | &\qquad | ||
&\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. | &\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. | ||
Line 129: | Line 128: | ||
</math> | </math> | ||
Now | Now look at the case with condition <math>f_k(r_1,r_2,\ldots,r_{n-1})\neq0</math>. | ||
Recall that <math>\bar{f}(x_1,\ldots,x_n)</math> has no <math>x_n^k</math> factor in any term, thus the condition <math>f_k(r_1,r_2,\ldots,r_{n-1})\neq0</math> implies that | |||
:<math>f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n)</math> | :<math>f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n)</math> | ||
when being interpreted as a univariate polynomial <math>g_{r_1,\ldots,r_{n-1}}(x_n)</math> on variable <math>x_n</math>, always has that <math>g_{r_1,\ldots,r_{n-1}}\not\equiv 0</math> and the degree of <math>g_{r_1,\ldots,r_{n-1}}(x_n)</math> is at most <math>k</math>, | |||
and thus due to the argument for the univariate case, the probability <math>g_{r_1,\ldots,r_{n-1}}(r_n)=0</math> is at most <math>\frac{k}{|S|}</math>. | |||
Therefore, | Therefore, | ||
:<math> | :<math> | ||
\begin{align} | \begin{align} | ||
( | (\text{case-II}) | ||
&\qquad | &\qquad | ||
&\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|} | &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|} | ||
\end{align} | \end{align} | ||
</math>. | </math>. | ||
Applying the inequalities of (case-I) and (case-II) in the total probability above, gives | |||
:<math> | :<math> | ||
\Pr[f(r_1,r_2,\ldots,r_n)=0] | \Pr[f(r_1,r_2,\ldots,r_n)=0] | ||
\le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|} | \le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}. | ||
</math> | </math> | ||
}} | }} |
Latest revision as of 03:05, 12 March 2023
In the Polynomial Identity Testing (PIT) problem, we are given two polynomials, and we want to determine whether they are identical.
Polynomial Identity Testing - Input: two [math]\displaystyle{ n }[/math]-variate polynomials [math]\displaystyle{ f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
- Determine whether [math]\displaystyle{ f\equiv g }[/math].
The [math]\displaystyle{ \mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] denotes the ring of multivariate polynomials over field [math]\displaystyle{ \mathbb{F} }[/math]. An [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math], written as a sum of monomials, is:
- [math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math].
The degree or total degree of a monomial [math]\displaystyle{ a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math] is given by [math]\displaystyle{ i_1+i_2+\cdots+i_n }[/math] and the degree of a polynomial [math]\displaystyle{ f }[/math] is the maximum degree of monomials of nonzero coefficients.
This problem is equivalent to the following problem of checking whether [math]\displaystyle{ f-g }[/math] (which is a polynomial of degree at most [math]\displaystyle{ d }[/math]) is the zero-polynomial.
Polynomial Identity Testing (equivalent version) - Input: a polynomial [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
- Determine whether [math]\displaystyle{ f\equiv 0 }[/math].
The problem seems to be solvable by checking whether all coefficients of monomials in [math]\displaystyle{ f }[/math] are zeros. And there are at most [math]\displaystyle{ (n+d)^{d} }[/math] coefficients since [math]\displaystyle{ f }[/math] is an [math]\displaystyle{ n }[/math]-variate polynomial of degree at most [math]\displaystyle{ d }[/math]. However, this relies on that all coefficients of [math]\displaystyle{ f }[/math] are explicitly given, which may not always be the case.
For example, a multivariate polynomial [math]\displaystyle{ f }[/math] may be expressed in a product form.
Example The Vandermonde matrix [math]\displaystyle{ M=M(x_1,x_2,\ldots,x_n) }[/math] is defined as that [math]\displaystyle{ M_{ij}=x_i^{j-1} }[/math], that is
- [math]\displaystyle{ M=\begin{bmatrix} 1 & x_1 & x_1^2 & \dots & x_1^{n-1}\\ 1 & x_2 & x_2^2 & \dots & x_2^{n-1}\\ 1 & x_3 & x_3^2 & \dots & x_3^{n-1}\\ \vdots & \vdots & \vdots & \ddots &\vdots \\ 1 & x_n & x_n^2 & \dots & x_n^{n-1} \end{bmatrix} }[/math].
Let [math]\displaystyle{ f }[/math] be the polynomial defined as
- [math]\displaystyle{ f(x_1,\ldots,x_n)=\det(M)=\prod_{j\lt i}(x_i-x_j). }[/math]
For polynomials [math]\displaystyle{ f }[/math] in product form, it is quite possible that evaluating the polynomial [math]\displaystyle{ f(x_1,\ldots,x_n) }[/math] at any [math]\displaystyle{ (x_1,\ldots,x_n)\in\mathbb{F}^n }[/math] is always efficient but symbolically expanding the polynomial to calculate the coefficients may be very expensive.
We assume that the input polynomial [math]\displaystyle{ f }[/math] is given as a block-box or in product form, so that the evaluation of [math]\displaystyle{ f(x_1,\ldots,x_n) }[/math] at any specific [math]\displaystyle{ (x_1,\ldots,x_n)\in\mathbb{F}^n }[/math] is efficient.
Polynomial Interpolation
First, let's consider the univariate case where [math]\displaystyle{ n=1 }[/math]. In such case [math]\displaystyle{ f(x) }[/math] is a univariate polynomial of degree at most [math]\displaystyle{ d }[/math].
A straightforward deterministic algorithm is to evaluate [math]\displaystyle{ f(x_1),f(x_2),\ldots,f(x_{d+1}) }[/math] over [math]\displaystyle{ d+1 }[/math] distinct elements [math]\displaystyle{ x_1,x_2,\ldots,x_{d+1} }[/math] from the field [math]\displaystyle{ \mathbb{F} }[/math] and check whether they are all zero. By the fundamental theorem of algebra, also known as polynomial interpolation, this guarantees to answer correctly whether a degree-[math]\displaystyle{ d }[/math] univariate polynomial [math]\displaystyle{ f\equiv 0 }[/math].
Fundamental Theorem of Algebra - Any non-zero univariate polynomial of degree [math]\displaystyle{ d }[/math] has at most [math]\displaystyle{ d }[/math] roots.
The reason for this fundamental theorem holding generally over any field [math]\displaystyle{ \mathbb{F} }[/math] is that any univariate polynomial of degree [math]\displaystyle{ d }[/math] factors uniquely into at most [math]\displaystyle{ d }[/math] irreducible polynomials, each of which has at most one root.
The following simple randomized algorithm is natural:
Randomized algorithm for univariate PIT - suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
- pick [math]\displaystyle{ r\in S }[/math] uniformly at random;
- if [math]\displaystyle{ f(r) = 0 }[/math] then return “yes” else return “no”;
This algorithm evaluates [math]\displaystyle{ f }[/math] at one point chosen uniformly at random from a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math]. It is easy to see the followings:
- If [math]\displaystyle{ f\equiv 0 }[/math], the algorithm always returns "yes", so it is always correct.
- If [math]\displaystyle{ f\not\equiv 0 }[/math], the algorithm may wrongly return "yes" (a false positive). But this happens only when the random [math]\displaystyle{ r }[/math] is a root of [math]\displaystyle{ f }[/math]. By the fundamental theorem of algebra, [math]\displaystyle{ f }[/math] has at most [math]\displaystyle{ d }[/math] roots, so the probability that the algorithm is wrong is bounded as
- [math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]
By fixing [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] to be an arbitrary subset of size [math]\displaystyle{ |S|=2d }[/math], this probability of false positive is at most [math]\displaystyle{ 1/2 }[/math]. We can reduce it to an arbitrarily small constant [math]\displaystyle{ \delta }[/math] by repeat the above testing independently for [math]\displaystyle{ \log_2 \frac{1}{\delta} }[/math] times, since the error probability decays geometrically as we repeat the algorithm independently.
Schwartz-Zippel Lemma
Now let's consider the general multivariate case, where [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] is an [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math].
The following is a simple randomized algorithm for testing identity of multivariate polynomials:
Randomized algorithm for multivariate PIT - suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
- pick [math]\displaystyle{ r_1,r_2,\ldots,r_n\in S }[/math] uniformly and independently at random;
- if [math]\displaystyle{ f(\vec{r})=f(r_1,r_2,\ldots,r_n) = 0 }[/math] then return “yes” else return “no”;
This algorithm evaluates [math]\displaystyle{ f }[/math] at one point chosen uniformly from an [math]\displaystyle{ n }[/math]-dimensional cube [math]\displaystyle{ S^n }[/math], where [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] is a finite subset. And:
- If [math]\displaystyle{ f\equiv 0 }[/math], the algorithm always returns "yes", so it is always correct.
- If [math]\displaystyle{ f\not\equiv 0 }[/math], the algorithm may wrongly return "yes" (a false positive). But this happens only when the random [math]\displaystyle{ \vec{r}=(r_1,r_2,\ldots,r_n) }[/math] is a root of [math]\displaystyle{ f }[/math]. The probability of this bad event is upper bounded by the following famous result due to Schwartz (1980) and Zippel (1979).
Schwartz-Zippel Lemma - Let [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] be an [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math] over a field [math]\displaystyle{ \mathbb{F} }[/math]. If [math]\displaystyle{ f\not\equiv 0 }[/math], then for any finite set [math]\displaystyle{ S\subset\mathbb{F} }[/math], and [math]\displaystyle{ r_1,r_2\ldots,r_n\in S }[/math] chosen uniformly and independently at random,
- [math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}. }[/math]
- Let [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] be an [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math] over a field [math]\displaystyle{ \mathbb{F} }[/math]. If [math]\displaystyle{ f\not\equiv 0 }[/math], then for any finite set [math]\displaystyle{ S\subset\mathbb{F} }[/math], and [math]\displaystyle{ r_1,r_2\ldots,r_n\in S }[/math] chosen uniformly and independently at random,
The Schwartz-Zippel lemma states that for any nonzero [math]\displaystyle{ n }[/math]-variate polynomial of degree at most [math]\displaystyle{ d }[/math], the number of roots in any cube [math]\displaystyle{ S^n }[/math] is at most [math]\displaystyle{ d\cdot |S|^{n-1} }[/math]. And such algebraic proposition enjoins a probabilistic proof which uses no more than the law of total probability.
A naive thought for proving the Schwartz-Zippel lemma is to express the multivariate polynomial [math]\displaystyle{ f(r_1,r_2,\ldots,r_n)=g_{r_1,r_2,\ldots,r_{n-1}}(r_n) }[/math] as a univariate polynomial [math]\displaystyle{ g_{r_1,r_2,\ldots,r_{n-1}} }[/math] on the [math]\displaystyle{ {n} }[/math]-th variable [math]\displaystyle{ r_{n} }[/math], where [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(\cdot)=f(r_1,\ldots,r_{n-1},\cdot) }[/math] is a univariate polynomial "parameterized" by the random values [math]\displaystyle{ r_1,r_2,\ldots,r_{n-1} }[/math].
Then, the argument for the univariate case can apply, and easily we have
- [math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0]=\Pr[g_{r_1,r_2,\ldots,r_{n-1}}(r_n)=0]\le\frac{d}{|S|}. }[/math]
A fatal flaw of this "proof" is that when the first [math]\displaystyle{ {n-1} }[/math] variables are assigned with random values [math]\displaystyle{ r_1,r_2,\ldots,r_{n-1} }[/math], the univariate polynomial [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(\cdot)=f(r_1,\ldots,r_{n-1},\cdot) }[/math] may no longer always satisfy that [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math], which is required by polynomial interpolation.
And this is precisely where the law of total probability kicks in, to separate the case where [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math] from the case where [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\equiv 0 }[/math].
Proof. The theorem is proved by induction on [math]\displaystyle{ n }[/math].
- Induction basis:
For [math]\displaystyle{ n=1 }[/math], this is the univariate case. Assume that [math]\displaystyle{ f\not\equiv 0 }[/math]. Due to the fundamental theorem of algebra, any polynomial [math]\displaystyle{ f(x) }[/math] of degree at most [math]\displaystyle{ d }[/math] must have at most [math]\displaystyle{ d }[/math] roots, thus
- [math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]
- Induction hypothesis:
Assume the theorem holds for any [math]\displaystyle{ m }[/math]-variate polynomials for all [math]\displaystyle{ m\lt n }[/math].
- Induction step:
For any [math]\displaystyle{ n }[/math]-variate polynomial [math]\displaystyle{ f(x_1,x_2,\ldots,x_n) }[/math] of degree at most [math]\displaystyle{ d }[/math], write [math]\displaystyle{ f }[/math] as
- [math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1}) }[/math],
where [math]\displaystyle{ k }[/math] is the highest degree of [math]\displaystyle{ x_n }[/math]. Note that the degree of [math]\displaystyle{ f_k }[/math] is at most [math]\displaystyle{ d-k }[/math] and [math]\displaystyle{ f_k\not\equiv 0 }[/math] (otherwise [math]\displaystyle{ k }[/math] wouldn't be the highest degree of [math]\displaystyle{ x_n }[/math]).
For short, express [math]\displaystyle{ f }[/math] as:
- [math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n) }[/math],
where both [math]\displaystyle{ f_k }[/math] and [math]\displaystyle{ \bar{f} }[/math] are polynomials, such that
- [math]\displaystyle{ f_k\not\equiv 0 }[/math] and has degree at most [math]\displaystyle{ d-k }[/math];
- [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1}) }[/math], thus [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^{k} }[/math] factor in any term.
By the law of total probability,
- [math]\displaystyle{ \begin{align} \Pr[f(r_1,r_2,\ldots,r_n)=0] = &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\ &+\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})\neq0]. \end{align} }[/math]
Since [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1}) }[/math] is an [math]\displaystyle{ (n-1) }[/math]-variate polynomial of degree at most [math]\displaystyle{ d-k }[/math] satisfying that [math]\displaystyle{ f_k\not\equiv 0 }[/math], by the induction hypothesis,
- [math]\displaystyle{ \begin{align} (\text{case-I}) &\qquad &\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. \end{align} }[/math]
Now look at the case with condition [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math]. Recall that [math]\displaystyle{ \bar{f}(x_1,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^k }[/math] factor in any term, thus the condition [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math] implies that
- [math]\displaystyle{ f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n) }[/math]
when being interpreted as a univariate polynomial [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }[/math] on variable [math]\displaystyle{ x_n }[/math], always has that [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math] and the degree of [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }[/math] is at most [math]\displaystyle{ k }[/math], and thus due to the argument for the univariate case, the probability [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(r_n)=0 }[/math] is at most [math]\displaystyle{ \frac{k}{|S|} }[/math].
Therefore,
- [math]\displaystyle{ \begin{align} (\text{case-II}) &\qquad &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|} \end{align} }[/math].
Applying the inequalities of (case-I) and (case-II) in the total probability above, gives
- [math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0] \le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}. }[/math]
- [math]\displaystyle{ \square }[/math]