概率论与数理统计 (Spring 2023)/Polynomial identity testing

In the Polynomial Identity Testing (PIT) problem, we are given two polynomials, and we want to determine whether they are identical.

Polynomial Identity Testing

Input: two [math]\displaystyle{ n }[/math]-variate polynomials [math]\displaystyle{ f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
Determine whether [math]\displaystyle{ f\equiv g }[/math].

The [math]\displaystyle{ \mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] denotes the ring of multivariate polynomials over field [math]\displaystyle{ \mathbb{F} }[/math]. An [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math], written as a sum of monomials, is:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math].

The degree or total degree of a monomial [math]\displaystyle{ a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math] is given by [math]\displaystyle{ i_1+i_2+\cdots+i_n }[/math] and the degree of a polynomial [math]\displaystyle{ f }[/math] is the maximum degree of monomials of nonzero coefficients.

This problem is equivalent to the following problem of checking whether [math]\displaystyle{ f-g }[/math] (which is a polynomial of degree at most [math]\displaystyle{ d }[/math]) is the zero-polynomial.

Polynomial Identity Testing (equivalent version)

Input: a polynomial [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
Determine whether [math]\displaystyle{ f\equiv 0 }[/math].

The problem seems to be solvable by checking whether all coefficients of monomials in [math]\displaystyle{ f }[/math] are zeros. And there are at most [math]\displaystyle{ (n+d)^{d} }[/math] coefficients since [math]\displaystyle{ f }[/math] is an [math]\displaystyle{ n }[/math]-variate polynomial of degree at most [math]\displaystyle{ d }[/math]. However, this relies on that all coefficients of [math]\displaystyle{ f }[/math] are explicitly given, which may not always be the case.

For example, a multivariate polynomial [math]\displaystyle{ f }[/math] may be expressed in a product form.

Example

The Vandermonde matrix [math]\displaystyle{ M=M(x_1,x_2,\ldots,x_n) }[/math] is defined as that [math]\displaystyle{ M_{ij}=x_i^{j-1} }[/math], that is

[math]\displaystyle{ M=\begin{bmatrix} 1 & x_1 & x_1^2 & \dots & x_1^{n-1}\\ 1 & x_2 & x_2^2 & \dots & x_2^{n-1}\\ 1 & x_3 & x_3^2 & \dots & x_3^{n-1}\\ \vdots & \vdots & \vdots & \ddots &\vdots \\ 1 & x_n & x_n^2 & \dots & x_n^{n-1} \end{bmatrix} }[/math].

Let [math]\displaystyle{ f }[/math] be the polynomial defined as

[math]\displaystyle{ f(x_1,\ldots,x_n)=\det(M)=\prod_{j\lt i}(x_i-x_j). }[/math]

For polynomials [math]\displaystyle{ f }[/math] in product form, it is quite possible that evaluating the polynomial [math]\displaystyle{ f(x_1,\ldots,x_n) }[/math] at any [math]\displaystyle{ (x_1,\ldots,x_n)\in\mathbb{F}^n }[/math] is always efficient but symbolically expanding the polynomial to calculate the coefficients may be very expensive.

We assume that the input polynomial [math]\displaystyle{ f }[/math] is given as a block-box or in product form, so that the evaluation of [math]\displaystyle{ f(x_1,\ldots,x_n) }[/math] at any specific [math]\displaystyle{ (x_1,\ldots,x_n)\in\mathbb{F}^n }[/math] is efficient.

Polynomial Interpolation

First, let's consider the univariate case where [math]\displaystyle{ n=1 }[/math]. In such case [math]\displaystyle{ f(x) }[/math] is a univariate polynomial of degree at most [math]\displaystyle{ d }[/math].

A straightforward deterministic algorithm is to evaluate [math]\displaystyle{ f(x_1),f(x_2),\ldots,f(x_{d+1}) }[/math] over [math]\displaystyle{ d+1 }[/math] distinct elements [math]\displaystyle{ x_1,x_2,\ldots,x_{d+1} }[/math] from the field [math]\displaystyle{ \mathbb{F} }[/math] and check whether they are all zero. By the fundamental theorem of algebra, also known as polynomial interpolation, this guarantees to answer correctly whether a degree-[math]\displaystyle{ d }[/math] univariate polynomial [math]\displaystyle{ f\equiv 0 }[/math].

Fundamental Theorem of Algebra

Any non-zero univariate polynomial of degree [math]\displaystyle{ d }[/math] has at most [math]\displaystyle{ d }[/math] roots.

The reason for this fundamental theorem holding generally over any field [math]\displaystyle{ \mathbb{F} }[/math] is that any univariate polynomial of degree [math]\displaystyle{ d }[/math] factors uniquely into at most [math]\displaystyle{ d }[/math] irreducible polynomials, each of which has at most one root.

The following simple randomized algorithm is natural:

Randomized algorithm for univariate PIT

suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
pick [math]\displaystyle{ r\in S }[/math] uniformly at random;
if [math]\displaystyle{ f(r) = 0 }[/math] then return “yes” else return “no”;

This algorithm evaluates [math]\displaystyle{ f }[/math] at one point chosen uniformly at random from a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math]. It is easy to see the followings:

If [math]\displaystyle{ f\equiv 0 }[/math], the algorithm always returns "yes", so it is always correct.
If [math]\displaystyle{ f\not\equiv 0 }[/math], the algorithm may wrongly return "yes" (a false positive). But this happens only when the random [math]\displaystyle{ r }[/math] is a root of [math]\displaystyle{ f }[/math]. By the fundamental theorem of algebra, [math]\displaystyle{ f }[/math] has at most [math]\displaystyle{ d }[/math] roots, so the probability that the algorithm is wrong is bounded as

[math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]

By fixing [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] to be an arbitrary subset of size [math]\displaystyle{ |S|=2d }[/math], this probability of false positive is at most [math]\displaystyle{ 1/2 }[/math]. We can reduce it to an arbitrarily small constant [math]\displaystyle{ \delta }[/math] by repeat the above testing independently for [math]\displaystyle{ \log_2 \frac{1}{\delta} }[/math] times, since the error probability decays geometrically as we repeat the algorithm independently.

Schwartz-Zippel Lemma

Now let's consider the general multivariate case, where [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] is an [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math].

The following is a simple randomized algorithm for testing identity of multivariate polynomials:

Randomized algorithm for multivariate PIT

suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
pick [math]\displaystyle{ r_1,r_2,\ldots,r_n\in S }[/math] uniformly and independently at random;
if [math]\displaystyle{ f(\vec{r})=f(r_1,r_2,\ldots,r_n) = 0 }[/math] then return “yes” else return “no”;

This algorithm evaluates [math]\displaystyle{ f }[/math] at one point chosen uniformly from an [math]\displaystyle{ n }[/math]-dimensional cube [math]\displaystyle{ S^n }[/math], where [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] is a finite subset. And:

If [math]\displaystyle{ f\equiv 0 }[/math], the algorithm always returns "yes", so it is always correct.
If [math]\displaystyle{ f\not\equiv 0 }[/math], the algorithm may wrongly return "yes" (a false positive). But this happens only when the random [math]\displaystyle{ \vec{r}=(r_1,r_2,\ldots,r_n) }[/math] is a root of [math]\displaystyle{ f }[/math]. The probability of this bad event is upper bounded by the following famous result due to Schwartz (1980) and Zippel (1979).

Schwartz-Zippel Lemma

Let [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] be an [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math] over a field [math]\displaystyle{ \mathbb{F} }[/math]. If [math]\displaystyle{ f\not\equiv 0 }[/math], then for any finite set [math]\displaystyle{ S\subset\mathbb{F} }[/math], and [math]\displaystyle{ r_1,r_2\ldots,r_n\in S }[/math] chosen uniformly and independently at random,

[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}. }[/math]

The Schwartz-Zippel lemma states that for any nonzero [math]\displaystyle{ n }[/math]-variate polynomial of degree at most [math]\displaystyle{ d }[/math], the number of roots in any cube [math]\displaystyle{ S^n }[/math] is at most [math]\displaystyle{ d\cdot |S|^{n-1} }[/math]. And such algebraic proposition enjoins a probabilistic proof which uses no more than the law of total probability.

A naive thought for proving the Schwartz-Zippel lemma is to express the multivariate polynomial [math]\displaystyle{ f(r_1,r_2,\ldots,r_n)=g_{r_1,r_2,\ldots,r_{n-1}}(r_n) }[/math] as a univariate polynomial [math]\displaystyle{ g_{r_1,r_2,\ldots,r_{n-1}} }[/math] on the [math]\displaystyle{ {n} }[/math]-th variable [math]\displaystyle{ r_{n} }[/math], where [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(\cdot)=f(r_1,\ldots,r_{n-1},\cdot) }[/math] is a univariate polynomial "parameterized" by the random values [math]\displaystyle{ r_1,r_2,\ldots,r_{n-1} }[/math].

Then, the argument for the univariate case can apply, and easily we have

[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0]=\Pr[g_{r_1,r_2,\ldots,r_{n-1}}(r_n)=0]\le\frac{d}{|S|}. }[/math]

A fatal flaw of this "proof" is that when the first [math]\displaystyle{ {n-1} }[/math] variables are assigned with random values [math]\displaystyle{ r_1,r_2,\ldots,r_{n-1} }[/math], the univariate polynomial [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(\cdot)=f(r_1,\ldots,r_{n-1},\cdot) }[/math] may no longer always satisfy that [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math], which is required by polynomial interpolation.

And this is precisely where the law of total probability kicks in, to separate the case where [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math] from the case where [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\equiv 0 }[/math].

Proof.

The theorem is proved by induction on [math]\displaystyle{ n }[/math].

Induction basis:

For [math]\displaystyle{ n=1 }[/math], this is the univariate case. Assume that [math]\displaystyle{ f\not\equiv 0 }[/math]. Due to the fundamental theorem of algebra, any polynomial [math]\displaystyle{ f(x) }[/math] of degree at most [math]\displaystyle{ d }[/math] must have at most [math]\displaystyle{ d }[/math] roots, thus

[math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]

Induction hypothesis:

Assume the theorem holds for any [math]\displaystyle{ m }[/math]-variate polynomials for all [math]\displaystyle{ m\lt n }[/math].

Induction step:

For any [math]\displaystyle{ n }[/math]-variate polynomial [math]\displaystyle{ f(x_1,x_2,\ldots,x_n) }[/math] of degree at most [math]\displaystyle{ d }[/math], write [math]\displaystyle{ f }[/math] as

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1}) }[/math],

where [math]\displaystyle{ k }[/math] is the highest degree of [math]\displaystyle{ x_n }[/math]. Note that the degree of [math]\displaystyle{ f_k }[/math] is at most [math]\displaystyle{ d-k }[/math] and [math]\displaystyle{ f_k\not\equiv 0 }[/math] (otherwise [math]\displaystyle{ k }[/math] wouldn't be the highest degree of [math]\displaystyle{ x_n }[/math]).

For short, express [math]\displaystyle{ f }[/math] as:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n) }[/math],

where both [math]\displaystyle{ f_k }[/math] and [math]\displaystyle{ \bar{f} }[/math] are polynomials, such that

[math]\displaystyle{ f_k\not\equiv 0 }[/math] and has degree at most [math]\displaystyle{ d-k }[/math];
[math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1}) }[/math], thus [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^{k} }[/math] factor in any term.

By the law of total probability,

[math]\displaystyle{ \begin{align} \Pr[f(r_1,r_2,\ldots,r_n)=0] = &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\ &+\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})\neq0]. \end{align} }[/math]

Since [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1}) }[/math] is an [math]\displaystyle{ (n-1) }[/math]-variate polynomial of degree at most [math]\displaystyle{ d-k }[/math] satisfying that [math]\displaystyle{ f_k\not\equiv 0 }[/math], by the induction hypothesis,

[math]\displaystyle{ \begin{align} (\text{case-I}) &\qquad &\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. \end{align} }[/math]

Now look at the case with condition [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math]. Recall that [math]\displaystyle{ \bar{f}(x_1,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^k }[/math] factor in any term, thus the condition [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math] implies that

[math]\displaystyle{ f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n) }[/math]

when being interpreted as a univariate polynomial [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }[/math] on variable [math]\displaystyle{ x_n }[/math], always has that [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math] and the degree of [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }[/math] is at most [math]\displaystyle{ k }[/math], and thus due to the argument for the univariate case, the probability [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(r_n)=0 }[/math] is at most [math]\displaystyle{ \frac{k}{|S|} }[/math].

Therefore,

[math]\displaystyle{ \begin{align} (\text{case-II}) &\qquad &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|} \end{align} }[/math].

Applying the inequalities of (case-I) and (case-II) in the total probability above, gives

[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0] \le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}. }[/math]

[math]\displaystyle{ \square }[/math]

概率论与数理统计 (Spring 2023)/Polynomial identity testing

Polynomial Interpolation

Schwartz-Zippel Lemma

Navigation menu

Search