概率论与数理统计 (Spring 2023)/Polynomial identity testing: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
Line 55: Line 55:


== Schwartz-Zippel Lemma ==
== Schwartz-Zippel Lemma ==
Now let's consider the general multivariate case, where .  
Now let's consider the general multivariate case, where <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> is an <math>n</math>-variate polynomial of degree <math>d</math>.


The following is a simple randomized algorithm for testing identity of multivariate polynomials:
The following is a simple randomized algorithm for testing identity of multivariate polynomials:

Revision as of 17:13, 11 March 2023

In the Polynomial Identity Testing (PIT) problem, we are given two polynomials, and we want to determine whether they are identical.

  • Input: two [math]\displaystyle{ n }[/math]-variate polynomials [math]\displaystyle{ f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv g }[/math].

The [math]\displaystyle{ \mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] denotes the ring of multivariate polynomials over field [math]\displaystyle{ \mathbb{F} }[/math]. An [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math], written as a sum of monomials, is:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math].

The degree or total degree of a monomial [math]\displaystyle{ a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math] is given by [math]\displaystyle{ i_1+i_2+\cdots+i_n }[/math] and the degree of a polynomial [math]\displaystyle{ f }[/math] is the maximum degree of monomials of nonzero coefficients.

This problem is equivalent to the following problem of checking whether [math]\displaystyle{ f-g }[/math] (which is a polynomial of degree at most [math]\displaystyle{ d }[/math]) is the zero-polynomial.

  • Input: a polynomial [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv 0 }[/math].

The problem seems to be solvable by checking whether all coefficients of monomials in [math]\displaystyle{ f }[/math] are zeros. And there are at most [math]\displaystyle{ (n+d)^{d} }[/math] coefficients since [math]\displaystyle{ f }[/math] is an [math]\displaystyle{ n }[/math]-variate polynomial of degree at most [math]\displaystyle{ d }[/math]. However, this relies on that all coefficients of [math]\displaystyle{ f }[/math] are explicitly given, which may not always be the case.

A multivariate polynomial [math]\displaystyle{ f }[/math] can also be expressed in a product form, for example:

Example

The Vandermonde matrix [math]\displaystyle{ M=M(x_1,x_2,\ldots,x_n) }[/math] is defined as that [math]\displaystyle{ M_{ij}=x_i^{j-1} }[/math], that is

[math]\displaystyle{ M=\begin{bmatrix} 1 & x_1 & x_1^2 & \dots & x_1^{n-1}\\ 1 & x_2 & x_2^2 & \dots & x_2^{n-1}\\ 1 & x_3 & x_3^2 & \dots & x_3^{n-1}\\ \vdots & \vdots & \vdots & \ddots &\vdots \\ 1 & x_n & x_n^2 & \dots & x_n^{n-1} \end{bmatrix} }[/math].

Let [math]\displaystyle{ f }[/math] be the polynomial defined as

[math]\displaystyle{ f(x_1,\ldots,x_n)=\det(M)=\prod_{j\lt i}(x_i-x_j). }[/math]

For polynomials in product form, it is quite possible that evaluating the polynomial is always efficient but symbolically expanding the polynomial may be very expensive.

We assume that the input polynomial [math]\displaystyle{ f }[/math] is given as a block-box or in product form, so that the evaluation of [math]\displaystyle{ f(x_1,\ldots,x_n) }[/math] at any specific [math]\displaystyle{ (x_1,\ldots,x_n)\in\mathbb{F}^n }[/math] is efficient.

Polynomial Interpolation

First, let's consider the univariate case where [math]\displaystyle{ n=1 }[/math].

A straightforward deterministic algorithm is to evaluate [math]\displaystyle{ f(x_1),f(x_2),\ldots,f(x_{d+1}) }[/math] over [math]\displaystyle{ d+1 }[/math] distinct elements [math]\displaystyle{ x_1,x_2,\ldots,x_{d+1} }[/math] from the field [math]\displaystyle{ \mathbb{F} }[/math] and check whether they are all zero. By the fundamental theorem of algebra, also known as polynomial interpolations, this guarantees to answer correctly whether a degree-[math]\displaystyle{ d }[/math] univariate polynomial [math]\displaystyle{ f\equiv 0 }[/math].

Fundamental Theorem of Algebra
Any non-zero univariate polynomial of degree [math]\displaystyle{ d }[/math] has at most [math]\displaystyle{ d }[/math] roots.

The reason for this fundamental theorem holding generally over any field [math]\displaystyle{ \mathbb{F} }[/math] is that any univariate polynomial of degree [math]\displaystyle{ d }[/math] factors uniquely into at most [math]\displaystyle{ d }[/math] irreducible polynomials, each of which has at most one root.

The following simple randomized algorithm is natural:

Randomized algorithm for univariate PIT
  • suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
  • pick [math]\displaystyle{ r\in S }[/math] uniformly at random;
  • if [math]\displaystyle{ f(r) = 0 }[/math] then return “yes” else return “no”;

This algorithm evaluates [math]\displaystyle{ f }[/math] at one point chosen uniformly at random from a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math]. It is easy to see the followings:

  • If [math]\displaystyle{ f\equiv 0 }[/math], the algorithm always returns "yes", so it is always correct.
  • If [math]\displaystyle{ f\not\equiv 0 }[/math], the algorithm may wrongly return "yes" (a false positive). But this happens only when the random [math]\displaystyle{ r }[/math] is a root of [math]\displaystyle{ f }[/math]. By the fundamental theorem of algebra, [math]\displaystyle{ f }[/math] has at most [math]\displaystyle{ d }[/math] roots, so the probability that the algorithm is wrong is bounded as
[math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]

By fixing [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] to be an arbitrary subset of size [math]\displaystyle{ |S|=2d }[/math], this probability of false positive is at most [math]\displaystyle{ 1/2 }[/math]. We can reduce it to an arbitrarily small constant [math]\displaystyle{ \delta }[/math] by repeat the above testing independently for [math]\displaystyle{ \log_2 \frac{1}{\delta} }[/math] times, since the error probability decays geometrically as we repeat the algorithm independently.

Schwartz-Zippel Lemma

Now let's consider the general multivariate case, where [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] is an [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math].

The following is a simple randomized algorithm for testing identity of multivariate polynomials:

Randomized algorithm for multivariate PIT
  • suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
  • pick [math]\displaystyle{ r_1,r_2,\ldots,r_n\in S }[/math] uniformly and independently at random;
  • if [math]\displaystyle{ f(\vec{r})=f(r_1,r_2,\ldots,r_n) = 0 }[/math] then return “yes” else return “no”;

This algorithm evaluates [math]\displaystyle{ f }[/math] at one point chosen uniformly from an [math]\displaystyle{ n }[/math]-dimensional cube [math]\displaystyle{ S^n }[/math], where [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] is a finite subset. And:

  • If [math]\displaystyle{ f\equiv 0 }[/math], the algorithm always returns "yes", so it is always correct.
  • If [math]\displaystyle{ f\not\equiv 0 }[/math], the algorithm may wrongly return "yes" (a false positive). But this happens only when the random [math]\displaystyle{ \vec{r}=(r_1,r_2,\ldots,r_n) }[/math] is a root of [math]\displaystyle{ f }[/math]. The probability of this bad event is upper bounded by the following famous result due to Schwartz (1980) and Zippel (1979).
Schwartz-Zippel Theorem
Let [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] be a multivariate polynomial of degree [math]\displaystyle{ d }[/math] over a field [math]\displaystyle{ \mathbb{F} }[/math]. If [math]\displaystyle{ f\not\equiv 0 }[/math], then for any finite set [math]\displaystyle{ S\subset\mathbb{F} }[/math], and [math]\displaystyle{ r_1,r_2\ldots,r_n\in S }[/math] chosen uniformly and independently at random,
[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}. }[/math]

The Schwartz-Zippel Theorem states that for any nonzero [math]\displaystyle{ n }[/math]-variate polynomial of degree at most [math]\displaystyle{ d }[/math], the number of roots in any cube [math]\displaystyle{ S^n }[/math] is at most [math]\displaystyle{ d\cdot |S|^{n-1} }[/math].

Dana Moshkovitz gave a surprisingly simply and elegant proof of Schwartz-Zippel Theorem, using some advanced ideas. Now we introduce the standard proof by induction.

Proof.

The theorem is proved by induction on [math]\displaystyle{ n }[/math].

Induction basis:

For [math]\displaystyle{ n=1 }[/math], this is the univariate case. Assume that [math]\displaystyle{ f\not\equiv 0 }[/math]. Due to the fundamental theorem of algebra, any polynomial [math]\displaystyle{ f(x) }[/math] of degree at most [math]\displaystyle{ d }[/math] must have at most [math]\displaystyle{ d }[/math] roots, thus

[math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]
Induction hypothesis:

Assume the theorem holds for any [math]\displaystyle{ m }[/math]-variate polynomials for all [math]\displaystyle{ m\lt n }[/math].

Induction step:

For any [math]\displaystyle{ n }[/math]-variate polynomial [math]\displaystyle{ f(x_1,x_2,\ldots,x_n) }[/math] of degree at most [math]\displaystyle{ d }[/math], we write [math]\displaystyle{ f }[/math] as

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1}) }[/math],

where [math]\displaystyle{ k }[/math] is the highest degree of [math]\displaystyle{ x_n }[/math], which means the degree of [math]\displaystyle{ f_k }[/math] is at most [math]\displaystyle{ d-k }[/math] and [math]\displaystyle{ f_k\not\equiv 0 }[/math].

In particular, we write [math]\displaystyle{ f }[/math] as a sum of two parts:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n) }[/math],

where both [math]\displaystyle{ f_k }[/math] and [math]\displaystyle{ \bar{f} }[/math] are polynomials, such that

  • [math]\displaystyle{ f_k\not\equiv 0 }[/math] is as above, whose degree is at most [math]\displaystyle{ d-k }[/math];
  • [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1}) }[/math], thus [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^{k} }[/math] factor in any term.

By the law of total probability, we have

[math]\displaystyle{ \begin{align} &\Pr[f(r_1,r_2,\ldots,r_n)=0]\\ = &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\ &+\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})\neq0]. \end{align} }[/math]

Note that [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1}) }[/math] is a polynomial on [math]\displaystyle{ n-1 }[/math] variables of degree [math]\displaystyle{ d-k }[/math] such that [math]\displaystyle{ f_k\not\equiv 0 }[/math]. By the induction hypothesis, we have

[math]\displaystyle{ \begin{align} (*) &\qquad &\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. \end{align} }[/math]

Now we look at the case conditioning on [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math]. Recall that [math]\displaystyle{ \bar{f}(x_1,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^k }[/math] factor in any term, thus the condition [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math] guarantees that

[math]\displaystyle{ f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n) }[/math]

is a nonzero univariate polynomial of [math]\displaystyle{ x_n }[/math] such that the degree of [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }[/math] is [math]\displaystyle{ k }[/math] and [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math], for which we already known that the probability [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(r_n)=0 }[/math] is at most [math]\displaystyle{ \frac{k}{|S|} }[/math]. Therefore,

[math]\displaystyle{ \begin{align} (**) &\qquad &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|} \end{align} }[/math].

Substituting both [math]\displaystyle{ (*) }[/math] and [math]\displaystyle{ (**) }[/math] back in the total probability, we have

[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0] \le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}, }[/math]

which proves the theorem.

[math]\displaystyle{ \square }[/math]