概率论与数理统计 (Spring 2023)/Polynomial identity testing: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
Line 38: Line 38:
First, let's consider the '''univariate''' case where <math>n=1</math>. In such case <math>f(x)</math> is a univariate polynomial of degree at most <math>d</math>.
First, let's consider the '''univariate''' case where <math>n=1</math>. In such case <math>f(x)</math> is a univariate polynomial of degree at most <math>d</math>.


A straightforward deterministic algorithm is to evaluate <math>f(x_1),f(x_2),\ldots,f(x_{d+1})</math> over <math>d+1</math> ''distinct'' elements <math>x_1,x_2,\ldots,x_{d+1}</math> from the field <math>\mathbb{F}</math> and check whether they are all zero. By the [https://en.wikipedia.org/wiki/Fundamental_theorem_of_algebra fundamental theorem of algebra], also known as polynomial interpolations, this guarantees to answer correctly whether a degree-<math>d</math> univariate polynomial <math>f\equiv 0</math>.
A straightforward deterministic algorithm is to evaluate <math>f(x_1),f(x_2),\ldots,f(x_{d+1})</math> over <math>d+1</math> ''distinct'' elements <math>x_1,x_2,\ldots,x_{d+1}</math> from the field <math>\mathbb{F}</math> and check whether they are all zero. By the [https://en.wikipedia.org/wiki/Fundamental_theorem_of_algebra fundamental theorem of algebra], also known as '''polynomial interpolation''', this guarantees to answer correctly whether a degree-<math>d</math> univariate polynomial <math>f\equiv 0</math>.
{{Theorem|Fundamental Theorem of Algebra|
{{Theorem|Fundamental Theorem of Algebra|
:Any non-zero univariate polynomial of degree <math>d</math> has at most <math>d</math> roots.
:Any non-zero univariate polynomial of degree <math>d</math> has at most <math>d</math> roots.

Revision as of 02:44, 12 March 2023

In the Polynomial Identity Testing (PIT) problem, we are given two polynomials, and we want to determine whether they are identical.

Polynomial Identity Testing
  • Input: two [math]\displaystyle{ n }[/math]-variate polynomials [math]\displaystyle{ f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv g }[/math].

The [math]\displaystyle{ \mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] denotes the ring of multivariate polynomials over field [math]\displaystyle{ \mathbb{F} }[/math]. An [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math], written as a sum of monomials, is:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math].

The degree or total degree of a monomial [math]\displaystyle{ a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math] is given by [math]\displaystyle{ i_1+i_2+\cdots+i_n }[/math] and the degree of a polynomial [math]\displaystyle{ f }[/math] is the maximum degree of monomials of nonzero coefficients.

This problem is equivalent to the following problem of checking whether [math]\displaystyle{ f-g }[/math] (which is a polynomial of degree at most [math]\displaystyle{ d }[/math]) is the zero-polynomial.

Polynomial Identity Testing (equivalent version)
  • Input: a polynomial [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv 0 }[/math].

The problem seems to be solvable by checking whether all coefficients of monomials in [math]\displaystyle{ f }[/math] are zeros. And there are at most [math]\displaystyle{ (n+d)^{d} }[/math] coefficients since [math]\displaystyle{ f }[/math] is an [math]\displaystyle{ n }[/math]-variate polynomial of degree at most [math]\displaystyle{ d }[/math]. However, this relies on that all coefficients of [math]\displaystyle{ f }[/math] are explicitly given, which may not always be the case.

For example, a multivariate polynomial [math]\displaystyle{ f }[/math] may be expressed in a product form.

Example

The Vandermonde matrix [math]\displaystyle{ M=M(x_1,x_2,\ldots,x_n) }[/math] is defined as that [math]\displaystyle{ M_{ij}=x_i^{j-1} }[/math], that is

[math]\displaystyle{ M=\begin{bmatrix} 1 & x_1 & x_1^2 & \dots & x_1^{n-1}\\ 1 & x_2 & x_2^2 & \dots & x_2^{n-1}\\ 1 & x_3 & x_3^2 & \dots & x_3^{n-1}\\ \vdots & \vdots & \vdots & \ddots &\vdots \\ 1 & x_n & x_n^2 & \dots & x_n^{n-1} \end{bmatrix} }[/math].

Let [math]\displaystyle{ f }[/math] be the polynomial defined as

[math]\displaystyle{ f(x_1,\ldots,x_n)=\det(M)=\prod_{j\lt i}(x_i-x_j). }[/math]

For polynomials [math]\displaystyle{ f }[/math] in product form, it is quite possible that evaluating the polynomial [math]\displaystyle{ f(x_1,\ldots,x_n) }[/math] at any [math]\displaystyle{ (x_1,\ldots,x_n)\in\mathbb{F}^n }[/math] is always efficient but symbolically expanding the polynomial to calculate the coefficients may be very expensive.

We assume that the input polynomial [math]\displaystyle{ f }[/math] is given as a block-box or in product form, so that the evaluation of [math]\displaystyle{ f(x_1,\ldots,x_n) }[/math] at any specific [math]\displaystyle{ (x_1,\ldots,x_n)\in\mathbb{F}^n }[/math] is efficient.

Polynomial Interpolation

First, let's consider the univariate case where [math]\displaystyle{ n=1 }[/math]. In such case [math]\displaystyle{ f(x) }[/math] is a univariate polynomial of degree at most [math]\displaystyle{ d }[/math].

A straightforward deterministic algorithm is to evaluate [math]\displaystyle{ f(x_1),f(x_2),\ldots,f(x_{d+1}) }[/math] over [math]\displaystyle{ d+1 }[/math] distinct elements [math]\displaystyle{ x_1,x_2,\ldots,x_{d+1} }[/math] from the field [math]\displaystyle{ \mathbb{F} }[/math] and check whether they are all zero. By the fundamental theorem of algebra, also known as polynomial interpolation, this guarantees to answer correctly whether a degree-[math]\displaystyle{ d }[/math] univariate polynomial [math]\displaystyle{ f\equiv 0 }[/math].

Fundamental Theorem of Algebra
Any non-zero univariate polynomial of degree [math]\displaystyle{ d }[/math] has at most [math]\displaystyle{ d }[/math] roots.

The reason for this fundamental theorem holding generally over any field [math]\displaystyle{ \mathbb{F} }[/math] is that any univariate polynomial of degree [math]\displaystyle{ d }[/math] factors uniquely into at most [math]\displaystyle{ d }[/math] irreducible polynomials, each of which has at most one root.

The following simple randomized algorithm is natural:

Randomized algorithm for univariate PIT
  • suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
  • pick [math]\displaystyle{ r\in S }[/math] uniformly at random;
  • if [math]\displaystyle{ f(r) = 0 }[/math] then return “yes” else return “no”;

This algorithm evaluates [math]\displaystyle{ f }[/math] at one point chosen uniformly at random from a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math]. It is easy to see the followings:

  • If [math]\displaystyle{ f\equiv 0 }[/math], the algorithm always returns "yes", so it is always correct.
  • If [math]\displaystyle{ f\not\equiv 0 }[/math], the algorithm may wrongly return "yes" (a false positive). But this happens only when the random [math]\displaystyle{ r }[/math] is a root of [math]\displaystyle{ f }[/math]. By the fundamental theorem of algebra, [math]\displaystyle{ f }[/math] has at most [math]\displaystyle{ d }[/math] roots, so the probability that the algorithm is wrong is bounded as
[math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]

By fixing [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] to be an arbitrary subset of size [math]\displaystyle{ |S|=2d }[/math], this probability of false positive is at most [math]\displaystyle{ 1/2 }[/math]. We can reduce it to an arbitrarily small constant [math]\displaystyle{ \delta }[/math] by repeat the above testing independently for [math]\displaystyle{ \log_2 \frac{1}{\delta} }[/math] times, since the error probability decays geometrically as we repeat the algorithm independently.

Schwartz-Zippel Lemma

Now let's consider the general multivariate case, where [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] is an [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math].

The following is a simple randomized algorithm for testing identity of multivariate polynomials:

Randomized algorithm for multivariate PIT
  • suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
  • pick [math]\displaystyle{ r_1,r_2,\ldots,r_n\in S }[/math] uniformly and independently at random;
  • if [math]\displaystyle{ f(\vec{r})=f(r_1,r_2,\ldots,r_n) = 0 }[/math] then return “yes” else return “no”;

This algorithm evaluates [math]\displaystyle{ f }[/math] at one point chosen uniformly from an [math]\displaystyle{ n }[/math]-dimensional cube [math]\displaystyle{ S^n }[/math], where [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] is a finite subset. And:

  • If [math]\displaystyle{ f\equiv 0 }[/math], the algorithm always returns "yes", so it is always correct.
  • If [math]\displaystyle{ f\not\equiv 0 }[/math], the algorithm may wrongly return "yes" (a false positive). But this happens only when the random [math]\displaystyle{ \vec{r}=(r_1,r_2,\ldots,r_n) }[/math] is a root of [math]\displaystyle{ f }[/math]. The probability of this bad event is upper bounded by the following famous result due to Schwartz (1980) and Zippel (1979).
Schwartz-Zippel Lemma
Let [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] be an [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math] over a field [math]\displaystyle{ \mathbb{F} }[/math]. If [math]\displaystyle{ f\not\equiv 0 }[/math], then for any finite set [math]\displaystyle{ S\subset\mathbb{F} }[/math], and [math]\displaystyle{ r_1,r_2\ldots,r_n\in S }[/math] chosen uniformly and independently at random,
[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}. }[/math]

The Schwartz-Zippel lemma states that for any nonzero [math]\displaystyle{ n }[/math]-variate polynomial of degree at most [math]\displaystyle{ d }[/math], the number of roots in any cube [math]\displaystyle{ S^n }[/math] is at most [math]\displaystyle{ d\cdot |S|^{n-1} }[/math]. And such algebraic proposition enjoins a probabilistic proof which uses no more than the law of total probability.

Proof.

The theorem is proved by induction on [math]\displaystyle{ n }[/math].

Induction basis:

For [math]\displaystyle{ n=1 }[/math], this is the univariate case. Assume that [math]\displaystyle{ f\not\equiv 0 }[/math]. Due to the fundamental theorem of algebra, any polynomial [math]\displaystyle{ f(x) }[/math] of degree at most [math]\displaystyle{ d }[/math] must have at most [math]\displaystyle{ d }[/math] roots, thus

[math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]
Induction hypothesis:

Assume the theorem holds for any [math]\displaystyle{ m }[/math]-variate polynomials for all [math]\displaystyle{ m\lt n }[/math].

Induction step:

For any [math]\displaystyle{ n }[/math]-variate polynomial [math]\displaystyle{ f(x_1,x_2,\ldots,x_n) }[/math] of degree at most [math]\displaystyle{ d }[/math], write [math]\displaystyle{ f }[/math] as

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1}) }[/math],

where [math]\displaystyle{ k }[/math] is the highest degree of [math]\displaystyle{ x_n }[/math]. Note that the degree of [math]\displaystyle{ f_k }[/math] is at most [math]\displaystyle{ d-k }[/math] and [math]\displaystyle{ f_k\not\equiv 0 }[/math] (otherwise [math]\displaystyle{ k }[/math] wouldn't be the highest degree of [math]\displaystyle{ x_n }[/math]).

For short, express [math]\displaystyle{ f }[/math] as:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n) }[/math],

where both [math]\displaystyle{ f_k }[/math] and [math]\displaystyle{ \bar{f} }[/math] are polynomials, such that

  • [math]\displaystyle{ f_k\not\equiv 0 }[/math] and has degree at most [math]\displaystyle{ d-k }[/math];
  • [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1}) }[/math], thus [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^{k} }[/math] factor in any term.

By the law of total probability,

[math]\displaystyle{ \begin{align} \Pr[f(r_1,r_2,\ldots,r_n)=0] = &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\ &+\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})\neq0]. \end{align} }[/math]

Since [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1}) }[/math] is an [math]\displaystyle{ (n-1) }[/math]-variate polynomial of degree at most [math]\displaystyle{ d-k }[/math] satisfying that [math]\displaystyle{ f_k\not\equiv 0 }[/math], by the induction hypothesis,

[math]\displaystyle{ \begin{align} (\text{case-I}) &\qquad &\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. \end{align} }[/math]

Now look at the case with condition [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math]. Recall that [math]\displaystyle{ \bar{f}(x_1,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^k }[/math] factor in any term, thus the condition [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math] implies that

[math]\displaystyle{ f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n) }[/math]

when being interpreted as a univariate polynomial [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }[/math] on variable [math]\displaystyle{ x_n }[/math], always has that [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math] and the degree of [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }[/math] is at most [math]\displaystyle{ k }[/math], and thus due to the argument for the univariate case, the probability [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(r_n)=0 }[/math] is at most [math]\displaystyle{ \frac{k}{|S|} }[/math].

Therefore,

[math]\displaystyle{ \begin{align} (\text{case-II}) &\qquad &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|} \end{align} }[/math].

Applying the inequalities of (case-I) and (case-II) in the total probability above, gives

[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0] \le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}. }[/math]
[math]\displaystyle{ \square }[/math]