# 概率论与数理统计 (Spring 2023)/Polynomial identity testing

In the Polynomial Identity Testing (PIT) problem, we are given two polynomials, and we want to determine whether they are identical.

 Polynomial Identity Testing Input: two $\displaystyle{ n }$-variate polynomials $\displaystyle{ f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n] }$ of degree $\displaystyle{ d }$. Determine whether $\displaystyle{ f\equiv g }$.

The $\displaystyle{ \mathbb{F}[x_1,x_2,\ldots,x_n] }$ denotes the ring of multivariate polynomials over field $\displaystyle{ \mathbb{F} }$. An $\displaystyle{ n }$-variate polynomial of degree $\displaystyle{ d }$, written as a sum of monomials, is:

$\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }$.

The degree or total degree of a monomial $\displaystyle{ a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }$ is given by $\displaystyle{ i_1+i_2+\cdots+i_n }$ and the degree of a polynomial $\displaystyle{ f }$ is the maximum degree of monomials of nonzero coefficients.

This problem is equivalent to the following problem of checking whether $\displaystyle{ f-g }$ (which is a polynomial of degree at most $\displaystyle{ d }$) is the zero-polynomial.

 Polynomial Identity Testing (equivalent version) Input: a polynomial $\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }$ of degree $\displaystyle{ d }$. Determine whether $\displaystyle{ f\equiv 0 }$.

The problem seems to be solvable by checking whether all coefficients of monomials in $\displaystyle{ f }$ are zeros. And there are at most $\displaystyle{ (n+d)^{d} }$ coefficients since $\displaystyle{ f }$ is an $\displaystyle{ n }$-variate polynomial of degree at most $\displaystyle{ d }$. However, this relies on that all coefficients of $\displaystyle{ f }$ are explicitly given, which may not always be the case.

For example, a multivariate polynomial $\displaystyle{ f }$ may be expressed in a product form.

 Example The Vandermonde matrix $\displaystyle{ M=M(x_1,x_2,\ldots,x_n) }$ is defined as that $\displaystyle{ M_{ij}=x_i^{j-1} }$, that is $\displaystyle{ M=\begin{bmatrix} 1 & x_1 & x_1^2 & \dots & x_1^{n-1}\\ 1 & x_2 & x_2^2 & \dots & x_2^{n-1}\\ 1 & x_3 & x_3^2 & \dots & x_3^{n-1}\\ \vdots & \vdots & \vdots & \ddots &\vdots \\ 1 & x_n & x_n^2 & \dots & x_n^{n-1} \end{bmatrix} }$. Let $\displaystyle{ f }$ be the polynomial defined as $\displaystyle{ f(x_1,\ldots,x_n)=\det(M)=\prod_{j\lt i}(x_i-x_j). }$

For polynomials $\displaystyle{ f }$ in product form, it is quite possible that evaluating the polynomial $\displaystyle{ f(x_1,\ldots,x_n) }$ at any $\displaystyle{ (x_1,\ldots,x_n)\in\mathbb{F}^n }$ is always efficient but symbolically expanding the polynomial to calculate the coefficients may be very expensive.

We assume that the input polynomial $\displaystyle{ f }$ is given as a block-box or in product form, so that the evaluation of $\displaystyle{ f(x_1,\ldots,x_n) }$ at any specific $\displaystyle{ (x_1,\ldots,x_n)\in\mathbb{F}^n }$ is efficient.

## Polynomial Interpolation

First, let's consider the univariate case where $\displaystyle{ n=1 }$. In such case $\displaystyle{ f(x) }$ is a univariate polynomial of degree at most $\displaystyle{ d }$.

A straightforward deterministic algorithm is to evaluate $\displaystyle{ f(x_1),f(x_2),\ldots,f(x_{d+1}) }$ over $\displaystyle{ d+1 }$ distinct elements $\displaystyle{ x_1,x_2,\ldots,x_{d+1} }$ from the field $\displaystyle{ \mathbb{F} }$ and check whether they are all zero. By the fundamental theorem of algebra, also known as polynomial interpolation, this guarantees to answer correctly whether a degree-$\displaystyle{ d }$ univariate polynomial $\displaystyle{ f\equiv 0 }$.

 Fundamental Theorem of Algebra Any non-zero univariate polynomial of degree $\displaystyle{ d }$ has at most $\displaystyle{ d }$ roots.

The reason for this fundamental theorem holding generally over any field $\displaystyle{ \mathbb{F} }$ is that any univariate polynomial of degree $\displaystyle{ d }$ factors uniquely into at most $\displaystyle{ d }$ irreducible polynomials, each of which has at most one root.

The following simple randomized algorithm is natural:

 Randomized algorithm for univariate PIT suppose we have a finite subset $\displaystyle{ S\subseteq\mathbb{F} }$ (to be specified later); pick $\displaystyle{ r\in S }$ uniformly at random; if $\displaystyle{ f(r) = 0 }$ then return “yes” else return “no”;

This algorithm evaluates $\displaystyle{ f }$ at one point chosen uniformly at random from a finite subset $\displaystyle{ S\subseteq\mathbb{F} }$. It is easy to see the followings:

• If $\displaystyle{ f\equiv 0 }$, the algorithm always returns "yes", so it is always correct.
• If $\displaystyle{ f\not\equiv 0 }$, the algorithm may wrongly return "yes" (a false positive). But this happens only when the random $\displaystyle{ r }$ is a root of $\displaystyle{ f }$. By the fundamental theorem of algebra, $\displaystyle{ f }$ has at most $\displaystyle{ d }$ roots, so the probability that the algorithm is wrong is bounded as
$\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }$

By fixing $\displaystyle{ S\subseteq\mathbb{F} }$ to be an arbitrary subset of size $\displaystyle{ |S|=2d }$, this probability of false positive is at most $\displaystyle{ 1/2 }$. We can reduce it to an arbitrarily small constant $\displaystyle{ \delta }$ by repeat the above testing independently for $\displaystyle{ \log_2 \frac{1}{\delta} }$ times, since the error probability decays geometrically as we repeat the algorithm independently.

## Schwartz-Zippel Lemma

Now let's consider the general multivariate case, where $\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }$ is an $\displaystyle{ n }$-variate polynomial of degree $\displaystyle{ d }$.

The following is a simple randomized algorithm for testing identity of multivariate polynomials:

 Randomized algorithm for multivariate PIT suppose we have a finite subset $\displaystyle{ S\subseteq\mathbb{F} }$ (to be specified later); pick $\displaystyle{ r_1,r_2,\ldots,r_n\in S }$ uniformly and independently at random; if $\displaystyle{ f(\vec{r})=f(r_1,r_2,\ldots,r_n) = 0 }$ then return “yes” else return “no”;

This algorithm evaluates $\displaystyle{ f }$ at one point chosen uniformly from an $\displaystyle{ n }$-dimensional cube $\displaystyle{ S^n }$, where $\displaystyle{ S\subseteq\mathbb{F} }$ is a finite subset. And:

• If $\displaystyle{ f\equiv 0 }$, the algorithm always returns "yes", so it is always correct.
• If $\displaystyle{ f\not\equiv 0 }$, the algorithm may wrongly return "yes" (a false positive). But this happens only when the random $\displaystyle{ \vec{r}=(r_1,r_2,\ldots,r_n) }$ is a root of $\displaystyle{ f }$. The probability of this bad event is upper bounded by the following famous result due to Schwartz (1980) and Zippel (1979).
 Schwartz-Zippel Lemma Let $\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }$ be an $\displaystyle{ n }$-variate polynomial of degree $\displaystyle{ d }$ over a field $\displaystyle{ \mathbb{F} }$. If $\displaystyle{ f\not\equiv 0 }$, then for any finite set $\displaystyle{ S\subset\mathbb{F} }$, and $\displaystyle{ r_1,r_2\ldots,r_n\in S }$ chosen uniformly and independently at random, $\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}. }$

The Schwartz-Zippel lemma states that for any nonzero $\displaystyle{ n }$-variate polynomial of degree at most $\displaystyle{ d }$, the number of roots in any cube $\displaystyle{ S^n }$ is at most $\displaystyle{ d\cdot |S|^{n-1} }$. And such algebraic proposition enjoins a probabilistic proof which uses no more than the law of total probability.

A naive thought for proving the Schwartz-Zippel lemma is to express the multivariate polynomial $\displaystyle{ f(r_1,r_2,\ldots,r_n)=g_{r_1,r_2,\ldots,r_{n-1}}(r_n) }$ as a univariate polynomial $\displaystyle{ g_{r_1,r_2,\ldots,r_{n-1}} }$ on the $\displaystyle{ {n} }$-th variable $\displaystyle{ r_{n} }$, where $\displaystyle{ g_{r_1,\ldots,r_{n-1}}(\cdot)=f(r_1,\ldots,r_{n-1},\cdot) }$ is a univariate polynomial "parameterized" by the random values $\displaystyle{ r_1,r_2,\ldots,r_{n-1} }$.

Then, the argument for the univariate case can apply, and easily we have

$\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0]=\Pr[g_{r_1,r_2,\ldots,r_{n-1}}(r_n)=0]\le\frac{d}{|S|}. }$

A fatal flaw of this "proof" is that when the first $\displaystyle{ {n-1} }$ variables are assigned with random values $\displaystyle{ r_1,r_2,\ldots,r_{n-1} }$, the univariate polynomial $\displaystyle{ g_{r_1,\ldots,r_{n-1}}(\cdot)=f(r_1,\ldots,r_{n-1},\cdot) }$ may no longer always satisfy that $\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }$, which is required by polynomial interpolation.

And this is precisely where the law of total probability kicks in, to separate the case where $\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }$ from the case where $\displaystyle{ g_{r_1,\ldots,r_{n-1}}\equiv 0 }$.

Proof.
 The theorem is proved by induction on $\displaystyle{ n }$. Induction basis: For $\displaystyle{ n=1 }$, this is the univariate case. Assume that $\displaystyle{ f\not\equiv 0 }$. Due to the fundamental theorem of algebra, any polynomial $\displaystyle{ f(x) }$ of degree at most $\displaystyle{ d }$ must have at most $\displaystyle{ d }$ roots, thus $\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }$ Induction hypothesis: Assume the theorem holds for any $\displaystyle{ m }$-variate polynomials for all $\displaystyle{ m\lt n }$. Induction step: For any $\displaystyle{ n }$-variate polynomial $\displaystyle{ f(x_1,x_2,\ldots,x_n) }$ of degree at most $\displaystyle{ d }$, write $\displaystyle{ f }$ as $\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1}) }$, where $\displaystyle{ k }$ is the highest degree of $\displaystyle{ x_n }$. Note that the degree of $\displaystyle{ f_k }$ is at most $\displaystyle{ d-k }$ and $\displaystyle{ f_k\not\equiv 0 }$ (otherwise $\displaystyle{ k }$ wouldn't be the highest degree of $\displaystyle{ x_n }$). For short, express $\displaystyle{ f }$ as: $\displaystyle{ f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n) }$, where both $\displaystyle{ f_k }$ and $\displaystyle{ \bar{f} }$ are polynomials, such that $\displaystyle{ f_k\not\equiv 0 }$ and has degree at most $\displaystyle{ d-k }$; $\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1}) }$, thus $\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n) }$ has no $\displaystyle{ x_n^{k} }$ factor in any term. By the law of total probability, \displaystyle{ \begin{align} \Pr[f(r_1,r_2,\ldots,r_n)=0] = &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\ &+\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})\neq0]. \end{align} } Since $\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1}) }$ is an $\displaystyle{ (n-1) }$-variate polynomial of degree at most $\displaystyle{ d-k }$ satisfying that $\displaystyle{ f_k\not\equiv 0 }$, by the induction hypothesis, \displaystyle{ \begin{align} (\text{case-I}) &\qquad &\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. \end{align} } Now look at the case with condition $\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }$. Recall that $\displaystyle{ \bar{f}(x_1,\ldots,x_n) }$ has no $\displaystyle{ x_n^k }$ factor in any term, thus the condition $\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }$ implies that $\displaystyle{ f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n) }$ when being interpreted as a univariate polynomial $\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }$ on variable $\displaystyle{ x_n }$, always has that $\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }$ and the degree of $\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }$ is at most $\displaystyle{ k }$, and thus due to the argument for the univariate case, the probability $\displaystyle{ g_{r_1,\ldots,r_{n-1}}(r_n)=0 }$ is at most $\displaystyle{ \frac{k}{|S|} }$. Therefore, \displaystyle{ \begin{align} (\text{case-II}) &\qquad &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|} \end{align} }. Applying the inequalities of (case-I) and (case-II) in the total probability above, gives $\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0] \le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}. }$
$\displaystyle{ \square }$