概率论与数理统计 (Spring 2023)/Polynomial identity testing: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
Line 146: Line 146:
</math>
</math>
which proves the theorem.
which proves the theorem.
}}
= Checking distinctness =
Consider the following problem of '''checking distinctness''':
*Given a sequence <math>x_1,x_2,\ldots,x_n\in\{1,2,\ldots,n\}</math>, check whether every element of <math>\{1,2,\ldots,n\}</math> appears '''exactly''' once.
Obviously this problem can be solved in linear time and linear space (in addition to the space for storing the input) by maintaining a <math>n</math>-bit vector that indicates which numbers among <math>\{1,2,\ldots,n\}</math> have appeared.
When this <math>n</math> is enormously large, <math>\Omega(n)</math> space cost is too expensive. We wonder whether we could solve this problem with a space cost (in addition to the space for storing the input) much less than <math>O(n)</math>. This can be done by fingerprinting if we tolerate a certain degree of inaccuracy.
== Fingerprinting multisets ==
We consider the following more generalized problem, '''checking identity of multisets''':
* '''Input:''' two multisets <math>A=\{a_1,a_2,\ldots, a_n\}</math> and <math>B=\{b_1,b_2,\ldots, b_n\}</math> where <math>a_1,a_2,\ldots,b_1,b_2,\ldots,b_n\in \{1,2,\ldots,n\}</math>.
* Determine whether <math>A=B</math> (multiset equivalence).
Here for a '''multiset''' <math>A=\{a_1,a_2,\ldots, a_n\}</math>, its elements <math>a_i</math> are not necessarily distinct. The '''multiplicity''' of an element <math>a_i</math> in a multiset <math>A</math> is the number of times <math>a_i</math> appears in <math>A</math>. Two multisets <math>A</math> and <math>B</math> are equivalent if they contain the same set of elements and the multiplicities of every element in <math>A</math> and <math>B</math> are equal.
Obviously the above problem of checking distinctness can be treated as a special case of checking identity of multisets: by checking the identity of the multiset <math>A</math> and set <math>\{1,2,\ldots, n\}</math>.
The following fingerprinting function for multisets was introduced by Lipton for solving multiset identity testing.
{{Theorem|Fingerprint for multiset|
:Let <math>p</math> be a uniform random prime chosen from the interval <math>[(n\log n)^2,2(n\log n)^2]</math>. By Chebyshev's theorem, such prime must exist. And consider the the finite field <math>\mathbb{Z}_p=[p]</math>.
:Given a multiset <math>A=\{a_1,a_2,\ldots,a_n\}</math>, we define a univariate polynomial <math>f_A\in\mathbb{Z}_p[x]</math> over the finite field <math>\mathbb{Z}_p</math> as follows:
::<math>f_A(x)=\prod_{i=1}^n(x-a_i)</math>,
:where <math>+</math> and <math>\cdot</math> are defined over the finite field <math>\mathbb{Z}_p</math>.
:We then define the random fingerprinting function as:
::<math>\mathrm{FING}(A)=f_A(r)=\prod_{i=1}^n(r-a_i)</math>,
:where <math>r</math> is chosen uniformly at random from <math>\mathbb{Z}_p</math>.
}}
Since all computations of <math>\mathrm{FING}(A)=\prod_{i=1}^n(r-a_i)</math> are over the finite field <math>\mathbb{Z}_p</math>, the space cost for computing the fingerprint <math>\mathrm{FING}(A)</math> is only <math>O(\log p)=O(\log n)</math>.
Moreover, the fingerprinting function <math>\mathrm{FING}(A)</math> is invariant under permutation of elements of the multiset <math>A=\{a_1,a_2,\ldots,a_n\}</math>, thus it is indeed a function of multisets (meaning every multiset has only one fingerprint). Therefore, if <math>A=B</math> then <math>\mathrm{FING}(A)=\mathrm{FING}(B)</math>.
For two distinct multisets <math>A\neq B</math>, it is possible that <math>\mathrm{FING}(A)=\mathrm{FING}(B)</math>, but the following theorem due to Lipton bounds this error probability of false positive.
{{Theorem|Theorem (Lipton 1989)|
:Let <math>A=\{a_1,a_2,\ldots,a_n\}</math> and <math>B=\{b_1,b_2,\ldots,b_n\}</math> be two multisets whose elements are from <math>\{1,2,\ldots,n\}</math>. If <math>A\neq B</math>, then
::<math>\Pr[\mathrm{FING}(A)= \mathrm{FING}(B)]=O\left(\frac{1}{n}\right)</math>.
}}
{{Proof|
Let <math>\tilde{f}_A(x)=\prod_{i=1}^n(x-a_i)</math> and <math>\tilde{f}_B(x)=\prod_{i=1}^n(x-b_i)</math> be two univariate polynomials defined over reals <math>\mathbb{R}</math>. Note that in contrast to <math>f_A(x)</math> and <math>f_B(x)</math>,  the <math>+</math> and <math>\cdot</math> in <math>\tilde{f}_A(x), \tilde{f}_B(x)</math> do not modulo <math>p</math>. It is easy to verify that the polynomials <math>\tilde{f}_A(x), \tilde{f}_B(x)</math> have the following properties:
*<math>\tilde{f}_A\equiv \tilde{f}_B</math> if and only if <math>A=B</math>. Here <math>A=B</math> means the multiset equivalence.
*By the properties of finite field, for any value <math>r\in\mathbb{Z}_p</math>, it holds that <math>f_A(r)=\tilde{f}_A(r)\bmod p</math> and <math>f_B(r)=\tilde{f}_B(r)\bmod p</math>.
Therefore, assuming that <math>A\neq B</math>, we must have <math>\tilde{f}_A(x)\not\equiv \tilde{f}_B(x)</math>. Then by the law of total probability:
:<math>
\begin{align}
\Pr[\mathrm{FING}(A)= \mathrm{FING}(B)]
&=
\Pr\left[f_A(r)=f_B(r)\mid f_A\not\equiv f_B\right]\Pr[f_A\not\equiv f_B]\\
&\quad\,\,+\Pr\left[f_A(r)=f_B(r)\mid f_A\equiv f_B\right]\Pr[f_A\equiv f_B]\\
&\le \Pr\left[f_A(r)=f_B(r)\mid f_A\not\equiv f_B\right]+\Pr[f_A\equiv f_B].
\end{align}
</math>
Note that the degrees of <math>f_A,f_B</math> are at most <math>n</math> and <math>r</math> is chosen uniformly from <math>[p]</math>. By the Schwartz-Zippel theorem for univariate polynomials, the first probability
:<math>
\Pr\left[f_A(r)=f_B(r)\mid f_A\not\equiv f_B\right]\le \frac{n}{p}=o\left(\frac{1}{n}\right),
</math>
since <math>p</math> is chosen from the interval <math>[(n\log n)^2,2(n\log n)^2]</math>.
For the second probability <math>\Pr[f_A\equiv f_B]</math>, recall that <math>\tilde{f}_A\not\equiv \tilde{f}_B</math>, therefore there is at least a non-zero coefficient <math>c\le n^n</math> in <math>\tilde{f}_A-\tilde{f}_B</math>. The event <math>f_A\equiv f_B</math> occurs only if <math>c\bmod p=0</math>, which means
:<math>
\begin{align}
\Pr[f_A\equiv f_B]
&\le \Pr[c\bmod p=0]\\
&=\frac{\text{number of prime factors of }c}{\text{number of primes in }[(n\log n)^2,2(n\log n)^2]}\\
&\le \frac{n\log_2n}{\pi(2(n\log n)^2)-\pi((n\log n)^2)}.
\end{align}
</math>
By the prime number theorem, <math>\pi(N)\rightarrow \frac{N}{\ln N}</math> as <math>N\to\infty</math>. Therefore,
:<math>
\Pr[f_A\equiv f_B]=O\left(\frac{n\log n}{n^2\log n}\right)=O\left(\frac{1}{n}\right).
</math>
Combining everything together, we have
<math>\Pr[\mathrm{FING}(A)= \mathrm{FING}(B)]=O\left(\frac{1}{n}\right)</math>.
}}
}}

Revision as of 16:44, 11 March 2023

Polynomial Identity Testing (PIT)

The Polynomial Identity Testing (PIT) is such a problem: given as input two polynomials, determine whether they are identical. It plays a fundamental role in Identity Testing problems.

First, let's consider the univariate ("one variable") case:

  • Input: two polynomials [math]\displaystyle{ f, g\in\mathbb{F}[x] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv g }[/math] ([math]\displaystyle{ f }[/math] and [math]\displaystyle{ g }[/math] are identical).

Here the [math]\displaystyle{ \mathbb{F}[x] }[/math] denotes the ring of univariate polynomials on a field [math]\displaystyle{ \mathbb{F} }[/math]. More precisely, a polynomial [math]\displaystyle{ f\in\mathbb{F}[x] }[/math] is

[math]\displaystyle{ f(x)=\sum_{i=0}^\infty a_ix^i }[/math],

where the coefficients [math]\displaystyle{ a_i }[/math] are taken from the field [math]\displaystyle{ \mathbb{F} }[/math], and the addition and multiplication are also defined over the field [math]\displaystyle{ \mathbb{F} }[/math]. And:

  • the degree of [math]\displaystyle{ f }[/math] is the highest [math]\displaystyle{ i }[/math] with non-zero [math]\displaystyle{ a_i }[/math];
  • a polynomial [math]\displaystyle{ f }[/math] is a zero-polynomial, denoted as [math]\displaystyle{ f\equiv 0 }[/math], if all coefficients [math]\displaystyle{ a_i=0 }[/math].

Alternatively, we can consider the following equivalent problem by comparing the polynomial [math]\displaystyle{ f-g }[/math] (whose degree is at most [math]\displaystyle{ d }[/math]) with the zero-polynomial:

  • Input: a polynomial [math]\displaystyle{ f\in\mathbb{F}[x] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv 0 }[/math] ([math]\displaystyle{ f }[/math] is the 0 polynomial).

The problem is trivial if the input polynomial [math]\displaystyle{ f }[/math] is given explicitly: one can trivially solve the problem by checking whether all [math]\displaystyle{ d+1 }[/math] coefficients are [math]\displaystyle{ 0 }[/math]. To make the problem nontrivial, we assume that the input polynomial is given implicitly as a black box (also called an oracle): the only way the algorithm can access to [math]\displaystyle{ f }[/math] is to evaluate [math]\displaystyle{ f(x) }[/math] over some [math]\displaystyle{ x }[/math] from the field [math]\displaystyle{ \mathbb{F} }[/math], [math]\displaystyle{ x }[/math] chosen by the algorithm.

A straightforward deterministic algorithm is to evaluate [math]\displaystyle{ f(x_1),f(x_2),\ldots,f(x_{d+1}) }[/math] over [math]\displaystyle{ d+1 }[/math] distinct elements [math]\displaystyle{ x_1,x_2,\ldots,x_{d+1} }[/math] from the field [math]\displaystyle{ \mathbb{F} }[/math] and check whether they are all zero. By the fundamental theorem of algebra, also known as polynomial interpolations, this guarantees to verify whether a degree-[math]\displaystyle{ d }[/math] univariate polynomial [math]\displaystyle{ f\equiv 0 }[/math].

Fundamental Theorem of Algebra
Any non-zero univariate polynomial of degree [math]\displaystyle{ d }[/math] has at most [math]\displaystyle{ d }[/math] roots.

The reason for this fundamental theorem holding generally over any field [math]\displaystyle{ \mathbb{F} }[/math] is that any univariate polynomial of degree [math]\displaystyle{ d }[/math] factors uniquely into at most [math]\displaystyle{ d }[/math] irreducible polynomials, each of which has at most one root.

The following simple randomized algorithm is natural:

Algorithm for PIT
  • suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
  • pick [math]\displaystyle{ r\in S }[/math] uniformly at random;
  • if [math]\displaystyle{ f(r) = 0 }[/math] then return “yes” else return “no”;

This algorithm evaluates [math]\displaystyle{ f }[/math] at one point chosen uniformly at random from a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math]. It is easy to see the followings:

  • If [math]\displaystyle{ f\equiv 0 }[/math], the algorithm always returns "yes", so it is always correct.
  • If [math]\displaystyle{ f\not\equiv 0 }[/math], the algorithm may wrongly return "yes" (a false positive). But this happens only when the random [math]\displaystyle{ r }[/math] is a root of [math]\displaystyle{ f }[/math]. By the fundamental theorem of algebra, [math]\displaystyle{ f }[/math] has at most [math]\displaystyle{ d }[/math] roots, so the probability that the algorithm is wrong is bounded as
[math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]

By fixing [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] to be an arbitrary subset of size [math]\displaystyle{ |S|=2d }[/math], this probability of false positive is at most [math]\displaystyle{ 1/2 }[/math]. We can reduce it to an arbitrarily small constant [math]\displaystyle{ \delta }[/math] by repeat the above testing independently for [math]\displaystyle{ \log_2 \frac{1}{\delta} }[/math] times, since the error probability decays geometrically as we repeat the algorithm independently.

Schwartz-Zippel Theorem

Now let's see the the true form of Polynomial Identity Testing (PIT), for multivariate polynomials:

  • Input: two [math]\displaystyle{ n }[/math]-variate polynomials [math]\displaystyle{ f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv g }[/math].

The [math]\displaystyle{ \mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] is the ring of multivariate polynomials over field [math]\displaystyle{ \mathbb{F} }[/math]. An [math]\displaystyle{ n }[/math]-variate polynomial of degree [math]\displaystyle{ d }[/math], written as a sum of monomials, is:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math].

The degree or total degree of a monomial [math]\displaystyle{ a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n} }[/math] is given by [math]\displaystyle{ i_1+i_2+\cdots+i_n }[/math] and the degree of a polynomial [math]\displaystyle{ f }[/math] is the maximum degree of monomials of nonzero coefficients.

As before, we also consider the following equivalent problem:

  • Input: a polynomial [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] of degree [math]\displaystyle{ d }[/math].
  • Determine whether [math]\displaystyle{ f\equiv 0 }[/math].

If [math]\displaystyle{ f }[/math] is written explicitly as a sum of monomials, then the problem can be solved by checking whether all coefficients, and there at most [math]\displaystyle{ {n+d\choose d}\le (n+d)^{d} }[/math] coefficients in an [math]\displaystyle{ n }[/math]-variate polynomial of degree at most [math]\displaystyle{ d }[/math].

A multivariate polynomial [math]\displaystyle{ f }[/math] can also be presented in its product form, for example:

Example

The Vandermonde matrix [math]\displaystyle{ M=M(x_1,x_2,\ldots,x_n) }[/math] is defined as that [math]\displaystyle{ M_{ij}=x_i^{j-1} }[/math], that is

[math]\displaystyle{ M=\begin{bmatrix} 1 & x_1 & x_1^2 & \dots & x_1^{n-1}\\ 1 & x_2 & x_2^2 & \dots & x_2^{n-1}\\ 1 & x_3 & x_3^2 & \dots & x_3^{n-1}\\ \vdots & \vdots & \vdots & \ddots &\vdots \\ 1 & x_n & x_n^2 & \dots & x_n^{n-1} \end{bmatrix} }[/math].

Let [math]\displaystyle{ f }[/math] be the polynomial defined as

[math]\displaystyle{ f(x_1,\ldots,x_n)=\det(M)=\prod_{j\lt i}(x_i-x_j). }[/math]

For polynomials in product form, it is quite efficient to evaluate the polynomial at any specific point from the field over which the polynomial is defined, however, expanding the polynomial to a sum of monomials can be very expensive.

The following is a simple randomized algorithm for testing identity of multivariate polynomials:

Randomized algorithm for multivariate PIT
  • suppose we have a finite subset [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] (to be specified later);
  • pick [math]\displaystyle{ r_1,r_2,\ldots,r_n\in S }[/math] uniformly and independently at random;
  • if [math]\displaystyle{ f(\vec{r})=f(r_1,r_2,\ldots,r_n) = 0 }[/math] then return “yes” else return “no”;

This algorithm evaluates [math]\displaystyle{ f }[/math] at one point chosen uniformly from an [math]\displaystyle{ n }[/math]-dimensional cube [math]\displaystyle{ S^n }[/math], where [math]\displaystyle{ S\subseteq\mathbb{F} }[/math] is a finite subset. And:

  • If [math]\displaystyle{ f\equiv 0 }[/math], the algorithm always returns "yes", so it is always correct.
  • If [math]\displaystyle{ f\not\equiv 0 }[/math], the algorithm may wrongly return "yes" (a false positive). But this happens only when the random [math]\displaystyle{ \vec{r}=(r_1,r_2,\ldots,r_n) }[/math] is a root of [math]\displaystyle{ f }[/math]. The probability of this bad event is upper bounded by the following famous result due to Schwartz (1980) and Zippel (1979).
Schwartz-Zippel Theorem
Let [math]\displaystyle{ f\in\mathbb{F}[x_1,x_2,\ldots,x_n] }[/math] be a multivariate polynomial of degree [math]\displaystyle{ d }[/math] over a field [math]\displaystyle{ \mathbb{F} }[/math]. If [math]\displaystyle{ f\not\equiv 0 }[/math], then for any finite set [math]\displaystyle{ S\subset\mathbb{F} }[/math], and [math]\displaystyle{ r_1,r_2\ldots,r_n\in S }[/math] chosen uniformly and independently at random,
[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}. }[/math]

The Schwartz-Zippel Theorem states that for any nonzero [math]\displaystyle{ n }[/math]-variate polynomial of degree at most [math]\displaystyle{ d }[/math], the number of roots in any cube [math]\displaystyle{ S^n }[/math] is at most [math]\displaystyle{ d\cdot |S|^{n-1} }[/math].

Dana Moshkovitz gave a surprisingly simply and elegant proof of Schwartz-Zippel Theorem, using some advanced ideas. Now we introduce the standard proof by induction.

Proof.

The theorem is proved by induction on [math]\displaystyle{ n }[/math].

Induction basis:

For [math]\displaystyle{ n=1 }[/math], this is the univariate case. Assume that [math]\displaystyle{ f\not\equiv 0 }[/math]. Due to the fundamental theorem of algebra, any polynomial [math]\displaystyle{ f(x) }[/math] of degree at most [math]\displaystyle{ d }[/math] must have at most [math]\displaystyle{ d }[/math] roots, thus

[math]\displaystyle{ \Pr[f(r)=0]\le\frac{d}{|S|}. }[/math]
Induction hypothesis:

Assume the theorem holds for any [math]\displaystyle{ m }[/math]-variate polynomials for all [math]\displaystyle{ m\lt n }[/math].

Induction step:

For any [math]\displaystyle{ n }[/math]-variate polynomial [math]\displaystyle{ f(x_1,x_2,\ldots,x_n) }[/math] of degree at most [math]\displaystyle{ d }[/math], we write [math]\displaystyle{ f }[/math] as

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1}) }[/math],

where [math]\displaystyle{ k }[/math] is the highest degree of [math]\displaystyle{ x_n }[/math], which means the degree of [math]\displaystyle{ f_k }[/math] is at most [math]\displaystyle{ d-k }[/math] and [math]\displaystyle{ f_k\not\equiv 0 }[/math].

In particular, we write [math]\displaystyle{ f }[/math] as a sum of two parts:

[math]\displaystyle{ f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n) }[/math],

where both [math]\displaystyle{ f_k }[/math] and [math]\displaystyle{ \bar{f} }[/math] are polynomials, such that

  • [math]\displaystyle{ f_k\not\equiv 0 }[/math] is as above, whose degree is at most [math]\displaystyle{ d-k }[/math];
  • [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1}) }[/math], thus [math]\displaystyle{ \bar{f}(x_1,x_2,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^{k} }[/math] factor in any term.

By the law of total probability, we have

[math]\displaystyle{ \begin{align} &\Pr[f(r_1,r_2,\ldots,r_n)=0]\\ = &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\ &+\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})\neq0]. \end{align} }[/math]

Note that [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1}) }[/math] is a polynomial on [math]\displaystyle{ n-1 }[/math] variables of degree [math]\displaystyle{ d-k }[/math] such that [math]\displaystyle{ f_k\not\equiv 0 }[/math]. By the induction hypothesis, we have

[math]\displaystyle{ \begin{align} (*) &\qquad &\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}. \end{align} }[/math]

Now we look at the case conditioning on [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math]. Recall that [math]\displaystyle{ \bar{f}(x_1,\ldots,x_n) }[/math] has no [math]\displaystyle{ x_n^k }[/math] factor in any term, thus the condition [math]\displaystyle{ f_k(r_1,r_2,\ldots,r_{n-1})\neq0 }[/math] guarantees that

[math]\displaystyle{ f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_{n-1},x_n)=g_{r_1,\ldots,r_{n-1}}(x_n) }[/math]

is a nonzero univariate polynomial of [math]\displaystyle{ x_n }[/math] such that the degree of [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(x_n) }[/math] is [math]\displaystyle{ k }[/math] and [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}\not\equiv 0 }[/math], for which we already known that the probability [math]\displaystyle{ g_{r_1,\ldots,r_{n-1}}(r_n)=0 }[/math] is at most [math]\displaystyle{ \frac{k}{|S|} }[/math]. Therefore,

[math]\displaystyle{ \begin{align} (**) &\qquad &\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|} \end{align} }[/math].

Substituting both [math]\displaystyle{ (*) }[/math] and [math]\displaystyle{ (**) }[/math] back in the total probability, we have

[math]\displaystyle{ \Pr[f(r_1,r_2,\ldots,r_n)=0] \le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}, }[/math]

which proves the theorem.

[math]\displaystyle{ \square }[/math]