随机算法 (Fall 2011)/Identity checking

From TCS Wiki
Revision as of 07:45, 13 November 2011 by imported>Etone (Created page with "Many applications in Computer Science require to efficiently check whether two complex objects are identical, while the objects are presented implicitly (e.g., as black-boxes). W…")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Many applications in Computer Science require to efficiently check whether two complex objects are identical, while the objects are presented implicitly (e.g., as black-boxes). We consider two examples. One is to check the result of multiplying two matrices, and the other is to check the identity of two polynomials.

Example: Checking matrix multiplication

Consider the following problem:

  • Given as the input three [math]\displaystyle{ n\times n }[/math] matrices [math]\displaystyle{ A,B }[/math] and [math]\displaystyle{ C }[/math],
  • check whether [math]\displaystyle{ C=AB }[/math].

We could compute [math]\displaystyle{ AB }[/math] and compare the result to [math]\displaystyle{ C }[/math]. The time complexity of fastest matrix multiplication algorithm (in theory) is [math]\displaystyle{ O(n^{2.376}) }[/math], and so is the time complexity of this method.

Here’s a very simple randomized algorithm, due to Freivalds, that runs in only [math]\displaystyle{ O(n^2) }[/math] time:

Algorithm (Freivalds)
  • pick a vector [math]\displaystyle{ r \in\{0, 1\}^n }[/math] uniformly at random;
  • if [math]\displaystyle{ A(Br) = Cr }[/math] then return "yes" else return "no";

The running time of the algorithm is [math]\displaystyle{ O(n^2) }[/math] because it does only 3 matrix-vector multiplications.

If [math]\displaystyle{ AB=C }[/math] then [math]\displaystyle{ A(Br) = Cr }[/math] for any [math]\displaystyle{ r \in\{0, 1\}^n }[/math], thus the algorithm always returns "yes". But if [math]\displaystyle{ AB \neq C }[/math] then the algorithm will make a mistake if it happens to choose an [math]\displaystyle{ r }[/math] for which [math]\displaystyle{ ABr = Cr }[/math]. This, however, is unlikely:

Lemma
If [math]\displaystyle{ AB\neq C }[/math] then for a uniformly random [math]\displaystyle{ r \in\{0, 1\}^n }[/math],
[math]\displaystyle{ \Pr[ABr = Cr]\le \frac{1}{2} }[/math].
Proof.
Let [math]\displaystyle{ D=AB-C }[/math]. We will show that if [math]\displaystyle{ D\neq \boldsymbol{0} }[/math], then [math]\displaystyle{ \Pr[Dr = \boldsymbol{0}]\le \frac{1}{2} }[/math] for a uniform random [math]\displaystyle{ r \in\{0, 1\}^n }[/math].

Since [math]\displaystyle{ D\neq \boldsymbol{0} }[/math], it must have at least one non-zero entry, say [math]\displaystyle{ D(i,j)\neq 0 }[/math]. The [math]\displaystyle{ i }[/math]-th entry of [math]\displaystyle{ Dr }[/math] is

[math]\displaystyle{ (Dr)_i=\sum_{k=1}^nD(i,k)r_k }[/math].

If to the contrary, [math]\displaystyle{ Dr=\boldsymbol{0} }[/math], then [math]\displaystyle{ (Dr)_i=\sum_{k=1}^nD(i,k)r_k=0 }[/math], which is equivalent to that

[math]\displaystyle{ r_j=-\frac{1}{D(i,j)}\sum_{k\neq j}^nD(i,k)r_k }[/math],

i.e. once [math]\displaystyle{ r_k }[/math] for [math]\displaystyle{ k\neq j }[/math] have been chosen, there is only one value of [math]\displaystyle{ r_j }[/math] that would give us a zero [math]\displaystyle{ Dr }[/math]. However, there are two possible values [math]\displaystyle{ \{0,1\} }[/math] for [math]\displaystyle{ r_j }[/math] which are equal-probable, so with at least [math]\displaystyle{ \frac{1}{2} }[/math] probability, the choice of [math]\displaystyle{ r }[/math] fails to give us a zero [math]\displaystyle{ Dr }[/math].

[math]\displaystyle{ \square }[/math]

Example: Checking polynomial identities

Consider the following problem:

  • Given as the input two multivariate polynomials [math]\displaystyle{ P_1(x_1,\ldots,x_n) }[/math] and [math]\displaystyle{ P_2(x_1,\ldots,x_n) }[/math],
  • check whether the two polynomials are identical, denoted [math]\displaystyle{ P_1\equiv P_2 }[/math].

Obviously, if [math]\displaystyle{ P_1, P_2 }[/math] are written out explicitly, the question is trivially answered in linear time just by comparing their coefficients. But in practice they are usually given in very compact form (e.g., as determinants of matrices), so that we can evaluate them efficiently, but expanding them out and looking at their coefficients is out of the question.

Example

Consider the polynomial

[math]\displaystyle{ P(x_1,\ldots,x_n)=\prod_{\overset{i\lt j}{i,j\neq 1}}(x_i-x_j)-\prod_{\overset{i\lt j}{i,j\neq 2}}(x_i-x_j)+\prod_{\overset{i\lt j}{i,j\neq 3}}(x_i-x_j)-\cdots+(-1)^{n-1}\prod_{\overset{i\lt j}{i,j\neq n}}(x_i-x_j) }[/math]

Show that evaluating [math]\displaystyle{ P }[/math] at any given point can be done efficiently, but that expanding out [math]\displaystyle{ P }[/math] to find all its coefficients is computationally infeasible even for moderate values of [math]\displaystyle{ n }[/math].

Here is a very simple randomized algorithm, due to Schwartz and Zippel. Testing [math]\displaystyle{ P_1\equiv P_2 }[/math] is equivalent to testing [math]\displaystyle{ P\equiv 0 }[/math], where [math]\displaystyle{ P = P_1 - P_2 }[/math].

Algorithm (Schwartz-Zippel)
  • pick [math]\displaystyle{ r_1, \ldots , r_n }[/math] independently and uniformly at random from a set [math]\displaystyle{ S }[/math];
  • if [math]\displaystyle{ P_1(r_1, \ldots , r_n) = P_2(r_1, \ldots , r_n) }[/math] then return “yes” else return “no”;

This algorithm requires only the evaluation of [math]\displaystyle{ P }[/math] at a single point. And if [math]\displaystyle{ P\equiv 0 }[/math] it is always correct.

In the Theorem below, we’ll see that if [math]\displaystyle{ P\neq 0 }[/math] then the algorithm is incorrect with probability at most [math]\displaystyle{ \frac{d}{|S|} }[/math], where [math]\displaystyle{ d }[/math] is the maximum degree of the polynomial [math]\displaystyle{ P }[/math].

Theorem (Schwartz-Zippel)
Let [math]\displaystyle{ Q(x_1,\ldots,x_n) }[/math] be a multivariate polynomial of degree [math]\displaystyle{ d }[/math] defined over a field [math]\displaystyle{ \mathbb{F} }[/math]. Fix any finite set [math]\displaystyle{ S\subset\mathbb{F} }[/math], and let [math]\displaystyle{ r_1,\ldots,r_n }[/math] be chosen independently and uniformly at random from [math]\displaystyle{ S }[/math]. Then
[math]\displaystyle{ \Pr[Q(r_1,\ldots,r_n)=0\mid Q\not\equiv 0]\le\frac{d}{|S|}. }[/math]
Proof.
The theorem holds if [math]\displaystyle{ Q }[/math] is a single-variate polynomial, because a single-variate polynomial [math]\displaystyle{ Q }[/math] of degree [math]\displaystyle{ d }[/math] has at most [math]\displaystyle{ d }[/math] roots, i.e. there are at most [math]\displaystyle{ d }[/math] many choices of [math]\displaystyle{ r }[/math] having [math]\displaystyle{ Q(r)=0 }[/math], so the theorem follows immediately.

For multi-variate [math]\displaystyle{ Q }[/math], we prove by induction on the number of variables [math]\displaystyle{ n }[/math].

Write [math]\displaystyle{ Q(x_1,\ldots,x_n) }[/math] as

[math]\displaystyle{ Q(x_1,\ldots,x_n) = \sum_{i=0}^kx_n^kQ_i(x_1,\ldots,x_{n-1}) }[/math]

where [math]\displaystyle{ k }[/math] is the largest exponent of [math]\displaystyle{ x_n }[/math] in [math]\displaystyle{ Q(x_1,\ldots,x_n) }[/math]. So [math]\displaystyle{ Q_k(x_1,\ldots,x_{n-1}) \not\equiv 0 }[/math] by our definition of [math]\displaystyle{ k }[/math], and its degree is at most [math]\displaystyle{ d-k }[/math].

Thus by the induction hypothesis we have that [math]\displaystyle{ \Pr[Q_k(r_1,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|} }[/math].

Conditioning on the event [math]\displaystyle{ Q_k(r_1,\ldots,r_{n-1})\neq 0 }[/math], the single-variate polynomial [math]\displaystyle{ Q'(x_n)=Q(r_1,\ldots,r_{n-1}, x_n)=\sum_{i=0}^kx_n^kQ_i(r_1,\ldots,r_{n-1}) }[/math] has degree [math]\displaystyle{ k }[/math] and [math]\displaystyle{ Q'(x_n)\not\equiv 0 }[/math], thus

[math]\displaystyle{ \begin{align} &\quad\,\Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]\\ &= \Pr[Q'(r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]\\ &\le \frac{k}{|S|} \end{align} }[/math].

Therefore, due to the law of total probability,

[math]\displaystyle{ \begin{align} &\quad\,\Pr[Q(r_1,\ldots,r_{n})=0]\\ &= \Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]\Pr[Q_k(r_1,\ldots,r_{n-1})\neq 0]\\ &\quad\,\,+\Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})= 0]\Pr[Q_k(r_1,\ldots,r_{n-1})= 0]\\ &\le \Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]+\Pr[Q_k(r_1,\ldots,r_{n-1})= 0]\\ &\le \frac{k}{|S|}+\frac{d-k}{|S|}\\ &=\frac{d}{|S|}. \end{align} }[/math]


[math]\displaystyle{ \square }[/math]

The idea of fingerprinting

Suppose we want to compare two items [math]\displaystyle{ Z_1 }[/math] and [math]\displaystyle{ Z_2 }[/math]. Instead of comparing them directly, we compute random fingerprints [math]\displaystyle{ \mathrm{FING}(Z_1) }[/math] and [math]\displaystyle{ \mathrm{FING}(Z_2) }[/math] and compare these. The fingerprints has the following properties:

  • [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] is a function, which means that if [math]\displaystyle{ Z_1= Z_2 }[/math] then [math]\displaystyle{ \mathrm{FING}(Z_1)=\mathrm{FING}(Z_2) }[/math].
  • If [math]\displaystyle{ Z_1\neq Z_2 }[/math] then [math]\displaystyle{ \Pr[\mathrm{FING}(Z_1)=\mathrm{FING}(Z_2)] }[/math] is small.
  • It is much more to compute and compare the fingerprints than to compare [math]\displaystyle{ Z_1 }[/math] and [math]\displaystyle{ Z_2 }[/math] directly.

For Freivald's algorithm, the items to compare are two [math]\displaystyle{ n\times n }[/math] matrices [math]\displaystyle{ AB }[/math] and [math]\displaystyle{ C }[/math], and given an [math]\displaystyle{ n\times n }[/math] matrix [math]\displaystyle{ M }[/math], its random fingerprint is computed as [math]\displaystyle{ \mathrm{FING}(M)=Mr }[/math] for a uniformly random [math]\displaystyle{ r\in\{0,1\}^n }[/math].

For the Schwartz-Zippel algorithm, the items to compare are two polynomials [math]\displaystyle{ P_1(x_1,\ldots,x_n) }[/math] and [math]\displaystyle{ P_2(x_1,\ldots,x_n) }[/math], and given a polynomial [math]\displaystyle{ Q(x_1,\ldots,x_n) }[/math], its random fingerprint is computed as [math]\displaystyle{ \mathrm{FING}(Q)=Q(r_1,\ldots,r_n) }[/math] for [math]\displaystyle{ r_i }[/math] chosen independently and uniformly at random from some fixed set [math]\displaystyle{ S }[/math].

For different problems, we may have different definitions of [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math].