Randomized Algorithms (Spring 2010)/Fingerprinting: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>WikiSysop
imported>WikiSysop
No edit summary
 
(36 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Fingerprinting ==
== Checking identities ==
Many applications in Computer Science require to efficiently check whether two complex objects are identical, while the objects are presented implicitly (e.g., as black-boxes). We consider two examples. One is to check the result of multiplying two matrices, and the other is to check the identity of two polynomials.


=== Evaluating at random points ===
=== Example: Checking matrix multiplication ===
Consider the following problem:
* Given as the input three <math>n\times n</math> matrices <math>A,B</math> and <math>C</math>,
* check whether <math>C=AB</math>.


==== Example: Checking matrix multiplication ====
We could compute <math>AB</math> and compare the result to <math>C</math>. The time complexity of fastest matrix multiplication algorithm (in theory) is <math>O(n^{2.376})</math>, and so is the time complexity of this method.
{|border="1"
|
*'''Input''': Three <math>n\times n</math> matrices <math>A,B</math> and <math>C</math>.


*'''Question''': Is <math>C=AB</math>?
Here’s a very simple randomized algorithm, due to Freivalds, that runs in only <math>O(n^2)</math> time:
|}


==== Example: Checking polynomial identities ====
{{Theorem|Algorithm (Freivalds)|
*pick a vector <math>r \in\{0, 1\}^n</math> uniformly at random;
*if <math>A(Br) = Cr</math> then return "yes" else return "no";
}}
The running time of the algorithm is <math>O(n^2)</math> because it does only 3 matrix-vector multiplications.


===Evaluating over a random field ===  
If <math>AB=C</math> then <math>A(Br) = Cr</math> for any <math>r \in\{0, 1\}^n</math>, thus the algorithm always returns "yes". But if <math>AB \neq C</math> then the algorithm will make a mistake if it happens to choose an <math>r</math> for which <math>ABr = Cr</math>. This, however, is unlikely:


==== Example: Identity checking ====
{{Theorem|Lemma|
:If <math>AB\neq C</math> then for a uniformly random <math>r \in\{0, 1\}^n</math>,
::<math>\Pr[ABr = Cr]\le \frac{1}{2}</math>.
}}
{{Proof| Let <math>D=AB-C</math>. We will show that if <math>D\neq \boldsymbol{0}</math>, then <math>\Pr[Dr = \boldsymbol{0}]\le \frac{1}{2}</math> for a uniform random <math>r \in\{0, 1\}^n</math>.


==== Example: Randomized pattern matching ====
Since <math>D\neq \boldsymbol{0}</math>, it must have at least one non-zero entry, say <math>D(i,j)\neq 0</math>. The <math>i</math>-th entry of <math>Dr</math> is
:<math>(Dr)_i=\sum_{k=1}^nD(i,k)r_k</math>.
If to the contrary, <math>Dr=\boldsymbol{0}</math>, then <math>(Dr)_i=\sum_{k=1}^nD(i,k)r_k=0</math>, which is equivalent to that
:<math>r_j=-\frac{1}{D(i,j)}\sum_{k\neq j}^nD(i,k)r_k</math>,
i.e. once <math>r_k</math> for <math>k\neq j</math> have been chosen, there is only '''one''' value of <math>r_j</math> that would give us a zero <math>Dr</math>. However, there are '''two''' possible values <math>\{0,1\}</math> for <math>r_j</math> which are equal-probable, so with at least <math>\frac{1}{2}</math> probability, the choice of <math>r</math> fails to give us a zero <math>Dr</math>.
}}


=== Universal hashing ===
=== Example: Checking polynomial identities ===
Consider the following problem:
* Given as the input two multivariate polynomials <math>P_1(x_1,\ldots,x_n)</math> and <math>P_2(x_1,\ldots,x_n)</math>,
* check whether the two polynomials are identical, denoted <math>P_1\equiv P_2</math>.


;Example<nowiki>:</nowiki> checking distinctness
Obviously, if <math>P_1, P_2</math> are written out explicitly, the question is trivially answered in linear time just by comparing their coefficients. But in practice they are usually given in very compact form (e.g., as determinants of matrices), so that we can evaluate them efficiently, but expanding them out and looking at their coefficients is out of the question.


== Probabilistic Checkable Proofs (PCPs) ==
{{Theorem|Example|
Consider the polynomial
:<math>
P(x_1,\ldots,x_n)=\prod_{\overset{i<j}{i,j\neq 1}}(x_i-x_j)-\prod_{\overset{i<j}{i,j\neq 2}}(x_i-x_j)+\prod_{\overset{i<j}{i,j\neq 3}}(x_i-x_j)-\cdots+(-1)^{n-1}\prod_{\overset{i<j}{i,j\neq n}}(x_i-x_j)
</math>
Show that evaluating <math>P</math> at any given point can be done efficiently, but that expanding out <math>P</math>
to find all its coefficients is computationally infeasible even for moderate values of <math>n</math>.
}}
 
Here is a very simple randomized algorithm, due to Schwartz and Zippel. Testing <math>P_1\equiv P_2</math>
is equivalent to testing <math>P\equiv 0</math>, where <math>P = P_1 - P_2</math>.
 
{{Theorem|Algorithm (Schwartz-Zippel)|
*pick <math>r_1, \ldots , r_n</math> independently and uniformly at random from a set <math>S</math>;
*if <math>P_1(r_1, \ldots , r_n) = P_2(r_1, \ldots , r_n)</math> then return “yes” else return “no”;
}}
 
This algorithm requires only the evaluation of <math>P</math> at a single point. And if <math>P\equiv 0</math> it is always correct.
 
In the Theorem below, we’ll see that if <math>P\neq 0</math> then the algorithm is incorrect with probability
at most <math>\frac{d}{|S|}</math>, where <math>d</math> is the maximum degree of the polynomial <math>P</math>.
 
{{Theorem|Theorem (Schwartz-Zippel)|
: Let <math>Q(x_1,\ldots,x_n)</math> be a multivariate polynomial of degree <math>d</math> defined over a field <math>\mathbb{F}</math>. Fix any finite set <math>S\subset\mathbb{F}</math>, and let <math>r_1,\ldots,r_n</math> be chosen independently and uniformly at random from <math>S</math>. Then
::<math>\Pr[Q(r_1,\ldots,r_n)=0\mid Q\not\equiv 0]\le\frac{d}{|S|}.</math>
}}
{{Proof| The theorem holds if <math>Q</math> is a single-variate polynomial, because a single-variate polynomial <math>Q</math> of degree <math>d</math> has at most <math>d</math> roots, i.e. there are at most <math>d</math> many choices of <math>r</math> having <math>Q(r)=0</math>, so the theorem follows immediately.
 
For multi-variate <math>Q</math>, we prove by induction on the number of variables <math>n</math>.
 
Write <math>Q(x_1,\ldots,x_n)</math> as
:<math>
Q(x_1,\ldots,x_n) = \sum_{i=0}^kx_n^kQ_i(x_1,\ldots,x_{n-1})
</math>
where <math>k</math> is the largest exponent of <math>x_n</math> in <math>Q(x_1,\ldots,x_n)</math>. So <math>Q_k(x_1,\ldots,x_{n-1}) \not\equiv 0</math> by our definition of <math>k</math>, and its degree is at most <math>d-k</math>.
 
Thus by the induction hypothesis we have that <math>\Pr[Q_k(r_1,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}</math>.
 
Conditioning on the event <math>Q_k(r_1,\ldots,r_{n-1})\neq 0</math>, the single-variate polynomial <math>Q'(x_n)=Q(r_1,\ldots,r_{n-1}, x_n)=\sum_{i=0}^kx_n^kQ_i(r_1,\ldots,r_{n-1})</math> has degree <math>k</math> and <math>Q'(x_n)\not\equiv 0</math>, thus
:<math>
\begin{align}
&\quad\,\Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]\\
&=
\Pr[Q'(r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]\\
&\le
\frac{k}{|S|}
\end{align}
</math>.
 
Therefore, due to the law of total probability,
:<math>
\begin{align}
&\quad\,\Pr[Q(r_1,\ldots,r_{n})=0]\\
&=
\Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]\Pr[Q_k(r_1,\ldots,r_{n-1})\neq 0]\\
&\quad\,\,+\Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})= 0]\Pr[Q_k(r_1,\ldots,r_{n-1})= 0]\\
&\le
\Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]+\Pr[Q_k(r_1,\ldots,r_{n-1})= 0]\\
&\le
\frac{k}{|S|}+\frac{d-k}{|S|}\\
&=\frac{d}{|S|}.
\end{align}
</math>
 
}}
 
=== The idea of fingerprinting ===
Suppose we want to compare two items <math>Z_1</math> and <math>Z_2</math>. Instead of comparing them directly, we compute random '''fingerprints''' <math>\mathrm{FING}(Z_1)</math> and <math>\mathrm{FING}(Z_2)</math> and compare these. The fingerprints has the following properties:
* <math>\mathrm{FING}(\cdot)</math> is a function, which means that if <math>Z_1= Z_2</math> then <math>\mathrm{FING}(Z_1)=\mathrm{FING}(Z_2)</math>.
* If <math>Z_1\neq Z_2</math> then <math>\Pr[\mathrm{FING}(Z_1)=\mathrm{FING}(Z_2)]</math> is small.
* It is much more to compute and compare the fingerprints than to compare <math>Z_1</math> and <math>Z_2</math> directly.
 
For Freivald's algorithm, the items to compare are two <math>n\times n</math> matrices <math>AB</math> and <math>C</math>, and given an <math>n\times n</math> matrix <math>M</math>, its random fingerprint is computed as <math>\mathrm{FING}(M)=Mr</math> for a uniformly random <math>r\in\{0,1\}^n</math>.
 
For the Schwartz-Zippel algorithm, the items to compare are two polynomials <math>P_1(x_1,\ldots,x_n)</math> and <math>P_2(x_1,\ldots,x_n)</math>, and given a polynomial <math>Q(x_1,\ldots,x_n)</math>, its random fingerprint is computed as <math>\mathrm{FING}(Q)=Q(r_1,\ldots,r_n)</math> for <math>r_i</math> chosen independently and uniformly at random from some fixed set <math>S</math>.
 
For different problems, we may have different definitions of <math>\mathrm{FING}(\cdot)</math>.
 
== Communication complexity ==
Alice and Bob are two entities. Alice has a private input <math>x</math> and Bob has a private input <math>y</math>. Together they want to compute a function <math>f(x,y)</math> by communicating with each other. This is the model of '''communication complexity''' introduced by Yao in 1979.
 
In the communication complexity model, the local computational costs are ignored. The complexity of algorithms (also called communication protocols here) are measured by the number of bits communicated between Alice and Bob.
 
A basic function is EQ, defined as
:<math>
\mathrm{EQ}(x,y)=
\begin{cases}
1& \mbox{if } x=y,\\
0& \mbox{otherwise.}
\end{cases}</math>
This function corresponds to the problem that two far apart entities Alice and Bob, each has a copy of a database (Alice's copy is <math>x</math>, and Bob's copy is <math>y</math>), and they want to compare whether their copies of the database are identical.
 
A trivial way to solve EQ is to let Bob send <math>y</math> to Alice. Supposed that <math>x,y\in\{0,1\}^n</math>, this costs <math>n</math> bits of communications.
 
It is known that for deterministic communication protocols, this is the best we can get for computing EQ.
 
{{Theorem|Theorem (Yao 1979)|
:Any deterministic communication protocol computing EQ on two <math>n</math>-bit strings costs <math>n</math> bits of communication in the worst-case.
}}
 
This theorem is much more nontrivial to prove than it looks, because Alice and Bob are allowed to interact with each other in arbitrary ways. How to prove such lower bounds is not today's topic.
 
If the randomness is allowed, we can use the idea of fingerprinting to solve this problem with significantly less communications. The general framework for the algorithm is as follows:
* Alice choose a random fingerprint function <math>\mathrm{FING}(\cdot)</math> and compute the fingerprint of her input <math>\mathrm{FING}(x)</math>;
* Alice sends both the description of <math>\mathrm{FING}(\cdot)</math> and the value of <math>\mathrm{FING}(x)</math> to Bob;
* Bob computes <math>\mathrm{FING}(y)</math> and check whether <math>\mathrm{FING}(x)=\mathrm{FING}(y)</math>.
 
So the question is, how to design this random fingerprint function <math>\mathrm{FING}(\cdot)</math> to guarantee:
# A random <math>\mathrm{FING}(\cdot)</math> can be described succinctly.
# The range of <math>\mathrm{FING}(\cdot)</math> is small, so the fingerprints are succinct.
# If <math>x\neq y</math>, the probability <math>\Pr[\mathrm{FING}(x)=\mathrm{FING}(y)]</math> is small.
 
The fingerprint function we choose is as follows: by treating the input string <math>x\in\{0,1\}^n</math> as the binary representation of a number, let <math>\mathrm{FING}(x)=x\bmod p</math> for some random prime <math>p</math>. The prime <math>p</math> can uniquely specify a random fingerprint function <math>\mathrm{FING}(\cdot)</math>, thus can be used as a description of the function, and alos the range of the fingerprints is <math>[p]</math>, thus we want the prime <math>p</math> to be reasonably small, but still has a good chance to distinguish different <math>x</math> and <math>y</math> after modulo <math>p</math>.
 
{{Theorem|A randomized protocol for EQ|
'''Alice does''':
:for some parameter <math>k</math> (to be specified),
:* choose uniformly at random a prime <math>p\in[k]</math>;
:* send <math>p</math> and <math>x\bmod p</math> to Bob;
'''Upon receiving''' <math>p</math> and <math>x\bmod p</math>, '''Bob does''':
:* check whether <math>x\bmod p=y\bmod p</math>.
}}
The number of bits to be communicated is <math>O(\log k)</math>. We then bound  the probability of error <math>\Pr[x\bmod p=y\bmod p]</math> for <math>x\neq y</math>, in terms of <math>k</math>.
 
Suppose without loss of generality <math>x>y</math>. Let <math>z=x-y</math>. Then <math>z<2^n</math> since <math>x,y\in[2^n]</math>, and
<math>z\neq 0</math> for <math>x\neq y</math>.  It holds that <math>x\bmod p=y\bmod p</math> if and only if <math>z</math> is dividable by <math>p</math>. Note that <math>z<2^n</math> since <math>x,y\in[2^n]</math>. We only need to bound the probability
:<math>\Pr[z\bmod p=0]</math> for <math>0<z<2^n</math>, where <math>p</math> is a random prime chosen from <math>[k]</math>.
 
The probability <math>\Pr[z\bmod p=0]</math> is computed directly as
:<math>\Pr[z\bmod p=0]\le\frac{\mbox{the number of prime divisors of }z}{\mbox{the number of primes in }[k]}</math>.
 
For the numerator, we have the following lemma.
 
{{Theorem|Lemma|
:The number of distinct prime divisors of any natural number less than <math>2^n</math> is at most <math>n</math>.
}}
{{Proof| Each prime number is <math>\ge2</math>. If an <math>N>0</math> has more than <math>n</math> distinct prime divisors, then <math>N\ge 2^n</math>.
}}
 
Due to this lemma, <math>z</math> has at most <math>n</math> prime divisors.
 
We then lower bound the number of primes in <math>[k]</math>. This is given by the celebrated [http://en.wikipedia.org/wiki/Prime_number_theorem '''Prime Number Theorem (PNT)'''].
 
{{Theorem|Prime Number Theorem|
:Let <math>\pi(k)</math> denote the number of primes less than <math>k</math>. Then <math>\pi(k)\sim\frac{k}{\ln k}</math> as <math>k\rightarrow\infty</math>.
}}
 
Therefore, by choosing <math>k=tn\ln tn</math> for some <math>t</math>, we have that for a <math>0<z<2^n</math>, and a random prime <math>p\in[k]</math>,
:<math>\Pr[z\bmod p=0]\le\frac{n}{\pi(k)}\sim\frac{1}{t}</math>.
We can make this error probability polynomially small and the number of bits to be communicated is still <math>O(\log k)=O(\log n)</math>.
 
=== Application: Randomized pattern matching ===
Consider the following problem of pattern matching, which has nothing to do with communication complexity.
 
*Input: a string <math>x\in\{0,1\}^n</math> and a "pattern" <math>y\in\{0,1\}^m</math>.
*Determine whether the pattern <math>y</math> is a contiguous  substring of <math>x</math>. Usually, we are also asked to find the location of the substring.
 
A naive algorithm trying every possible match runs in <math>O(nm)</math> time. The more sophisticated KMP algorithm inspired by automaton theory runs in <math>O(n+m)</math> time.
 
A simple randomized algorithm, due to Karp and Rabin, uses the idea of fingerprinting and also runs in <math>O(n + m)</math> time.
 
Let <math>X(j)=x_jx_{j+1}\cdots x_{j+m-1}</math> denote the substring of <math>x</math> of length <math>m</math> starting at position <math>j</math>.
 
{{Theorem|Algorithm (Karp-Rabin)|
:pick a random prime <math>p\in[k]</math>;
:'''for''' <math>j = 1</math> to <math>n -m + 1</math> '''do'''
::'''if''' <math>X(j)\bmod p = y \bmod p</math> then report a match;
:'''return''' "no match";
}}
 
So the algorithm just compares the <math>\mathrm{FING}(X(j))</math> and <math>\mathrm{FING}(y)</math> for every <math>j</math>, with the same definition of fingerprint function <math>\mathrm{FING}(\cdot)</math> as in the communication protocol for EQ.
 
By the same analysis, by choosing <math>k=n^2m\ln (n^2m)</math>, the probability of a single false match is
:<math>\Pr[X(j)\bmod p=y\bmod p\mid X(j)\neq y ]=O\left(\frac{1}{n^2}\right)</math>.
By the union bound, the probability that a false match occurs is <math>O\left(\frac{1}{n}\right)</math>.
 
The algorithm runs in linear time if we assume that we can compute <math>X(j)\bmod p </math> for each <math>j</math> in constant time. This outrageous assumption can be made realistic by the following observation.
 
{{Theorem|Lemma|
:Let <math>\mathrm{FING}(a)=a\bmod p</math>.
::<math>\mathrm{FING}(X(j+1))\equiv2(\mathrm{FING}(X(j))-2^{m-1}x_j)+x_{j+m}\pmod p\,</math>.
}}
{{Proof| It holds that
:<math>X(j+1)=2(X(j)-2^{m-1}x_j)+x_{j+m}\,</math>.
So the equation holds on the finite field modulo <math>p</math>.
}}
 
Due to this lemma, each fingerprint <math>\mathrm{FING}(X(j))</math> can be computed in an incremental way, each in constant time. The running time of the algorithm is <math>O(n+m)</math>.
 
== Checking distinctness ==
Consider the following problem:
*Given a sequence <math>x_1,x_2,\ldots,x_n\in\{1,2,\ldots,n\}</math>, check whether every member of <math>\{1,2,\ldots,n\}</math> appears '''exactly''' once.
 
This problem is called detecting duplicate or checking distinctness. It can be solved in linear time and linear space by a straightforward algorithm.
 
For many real applications, the <math>n</math> is enormously large, and we would like to have an algorithm using very limited extra space.
 
=== Fingerprinting multisets ===
A randomized algorithm due to Lipton, checks distinctness by solving a more general problem '''fingerprinting multisets'''.
 
Given a multiset (each member may appear more than once) <math>M=\{x_1,x_2,\ldots,x_n\}</math>, its fingerprint is defined as
:<math>
\mathrm{FING}(M)=\prod_{i=1}^n(r-x_i)\bmod p,
</math>
where <math>p</math> is a random prime chosen from the interval <math>[(n\log n)^2,2(n\log n)^2]</math>, and <math>r</math> is chosen uniformly at random from <math>[p]</math>.
 
We first see that the space reuqired to compute <math>\mathrm{FING}(M)</math> is only <math>O(\log n)</math> bits. We do not need to compute the value of the product <math>\prod_{i=1}^n(r-x_i)</math>. Instead, all the computation can be done in the finite field <math>\mathbb{Z}_p</math> (you need some knowledge in algebra to understand this). So the space requirement is only <math>O(\log p)=O(\log n)</math>.
 
It is easy to see that the above fingerprint function is invariant under permutations, thus one multiset has only one fingerprint. The next theorem due to Lipton, states that the probability that two distinct multisets have the same fingerprint is small.
 
{{Theorem|Theorem (Lipton 1989)|
:Let <math>M_1=\{x_1,x_2,\ldots,x_n\}</math> and <math>M_2=\{y_1,y_2,\ldots,y_n\}</math> be two multisets whose members are from <math>\{1,2,\ldots,n\}</math>. If <math>M_1\neq M_2</math>, then
::<math>\Pr[\mathrm{FING}(M_1)= \mathrm{FING}(M_2)]=O\left(\frac{1}{n}\right)</math>.
}}
{{Proof| Let <math>P_1(u)=\prod_{i=1}^n(u-x_i)</math> and <math>P_2(u)=\prod_{i=1}^n(u-y_i)</math>, and let <math>Q(u)=P_1(u)-P_2(u)</math>.
 
If <math>M_1\neq M_2</math>, then the polynomials <math>P_1</math> and <math>P_2</math> are not identical, and the polynomial <math>Q\not\equiv 0</math>. We only need to show that <math>\Pr[Q(r)\bmod =0]\,</math> is small for our choice of random <math>p</math> and <math>r</math>.
 
We first show that the probability that <math>Q\equiv 0 \bmod p</math> is small. Then apply the Schwartz-Zippel technique to show that conditioning on that <math>Q\not\equiv 0 \bmod p</math>, the probability <math>Q(r) \bmod p=0</math> is small.
 
Expand <math>Q(u)</math> in form of <math>Q(u)=\sum_{k=0}^n a_i u^k</math>. It can be verified by induction on <math>n</math> that for any coefficient <math>a_k</math>, it holds that <math>|a_k|\le 2n\cdot 2^{n}</math>.
 
Since <math>Q\not\equiv 0</math>, it has at least one nonzero coefficient <math>c\neq 0</math>, and it holds that <math>|c|\le 2n\cdot 2^{n}</math>. Then with the same analysis as in the problem of EQ, by our choice of random <math>p</math>,
:<math>\Pr[c\bmod p=0]=O\left(\frac{1}{n}\right)</math>.
 
Conditioning on that <math>Q\not\equiv 0</math> after modulo <math>p</math>, since the degree of <math>Q</math> is at most <math>n</math>, <math>Q</math> has at most <math>n</math> roots. Since <math>p\in[(n\log n)^2,2(n\log n)^2]</math>, therefore <math>r</math> is uniformly distributed over a set of size at least <math>(n\log n)^2</math>. Therefore, the probability that <math>Q(r)\bmod p =0</math> conditioning on that <math>Q\not\equiv 0\bmod p</math> is at most <math>\frac{1}{n(\log n)^2}</math>.
 
By the union bound,
<math>\Pr[\mathrm{FING}(M_1)= \mathrm{FING}(M_2)]=O\left(\frac{1}{n}\right)</math>.
}}
 
=== Checking distinctness by fingerprinting multisets ===
We now have a fingerprint function for multisets, which is useful for checking the identities of multisets. To solve the problem of checking distinctness, we just check whether the input multiset <math>M</math> is identical to the set <math>\{1,2,\ldots,n\}</math>.

Latest revision as of 01:24, 8 June 2010

Checking identities

Many applications in Computer Science require to efficiently check whether two complex objects are identical, while the objects are presented implicitly (e.g., as black-boxes). We consider two examples. One is to check the result of multiplying two matrices, and the other is to check the identity of two polynomials.

Example: Checking matrix multiplication

Consider the following problem:

  • Given as the input three [math]\displaystyle{ n\times n }[/math] matrices [math]\displaystyle{ A,B }[/math] and [math]\displaystyle{ C }[/math],
  • check whether [math]\displaystyle{ C=AB }[/math].

We could compute [math]\displaystyle{ AB }[/math] and compare the result to [math]\displaystyle{ C }[/math]. The time complexity of fastest matrix multiplication algorithm (in theory) is [math]\displaystyle{ O(n^{2.376}) }[/math], and so is the time complexity of this method.

Here’s a very simple randomized algorithm, due to Freivalds, that runs in only [math]\displaystyle{ O(n^2) }[/math] time:

Algorithm (Freivalds)
  • pick a vector [math]\displaystyle{ r \in\{0, 1\}^n }[/math] uniformly at random;
  • if [math]\displaystyle{ A(Br) = Cr }[/math] then return "yes" else return "no";

The running time of the algorithm is [math]\displaystyle{ O(n^2) }[/math] because it does only 3 matrix-vector multiplications.

If [math]\displaystyle{ AB=C }[/math] then [math]\displaystyle{ A(Br) = Cr }[/math] for any [math]\displaystyle{ r \in\{0, 1\}^n }[/math], thus the algorithm always returns "yes". But if [math]\displaystyle{ AB \neq C }[/math] then the algorithm will make a mistake if it happens to choose an [math]\displaystyle{ r }[/math] for which [math]\displaystyle{ ABr = Cr }[/math]. This, however, is unlikely:

Lemma
If [math]\displaystyle{ AB\neq C }[/math] then for a uniformly random [math]\displaystyle{ r \in\{0, 1\}^n }[/math],
[math]\displaystyle{ \Pr[ABr = Cr]\le \frac{1}{2} }[/math].
Proof.
Let [math]\displaystyle{ D=AB-C }[/math]. We will show that if [math]\displaystyle{ D\neq \boldsymbol{0} }[/math], then [math]\displaystyle{ \Pr[Dr = \boldsymbol{0}]\le \frac{1}{2} }[/math] for a uniform random [math]\displaystyle{ r \in\{0, 1\}^n }[/math].

Since [math]\displaystyle{ D\neq \boldsymbol{0} }[/math], it must have at least one non-zero entry, say [math]\displaystyle{ D(i,j)\neq 0 }[/math]. The [math]\displaystyle{ i }[/math]-th entry of [math]\displaystyle{ Dr }[/math] is

[math]\displaystyle{ (Dr)_i=\sum_{k=1}^nD(i,k)r_k }[/math].

If to the contrary, [math]\displaystyle{ Dr=\boldsymbol{0} }[/math], then [math]\displaystyle{ (Dr)_i=\sum_{k=1}^nD(i,k)r_k=0 }[/math], which is equivalent to that

[math]\displaystyle{ r_j=-\frac{1}{D(i,j)}\sum_{k\neq j}^nD(i,k)r_k }[/math],

i.e. once [math]\displaystyle{ r_k }[/math] for [math]\displaystyle{ k\neq j }[/math] have been chosen, there is only one value of [math]\displaystyle{ r_j }[/math] that would give us a zero [math]\displaystyle{ Dr }[/math]. However, there are two possible values [math]\displaystyle{ \{0,1\} }[/math] for [math]\displaystyle{ r_j }[/math] which are equal-probable, so with at least [math]\displaystyle{ \frac{1}{2} }[/math] probability, the choice of [math]\displaystyle{ r }[/math] fails to give us a zero [math]\displaystyle{ Dr }[/math].

[math]\displaystyle{ \square }[/math]

Example: Checking polynomial identities

Consider the following problem:

  • Given as the input two multivariate polynomials [math]\displaystyle{ P_1(x_1,\ldots,x_n) }[/math] and [math]\displaystyle{ P_2(x_1,\ldots,x_n) }[/math],
  • check whether the two polynomials are identical, denoted [math]\displaystyle{ P_1\equiv P_2 }[/math].

Obviously, if [math]\displaystyle{ P_1, P_2 }[/math] are written out explicitly, the question is trivially answered in linear time just by comparing their coefficients. But in practice they are usually given in very compact form (e.g., as determinants of matrices), so that we can evaluate them efficiently, but expanding them out and looking at their coefficients is out of the question.

Example

Consider the polynomial

[math]\displaystyle{ P(x_1,\ldots,x_n)=\prod_{\overset{i\lt j}{i,j\neq 1}}(x_i-x_j)-\prod_{\overset{i\lt j}{i,j\neq 2}}(x_i-x_j)+\prod_{\overset{i\lt j}{i,j\neq 3}}(x_i-x_j)-\cdots+(-1)^{n-1}\prod_{\overset{i\lt j}{i,j\neq n}}(x_i-x_j) }[/math]

Show that evaluating [math]\displaystyle{ P }[/math] at any given point can be done efficiently, but that expanding out [math]\displaystyle{ P }[/math] to find all its coefficients is computationally infeasible even for moderate values of [math]\displaystyle{ n }[/math].

Here is a very simple randomized algorithm, due to Schwartz and Zippel. Testing [math]\displaystyle{ P_1\equiv P_2 }[/math] is equivalent to testing [math]\displaystyle{ P\equiv 0 }[/math], where [math]\displaystyle{ P = P_1 - P_2 }[/math].

Algorithm (Schwartz-Zippel)
  • pick [math]\displaystyle{ r_1, \ldots , r_n }[/math] independently and uniformly at random from a set [math]\displaystyle{ S }[/math];
  • if [math]\displaystyle{ P_1(r_1, \ldots , r_n) = P_2(r_1, \ldots , r_n) }[/math] then return “yes” else return “no”;

This algorithm requires only the evaluation of [math]\displaystyle{ P }[/math] at a single point. And if [math]\displaystyle{ P\equiv 0 }[/math] it is always correct.

In the Theorem below, we’ll see that if [math]\displaystyle{ P\neq 0 }[/math] then the algorithm is incorrect with probability at most [math]\displaystyle{ \frac{d}{|S|} }[/math], where [math]\displaystyle{ d }[/math] is the maximum degree of the polynomial [math]\displaystyle{ P }[/math].

Theorem (Schwartz-Zippel)
Let [math]\displaystyle{ Q(x_1,\ldots,x_n) }[/math] be a multivariate polynomial of degree [math]\displaystyle{ d }[/math] defined over a field [math]\displaystyle{ \mathbb{F} }[/math]. Fix any finite set [math]\displaystyle{ S\subset\mathbb{F} }[/math], and let [math]\displaystyle{ r_1,\ldots,r_n }[/math] be chosen independently and uniformly at random from [math]\displaystyle{ S }[/math]. Then
[math]\displaystyle{ \Pr[Q(r_1,\ldots,r_n)=0\mid Q\not\equiv 0]\le\frac{d}{|S|}. }[/math]
Proof.
The theorem holds if [math]\displaystyle{ Q }[/math] is a single-variate polynomial, because a single-variate polynomial [math]\displaystyle{ Q }[/math] of degree [math]\displaystyle{ d }[/math] has at most [math]\displaystyle{ d }[/math] roots, i.e. there are at most [math]\displaystyle{ d }[/math] many choices of [math]\displaystyle{ r }[/math] having [math]\displaystyle{ Q(r)=0 }[/math], so the theorem follows immediately.

For multi-variate [math]\displaystyle{ Q }[/math], we prove by induction on the number of variables [math]\displaystyle{ n }[/math].

Write [math]\displaystyle{ Q(x_1,\ldots,x_n) }[/math] as

[math]\displaystyle{ Q(x_1,\ldots,x_n) = \sum_{i=0}^kx_n^kQ_i(x_1,\ldots,x_{n-1}) }[/math]

where [math]\displaystyle{ k }[/math] is the largest exponent of [math]\displaystyle{ x_n }[/math] in [math]\displaystyle{ Q(x_1,\ldots,x_n) }[/math]. So [math]\displaystyle{ Q_k(x_1,\ldots,x_{n-1}) \not\equiv 0 }[/math] by our definition of [math]\displaystyle{ k }[/math], and its degree is at most [math]\displaystyle{ d-k }[/math].

Thus by the induction hypothesis we have that [math]\displaystyle{ \Pr[Q_k(r_1,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|} }[/math].

Conditioning on the event [math]\displaystyle{ Q_k(r_1,\ldots,r_{n-1})\neq 0 }[/math], the single-variate polynomial [math]\displaystyle{ Q'(x_n)=Q(r_1,\ldots,r_{n-1}, x_n)=\sum_{i=0}^kx_n^kQ_i(r_1,\ldots,r_{n-1}) }[/math] has degree [math]\displaystyle{ k }[/math] and [math]\displaystyle{ Q'(x_n)\not\equiv 0 }[/math], thus

[math]\displaystyle{ \begin{align} &\quad\,\Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]\\ &= \Pr[Q'(r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]\\ &\le \frac{k}{|S|} \end{align} }[/math].

Therefore, due to the law of total probability,

[math]\displaystyle{ \begin{align} &\quad\,\Pr[Q(r_1,\ldots,r_{n})=0]\\ &= \Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]\Pr[Q_k(r_1,\ldots,r_{n-1})\neq 0]\\ &\quad\,\,+\Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})= 0]\Pr[Q_k(r_1,\ldots,r_{n-1})= 0]\\ &\le \Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]+\Pr[Q_k(r_1,\ldots,r_{n-1})= 0]\\ &\le \frac{k}{|S|}+\frac{d-k}{|S|}\\ &=\frac{d}{|S|}. \end{align} }[/math]


[math]\displaystyle{ \square }[/math]

The idea of fingerprinting

Suppose we want to compare two items [math]\displaystyle{ Z_1 }[/math] and [math]\displaystyle{ Z_2 }[/math]. Instead of comparing them directly, we compute random fingerprints [math]\displaystyle{ \mathrm{FING}(Z_1) }[/math] and [math]\displaystyle{ \mathrm{FING}(Z_2) }[/math] and compare these. The fingerprints has the following properties:

  • [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] is a function, which means that if [math]\displaystyle{ Z_1= Z_2 }[/math] then [math]\displaystyle{ \mathrm{FING}(Z_1)=\mathrm{FING}(Z_2) }[/math].
  • If [math]\displaystyle{ Z_1\neq Z_2 }[/math] then [math]\displaystyle{ \Pr[\mathrm{FING}(Z_1)=\mathrm{FING}(Z_2)] }[/math] is small.
  • It is much more to compute and compare the fingerprints than to compare [math]\displaystyle{ Z_1 }[/math] and [math]\displaystyle{ Z_2 }[/math] directly.

For Freivald's algorithm, the items to compare are two [math]\displaystyle{ n\times n }[/math] matrices [math]\displaystyle{ AB }[/math] and [math]\displaystyle{ C }[/math], and given an [math]\displaystyle{ n\times n }[/math] matrix [math]\displaystyle{ M }[/math], its random fingerprint is computed as [math]\displaystyle{ \mathrm{FING}(M)=Mr }[/math] for a uniformly random [math]\displaystyle{ r\in\{0,1\}^n }[/math].

For the Schwartz-Zippel algorithm, the items to compare are two polynomials [math]\displaystyle{ P_1(x_1,\ldots,x_n) }[/math] and [math]\displaystyle{ P_2(x_1,\ldots,x_n) }[/math], and given a polynomial [math]\displaystyle{ Q(x_1,\ldots,x_n) }[/math], its random fingerprint is computed as [math]\displaystyle{ \mathrm{FING}(Q)=Q(r_1,\ldots,r_n) }[/math] for [math]\displaystyle{ r_i }[/math] chosen independently and uniformly at random from some fixed set [math]\displaystyle{ S }[/math].

For different problems, we may have different definitions of [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math].

Communication complexity

Alice and Bob are two entities. Alice has a private input [math]\displaystyle{ x }[/math] and Bob has a private input [math]\displaystyle{ y }[/math]. Together they want to compute a function [math]\displaystyle{ f(x,y) }[/math] by communicating with each other. This is the model of communication complexity introduced by Yao in 1979.

In the communication complexity model, the local computational costs are ignored. The complexity of algorithms (also called communication protocols here) are measured by the number of bits communicated between Alice and Bob.

A basic function is EQ, defined as

[math]\displaystyle{ \mathrm{EQ}(x,y)= \begin{cases} 1& \mbox{if } x=y,\\ 0& \mbox{otherwise.} \end{cases} }[/math]

This function corresponds to the problem that two far apart entities Alice and Bob, each has a copy of a database (Alice's copy is [math]\displaystyle{ x }[/math], and Bob's copy is [math]\displaystyle{ y }[/math]), and they want to compare whether their copies of the database are identical.

A trivial way to solve EQ is to let Bob send [math]\displaystyle{ y }[/math] to Alice. Supposed that [math]\displaystyle{ x,y\in\{0,1\}^n }[/math], this costs [math]\displaystyle{ n }[/math] bits of communications.

It is known that for deterministic communication protocols, this is the best we can get for computing EQ.

Theorem (Yao 1979)
Any deterministic communication protocol computing EQ on two [math]\displaystyle{ n }[/math]-bit strings costs [math]\displaystyle{ n }[/math] bits of communication in the worst-case.

This theorem is much more nontrivial to prove than it looks, because Alice and Bob are allowed to interact with each other in arbitrary ways. How to prove such lower bounds is not today's topic.

If the randomness is allowed, we can use the idea of fingerprinting to solve this problem with significantly less communications. The general framework for the algorithm is as follows:

  • Alice choose a random fingerprint function [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] and compute the fingerprint of her input [math]\displaystyle{ \mathrm{FING}(x) }[/math];
  • Alice sends both the description of [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] and the value of [math]\displaystyle{ \mathrm{FING}(x) }[/math] to Bob;
  • Bob computes [math]\displaystyle{ \mathrm{FING}(y) }[/math] and check whether [math]\displaystyle{ \mathrm{FING}(x)=\mathrm{FING}(y) }[/math].

So the question is, how to design this random fingerprint function [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] to guarantee:

  1. A random [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] can be described succinctly.
  2. The range of [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] is small, so the fingerprints are succinct.
  3. If [math]\displaystyle{ x\neq y }[/math], the probability [math]\displaystyle{ \Pr[\mathrm{FING}(x)=\mathrm{FING}(y)] }[/math] is small.

The fingerprint function we choose is as follows: by treating the input string [math]\displaystyle{ x\in\{0,1\}^n }[/math] as the binary representation of a number, let [math]\displaystyle{ \mathrm{FING}(x)=x\bmod p }[/math] for some random prime [math]\displaystyle{ p }[/math]. The prime [math]\displaystyle{ p }[/math] can uniquely specify a random fingerprint function [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math], thus can be used as a description of the function, and alos the range of the fingerprints is [math]\displaystyle{ [p] }[/math], thus we want the prime [math]\displaystyle{ p }[/math] to be reasonably small, but still has a good chance to distinguish different [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math] after modulo [math]\displaystyle{ p }[/math].

A randomized protocol for EQ

Alice does:

for some parameter [math]\displaystyle{ k }[/math] (to be specified),
  • choose uniformly at random a prime [math]\displaystyle{ p\in[k] }[/math];
  • send [math]\displaystyle{ p }[/math] and [math]\displaystyle{ x\bmod p }[/math] to Bob;

Upon receiving [math]\displaystyle{ p }[/math] and [math]\displaystyle{ x\bmod p }[/math], Bob does:

  • check whether [math]\displaystyle{ x\bmod p=y\bmod p }[/math].

The number of bits to be communicated is [math]\displaystyle{ O(\log k) }[/math]. We then bound the probability of error [math]\displaystyle{ \Pr[x\bmod p=y\bmod p] }[/math] for [math]\displaystyle{ x\neq y }[/math], in terms of [math]\displaystyle{ k }[/math].

Suppose without loss of generality [math]\displaystyle{ x\gt y }[/math]. Let [math]\displaystyle{ z=x-y }[/math]. Then [math]\displaystyle{ z\lt 2^n }[/math] since [math]\displaystyle{ x,y\in[2^n] }[/math], and [math]\displaystyle{ z\neq 0 }[/math] for [math]\displaystyle{ x\neq y }[/math]. It holds that [math]\displaystyle{ x\bmod p=y\bmod p }[/math] if and only if [math]\displaystyle{ z }[/math] is dividable by [math]\displaystyle{ p }[/math]. Note that [math]\displaystyle{ z\lt 2^n }[/math] since [math]\displaystyle{ x,y\in[2^n] }[/math]. We only need to bound the probability

[math]\displaystyle{ \Pr[z\bmod p=0] }[/math] for [math]\displaystyle{ 0\lt z\lt 2^n }[/math], where [math]\displaystyle{ p }[/math] is a random prime chosen from [math]\displaystyle{ [k] }[/math].

The probability [math]\displaystyle{ \Pr[z\bmod p=0] }[/math] is computed directly as

[math]\displaystyle{ \Pr[z\bmod p=0]\le\frac{\mbox{the number of prime divisors of }z}{\mbox{the number of primes in }[k]} }[/math].

For the numerator, we have the following lemma.

Lemma
The number of distinct prime divisors of any natural number less than [math]\displaystyle{ 2^n }[/math] is at most [math]\displaystyle{ n }[/math].
Proof.
Each prime number is [math]\displaystyle{ \ge2 }[/math]. If an [math]\displaystyle{ N\gt 0 }[/math] has more than [math]\displaystyle{ n }[/math] distinct prime divisors, then [math]\displaystyle{ N\ge 2^n }[/math].
[math]\displaystyle{ \square }[/math]

Due to this lemma, [math]\displaystyle{ z }[/math] has at most [math]\displaystyle{ n }[/math] prime divisors.

We then lower bound the number of primes in [math]\displaystyle{ [k] }[/math]. This is given by the celebrated Prime Number Theorem (PNT).

Prime Number Theorem
Let [math]\displaystyle{ \pi(k) }[/math] denote the number of primes less than [math]\displaystyle{ k }[/math]. Then [math]\displaystyle{ \pi(k)\sim\frac{k}{\ln k} }[/math] as [math]\displaystyle{ k\rightarrow\infty }[/math].

Therefore, by choosing [math]\displaystyle{ k=tn\ln tn }[/math] for some [math]\displaystyle{ t }[/math], we have that for a [math]\displaystyle{ 0\lt z\lt 2^n }[/math], and a random prime [math]\displaystyle{ p\in[k] }[/math],

[math]\displaystyle{ \Pr[z\bmod p=0]\le\frac{n}{\pi(k)}\sim\frac{1}{t} }[/math].

We can make this error probability polynomially small and the number of bits to be communicated is still [math]\displaystyle{ O(\log k)=O(\log n) }[/math].

Application: Randomized pattern matching

Consider the following problem of pattern matching, which has nothing to do with communication complexity.

  • Input: a string [math]\displaystyle{ x\in\{0,1\}^n }[/math] and a "pattern" [math]\displaystyle{ y\in\{0,1\}^m }[/math].
  • Determine whether the pattern [math]\displaystyle{ y }[/math] is a contiguous substring of [math]\displaystyle{ x }[/math]. Usually, we are also asked to find the location of the substring.

A naive algorithm trying every possible match runs in [math]\displaystyle{ O(nm) }[/math] time. The more sophisticated KMP algorithm inspired by automaton theory runs in [math]\displaystyle{ O(n+m) }[/math] time.

A simple randomized algorithm, due to Karp and Rabin, uses the idea of fingerprinting and also runs in [math]\displaystyle{ O(n + m) }[/math] time.

Let [math]\displaystyle{ X(j)=x_jx_{j+1}\cdots x_{j+m-1} }[/math] denote the substring of [math]\displaystyle{ x }[/math] of length [math]\displaystyle{ m }[/math] starting at position [math]\displaystyle{ j }[/math].

Algorithm (Karp-Rabin)
pick a random prime [math]\displaystyle{ p\in[k] }[/math];
for [math]\displaystyle{ j = 1 }[/math] to [math]\displaystyle{ n -m + 1 }[/math] do
if [math]\displaystyle{ X(j)\bmod p = y \bmod p }[/math] then report a match;
return "no match";

So the algorithm just compares the [math]\displaystyle{ \mathrm{FING}(X(j)) }[/math] and [math]\displaystyle{ \mathrm{FING}(y) }[/math] for every [math]\displaystyle{ j }[/math], with the same definition of fingerprint function [math]\displaystyle{ \mathrm{FING}(\cdot) }[/math] as in the communication protocol for EQ.

By the same analysis, by choosing [math]\displaystyle{ k=n^2m\ln (n^2m) }[/math], the probability of a single false match is

[math]\displaystyle{ \Pr[X(j)\bmod p=y\bmod p\mid X(j)\neq y ]=O\left(\frac{1}{n^2}\right) }[/math].

By the union bound, the probability that a false match occurs is [math]\displaystyle{ O\left(\frac{1}{n}\right) }[/math].

The algorithm runs in linear time if we assume that we can compute [math]\displaystyle{ X(j)\bmod p }[/math] for each [math]\displaystyle{ j }[/math] in constant time. This outrageous assumption can be made realistic by the following observation.

Lemma
Let [math]\displaystyle{ \mathrm{FING}(a)=a\bmod p }[/math].
[math]\displaystyle{ \mathrm{FING}(X(j+1))\equiv2(\mathrm{FING}(X(j))-2^{m-1}x_j)+x_{j+m}\pmod p\, }[/math].
Proof.
It holds that
[math]\displaystyle{ X(j+1)=2(X(j)-2^{m-1}x_j)+x_{j+m}\, }[/math].

So the equation holds on the finite field modulo [math]\displaystyle{ p }[/math].

[math]\displaystyle{ \square }[/math]

Due to this lemma, each fingerprint [math]\displaystyle{ \mathrm{FING}(X(j)) }[/math] can be computed in an incremental way, each in constant time. The running time of the algorithm is [math]\displaystyle{ O(n+m) }[/math].

Checking distinctness

Consider the following problem:

  • Given a sequence [math]\displaystyle{ x_1,x_2,\ldots,x_n\in\{1,2,\ldots,n\} }[/math], check whether every member of [math]\displaystyle{ \{1,2,\ldots,n\} }[/math] appears exactly once.

This problem is called detecting duplicate or checking distinctness. It can be solved in linear time and linear space by a straightforward algorithm.

For many real applications, the [math]\displaystyle{ n }[/math] is enormously large, and we would like to have an algorithm using very limited extra space.

Fingerprinting multisets

A randomized algorithm due to Lipton, checks distinctness by solving a more general problem fingerprinting multisets.

Given a multiset (each member may appear more than once) [math]\displaystyle{ M=\{x_1,x_2,\ldots,x_n\} }[/math], its fingerprint is defined as

[math]\displaystyle{ \mathrm{FING}(M)=\prod_{i=1}^n(r-x_i)\bmod p, }[/math]

where [math]\displaystyle{ p }[/math] is a random prime chosen from the interval [math]\displaystyle{ [(n\log n)^2,2(n\log n)^2] }[/math], and [math]\displaystyle{ r }[/math] is chosen uniformly at random from [math]\displaystyle{ [p] }[/math].

We first see that the space reuqired to compute [math]\displaystyle{ \mathrm{FING}(M) }[/math] is only [math]\displaystyle{ O(\log n) }[/math] bits. We do not need to compute the value of the product [math]\displaystyle{ \prod_{i=1}^n(r-x_i) }[/math]. Instead, all the computation can be done in the finite field [math]\displaystyle{ \mathbb{Z}_p }[/math] (you need some knowledge in algebra to understand this). So the space requirement is only [math]\displaystyle{ O(\log p)=O(\log n) }[/math].

It is easy to see that the above fingerprint function is invariant under permutations, thus one multiset has only one fingerprint. The next theorem due to Lipton, states that the probability that two distinct multisets have the same fingerprint is small.

Theorem (Lipton 1989)
Let [math]\displaystyle{ M_1=\{x_1,x_2,\ldots,x_n\} }[/math] and [math]\displaystyle{ M_2=\{y_1,y_2,\ldots,y_n\} }[/math] be two multisets whose members are from [math]\displaystyle{ \{1,2,\ldots,n\} }[/math]. If [math]\displaystyle{ M_1\neq M_2 }[/math], then
[math]\displaystyle{ \Pr[\mathrm{FING}(M_1)= \mathrm{FING}(M_2)]=O\left(\frac{1}{n}\right) }[/math].
Proof.
Let [math]\displaystyle{ P_1(u)=\prod_{i=1}^n(u-x_i) }[/math] and [math]\displaystyle{ P_2(u)=\prod_{i=1}^n(u-y_i) }[/math], and let [math]\displaystyle{ Q(u)=P_1(u)-P_2(u) }[/math].

If [math]\displaystyle{ M_1\neq M_2 }[/math], then the polynomials [math]\displaystyle{ P_1 }[/math] and [math]\displaystyle{ P_2 }[/math] are not identical, and the polynomial [math]\displaystyle{ Q\not\equiv 0 }[/math]. We only need to show that [math]\displaystyle{ \Pr[Q(r)\bmod =0]\, }[/math] is small for our choice of random [math]\displaystyle{ p }[/math] and [math]\displaystyle{ r }[/math].

We first show that the probability that [math]\displaystyle{ Q\equiv 0 \bmod p }[/math] is small. Then apply the Schwartz-Zippel technique to show that conditioning on that [math]\displaystyle{ Q\not\equiv 0 \bmod p }[/math], the probability [math]\displaystyle{ Q(r) \bmod p=0 }[/math] is small.

Expand [math]\displaystyle{ Q(u) }[/math] in form of [math]\displaystyle{ Q(u)=\sum_{k=0}^n a_i u^k }[/math]. It can be verified by induction on [math]\displaystyle{ n }[/math] that for any coefficient [math]\displaystyle{ a_k }[/math], it holds that [math]\displaystyle{ |a_k|\le 2n\cdot 2^{n} }[/math].

Since [math]\displaystyle{ Q\not\equiv 0 }[/math], it has at least one nonzero coefficient [math]\displaystyle{ c\neq 0 }[/math], and it holds that [math]\displaystyle{ |c|\le 2n\cdot 2^{n} }[/math]. Then with the same analysis as in the problem of EQ, by our choice of random [math]\displaystyle{ p }[/math],

[math]\displaystyle{ \Pr[c\bmod p=0]=O\left(\frac{1}{n}\right) }[/math].

Conditioning on that [math]\displaystyle{ Q\not\equiv 0 }[/math] after modulo [math]\displaystyle{ p }[/math], since the degree of [math]\displaystyle{ Q }[/math] is at most [math]\displaystyle{ n }[/math], [math]\displaystyle{ Q }[/math] has at most [math]\displaystyle{ n }[/math] roots. Since [math]\displaystyle{ p\in[(n\log n)^2,2(n\log n)^2] }[/math], therefore [math]\displaystyle{ r }[/math] is uniformly distributed over a set of size at least [math]\displaystyle{ (n\log n)^2 }[/math]. Therefore, the probability that [math]\displaystyle{ Q(r)\bmod p =0 }[/math] conditioning on that [math]\displaystyle{ Q\not\equiv 0\bmod p }[/math] is at most [math]\displaystyle{ \frac{1}{n(\log n)^2} }[/math].

By the union bound, [math]\displaystyle{ \Pr[\mathrm{FING}(M_1)= \mathrm{FING}(M_2)]=O\left(\frac{1}{n}\right) }[/math].

[math]\displaystyle{ \square }[/math]

Checking distinctness by fingerprinting multisets

We now have a fingerprint function for multisets, which is useful for checking the identities of multisets. To solve the problem of checking distinctness, we just check whether the input multiset [math]\displaystyle{ M }[/math] is identical to the set [math]\displaystyle{ \{1,2,\ldots,n\} }[/math].