Checking Matrix Multiplication

The evolution of time complexity ${\displaystyle O(n^{\omega })}$ for matrix multiplication.

Let ${\displaystyle \mathbb {F} }$ be a feild (you may think of it as the filed ${\displaystyle \mathbb {Q} }$ of rational numbers, or the finite field ${\displaystyle \mathbb {Z} _{p}}$ of integers modulo prime ${\displaystyle p}$). We suppose that each field operation (addition, subtraction, multiplication, division) has unit cost. This model is called the unit-cost RAM model, which is an ideal abstraction of a computer.

Consider the following problem:

• Input: Three ${\displaystyle n\times n}$ matrices ${\displaystyle A}$, ${\displaystyle B}$, and ${\displaystyle C}$ over the field ${\displaystyle \mathbb {F} }$.
• Output: "yes" if ${\displaystyle C=AB}$ and "no" if otherwise.

A naive way to solve this is to multiply ${\displaystyle A}$ and ${\displaystyle B}$ and compare the result with ${\displaystyle C}$. The straightforward algorithm for matrix multiplication takes ${\displaystyle O(n^{3})}$ time, assuming that each arithmetic operation takes unit time. The Strassen's algorithm discovered in 1969 now implemented by many numerical libraries runs in time ${\displaystyle O(n^{\log _{2}7})\approx O(n^{2.81})}$. Strassen's algorithm starts the search for fast matrix multiplication algorithms. The Coppersmith–Winograd algorithm discovered in 1987 runs in time ${\displaystyle O(n^{2.376})}$ but is only faster than Strassens' algorithm on extremely large matrices due to the very large constant coefficient. This has been the best known for decades, until recently Stothers got an ${\displaystyle O(n^{2.374})}$ algorithm in his PhD thesis in 2010, and independently Vassilevska Williams got an ${\displaystyle O(n^{2.373})}$ algorithm in 2012. Both these improvements are based on generalization of Coppersmith–Winograd algorithm. It is unknown whether the matrix multiplication can be done in time ${\displaystyle O(n^{2+o(1)})}$.

Freivalds Algorithm

The following is a very simple randomized algorithm due to Freivalds, running in ${\displaystyle O(n^{2})}$ time:

 Algorithm (Freivalds, 1979) pick a vector ${\displaystyle r\in \{0,1\}^{n}}$ uniformly at random; if ${\displaystyle A(Br)=Cr}$ then return "yes" else return "no";

The product ${\displaystyle A(Br)}$ is computed by first multiplying ${\displaystyle Br}$ and then ${\displaystyle A(Br)}$. The running time of Freivalds algorithm is ${\displaystyle O(n^{2})}$ because the algorithm computes 3 matrix-vector multiplications.

If ${\displaystyle AB=C}$ then ${\displaystyle A(Br)=Cr}$ for any ${\displaystyle r\in \{0,1\}^{n}}$, thus the algorithm will return a "yes" for any positive instance (${\displaystyle AB=C}$). But if ${\displaystyle AB\neq C}$ then the algorithm will make a mistake if it chooses such an ${\displaystyle r}$ that ${\displaystyle ABr=Cr}$. However, the following lemma states that the probability of this event is bounded.

 Lemma If ${\displaystyle AB\neq C}$ then for a uniformly random ${\displaystyle r\in \{0,1\}^{n}}$, ${\displaystyle \Pr[ABr=Cr]\leq {\frac {1}{2}}}$.
Proof.
 Let ${\displaystyle D=AB-C}$. The event ${\displaystyle ABr=Cr}$ is equivalent to that ${\displaystyle Dr=0}$. It is then sufficient to show that for a ${\displaystyle D\neq {\boldsymbol {0}}}$, it holds that ${\displaystyle \Pr[Dr={\boldsymbol {0}}]\leq {\frac {1}{2}}}$. Since ${\displaystyle D\neq {\boldsymbol {0}}}$, it must have at least one non-zero entry. Suppose that ${\displaystyle D_{ij}\neq 0}$. We assume the event that ${\displaystyle Dr={\boldsymbol {0}}}$. In particular, the ${\displaystyle i}$-th entry of ${\displaystyle Dr}$ is ${\displaystyle (Dr)_{i}=\sum _{k=1}^{n}D_{ik}r_{k}=0.}$ The ${\displaystyle r_{j}}$ can be calculated by ${\displaystyle r_{j}=-{\frac {1}{D_{ij}}}\sum _{k\neq j}^{n}D_{ik}r_{k}.}$ Once all other entries ${\displaystyle r_{k}}$ with ${\displaystyle k\neq j}$ are fixed, there is a unique solution of ${\displaystyle r_{j}}$. Therefore, the number of ${\displaystyle r\in \{0,1\}^{n}}$ satisfying ${\displaystyle Dr={\boldsymbol {0}}}$ is at most ${\displaystyle 2^{n-1}}$. The probability that ${\displaystyle ABr=Cr}$ is bounded as ${\displaystyle \Pr[ABr=Cr]=\Pr[Dr={\boldsymbol {0}}]\leq {\frac {2^{n-1}}{2^{n}}}={\frac {1}{2}}}$.
${\displaystyle \square }$

When ${\displaystyle AB=C}$, Freivalds algorithm always returns "yes"; and when ${\displaystyle AB\neq C}$, Freivalds algorithm returns "no" with probability at least 1/2.

To improve its accuracy, we can run Freivalds algorithm for ${\displaystyle k}$ times, each time with an independent ${\displaystyle r\in \{0,1\}^{n}}$, and return "yes" if and only if all running instances returns "yes".

 Freivalds' Algorithm (multi-round) pick ${\displaystyle k}$ vectors ${\displaystyle r_{1},r_{2},\ldots ,r_{k}\in \{0,1\}^{n}}$ uniformly and independently at random; if ${\displaystyle A(Br_{i})=Cr_{i}}$ for all ${\displaystyle i=1,\ldots ,k}$ then return "yes" else return "no";

If ${\displaystyle AB=C}$, then the algorithm returns a "yes" with probability 1. If ${\displaystyle AB\neq C}$, then due to the independence, the probability that all ${\displaystyle r_{i}}$ have ${\displaystyle ABr_{i}=C_{i}}$ is at most ${\displaystyle 2^{-k}}$, so the algorithm returns "no" with probability at least ${\displaystyle 1-2^{-k}}$. For any ${\displaystyle 0<\epsilon <1}$, choose ${\displaystyle k=\log _{2}{\frac {1}{\epsilon }}}$. The algorithm runs in time ${\displaystyle O(n^{2}\log _{2}{\frac {1}{\epsilon }})}$ and has a one-sided error (false positive) bounded by ${\displaystyle \epsilon }$.

Polynomial Identity Testing (PIT)

The Polynomial Identity Testing (PIT) is such a problem: given as input two polynomials, determine whether two polynomials are identical. This problem plays a fundamental role in Computer Science.

First let's consider the following simplified version of Polynomial Identity Testing (PIT) which takes only the single-variate polynomials:

• Input: two polynomials ${\displaystyle P_{1},P_{2}\in \mathbb {F} [x]}$ of degree ${\displaystyle d}$.
• Output: "yes" if two polynomials are identical, i.e. ${\displaystyle P_{1}\equiv P_{2}}$, and "no" if otherwise.

The ${\displaystyle \mathbb {F} [x]}$ denote the ring of polynomials over field ${\displaystyle \mathbb {F} }$.

Alternatively, we can consider the following equivalent problem:

• Input: a polynomial ${\displaystyle P\in \mathbb {F} [x]}$ of degree ${\displaystyle d}$.
• Output: "yes" if ${\displaystyle P\equiv 0}$, and "no" if otherwise.

The probalem is trivial if ${\displaystyle P}$ is presented in its explicit form ${\displaystyle P(x)=\sum _{i=0}^{d}a_{i}x^{i}}$. But we assume that ${\displaystyle P}$ is given in product form or as black box.

A straightforward deterministic algorithm that solves PIT is to query ${\displaystyle d+1}$ points ${\displaystyle P(1),P(2),\ldots ,P(d+1)}$ and check whether thay are all zero. This can determine whether ${\displaystyle P\equiv 0}$ by interpolation.

We now introduce a simple randomized algorithm for the problem.

 Algorithm for PIT pick ${\displaystyle x\in \{1,2,\ldots ,2d\}}$ uniformly at random; if ${\displaystyle P(x)=0}$ then return “yes” else return “no”;

This algorithm requires only the evaluation of ${\displaystyle P}$ at a single point. And if ${\displaystyle P\equiv 0}$ it is always correct. And if ${\displaystyle P\not \equiv 0}$ then the probability that the algorithm wrongly returns "yes" is bounded as follows.

 Theorem Let ${\displaystyle P\in \mathbb {F} [x]}$ be a polynomial of degree ${\displaystyle d}$ over the field ${\displaystyle \mathbb {F} }$. Let ${\displaystyle S\subset \mathbb {F} }$ be an arbitrary set and ${\displaystyle x\in S}$ is chosen uniformly at random from ${\displaystyle S}$. If ${\displaystyle P\not \equiv 0}$ then ${\displaystyle \Pr[P(x)=0]\leq {\frac {d}{|S|}}.}$
Proof.
 A non-zero ${\displaystyle d}$-degree polynomial ${\displaystyle P}$ has at most ${\displaystyle d}$ distinct roots, thus at most ${\displaystyle d}$ members ${\displaystyle x}$ of ${\displaystyle S}$ satisfy that ${\displaystyle P(x)=0}$. Therefore, ${\displaystyle \Pr[P(x)=0]\leq {\frac {d}{|S|}}}$.
${\displaystyle \square }$

By the theorem, the algorithm can distinguish a non-zero polynomial from 0 with probability at least ${\displaystyle 1/2}$. This is achieved by evaluation of the polynomial at only one point and ${\displaystyle 1+\log _{2}d}$ many random bits.

Communication Complexity of Equality

The communication complexity is introduced by Andrew Chi-Chih Yao as a model of computation which involves multiple participants, each with partial information of the input.

Assume that there are two entities, say Alice and Bob. Alice has a private input ${\displaystyle a}$ and Bob has a private input ${\displaystyle b}$. Together they want to compute a function ${\displaystyle f(a,b)}$ by communicating with each other. The communication follows a predefined communication protocol (the "algorithm" in this model) whose logics depends only on the problem ${\displaystyle f}$ but not on the inputs. The complexity of a communication protocol is measured by the number of bits communicated between Alice and Bob in the worst case.

The problem of checking identity is formally defined by the function EQ as follows: ${\displaystyle \mathrm {EQ} :\{0,1\}^{n}\times \{0,1\}^{n}\rightarrow \{0,1\}}$ and for any ${\displaystyle a,b\in \{0,1\}^{n}}$,

${\displaystyle \mathrm {EQ} (a,b)={\begin{cases}1&{\mbox{if }}a=b,\\0&{\mbox{otherwise.}}\end{cases}}}$

A trivial way to solve EQ is to let Bob send his entire input string ${\displaystyle b}$ to Alice and let Alice check whether ${\displaystyle a=b}$. This costs ${\displaystyle n}$ bits of communications.

It is known that for deterministic communication protocols, this is the best we can get for computing EQ.

 Theorem (Yao 1979) Any deterministic communication protocol computing EQ on two ${\displaystyle n}$-bit strings costs ${\displaystyle n}$ bits of communication in the worst-case.

This theorem is much more nontrivial to prove than it looks, because Alice and Bob are allowed to interact with each other in arbitrary ways. The proof of this theorem in Yao's 1979 paper initiates the field of communication complexity.

If the randomness is allowed, we can solve this problem up to a tolerable probabilistic error with significantly less communications. The inputs ${\displaystyle a,b\in \{0,1\}^{n}}$ are two strings ${\displaystyle a=a_{0}a_{1}\cdots a_{n-1},b=b_{0}b_{1}\cdots b_{n-1}}$ of ${\displaystyle n}$ bits. Let ${\displaystyle k=\lceil \log _{2}(2n)\rceil }$ and ${\displaystyle p\in [2^{k},2^{k+1}]}$ be an arbitrary prime number. (Such a prime ${\displaystyle p}$ always exists.) The input strings ${\displaystyle a,b}$ can be respectively represented as two polynomials ${\displaystyle f,g\in \mathbb {Z} _{p}[x]}$ such that ${\displaystyle f(x)=\sum _{i=0}^{n-1}a_{i}x^{i}}$ and ${\displaystyle g(x)=\sum _{i=0}^{n-1}b_{i}x^{i}}$ of degree ${\displaystyle n-1}$, where all additions and multiplications are modulo ${\displaystyle p}$. The randomized communication protocol is given as follows:

 A randomized protocol for EQ Alice does: pick ${\displaystyle r\in [p]}$ uniformly at random; send ${\displaystyle r}$ and ${\displaystyle f(r)}$ to Bob; Upon receiving ${\displaystyle r}$ and ${\displaystyle f(r)}$ Bob does: If ${\displaystyle f(r)=g(r)}$ return "yes"; else return "no".

Repeat this protocol for 100 times. The total number of bits to communicate is bounded by ${\displaystyle 200\log _{2}p=O(\log n)}$. Due to the analysis of the randomized algorithm for PIT, if ${\displaystyle a=b}$ the protocol is always correct and if ${\displaystyle a\neq b}$ the protocol fails to report a difference with probability less than ${\displaystyle 2^{-100}}$.

Schwartz-Zippel Theorem

Now let's move on to the the true form of Polynomial Identity Testing (PIT) which works on multi-variate polynomials:

• Input: two ${\displaystyle n}$-variate polynomials ${\displaystyle f,g\in \mathbb {F} [x_{1},x_{2},\ldots ,x_{n}]}$ of degree ${\displaystyle d}$.
• Output: "yes" if ${\displaystyle f\equiv g}$, and "no" if otherwise.

The ${\displaystyle \mathbb {F} [x_{1},x_{2},\ldots ,x_{n}]}$ is the ring of multi-variate polynomials over field ${\displaystyle \mathbb {F} }$. The most natural way to represent an ${\displaystyle n}$-variate polynomial of degree ${\displaystyle d}$ is to write it as a sum of monomials:

${\displaystyle f(x_{1},x_{2},\ldots ,x_{n})=\sum _{i_{1},i_{2},\ldots ,i_{n}\geq 0 \atop i_{1}+i_{2}+\cdots +i_{n}\leq d}a_{i_{1},i_{2},\ldots ,i_{n}}x_{1}^{i_{1}}x_{2}^{i_{2}}\cdots x_{n}^{i_{n}}}$.

The degree or total degree of a monomial ${\displaystyle a_{i_{1},i_{2},\ldots ,i_{n}}x_{1}^{i_{1}}x_{2}^{i_{2}}\cdots x_{n}^{i_{n}}}$ is given by ${\displaystyle i_{1}+i_{2}+\cdots +i_{n}}$ and the degree of a polynomial ${\displaystyle f}$ is the maximum degree of monomials of nonzero coefficients.

Alternatively, we can consider the following equivalent problem:

• Input: a polynomial ${\displaystyle f\in \mathbb {F} [x_{1},x_{2},\ldots ,x_{n}]}$ of degree ${\displaystyle d}$.
• Output: "yes" if ${\displaystyle f\equiv 0}$, and "no" if otherwise.

If ${\displaystyle f}$ is written explicitly as a sum of monomials, then the problem is trivial. Again we allow ${\displaystyle f}$ to be represented in product form.

 Example The Vandermonde matrix ${\displaystyle M=M(x_{1},x_{2},\ldots ,x_{n})}$ is defined as that ${\displaystyle M_{ij}=x_{i}^{j-1}}$, that is ${\displaystyle M={\begin{bmatrix}1&x_{1}&x_{1}^{2}&\dots &x_{1}^{n-1}\\1&x_{2}&x_{2}^{2}&\dots &x_{2}^{n-1}\\1&x_{3}&x_{3}^{2}&\dots &x_{3}^{n-1}\\\vdots &\vdots &\vdots &\ddots &\vdots \\1&x_{n}&x_{n}^{2}&\dots &x_{n}^{n-1}\end{bmatrix}}}$. Let ${\displaystyle f}$ be the polynomial defined as ${\displaystyle f(x_{1},\ldots ,x_{n})=\det(M)=\prod _{j It is pretty easy to evaluate ${\displaystyle f(x_{1},x_{2},\ldots ,x_{n})}$ on any particular ${\displaystyle x_{1},x_{2},\ldots ,x_{n}}$, however it is prohibitively expensive to symbolically expand ${\displaystyle f(x_{1},\ldots ,x_{n})}$ to its sum-of-monomial form.

Here is a very simple randomized algorithm, due to Schwartz and Zippel.

 Randomized algorithm for multi-variate PIT fix an arbitrary set ${\displaystyle S\subseteq \mathbb {F} }$ whose size to be fixed; pick ${\displaystyle r_{1},r_{2},\ldots ,r_{n}\in S}$ uniformly and independently at random; if ${\displaystyle f({\vec {r}})=f(r_{1},r_{2},\ldots ,r_{n})=0}$ then return “yes” else return “no”;

This algorithm requires only the evaluation of ${\displaystyle f}$ at a single point ${\displaystyle {\vec {r}}}$. And if ${\displaystyle f\equiv 0}$ it is always correct.

In the Theorem below, we’ll see that if ${\displaystyle f\not \equiv 0}$ then the algorithm is incorrect with probability at most ${\displaystyle {\frac {d}{|S|}}}$, where ${\displaystyle d}$ is the degree of the polynomial ${\displaystyle f}$.

 Schwartz-Zippel Theorem Let ${\displaystyle f\in \mathbb {F} [x_{1},x_{2},\ldots ,x_{n}]}$ be a multivariate polynomial of degree ${\displaystyle d}$ over a field ${\displaystyle \mathbb {F} }$ such that ${\displaystyle f\not \equiv 0}$. Fix any finite set ${\displaystyle S\subset \mathbb {F} }$, and let ${\displaystyle r_{1},r_{2}\ldots ,r_{n}}$ be chosen uniformly and independently at random from ${\displaystyle S}$. Then ${\displaystyle \Pr[f(r_{1},r_{2},\ldots ,r_{n})=0]\leq {\frac {d}{|S|}}.}$
Proof.
 We prove by induction on ${\displaystyle n}$ the number of variables. For ${\displaystyle n=1}$, assuming that ${\displaystyle f\not \equiv 0}$, due to the fundamental theorem of algebra, the degree-${\displaystyle d}$ polynomial ${\displaystyle f(x)}$ has at most ${\displaystyle d}$ roots, thus ${\displaystyle \Pr[f(r)=0]\leq {\frac {d}{|S|}}.}$ Assume the induction hypothesis for a multi-variate polynomial up to ${\displaystyle n-1}$ variable. An ${\displaystyle n}$-variate polynomial ${\displaystyle f(x_{1},x_{2},\ldots ,x_{n})}$ can be represented as ${\displaystyle f(x_{1},x_{2},\ldots ,x_{n})=\sum _{i=0}^{k}x_{n}^{i}f_{i}(x_{1},x_{2},\ldots ,x_{n-1})}$, where ${\displaystyle k}$ is the largest power of ${\displaystyle x_{n}}$, which means that the degree of ${\displaystyle f_{k}}$ is at most ${\displaystyle d-k}$ and ${\displaystyle f_{k}\not \equiv 0}$. In particular, we write ${\displaystyle f}$ as a sum of two parts: ${\displaystyle f(x_{1},x_{2},\ldots ,x_{n})=x_{n}^{k}f_{k}(x_{1},x_{2},\ldots ,x_{n-1})+{\bar {f}}(x_{1},x_{2},\ldots ,x_{n})}$, where both ${\displaystyle f_{k}}$ and ${\displaystyle {\bar {f}}}$ are polynomials, such that as argued above, the degree of ${\displaystyle f_{k}}$ is at most ${\displaystyle d-k}$ and ${\displaystyle f_{k}\not \equiv 0}$; ${\displaystyle {\bar {f}}(x_{1},x_{2},\ldots ,x_{n})=\sum _{i=0}^{k-1}x_{n}^{i}f_{i}(x_{1},x_{2},\ldots ,x_{n-1})}$, thus ${\displaystyle {\bar {f}}(x_{1},x_{2},\ldots ,x_{n})}$ has no ${\displaystyle x_{n}^{k}}$ factor in any term. By the law of total probability, it holds that {\displaystyle {\begin{aligned}&\Pr[f(r_{1},r_{2},\ldots ,r_{n})=0]\\=&\Pr[f({\vec {r}})=0\mid f_{k}(r_{1},r_{2},\ldots ,r_{n-1})=0]\cdot \Pr[f_{k}(r_{1},r_{2},\ldots ,r_{n-1})=0]\\&+\Pr[f({\vec {r}})=0\mid f_{k}(r_{1},r_{2},\ldots ,r_{n-1})\neq 0]\cdot \Pr[f_{k}(r_{1},r_{2},\ldots ,r_{n-1})\neq 0].\end{aligned}}} Note that ${\displaystyle f_{k}(r_{1},r_{2},\ldots ,r_{n-1})}$ is a polynomial on ${\displaystyle n-1}$ variables of degree ${\displaystyle d-k}$ such that ${\displaystyle f_{k}\not \equiv 0}$. By the induction hypothesis, we have {\displaystyle {\begin{aligned}(*)&\qquad &\Pr[f_{k}(r_{1},r_{2},\ldots ,r_{n-1})=0]\leq {\frac {d-k}{|S|}}.\end{aligned}}} For the second case, recall that ${\displaystyle {\bar {f}}(x_{1},\ldots ,x_{n})}$ has no ${\displaystyle x_{n}^{k}}$ factor in any term, thus the condition ${\displaystyle f_{k}(r_{1},r_{2},\ldots ,r_{n-1})\neq 0}$ guarantees that ${\displaystyle f(r_{1},\ldots ,r_{n-1},x_{n})=x_{n}^{k}f_{k}(r_{1},r_{2},\ldots ,r_{n-1})+{\bar {f}}(r_{1},r_{2},\ldots ,r_{n-1},x_{n})=g_{r_{1},\ldots ,r_{n-1}}(x_{n})}$ is a single-variate polynomial such that the degree of ${\displaystyle g_{r_{1},\ldots ,r_{n-1}}(x_{n})}$ is ${\displaystyle k}$ and ${\displaystyle g_{r_{1},\ldots ,r_{n-1}}\not \equiv 0}$, for which we already known that the probability ${\displaystyle g_{r_{1},\ldots ,r_{n-1}}(r_{n})=0}$ is at most ${\displaystyle {\frac {k}{|S|}}}$. Therefore, {\displaystyle {\begin{aligned}(**)&\qquad &\Pr[f({\vec {r}})=0\mid f_{k}(r_{1},r_{2},\ldots ,r_{n-1})\neq 0]=\Pr[g_{r_{1},\ldots ,r_{n-1}}(r_{n})=0\mid f_{k}(r_{1},r_{2},\ldots ,r_{n-1})\neq 0]\leq {\frac {k}{|S|}}\end{aligned}}}. Substituting both ${\displaystyle (*)}$ and ${\displaystyle (**)}$ back in the total probability, we have ${\displaystyle \Pr[f(r_{1},r_{2},\ldots ,r_{n})=0]\leq {\frac {d-k}{|S|}}+{\frac {k}{|S|}}={\frac {d}{|S|}},}$ which proves the theorem. In above proof, for the second case that ${\displaystyle f_{k}(r_{1},\ldots ,r_{n-1})\neq 0}$, we use an "probabilistic arguement" to deal with the random choices in the condition. Here we give a more rigorous proof by enumerating all elementary events in applying the law of total probability. You make your own judgement which proof is better. By the law of total probability, {\displaystyle {\begin{aligned}&\Pr[f({\vec {r}})=0]\\=&\sum _{x_{1},\ldots ,x_{n-1}\in S}\Pr[f({\vec {r}})=0\mid \forall i We have argued that ${\displaystyle f_{k}\not \equiv 0}$ and the degree of ${\displaystyle f_{k}}$ is ${\displaystyle d-k}$. By the induction hypothesis, we have ${\displaystyle \Pr[f_{k}(r_{1},\ldots ,r_{n-1})=0]\leq {\frac {d-k}{|S|}}.}$ And for every fixed ${\displaystyle x_{1},\ldots ,x_{n-1}\in S}$ such that ${\displaystyle f_{k}(x_{1},\ldots ,x_{n-1})\neq 0}$, we have argued that ${\displaystyle f(x_{1},\ldots ,x_{n-1},x_{n})}$ is a polynomial in ${\displaystyle x_{n}}$ of degree ${\displaystyle k}$, thus ${\displaystyle \Pr[f(x_{1},\ldots ,x_{n-1},r_{n})=0]\leq {\frac {k}{|S|}},}$ which holds for all ${\displaystyle x_{1},\ldots ,x_{n-1}\in S}$ such that ${\displaystyle f_{k}(x_{1},\ldots ,x_{n-1})\neq 0}$, therefore the weighted average ${\displaystyle \sum _{x_{1},\ldots ,x_{n-1}\in S \atop f_{k}(x_{1},\ldots ,x_{n-1})\neq 0}\Pr[f(x_{1},\ldots ,x_{n-1},r_{n})=0]\cdot \Pr[\forall i Substituting these inequalities back to the total probability, we have ${\displaystyle \Pr[f({\vec {r}})=0]\leq {\frac {d-k}{|S|}}+{\frac {k}{|S|}}={\frac {d}{|S|}}.}$
${\displaystyle \square }$

Fingerprinting

The Freivald's algorithm and Schwartz-Zippel theorem can be abstracted as the following procedure: Suppose we want to compare two items ${\displaystyle Z_{1}}$ and ${\displaystyle Z_{2}}$. Instead of comparing them directly, we compute random fingerprints ${\displaystyle \mathrm {FING} (Z_{1})}$ and ${\displaystyle \mathrm {FING} (Z_{2})}$ of them and compare the fingerprints. The fingerprints has the following properties:

• ${\displaystyle \mathrm {FING} (\cdot )}$ is a function, so if ${\displaystyle Z_{1}=Z_{2}}$ then ${\displaystyle \mathrm {FING} (Z_{1})=\mathrm {FING} (Z_{2})}$.
• If ${\displaystyle Z_{1}\neq Z_{2}}$ then ${\displaystyle \Pr[\mathrm {FING} (Z_{1})=\mathrm {FING} (Z_{2})]}$ is small.
• It is much easier to compute and compare the fingerprints than to compare ${\displaystyle Z_{1}}$ and ${\displaystyle Z_{2}}$ directly.

In Freivald's algorithm, the items to compare are two ${\displaystyle n\times n}$ matrices ${\displaystyle AB}$ and ${\displaystyle C}$, and given an ${\displaystyle n\times n}$ matrix ${\displaystyle M}$, its random fingerprint is computed as ${\displaystyle \mathrm {FING} (M)=Mr}$ for a uniformly random ${\displaystyle r\in \{0,1\}^{n}}$.

In Schwartz-Zippel theorem, the items to compare are two polynomials ${\displaystyle P_{1}(x_{1},\ldots ,x_{n})}$ and ${\displaystyle P_{2}(x_{1},\ldots ,x_{n})}$, and given a polynomial ${\displaystyle Q(x_{1},\ldots ,x_{n})}$, its random fingerprint is computed as ${\displaystyle \mathrm {FING} (Q)=Q(r_{1},\ldots ,r_{n})}$ for ${\displaystyle r_{i}}$ chosen independently and uniformly at random from some fixed set ${\displaystyle S}$.

For different problems, we may have different definitions of ${\displaystyle \mathrm {FING} (\cdot )}$.

Communication complexity revisited

Now consider again the communication model where the two players Alice with a private input ${\displaystyle x\in \{0,1\}^{n}}$ and Bob with a private input ${\displaystyle y\in \{0,1\}^{n}}$ together compute a function ${\displaystyle f(x,y)}$ by running a communication protocol.

We still consider the communication protocols for the equality function EQ

${\displaystyle \mathrm {EQ} (x,y)={\begin{cases}1&{\mbox{if }}x=y,\\0&{\mbox{otherwise.}}\end{cases}}}$

With the language of fingerprinting, this communication problem can be solved by the following generic scheme:

• Alice choose a random fingerprint function ${\displaystyle \mathrm {FING} (\cdot )}$ and compute the fingerprint of her input ${\displaystyle \mathrm {FING} (x)}$;
• Alice sends both the description of ${\displaystyle \mathrm {FING} (\cdot )}$ and the value of ${\displaystyle \mathrm {FING} (x)}$ to Bob;
• Bob computes ${\displaystyle \mathrm {FING} (y)}$ and check whether ${\displaystyle \mathrm {FING} (x)=\mathrm {FING} (y)}$.

In this way we have a randomized communication protocol for the equality function EQ with a false positive. The communication cost as well as the error probability are reduced to the question of how to design this random fingerprint function ${\displaystyle \mathrm {FING} (\cdot )}$ to guarantee:

1. A random ${\displaystyle \mathrm {FING} (\cdot )}$ can be described succinctly.
2. The range of ${\displaystyle \mathrm {FING} (\cdot )}$ is small, so the fingerprints are succinct.
3. If ${\displaystyle x\neq y}$, the probability ${\displaystyle \Pr[\mathrm {FING} (x)=\mathrm {FING} (y)]}$ is small.

In above application of single-variate PIT, we know that ${\displaystyle \mathrm {FING} (x)=\sum _{i=1}^{n}x_{i}r^{i}}$, where ${\displaystyle r}$ is a random element from a finite field and the additions and multiplications are defined over the finite field, is a good fingerprint function. Now we introduce another fingerprint and hence a new communication protocol.

The new fingerprint function we design is as follows: by treating the input string ${\displaystyle x\in \{0,1\}^{n}}$ as the binary representation of a number, let ${\displaystyle \mathrm {FING} (x)=x{\bmod {p}}}$ for some random prime ${\displaystyle p}$. The prime ${\displaystyle p}$ can uniquely specify a random fingerprint function ${\displaystyle \mathrm {FING} (\cdot )}$, thus can be used as a description of the function, and alos the range of the fingerprints is ${\displaystyle [p]}$, thus we want the prime ${\displaystyle p}$ to be reasonably small, but still has a good chance to distinguish different ${\displaystyle x}$ and ${\displaystyle y}$ after modulo ${\displaystyle p}$.

 A randomized protocol for EQ Alice does: for some parameter ${\displaystyle k}$ (to be specified), choose uniformly at random a prime ${\displaystyle p\in [k]}$; send ${\displaystyle p}$ and ${\displaystyle x{\bmod {p}}}$ to Bob; Upon receiving ${\displaystyle p}$ and ${\displaystyle x{\bmod {p}}}$, Bob does: check whether ${\displaystyle x{\bmod {p}}=y{\bmod {p}}}$.

The number of bits to be communicated is ${\displaystyle O(\log k)}$. We then bound the probability of error ${\displaystyle \Pr[x{\bmod {p}}=y{\bmod {p}}]}$ for ${\displaystyle x\neq y}$, in terms of ${\displaystyle k}$.

Suppose without loss of generality ${\displaystyle x>y}$. Let ${\displaystyle z=x-y}$. Then ${\displaystyle z<2^{n}}$ since ${\displaystyle x,y\in [2^{n}]}$, and ${\displaystyle z\neq 0}$ for ${\displaystyle x\neq y}$. It holds that ${\displaystyle x{\bmod {p}}=y{\bmod {p}}}$ if and only if ${\displaystyle z}$ is dividable by ${\displaystyle p}$. Note that ${\displaystyle z<2^{n}}$ since ${\displaystyle x,y\in [2^{n}]}$. We only need to bound the probability

${\displaystyle \Pr[z{\bmod {p}}=0]}$ for ${\displaystyle 0, where ${\displaystyle p}$ is a random prime chosen from ${\displaystyle [k]}$.

The probability ${\displaystyle \Pr[z{\bmod {p}}=0]}$ is computed directly as

${\displaystyle \Pr[z{\bmod {p}}=0]\leq {\frac {{\mbox{the number of prime divisors of }}z}{{\mbox{the number of primes in }}[k]}}}$.

For the numerator, we have the following lemma.

 Lemma The number of distinct prime divisors of any natural number less than ${\displaystyle 2^{n}}$ is at most ${\displaystyle n}$.
Proof.
 Each prime number is ${\displaystyle \geq 2}$. If an ${\displaystyle N>0}$ has more than ${\displaystyle n}$ distinct prime divisors, then ${\displaystyle N\geq 2^{n}}$.
${\displaystyle \square }$

Due to this lemma, ${\displaystyle z}$ has at most ${\displaystyle n}$ prime divisors.

We then lower bound the number of primes in ${\displaystyle [k]}$. This is given by the celebrated Prime Number Theorem (PNT).

 Prime Number Theorem Let ${\displaystyle \pi (k)}$ denote the number of primes less than ${\displaystyle k}$. Then ${\displaystyle \pi (k)\sim {\frac {k}{\ln k}}}$ as ${\displaystyle k\rightarrow \infty }$.

Therefore, by choosing ${\displaystyle k=tn\ln tn}$ for some ${\displaystyle t}$, we have that for a ${\displaystyle 0, and a random prime ${\displaystyle p\in [k]}$,

${\displaystyle \Pr[z{\bmod {p}}=0]\leq {\frac {n}{\pi (k)}}\sim {\frac {1}{t}}}$.

We can make this error probability polynomially small and the number of bits to be communicated is still ${\displaystyle O(\log k)=O(\log n)}$.

Randomized pattern matching

Consider the following problem of pattern matching, which has nothing to do with communication complexity.

• Input: a string ${\displaystyle x\in \{0,1\}^{n}}$ and a "pattern" ${\displaystyle y\in \{0,1\}^{m}}$.
• Determine whether the pattern ${\displaystyle y}$ is a contiguous substring of ${\displaystyle x}$. Usually, we are also asked to find the location of the substring.

A naive algorithm trying every possible match runs in ${\displaystyle O(nm)}$ time. The more sophisticated KMP algorithm inspired by automaton theory runs in ${\displaystyle O(n+m)}$ time.

A simple randomized algorithm, due to Karp and Rabin, uses the idea of fingerprinting and also runs in ${\displaystyle O(n+m)}$ time.

Let ${\displaystyle X(j)=x_{j}x_{j+1}\cdots x_{j+m-1}}$ denote the substring of ${\displaystyle x}$ of length ${\displaystyle m}$ starting at position ${\displaystyle j}$.

 Algorithm (Karp-Rabin) pick a random prime ${\displaystyle p\in [k]}$; for ${\displaystyle j=1}$ to ${\displaystyle n-m+1}$ do if ${\displaystyle X(j){\bmod {p}}=y{\bmod {p}}}$ then report a match; return "no match";

So the algorithm just compares the ${\displaystyle \mathrm {FING} (X(j))}$ and ${\displaystyle \mathrm {FING} (y)}$ for every ${\displaystyle j}$, with the same definition of fingerprint function ${\displaystyle \mathrm {FING} (\cdot )}$ as in the communication protocol for EQ.

By the same analysis, by choosing ${\displaystyle k=n^{2}m\ln(n^{2}m)}$, the probability of a single false match is

${\displaystyle \Pr[X(j){\bmod {p}}=y{\bmod {p}}\mid X(j)\neq y]=O\left({\frac {1}{n^{2}}}\right)}$.

By the union bound, the probability that a false match occurs is ${\displaystyle O\left({\frac {1}{n}}\right)}$.

The algorithm runs in linear time if we assume that we can compute ${\displaystyle X(j){\bmod {p}}}$ for each ${\displaystyle j}$ in constant time. This outrageous assumption can be made realistic by the following observation.

 Lemma Let ${\displaystyle \mathrm {FING} (a)=a{\bmod {p}}}$. ${\displaystyle \mathrm {FING} (X(j+1))\equiv 2(\mathrm {FING} (X(j))-2^{m-1}x_{j})+x_{j+m}{\pmod {p}}\,}$.
Proof.
 It holds that ${\displaystyle X(j+1)=2(X(j)-2^{m-1}x_{j})+x_{j+m}\,}$. So the equation holds on the finite field modulo ${\displaystyle p}$.
${\displaystyle \square }$

Due to this lemma, each fingerprint ${\displaystyle \mathrm {FING} (X(j))}$ can be computed in an incremental way, each in constant time. The running time of the algorithm is ${\displaystyle O(n+m)}$.