随机算法 (Fall 2011)/Identity checking and W.D. Hamilton: Difference between pages

From TCS Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Etone
(Created page with "Many applications in Computer Science require to efficiently check whether two complex objects are identical, while the objects are presented implicitly (e.g., as black-boxes). W…")
 
imported>Macdonald-ross
(prose...)
 
Line 1: Line 1:
Many applications in Computer Science require to efficiently check whether two complex objects are identical, while the objects are presented implicitly (e.g., as black-boxes). We consider two examples. One is to check the result of multiplying two matrices, and the other is to check the identity of two polynomials.
'''William Donald Hamilton''' [[Royal Society|FRS]] (1 August 1936 – 7 March 2000) was an [[English people|English]] [[evolutionary biology|evolutionary biologist]] whom [[Richard Dawkins]] praised as one of the greatest [[evolution]]ary theorists of the 20th century.<ref>[http://www.edge.org/3rd_culture/hamilton/hamilton_index.html Obituary by Richard Dawkins – ''The Independent'' – 10 March 2000]</ref>


== Example: Checking matrix multiplication ==
Hamilton became famous through his [[Theory|theoretical]] work on [[kin selection]] and [[altruism]]. He explained its [[Genetics|genetic]] basis, and this was a key part of the gene-centered view of [[evolution]]. In doing this, he became one of the forerunners of [[sociobiology]], as popularized by [[E.O. Wilson]]. Hamilton was certainly a big influence on Dawkins. He also published important work on [[sex ratio]]s and the [[evolution of sex]]. From 1984 to his death in 2000, he was the [[Royal Society]] Research Professor at [[Oxford University]]. He died of [[malaria]] contracted in the [[Democratic Republic of the Congo]].
Consider the following problem:
* Given as the input three <math>n\times n</math> matrices <math>A,B</math> and <math>C</math>,
* check whether <math>C=AB</math>.


We could compute <math>AB</math> and compare the result to <math>C</math>. The time complexity of fastest matrix multiplication algorithm (in theory) is <math>O(n^{2.376})</math>, and so is the time complexity of this method.
== Hamilton's equation ==
Hamilton's equation describes whether or not a gene for altruistic behaviour will spread in a population.<ref>Hamilton W.D. 1996. ''Narrow roads of geneland: the collected papers of W.D. Hamilton'', vol 1. Freeman, Oxford.</ref> The gene will spread if '''r'''x'''b''' is greater than '''c''':
:<math>rb > c \ </math>    
where:
* <math>c \ </math> is the reproductive cost to the altruist,
* <math>b \ </math> is the reproductive benefit to the recipient of the altruistic behavior, and
* <math>r \ </math> is the probability, above the population average, of the individuals sharing an altruistic gene – the "degree of relatedness".


Here’s a very simple randomized algorithm, due to Freivalds, that runs in only <math>O(n^2)</math> time:
== Collected papers ==
Hamilton started to publish his collected papers starting in 1996, with short essays giving each paper context. He died after the preparation of the second volume, so the commentaries for the third volume came from his coauthors.


{{Theorem|Algorithm (Freivalds)|
* Hamilton W.D. 1996. ''Narrow roads of gene land vol. 1: Evolution of social behaviour''. Freeman, Oxford. ISBN 0-7167-4530-5
*pick a vector <math>r \in\{0, 1\}^n</math> uniformly at random;
* Hamilton W.D. 2002. ''Narrow roads of gene land vol. 2: Evolution of sex''. Oxford University Press, Oxford. ISBN 0-19-850336-9
*if <math>A(Br) = Cr</math> then return "yes" else return "no";
* Hamilton W.D. 2005. ''Narrow roads of gene land, vol. 3: Last words'' (with essays by coauthors, ed. M. Ridley). Oxford University Press, Oxford. ISBN 0-19-856690-5
}}
The running time of the algorithm is <math>O(n^2)</math> because it does only 3 matrix-vector multiplications.


If <math>AB=C</math> then <math>A(Br) = Cr</math> for any <math>r \in\{0, 1\}^n</math>, thus the algorithm always returns "yes". But if <math>AB \neq C</math> then the algorithm will make a mistake if it happens to choose an <math>r</math> for which <math>ABr = Cr</math>. This, however, is unlikely:
== References ==
{{Reflist}}


{{Theorem|Lemma|
{{DEFAULTSORT:Hamilton, William Donald}}
:If <math>AB\neq C</math> then for a uniformly random <math>r \in\{0, 1\}^n</math>,
[[Category:1936 births]]
::<math>\Pr[ABr = Cr]\le \frac{1}{2}</math>.
[[Category:2000 deaths]]
}}
[[Category:English mathematicians]]
{{Proof| Let <math>D=AB-C</math>. We will show that if <math>D\neq \boldsymbol{0}</math>, then <math>\Pr[Dr = \boldsymbol{0}]\le \frac{1}{2}</math> for a uniform random <math>r \in\{0, 1\}^n</math>.
[[Category:Geneticists]]
 
[[Category:English evolutionary biologists]]
Since <math>D\neq \boldsymbol{0}</math>, it must have at least one non-zero entry, say <math>D(i,j)\neq 0</math>. The <math>i</math>-th entry of <math>Dr</math> is
[[Category:Fellows of the Royal Society]]
:<math>(Dr)_i=\sum_{k=1}^nD(i,k)r_k</math>.
If to the contrary, <math>Dr=\boldsymbol{0}</math>, then <math>(Dr)_i=\sum_{k=1}^nD(i,k)r_k=0</math>, which is equivalent to that
:<math>r_j=-\frac{1}{D(i,j)}\sum_{k\neq j}^nD(i,k)r_k</math>,
i.e. once <math>r_k</math> for <math>k\neq j</math> have been chosen, there is only '''one''' value of <math>r_j</math> that would give us a zero <math>Dr</math>. However, there are '''two''' possible values <math>\{0,1\}</math> for <math>r_j</math> which are equal-probable, so with at least <math>\frac{1}{2}</math> probability, the choice of <math>r</math> fails to give us a zero <math>Dr</math>.
}}
 
== Example: Checking polynomial identities ==
Consider the following problem:
* Given as the input two multivariate polynomials <math>P_1(x_1,\ldots,x_n)</math> and <math>P_2(x_1,\ldots,x_n)</math>,
* check whether the two polynomials are identical, denoted <math>P_1\equiv P_2</math>.
 
Obviously, if <math>P_1, P_2</math> are written out explicitly, the question is trivially answered in linear time just by comparing their coefficients. But in practice they are usually given in very compact form (e.g., as determinants of matrices), so that we can evaluate them efficiently, but expanding them out and looking at their coefficients is out of the question.
 
{{Theorem|Example|
Consider the polynomial
:<math>
P(x_1,\ldots,x_n)=\prod_{\overset{i<j}{i,j\neq 1}}(x_i-x_j)-\prod_{\overset{i<j}{i,j\neq 2}}(x_i-x_j)+\prod_{\overset{i<j}{i,j\neq 3}}(x_i-x_j)-\cdots+(-1)^{n-1}\prod_{\overset{i<j}{i,j\neq n}}(x_i-x_j)
</math>
Show that evaluating <math>P</math> at any given point can be done efficiently, but that expanding out <math>P</math>
to find all its coefficients is computationally infeasible even for moderate values of <math>n</math>.
}}
 
Here is a very simple randomized algorithm, due to Schwartz and Zippel. Testing <math>P_1\equiv P_2</math>
is equivalent to testing <math>P\equiv 0</math>, where <math>P = P_1 - P_2</math>.
 
{{Theorem|Algorithm (Schwartz-Zippel)|
*pick <math>r_1, \ldots , r_n</math> independently and uniformly at random from a set <math>S</math>;
*if <math>P_1(r_1, \ldots , r_n) = P_2(r_1, \ldots , r_n)</math> then return “yes” else return “no”;
}}
 
This algorithm requires only the evaluation of <math>P</math> at a single point. And if <math>P\equiv 0</math> it is always correct.
 
In the Theorem below, we’ll see that if <math>P\neq 0</math> then the algorithm is incorrect with probability
at most <math>\frac{d}{|S|}</math>, where <math>d</math> is the maximum degree of the polynomial <math>P</math>.
 
{{Theorem|Theorem (Schwartz-Zippel)|
: Let <math>Q(x_1,\ldots,x_n)</math> be a multivariate polynomial of degree <math>d</math> defined over a field <math>\mathbb{F}</math>. Fix any finite set <math>S\subset\mathbb{F}</math>, and let <math>r_1,\ldots,r_n</math> be chosen independently and uniformly at random from <math>S</math>. Then
::<math>\Pr[Q(r_1,\ldots,r_n)=0\mid Q\not\equiv 0]\le\frac{d}{|S|}.</math>
}}
{{Proof| The theorem holds if <math>Q</math> is a single-variate polynomial, because a single-variate polynomial <math>Q</math> of degree <math>d</math> has at most <math>d</math> roots, i.e. there are at most <math>d</math> many choices of <math>r</math> having <math>Q(r)=0</math>, so the theorem follows immediately.
 
For multi-variate <math>Q</math>, we prove by induction on the number of variables <math>n</math>.
 
Write <math>Q(x_1,\ldots,x_n)</math> as
:<math>
Q(x_1,\ldots,x_n) = \sum_{i=0}^kx_n^kQ_i(x_1,\ldots,x_{n-1})
</math>
where <math>k</math> is the largest exponent of <math>x_n</math> in <math>Q(x_1,\ldots,x_n)</math>. So <math>Q_k(x_1,\ldots,x_{n-1}) \not\equiv 0</math> by our definition of <math>k</math>, and its degree is at most <math>d-k</math>.
 
Thus by the induction hypothesis we have that <math>\Pr[Q_k(r_1,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}</math>.
 
Conditioning on the event <math>Q_k(r_1,\ldots,r_{n-1})\neq 0</math>, the single-variate polynomial <math>Q'(x_n)=Q(r_1,\ldots,r_{n-1}, x_n)=\sum_{i=0}^kx_n^kQ_i(r_1,\ldots,r_{n-1})</math> has degree <math>k</math> and <math>Q'(x_n)\not\equiv 0</math>, thus
:<math>
\begin{align}
&\quad\,\Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]\\
&=
\Pr[Q'(r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]\\
&\le
\frac{k}{|S|}
\end{align}
</math>.
 
Therefore, due to the law of total probability,
:<math>
\begin{align}
&\quad\,\Pr[Q(r_1,\ldots,r_{n})=0]\\
&=
\Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]\Pr[Q_k(r_1,\ldots,r_{n-1})\neq 0]\\
&\quad\,\,+\Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})= 0]\Pr[Q_k(r_1,\ldots,r_{n-1})= 0]\\
&\le
\Pr[Q(r_1,\ldots,r_{n})=0\mid Q_k(r_1,\ldots,r_{n-1})\neq 0]+\Pr[Q_k(r_1,\ldots,r_{n-1})= 0]\\
&\le
\frac{k}{|S|}+\frac{d-k}{|S|}\\
&=\frac{d}{|S|}.
\end{align}
</math>
 
}}
 
== The idea of fingerprinting ==
Suppose we want to compare two items <math>Z_1</math> and <math>Z_2</math>. Instead of comparing them directly, we compute random '''fingerprints''' <math>\mathrm{FING}(Z_1)</math> and <math>\mathrm{FING}(Z_2)</math> and compare these. The fingerprints has the following properties:
* <math>\mathrm{FING}(\cdot)</math> is a function, which means that if <math>Z_1= Z_2</math> then <math>\mathrm{FING}(Z_1)=\mathrm{FING}(Z_2)</math>.
* If <math>Z_1\neq Z_2</math> then <math>\Pr[\mathrm{FING}(Z_1)=\mathrm{FING}(Z_2)]</math> is small.
* It is much more to compute and compare the fingerprints than to compare <math>Z_1</math> and <math>Z_2</math> directly.
 
For Freivald's algorithm, the items to compare are two <math>n\times n</math> matrices <math>AB</math> and <math>C</math>, and given an <math>n\times n</math> matrix <math>M</math>, its random fingerprint is computed as <math>\mathrm{FING}(M)=Mr</math> for a uniformly random <math>r\in\{0,1\}^n</math>.
 
For the Schwartz-Zippel algorithm, the items to compare are two polynomials <math>P_1(x_1,\ldots,x_n)</math> and <math>P_2(x_1,\ldots,x_n)</math>, and given a polynomial <math>Q(x_1,\ldots,x_n)</math>, its random fingerprint is computed as <math>\mathrm{FING}(Q)=Q(r_1,\ldots,r_n)</math> for <math>r_i</math> chosen independently and uniformly at random from some fixed set <math>S</math>.
 
For different problems, we may have different definitions of <math>\mathrm{FING}(\cdot)</math>.

Latest revision as of 09:43, 22 December 2013

William Donald Hamilton FRS (1 August 1936 – 7 March 2000) was an English evolutionary biologist whom Richard Dawkins praised as one of the greatest evolutionary theorists of the 20th century.[1]

Hamilton became famous through his theoretical work on kin selection and altruism. He explained its genetic basis, and this was a key part of the gene-centered view of evolution. In doing this, he became one of the forerunners of sociobiology, as popularized by E.O. Wilson. Hamilton was certainly a big influence on Dawkins. He also published important work on sex ratios and the evolution of sex. From 1984 to his death in 2000, he was the Royal Society Research Professor at Oxford University. He died of malaria contracted in the Democratic Republic of the Congo.

Hamilton's equation

Hamilton's equation describes whether or not a gene for altruistic behaviour will spread in a population.[2] The gene will spread if rxb is greater than c:

[math]\displaystyle{ rb \gt c \ }[/math]

where:

  • [math]\displaystyle{ c \ }[/math] is the reproductive cost to the altruist,
  • [math]\displaystyle{ b \ }[/math] is the reproductive benefit to the recipient of the altruistic behavior, and
  • [math]\displaystyle{ r \ }[/math] is the probability, above the population average, of the individuals sharing an altruistic gene – the "degree of relatedness".

Collected papers

Hamilton started to publish his collected papers starting in 1996, with short essays giving each paper context. He died after the preparation of the second volume, so the commentaries for the third volume came from his coauthors.

  • Hamilton W.D. 1996. Narrow roads of gene land vol. 1: Evolution of social behaviour. Freeman, Oxford. ISBN 0-7167-4530-5
  • Hamilton W.D. 2002. Narrow roads of gene land vol. 2: Evolution of sex. Oxford University Press, Oxford. ISBN 0-19-850336-9
  • Hamilton W.D. 2005. Narrow roads of gene land, vol. 3: Last words (with essays by coauthors, ed. M. Ridley). Oxford University Press, Oxford. ISBN 0-19-856690-5

References

Template:Reflist

  1. Obituary by Richard Dawkins – The Independent – 10 March 2000
  2. Hamilton W.D. 1996. Narrow roads of geneland: the collected papers of W.D. Hamilton, vol 1. Freeman, Oxford.