组合数学 (Spring 2013)/Existence problems and Fitness: Difference between pages

From TCS Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Etone
 
imported>Macdonald-ross
mNo edit summary
 
Line 1: Line 1:
== Existence by Counting ==
{{otheruse|biological fitness}}
=== Shannon's circuit lower bound===
'''Fitness''' in [[biology]] is the relative ability of an organism to survive and pass on its [[gene]]s to the next generation.<ref>King R.C. Stansfield W.D. & Mulligan P.K. 2006. ''A dictionary of genetics'', 7th ed. Oxford.</ref><sup>p160</sup> It is a central idea in [[evolution|evolutionary theory]]. Fitness is usually equal to the proportion of the individual's [[gene]]s in all the genes of the next generation.
This is a fundamental problem in in Computer Science.


A '''boolean function''' is a function in the form <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.
Like all terms in evolutionary biology, fitness is defined in terms of an interbreeding [[Population genetics|population]], which might or might not be a whole [[species]]. If differences in individual genotypes affect fitness, then the frequencies of the genotypes will change over generations; the genotypes with higher fitness become more common. This is the process called [[natural selection]].


[http://en.wikipedia.org/wiki/Boolean_circuit Boolean circuit] is a mathematical model of computation.
An individual's fitness is caused by its [[phenotype]], and passed on by its [[genotype]]. The fitnesses of different individuals with the same genotype are not necessarily equal. It depends on the [[environment]] in which the individuals live, and on accidental [[event]]s. However, since the fitness of the genotype is an [[average]]d quantity, it reflects the reproductive outcomes of ''all'' individuals with that genotype.
Formally, a boolean circuit is a directed acyclic graph. Nodes with indegree zero are input nodes, labeled <math>x_1, x_2, \ldots , x_n</math>. A circuit has a unique node with outdegree zero, called the output node. Every other node is a gate. There are three types of gates: AND, OR (both with indegree two), and NOT (with indegree one).


Computations in Turing machines can be simulated by circuits, and any boolean function in '''P''' can be computed by a circuit with polynomially many gates. Thus, if we can find a function in '''NP''' that cannot be computed by any circuit with polynomially many gates, then '''NP'''<math>\neq</math>'''P'''.
== Relatedness ==
Fitness measures the number of the ''copies'' of the genes of an individual in the next generation. It doesn't really matter how the genes arrive in the next generation. For an individual, it is equally "beneficial" to reproduce itself, or to help relatives with similar genes to reproduce, ''as long as similar number of copies of individual's genes get passed on to the next generation''. Selection which promotes this kind of helper behaviour is called [[kin selection]].


The following theorem due to Shannon says that functions with exponentially large circuit complexity do exist.
Our closest relatives (parents, siblings, and our own children) share on average 50% (half) of our genes. One step further removed are grandparents. With each of them we share on average 25% (a quarter) of our genes. That is a measure of our relatedness to them. Next come first cousins (children of our parents' siblings). We share 12.5% (1/8) of their genes.<ref name=JMS>Maynard Smith, John. 1999. ''Evolutionary genetics''. 2nd ed, Cambridge University Press.</ref><sup>p100</sup>


{{Theorem
=== Hamilton's rule ===
|Theorem (Shannon 1949)|
[[W.D. Hamilton|William Hamilton]] added various ideas to the notion of fitness. His rule suggests that a costly action should be performed if:
:There is a boolean function <math>f:\{0,1\}^n\rightarrow \{0,1\}</math> with circuit complexity greater than <math>\frac{2^n}{3n}</math>.
:<math>C < R \times B </math>   where:
}}
* <math>c \ </math> is the reproductive cost to the altruist,
{{Proof|
* <math>b \ </math> is the reproductive benefit to the recipient of the altruistic behavior, and
We first count the number of boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>. There are <math>2^{2^n}</math> boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.
* <math>r \ </math> is the probability, above the population average, of the individuals sharing an altruistic gene – the "degree of relatedness".
Fitness costs and benefits are measured in [[fecundity]].<ref>Hamilton W.D. 1964. The genetical evolution of social behavior. ''Journal of Theoretical Biology'' '''7''' (1): 1–52. doi:10.1016/0022-5193(64)90038-4.</ref>


Then we count the number of boolean circuit with fixed number of gates.
=== Inclusive fitness ===
Fix an integer <math>t</math>, we count the number of circuits with <math>t</math> gates. By the [http://en.wikipedia.org/wiki/De_Morgan's_laws De Morgan's laws], we can assume that all NOTs are pushed back to the inputs. Each gate has one of the two types (AND or OR), and has two inputs. Each of the inputs to a gate is either a constant 0 or 1, an input variable <math>x_i</math>, an inverted input variable <math>\neg x_i</math>, or the output of another gate; thus, there are at most <math>2+2n+t-1</math> possible gate inputs. It follows that the number of circuits with <math>t</math> gates is at most <math>2^t(t+2n+1)^{2t}</math>.  
Inclusive fitness is a term which is essentially the same as fitness, but emphasises the group of genes rather than individuals.  


If <math>t=2^n/3n</math>, then
Biological fitness says how well an organism can reproduce, and spread its genes to its offspring. The theory of inclusive fitness says that the fitness of an organism is also increased to the extent that its close relatives also reproduce. This is because relatives share genes in proportion to their relationship.
:<math>
\frac{2^t(t+2n+1)^{2t}}{2^{2^n}}=o(1)<1,</math>      thus, <math>2^t(t+2n+1)^{2t} < 2^{2^n}.</math>


Each boolean circuit computes one boolean function. Therefore, there must exist a boolean function <math>f</math> which cannot be computed by any circuits with <math>2^n/3n</math> gates.
Another way of saying it: ''the inclusive fitness of an organism is not a property of itself, but a property of its set of [[genes]]''. It is calculated from from the reproductive success of the individual, plus the reproductive success of its relatives, each one weighed by an appropriate coefficient of relatedness.<ref>Adapted from Dawkins R. 1982. ''The extended phenotype''. Oxford: Oxford University Press, p186.  ISBN 0-19-288051-9</ref>
}}


Note that by Shannon's theorem, not only there exists a boolean function with exponentially large circuit complexity, but ''almost all'' boolean functions have exponentially large circuit complexity.
== History ==
The [[British]] [[Sociology|social]] [[philosopher]] [[Herbert Spencer]] coined the phrase ''[[survival of the fittest]]'' in his 1864 work ''Principles of biology'' to mean what [[Charles Darwin]] called [[natural selection]].<ref> Herbert Spencer 1864. ''Principles of Biology'' London, vol 1, 444, wrote “This survival of the fittest, which I have here sought to express in mechanical terms, is that which Mr. Darwin has called ‘natural selection’, or the preservation of favoured races in the struggle for life. </ref> The original phrase was "survival of the best fitted".  


=== Double counting ===
== References ==
The double counting principle states the following obvious fact: if the elements of a set are counted in two different ways, the answers are the same.
{{Reflist}}
==== Handshaking lemma ====
The following lemma is a standard demonstration of double counting.
{{Theorem|Handshaking Lemma|
:At a party, the number of guests who shake hands an odd number of times is even.
}}


We model this scenario as an undirected graph <math>G(V,E)</math> with <math>|V|=n</math> standing for the <math>n</math> guests. There is an edge <math>uv\in E</math> if <math>u</math> and <math>v</math> shake hands. Let <math>d(v)</math> be the degree of vertex <math>v</math>, which represents the number of times that <math>v</math> shakes hand. The handshaking lemma states that in any undirected graph, the number of vertices whose degrees are odd is even. It is sufficient to show that the sum of odd degrees is even.
[[Category:Classical genetics]]
 
[[Category:Evolutionary biology]]
The handshaking lemma is a direct consequence of the following lemma, which is proved by Euler in his 1736 paper on [http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg Seven Bridges of Königsberg] that began the study of graph theory.
 
{{Theorem|Lemma (Euler 1736)|
:<math>\sum_{v\in V}d(v)=2|E|</math>
}}
{{Proof|
We count the number of '''directed''' edges. A directed edge is an ordered pair <math>(u,v)</math> such that <math>\{u,v\}\in E</math>. There are two ways to count the directed edges.
 
First, we can enumerate by edges. Pick every edge <math>uv\in E</math> and apply two directions <math>(u,v)</math> and <math>(v,u)</math> to the edge. This gives us <math>2|E|</math> directed edges.
 
On the other hand, we can enumerate by vertices. Pick every vertex <math>v\in V</math> and for each of its <math>d(v)</math> neighbors, say <math>u</math>, generate a directed edge <math>(v,u)</math>. This gives us <math>\sum_{v\in V}d(v)</math> directed edges.
 
It is obvious that the two terms are equal, since we just count the same thing twice with different methods. The lemma follows.
}}
 
The handshaking lemma is implied directly by the above lemma, since the sum of even degrees is even.
==== Sperner's lemma ====
A '''triangulation''' of a triangle <math>abc</math> is a decomposition of <math>abc</math> to small triangles (called ''cells''), such that any two different cells are either disjoint, or share an edge, or a vertex.
 
A '''proper coloring''' of a triangulation of triangle <math>abc</math> is a coloring of all vertices in the triangulation with three colors: <font color=red>red</font>, <font color=blue>blue</font>, and <font color=green>green</font>, such that the following constraints are satisfied:
* The three vertices <math>a,b</math>, and <math>c</math> of the big triangle receive all three colors.
* The vertices in each of the three lines <math>ab</math>, <math>bc</math>, and <math>ac</math> receive two colors.
 
The following figure is an example of a properly colored triangulation.
[[Image:sperner-triangle.png|260px|center]]
 
In 1928 young Emanuel Sperner gave a combinatorial proof of the famous Brouwer's fixed point theorem by proving the following lemma (now called Sperner's lemma), with an extremely elegant proof.
{{Theorem|Sperner's Lemma (1928)|
:For any properly colored triangulation, there exists a cell receiving all three colors.
}}
{{Proof|
The proof is done by appropriately constructing a dual graph of the triangulation.
 
The dual graph is defined as follows:
* Each cell in the triangulation corresponds to a distinct vertex in the dual graph.
* The outer space corresponds to a distinct vertex in the dual graph.
* An edge is added between two vertices in the dual graph if the corresponding cells share a <math>{\color{Red}\mbox{red}}\mbox{--}{\color{Blue}\mbox{blue}}</math> edge.
 
The following is an example of the dual graph of a properly colored triangulation:
[[Image:sperner-dual.png|260px|center]]
 
For vertices in the dual graph:
* If a cell receives all three colors, the corresponding vertex in the dual graph has degree 1;
* if a cell receives only <math>{\color{Red}\mbox{red}}</math> and <math>{\color{Blue}\mbox{blue}}</math>, the corresponding vertex has degree 2;
* for all other cases (the cell is monochromatic, or does not have blue or red), the corresponding vertex has degree 0.
 
Besides, the unique vertex corresponding to the outer space must have odd degree, since the number of <math>{\color{Red}\mbox{red}}\mbox{--}{\color{Blue}\mbox{blue}}</math> transitions between a <math>{\color{Red}\mbox{red}}</math> endpoint and a <math>{\color{Blue}\mbox{blue}}</math> endpoint must be odd.
 
By handshaking lemma, the number of odd-degree vertices in the dual graph is even, thus the number of cells receiving all three colors must be odd, which cannot be zero.
}}
 
== The Pigeonhole Principle ==
The '''pigeonhole principle''' states the following "obvious" fact:
:''<math>n+1</math> pigeons cannot sit in <math>n</math> holes so that every pigeon is alone in its hole.''
This is one of the oldest '''non-constructive''' principles: it states only the ''existence'' of a pigeonhole with more than one pigeons and says nothing about how to ''find'' such a pigeonhole.
 
The general form of pigeonhole principle, also known as the '''averaging principle''', is stated as follows.
{{Theorem|Generalized pigeonhole principle|
:If a set consisting of more than <math>mn</math> objects is partitioned into <math>n</math> classes, then some class receives more than <math>m</math> objects.
}}
 
=== Inevitable divisors ===
The following is one of Erdős' favorite initiation questions to mathematics. The proof uses the Pigeonhole Principle.
 
{{Theorem|Theorem|
:For any subset <math>S\subseteq\{1,2,\ldots,2n\}</math> of size <math>|S|>n\,</math>, there are two numbers <math>a,b\in S</math> such that <math>a|b\,</math>.
}}
{{Proof|
For every odd number <math>m\in\{1,2,\ldots,2n\}</math>, let
:<math>C_m=\{2^km\mid k\ge 0, 2^km\le 2n\}</math>.
It is easy to see that for any <math>b<a</math> from the same <math>C_m</math>, it holds that <math>a|b</math>.
 
Every number <math>a\in S</math> can be uniquely represented as <math>a=2^km</math> for some odd number <math>m</math>, thus belongs to exactly one of <math>C_m</math>, for odd <math>m\in\{1,2,\ldots, 2n\}</math>.  There are <math>n</math> odd numbers in <math>\{1,2,\ldots,2n\}</math>, thus <math>n</math> different <math>C_m</math>, but <math>|S|>n</math>, thus there must exist distinct <math>a,b\in S</math>, supposed that <math>b<a</math>, belonging to the same <math>C_m</math>, which implies that <math>a|b</math>.
}}
 
=== Monotonic subsequences ===
Let <math>(a_1,a_2,\ldots,a_n)</math> be a sequence of <math>n</math> distinct real numbers. A '''subsequence''' is a sequence of distinct terms of <math>(a_1,a_2,\ldots,a_n)</math> appearing in the same order in which they appear in <math>(a_1,a_2,\ldots,a_n)</math>. Formally, a subsequence of <math>(a_1,a_2,\ldots,a_n)</math> is an <math>(a_{i_1},a_{i_2},\ldots,a_{i_k})</math>, with <math>i_1<i_2<\cdots<i_k</math>.
 
A sequence <math>(a_1,a_2,\ldots,a_n)</math> is '''increasing''' if <math>a_1<a_2<\cdots<a_n</math>, and '''decreasing''' if <math>a_1>a_2>\cdots>a_n</math>.
 
We are interested in the ''longest'' increasing and decreasing subsequences of an <math>a_1<a_2<\cdots<a_n</math>. It is intuitive that the length of both the longest increasing subsequence and the longest decreasing subsequence cannot be small simultaneously. A famous result of Erdős and Szekeres formally justifies this intuition. This is one of the first results in extremal combinatorics, published in the influential 1935 paper of Erdős and Szekeres.
 
{{Theorem|Theorem (Erdős-Szekeres 1935)|
:A sequence of more than <math>mn</math> different real numbers must contain either an increasing subsequence of length <math>m+1</math>, or a decreasing subsequence of length <math>n+1</math>.
}}
{{Proof|(due to Seidenberg 1959)
Let <math>(a_1,a_2,\ldots,a_{N})</math> be the original sequence of <math>N>mn</math> distinct real numbers. Associate each <math>a_i</math> a pair <math>(x_i,y_i)</math>, defined as:
*<math>x_i</math>: the length of the longest ''increasing'' subsequence ''ending'' at <math>a_i</math>;
*<math>y_i</math>: the length of the longest ''decreasing'' subsequence ''starting'' at <math>a_i</math>.
A key observation is that <math>(x_i,y_i)\neq (x_j,y_j)</math> whenever <math>i\neq j</math>. This is proved as follows:
: '''Case 1:''' If <math>a_i<a_j</math>, then the longest increasing subsequence ending at <math>a_i</math> can be extended by adding on <math>a_j</math>, so <math>x_i<x_j</math>.
: '''Case 2:'''  If <math>a_i>a_j</math>, then the longest decreasing subsequence starting at <math>a_j</math> can be preceded by <math>a_i</math>, so <math>y_i>y_j</math>.
Now we put <math>N</math> "pigeons" <math>a_1,a_2,\ldots,a_N</math> into "pigeonholes" <math>\{1,2,\ldots,N\}\times\{1,2,\ldots,N\}</math>, such that <math>a_i</math> is put into hole <math>(x_i,y_i)</math>, with at most one pigeon per each hole (since different <math>a_i</math> has different <math>(x_i,y_i)</math>).
 
The number of pigeons is <math>N>mn</math>. Due to pigeonhole principle, there must be a pigeon which is outside the region <math>\{1,2,\ldots,m\}\times\{1,2,\ldots,n\}</math>, which implies that there exists an <math>a_i</math> with either <math>x_i>m</math> or <math>y_i>n</math>. Due to our definition of <math>(x_i,y_i)</math>, there must be either an increasing subsequence of length <math>m+1</math>, or a decreasing subsequence of length <math>n+1</math>.
}}
 
=== Dirichlet's approximation ===
Let <math>x</math> be an irrational number. We now want to approximate <math>x</math> be a rational number (a fraction).
 
Since every real interval <math>[a,b]</math> with <math>a<b</math> contains infinitely many rational numbers, there must exist rational numbers arbitrarily close to <math>x</math>. The trick is to let the denominator of the fraction sufficiently large.
 
Suppose however we restrict the rationals we may select to have denominators bounded by <math>n</math>. How closely we can approximate <math>x</math> now?
 
The following important theorem is due to Dirichlet and his ''Schubfachprinzip'' ("drawer principle"). The theorem is fundamental in numer theory and real analysis, but the proof is combinatorial.
 
{{Theorem|Theorem (Dirichlet 1879)|
:Let <math>x</math> be an irrational number. For any natural number <math>n</math>, there is a rational number <math>\frac{p}{q}</math> such that <math>1\le q\le n</math> and
::<math>\left|x-\frac{p}{q}\right|<\frac{1}{nq}</math>.
}}
{{Proof|
Let <math>\{x\}=x-\lfloor x\rfloor</math> denote the '''fractional part''' of the real number <math>x</math>. It is obvious that <math>\{x\}\in[0,1)</math> for any real number <math>x</math>.
 
Consider the <math>n+1</math> numbers <math>\{kx\}</math>, <math>k=1,2,\ldots,n+1</math>. These <math>n+1</math> numbers (pigeons) belong to the following <math>n</math> intervals (pigeonholes):
:<math>\left(0,\frac{1}{n}\right),\left(\frac{1}{n},\frac{2}{n}\right),\ldots,\left(\frac{n-1}{n},1\right)</math>.
Since <math>x</math> is irrational, <math>\{kx\}</math> cannot coincide with any endpoint of the above intervals.
 
By the pigeonhole principle, there exist <math>1\le a<b\le n+1</math>, such that <math>\{ax\},\{bx\}</math> are in the same interval, thus
:<math>|\{bx\}-\{ax\}|<\frac{1}{n}</math>.
Therefore,
:<math>|(b-a)x-\left(\lfloor bx\rfloor-\lfloor ax\rfloor\right)|<\frac{1}{n}</math>.
Let <math>q=b-a</math> and <math>p=\lfloor bx\rfloor-\lfloor ax\rfloor</math>. We have <math>|qx-p|<\frac{1}{n}</math> and <math>1\le q\le n</math>. Dividing both sides by <math>q</math>, the theorem is proved.
}}

Latest revision as of 14:26, 23 June 2016

Template:Otheruse Fitness in biology is the relative ability of an organism to survive and pass on its genes to the next generation.[1]p160 It is a central idea in evolutionary theory. Fitness is usually equal to the proportion of the individual's genes in all the genes of the next generation.

Like all terms in evolutionary biology, fitness is defined in terms of an interbreeding population, which might or might not be a whole species. If differences in individual genotypes affect fitness, then the frequencies of the genotypes will change over generations; the genotypes with higher fitness become more common. This is the process called natural selection.

An individual's fitness is caused by its phenotype, and passed on by its genotype. The fitnesses of different individuals with the same genotype are not necessarily equal. It depends on the environment in which the individuals live, and on accidental events. However, since the fitness of the genotype is an averaged quantity, it reflects the reproductive outcomes of all individuals with that genotype.

Relatedness

Fitness measures the number of the copies of the genes of an individual in the next generation. It doesn't really matter how the genes arrive in the next generation. For an individual, it is equally "beneficial" to reproduce itself, or to help relatives with similar genes to reproduce, as long as similar number of copies of individual's genes get passed on to the next generation. Selection which promotes this kind of helper behaviour is called kin selection.

Our closest relatives (parents, siblings, and our own children) share on average 50% (half) of our genes. One step further removed are grandparents. With each of them we share on average 25% (a quarter) of our genes. That is a measure of our relatedness to them. Next come first cousins (children of our parents' siblings). We share 12.5% (1/8) of their genes.[2]p100

Hamilton's rule

William Hamilton added various ideas to the notion of fitness. His rule suggests that a costly action should be performed if:

[math]\displaystyle{ C \lt R \times B }[/math] where:
  • [math]\displaystyle{ c \ }[/math] is the reproductive cost to the altruist,
  • [math]\displaystyle{ b \ }[/math] is the reproductive benefit to the recipient of the altruistic behavior, and
  • [math]\displaystyle{ r \ }[/math] is the probability, above the population average, of the individuals sharing an altruistic gene – the "degree of relatedness".

Fitness costs and benefits are measured in fecundity.[3]

Inclusive fitness

Inclusive fitness is a term which is essentially the same as fitness, but emphasises the group of genes rather than individuals.

Biological fitness says how well an organism can reproduce, and spread its genes to its offspring. The theory of inclusive fitness says that the fitness of an organism is also increased to the extent that its close relatives also reproduce. This is because relatives share genes in proportion to their relationship.

Another way of saying it: the inclusive fitness of an organism is not a property of itself, but a property of its set of genes. It is calculated from from the reproductive success of the individual, plus the reproductive success of its relatives, each one weighed by an appropriate coefficient of relatedness.[4]

History

The British social philosopher Herbert Spencer coined the phrase survival of the fittest in his 1864 work Principles of biology to mean what Charles Darwin called natural selection.[5] The original phrase was "survival of the best fitted".

References

Template:Reflist

  1. King R.C. Stansfield W.D. & Mulligan P.K. 2006. A dictionary of genetics, 7th ed. Oxford.
  2. Maynard Smith, John. 1999. Evolutionary genetics. 2nd ed, Cambridge University Press.
  3. Hamilton W.D. 1964. The genetical evolution of social behavior. Journal of Theoretical Biology 7 (1): 1–52. doi:10.1016/0022-5193(64)90038-4.
  4. Adapted from Dawkins R. 1982. The extended phenotype. Oxford: Oxford University Press, p186. ISBN 0-19-288051-9
  5. Herbert Spencer 1864. Principles of Biology London, vol 1, 444, wrote “This survival of the fittest, which I have here sought to express in mechanical terms, is that which Mr. Darwin has called ‘natural selection’, or the preservation of favoured races in the struggle for life.