组合数学 (Spring 2013)/Cayley's formula and Fitness: Difference between pages

From TCS Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Etone
 
imported>Macdonald-ross
mNo edit summary
 
Line 1: Line 1:
= Cayley's Formula =
{{otheruse|biological fitness}}
We now present a theorem of the number of labeled trees on a fixed number of vertices. It is due to [http://en.wikipedia.org/wiki/Arthur_Cayley Cayley] in 1889. The theorem is often referred by the name [http://en.wikipedia.org/wiki/Cayley's_formula Cayley's formula].
'''Fitness''' in [[biology]] is the relative ability of an organism to survive and pass on its [[gene]]s to the next generation.<ref>King R.C. Stansfield W.D. & Mulligan P.K. 2006. ''A dictionary of genetics'', 7th ed. Oxford.</ref><sup>p160</sup> It is a central idea in [[evolution|evolutionary theory]]. Fitness is usually equal to the proportion of the individual's [[gene]]s in all the genes of the next generation.


{{Theorem|Cayley's formula for trees|
Like all terms in evolutionary biology, fitness is defined in terms of an interbreeding [[Population genetics|population]], which might or might not be a whole [[species]]. If differences in individual genotypes affect fitness, then the frequencies of the genotypes will change over generations; the genotypes with higher fitness become more common. This is the process called [[natural selection]].
: There are <math>n^{n-2}</math> different trees on <math>n</math> distinct vertices.
}}


The theorem has several proofs. Classical methods include the bijection which encodes a tree by a [http://en.wikipedia.org/wiki/Pr%C3%BCfer_sequence Prüfer code], through the [http://en.wikipedia.org/wiki/Kirchhoff's_matrix_tree_theorem Kirchhoff's matrix tree theorem], and by double counting.
An individual's fitness is caused by its [[phenotype]], and passed on by its [[genotype]]. The fitnesses of different individuals with the same genotype are not necessarily equal. It depends on the [[environment]] in which the individuals live, and on accidental [[event]]s. However, since the fitness of the genotype is an [[average]]d quantity, it reflects the reproductive outcomes of ''all'' individuals with that genotype.


== Prüfer code ==
== Relatedness ==
The Prüfer code encodes a labeled tree to a sequence of labels. This gives a bijections between trees and tuples.
Fitness measures the number of the ''copies'' of the genes of an individual in the next generation. It doesn't really matter how the genes arrive in the next generation. For an individual, it is equally "beneficial" to reproduce itself, or to help relatives with similar genes to reproduce, ''as long as similar number of copies of individual's genes get passed on to the next generation''. Selection which promotes this kind of helper behaviour is called [[kin selection]].


In a tree, the vertices of degree 1 are called leaves. It is easy to see that:
Our closest relatives (parents, siblings, and our own children) share on average 50% (half) of our genes. One step further removed are grandparents. With each of them we share on average 25% (a quarter) of our genes. That is a measure of our relatedness to them. Next come first cousins (children of our parents' siblings). We share 12.5% (1/8) of their genes.<ref name=JMS>Maynard Smith, John. 1999. ''Evolutionary genetics''. 2nd ed, Cambridge University Press.</ref><sup>p100</sup>
* each tree has at least two leaves; and
* after removing a leaf (along with the edge adjacent to it) from a tree, the resulting graph is still a tree.  


The following algorithm transforms a tree <math>T</math> of <math>n</math> vertices <math>1,2,\ldots,n</math>, to a tuple <math>(v_1,v_2,\ldots,v_{n-2})\in\{1,2,\ldots,n\}^{n-2}</math>.
=== Hamilton's rule ===
{{Theorem| Prüfer code (encoder)|
[[W.D. Hamilton|William Hamilton]] added various ideas to the notion of fitness. His rule suggests that a costly action should be performed if:
:'''Input''': A tree <math>T</math> of <math>n</math> distinct vertices, labeled by <math>1,2,\ldots,n</math>.
:<math>C < R \times B </math>   where:
:
* <math>c \ </math> is the reproductive cost to the altruist,
:let <math>T_1=T</math>;
* <math>b \ </math> is the reproductive benefit to the recipient of the altruistic behavior, and
:for <math>i=1</math> to <math>n-1</math>, do
* <math>r \ </math> is the probability, above the population average, of the individuals sharing an altruistic gene – the "degree of relatedness".
::let <math>u_i</math> be the leaf in <math>T_i</math> with the smallest label, and <math>v_i</math> be its neighbor;
Fitness costs and benefits are measured in [[fecundity]].<ref>Hamilton W.D. 1964. The genetical evolution of social behavior. ''Journal of Theoretical Biology'' '''7''' (1): 1–52. doi:10.1016/0022-5193(64)90038-4.</ref>
::let <math>T_{i+1}</math> be the new tree obtained from deleting the leaf <math>u_i</math> from <math>T_i</math>;
:end
:return <math>(v_1,v_2,\ldots,v_{n-2})</math>;
}}


It is trivial to observe the following lemma:
=== Inclusive fitness ===
{{Theorem|Lemma 1|
Inclusive fitness is a term which is essentially the same as fitness, but emphasises the group of genes rather than individuals.  
:For each <math>1\le i\le n-1</math>, <math>T_i</math> is a tree of <math>n-i+1</math> vertices. In particular, the vertices of <math>T_i</math> are  <math>u_i,u_{i+1},\ldots,u_{n-1},v_{n-1}</math>, and the edges of <math>T_i</math> are precisely <math>\{u_j,v_j\}</math>, <math>i\le j\le n-1</math>.
}}


And there is a reason that we do not need to store <math>v_{n-1}</math> in the Prüfer code.
Biological fitness says how well an organism can reproduce, and spread its genes to its offspring. The theory of inclusive fitness says that the fitness of an organism is also increased to the extent that its close relatives also reproduce. This is because relatives share genes in proportion to their relationship.
{{Theorem|Lemma 2|
:It always holds that <math>v_{n-1}=n</math>.
}}
{{Proof|
Every tree (of at least two vertices) has at least two leaves. The <math>u_i</math>, <math>1\le i\le n-1</math>, are the leaf of the smallest label in <math>T_i</math>, which can never be <math>n</math>, thus <math>n</math> is never deleted.
}}


Lemma 1 and 2 together imply that given a Prüfer code <math>(v_1,v_2,\ldots,v_{n-2})</math>, the only remaining task to reconstruct the tree <math>T</math> is to figure out those <math>u_i</math>, <math>1\le i\le n-1</math>. The following lemma state how to obtain <math>u_i</math>, <math>1\le i\le n-1</math>, from a Prüfer code <math>(v_1,v_2,\ldots,v_{n-2})</math>.
Another way of saying it: ''the inclusive fitness of an organism is not a property of itself, but a property of its set of [[genes]]''. It is calculated from from the reproductive success of the individual, plus the reproductive success of its relatives, each one weighed by an appropriate coefficient of relatedness.<ref>Adapted from Dawkins R. 1982. ''The extended phenotype''. Oxford: Oxford University Press, p186.  ISBN 0-19-288051-9</ref>


{{Theorem|Lemma 3|
== History ==
:For <math>i=1,2,\ldots,n-1</math>, <math>u_i</math> is the smallest element of <math>\{1,2,\ldots,n\}</math> not in <math>\{u_1,\ldots,u_{i-1}\}\cup\{v_i,\ldots,v_{n-1}\}</math>.
The [[British]] [[Sociology|social]] [[philosopher]] [[Herbert Spencer]] coined the phrase ''[[survival of the fittest]]'' in his 1864 work ''Principles of biology'' to mean what [[Charles Darwin]] called [[natural selection]].<ref> Herbert Spencer 1864. ''Principles of Biology'' London, vol 1, 444, wrote “This survival of the fittest, which I have here sought to express in mechanical terms, is that which Mr. Darwin has called ‘natural selection’, or the preservation of favoured races in the struggle for life. </ref> The original phrase was "survival of the best fitted".  
}}
{{Proof|
Note that <math>u_1,u_2,\ldots,u_{n-1},v_{n-1}</math> is a sequence of distinct vertices, because <math>u_1,u_2,\ldots,u_{n-1}</math> are deleted one by one from the tree, and <math>v_{n-1}=n</math> is never deleted. Thus, each vertex <math>v</math> appears among <math>u_1,u_2,\ldots,u_{n-1},v_{n-1}</math> exactly once. And each vertex <math>v</math> appears for <math>deg(v)</math> times among the edges <math>\{u_i,v_i\}</math>, <math>1\le i\le n-1</math>, where <math>deg(v)</math> denotes the degree of vertex <math>v</math> in the original tree <math>T</math>. Therefore, each vertex <math>v</math> appears among  <math>v_1,v_2,\ldots,v_{n-2}</math> for <math>deg(v)-1</math> times.


Similarly, each vertex <math>v</math> of <math>T_i</math> appears among <math>v_i,v_{i+1},\ldots,v_{n-2}</math> for <math>deg_i(v)-1</math> times, where <math>deg_i(v)</math> is the degree of vertex <math>v</math> in the tree <math>T_i</math>. In particular, the leaves of <math>T_i</math> are not among <math>\{v_i,v_{i+1},\ldots,v_{n-2}\}</math>. Recall that the vertices of <math>T_i</math> are <math>u_i,u_{i+1},\ldots,u_{n-1},v_{n-1}</math>. Then the leaves of <math>T_i</math> are the elements of <math>\{1,2,\ldots,n\}</math> not in <math>\{u_1,\ldots,u_{i-1}\}\cup\{v_i,\ldots,v_{n-1}\}</math>. By definition of Prüfer code, <math>u_i</math> is the leaf in <math>T_i</math> of smallest label, hence the smallest element of <math>\{1,2,\ldots,n\}</math> not in <math>\{u_1,\ldots,u_{i-1}\}\cup\{v_i,\ldots,v_{n-1}\}</math>.
== References ==
}}
{{Reflist}}


Applying Lemma 3, we have the following decoder for the Prüfer code:
[[Category:Classical genetics]]
{{Theorem| Prüfer code (decoder)|
[[Category:Evolutionary biology]]
:'''Input''': A tuple <math>(v_1,v_2,\ldots,v_{n-2})\in\{1,2,\ldots,n\}^{n-2}</math>.
:
:let <math>T</math> be empty graph, and <math>v_{n-1}=n</math>;
:for <math>i=1</math> to <math>n-1</math>, do
::let <math>u_i</math> be the smallest label not in <math>\{u_1,\ldots,u_{i-1}\}\cup\{v_i,\ldots,v_{n-1}\}</math>;
::add an edge <math>\{u_i,v_i\}</math> to <math>T</math>;
:end
:return <math>T</math>;
}}
 
In other words, the encoding of trees to tuples by the Prüfer code is reversible, thus the mapping is injective (1-1). To see it is also surjective, we need to show that for every possible <math>(v_1,v_2,\ldots,v_{n-2})\in\{1,2,\ldots,n\}^{n-2}</math>, the above decoder recovers a tree from it.
 
It is easy to see that the decoder always returns a graph of <math>n-1</math> edges on the <math>n</math> vertices. The only thing remaining to verify is that the returned graph has no cycle in it, which can be easily proved by a timeline argument (left as an exercise).
 
Therefore, the Prüfer code establishes a bijection between the set of trees on <math>n</math> distinct vertices and the tuples from <math>\{1,2,\ldots,n\}^{n-2}</math>. This proves Cayley's formula.
 
== Double counting ==
We now present a proof of the Cayley's formula by double counting, which is considered by the [http://en.wikipedia.org/wiki/Proofs_from_THE_BOOK Proofs from THE BOOK] "the most beautiful of them all".
{{Prooftitle|Proof of Cayley's formula by double counting|
(Due to Pitman 1999)
 
Let <math>T_n</math> be the number of different trees defined on <math>n</math> distinct vertices.
 
A '''rooted tree''' is a tree with a special vertex. That is, one of the <math>n</math> vertices is marked as the "root" of the tree. A rooted tree defines a natural direction of all edges, such that an edge <math>uv</math> of the tree is directed from <math>u</math> to <math>v</math> if <math>u</math> is before <math>v</math> along the unique path from the root.
 
We count the number of different ''sequences'' of directed edges that can be added to an empty graph on <math>n</math> vertices to form from it a ''rooted'' tree. We note that such a sequence can be formed in two ways:
# Starting with an unrooted tree, choose one of its vertices as root, and fix an total order of edges to specify the order in which the edges are added.
# Starting from an empty graph, add the edges one by one in steps.
 
In the first method, we pick one of the <math>T_n</math> unrooted trees, choose one of the <math>n</math> vertices as the root, and pick one of the <math>(n-1)!</math> total orders of the <math>n-1</math> edges. This gives us <math>T_nn(n-1)!=T_nn!</math> ways.
 
In the second method, we consider the number of choices in one step, and multiply the numbers of choices in all steps. This is done as follows.
 
Given a sequence of ''adding'' <math>n-1</math> edges to an empty graph to form a rooted tree, we reverse this sequence and get a sequence of ''removing'' edges one by one from the final rooted tree until no edge left. We observe that:
* At first, we remove an edge from the rooted tree. Suppose that the root of the tree is <math>r</math>, and the removed directed edge is <math>(u,v)</math>.  After removing <math>(u,v)</math>, the original rooted tree is disconnected into two rooted trees, one rooted at <math>r</math> and the other rooted at <math>v</math>.
* After removing <math>k-1</math> edges, there are <math>k</math> rooted trees. In the <math>k</math>th step, a directed edge <math>(u,v)</math> in the current forest is removed and the tree containing <math>(u,v)</math> is disconnected into two trees, one rooted at the old root of that tree, and the other rooted at <math>v</math>.
 
We now again reverse the above procedure, and consider the sequence of adding directed edges to an empty graph to form a rooted tree.
* At first, we have <math>n</math> rooted trees, each of 0 edge (<math>n</math> isolated vertices).
* After adding <math>n-k</math> edges, there are <math>k</math> rooted trees. Denoting the directed edge added next as <math>(u,v)</math>. As observed above, <math>u</math> can be any one of the <math>n</math> vertices; but <math>v</math> must be the root of one of the <math>k</math> trees, except the tree which contains <math>u</math>. There are <math>n(k-1)</math> choices of such <math>(u,v)</math>.
Multiplying the numbers of choices in all steps, the number of sequences of adding directed edges to an empty graph to form a rooted tree is given by
:<math>\prod_{k=2}^nn(k-1)=n^{n-2}n!</math>.
 
By the principle of double counting, counting the same thing by different methods yield the same result.
:<math>T_nn!=n^{n-2}n!</math>,
which gives that <math>T_n=n^{n-2}</math>.
}}
 
== Kirchhoff's Matrix-Tree Theorem ==
{To be added}

Latest revision as of 14:26, 23 June 2016

Template:Otheruse Fitness in biology is the relative ability of an organism to survive and pass on its genes to the next generation.[1]p160 It is a central idea in evolutionary theory. Fitness is usually equal to the proportion of the individual's genes in all the genes of the next generation.

Like all terms in evolutionary biology, fitness is defined in terms of an interbreeding population, which might or might not be a whole species. If differences in individual genotypes affect fitness, then the frequencies of the genotypes will change over generations; the genotypes with higher fitness become more common. This is the process called natural selection.

An individual's fitness is caused by its phenotype, and passed on by its genotype. The fitnesses of different individuals with the same genotype are not necessarily equal. It depends on the environment in which the individuals live, and on accidental events. However, since the fitness of the genotype is an averaged quantity, it reflects the reproductive outcomes of all individuals with that genotype.

Relatedness

Fitness measures the number of the copies of the genes of an individual in the next generation. It doesn't really matter how the genes arrive in the next generation. For an individual, it is equally "beneficial" to reproduce itself, or to help relatives with similar genes to reproduce, as long as similar number of copies of individual's genes get passed on to the next generation. Selection which promotes this kind of helper behaviour is called kin selection.

Our closest relatives (parents, siblings, and our own children) share on average 50% (half) of our genes. One step further removed are grandparents. With each of them we share on average 25% (a quarter) of our genes. That is a measure of our relatedness to them. Next come first cousins (children of our parents' siblings). We share 12.5% (1/8) of their genes.[2]p100

Hamilton's rule

William Hamilton added various ideas to the notion of fitness. His rule suggests that a costly action should be performed if:

[math]\displaystyle{ C \lt R \times B }[/math] where:
  • [math]\displaystyle{ c \ }[/math] is the reproductive cost to the altruist,
  • [math]\displaystyle{ b \ }[/math] is the reproductive benefit to the recipient of the altruistic behavior, and
  • [math]\displaystyle{ r \ }[/math] is the probability, above the population average, of the individuals sharing an altruistic gene – the "degree of relatedness".

Fitness costs and benefits are measured in fecundity.[3]

Inclusive fitness

Inclusive fitness is a term which is essentially the same as fitness, but emphasises the group of genes rather than individuals.

Biological fitness says how well an organism can reproduce, and spread its genes to its offspring. The theory of inclusive fitness says that the fitness of an organism is also increased to the extent that its close relatives also reproduce. This is because relatives share genes in proportion to their relationship.

Another way of saying it: the inclusive fitness of an organism is not a property of itself, but a property of its set of genes. It is calculated from from the reproductive success of the individual, plus the reproductive success of its relatives, each one weighed by an appropriate coefficient of relatedness.[4]

History

The British social philosopher Herbert Spencer coined the phrase survival of the fittest in his 1864 work Principles of biology to mean what Charles Darwin called natural selection.[5] The original phrase was "survival of the best fitted".

References

Template:Reflist

  1. King R.C. Stansfield W.D. & Mulligan P.K. 2006. A dictionary of genetics, 7th ed. Oxford.
  2. Maynard Smith, John. 1999. Evolutionary genetics. 2nd ed, Cambridge University Press.
  3. Hamilton W.D. 1964. The genetical evolution of social behavior. Journal of Theoretical Biology 7 (1): 1–52. doi:10.1016/0022-5193(64)90038-4.
  4. Adapted from Dawkins R. 1982. The extended phenotype. Oxford: Oxford University Press, p186. ISBN 0-19-288051-9
  5. Herbert Spencer 1864. Principles of Biology London, vol 1, 444, wrote “This survival of the fittest, which I have here sought to express in mechanical terms, is that which Mr. Darwin has called ‘natural selection’, or the preservation of favoured races in the struggle for life.