组合数学 (Spring 2013)/Cayley's formula
Cayley's Formula
We now present a theorem of the number of labeled trees on a fixed number of vertices. It is due to Cayley in 1889. The theorem is often referred by the name Cayley's formula.
Cayley's formula for trees - There are [math]\displaystyle{ n^{n-2} }[/math] different trees on [math]\displaystyle{ n }[/math] distinct vertices.
The theorem has several proofs, including the bijection which encodes a tree by a Prüfer code, through the Kirchhoff's matrix tree theorem, and by double counting.
Proof of Cayley's formula by double counting
We now present a double counting proof, which is considered by the Proofs from THE BOOK "the most beautiful of them all".
Proof of Cayley's formula by double counting (Due to Pitman 1999)
Let [math]\displaystyle{ T_n }[/math] be the number of different trees defined on [math]\displaystyle{ n }[/math] distinct vertices.
A rooted tree is a tree with a special vertex. That is, one of the [math]\displaystyle{ n }[/math] vertices is marked as the "root" of the tree. A rooted tree defines a natural direction of all edges, such that an edge [math]\displaystyle{ uv }[/math] of the tree is directed from [math]\displaystyle{ u }[/math] to [math]\displaystyle{ v }[/math] if [math]\displaystyle{ u }[/math] is before [math]\displaystyle{ v }[/math] along the unique path from the root.
We count the number of different sequences of directed edges that can be added to an empty graph on [math]\displaystyle{ n }[/math] vertices to form from it a rooted tree. We note that such a sequence can be formed in two ways:
- Starting with an unrooted tree, choose one of its vertices as root, and fix an total order of edges to specify the order in which the edges are added.
- Starting from an empty graph, add the edges one by one in steps.
In the first method, we pick one of the [math]\displaystyle{ T_n }[/math] unrooted trees, choose one of the [math]\displaystyle{ n }[/math] vertices as the root, and pick one of the [math]\displaystyle{ (n-1)! }[/math] total orders of the [math]\displaystyle{ n-1 }[/math] edges. This gives us [math]\displaystyle{ T_nn(n-1)!=T_nn! }[/math] ways.
In the second method, we consider the number of choices in one step, and multiply the numbers of choices in all steps. This is done as follows.
Given a sequence of adding [math]\displaystyle{ n-1 }[/math] edges to an empty graph to form a rooted tree, we reverse this sequence and get a sequence of removing edges one by one from the final rooted tree until no edge left. We observe that:
- At first, we remove an edge from the rooted tree. Suppose that the root of the tree is [math]\displaystyle{ r }[/math], and the removed directed edge is [math]\displaystyle{ (u,v) }[/math]. After removing [math]\displaystyle{ (u,v) }[/math], the original rooted tree is disconnected into two rooted trees, one rooted at [math]\displaystyle{ r }[/math] and the other rooted at [math]\displaystyle{ v }[/math].
- After removing [math]\displaystyle{ k-1 }[/math] edges, there are [math]\displaystyle{ k }[/math] rooted trees. In the [math]\displaystyle{ k }[/math]th step, a directed edge [math]\displaystyle{ (u,v) }[/math] in the current forest is removed and the tree containing [math]\displaystyle{ (u,v) }[/math] is disconnected into two trees, one rooted at the old root of that tree, and the other rooted at [math]\displaystyle{ v }[/math].
We now again reverse the above procedure, and consider the sequence of adding directed edges to an empty graph to form a rooted tree.
- At first, we have [math]\displaystyle{ n }[/math] rooted trees, each of 0 edge ([math]\displaystyle{ n }[/math] isolated vertices).
- After adding [math]\displaystyle{ n-k }[/math] edges, there are [math]\displaystyle{ k }[/math] rooted trees. Denoting the directed edge added next as [math]\displaystyle{ (u,v) }[/math]. As observed above, [math]\displaystyle{ u }[/math] can be any one of the [math]\displaystyle{ n }[/math] vertices; but [math]\displaystyle{ v }[/math] must be the root of one of the [math]\displaystyle{ k }[/math] trees, except the tree which contains [math]\displaystyle{ u }[/math]. There are [math]\displaystyle{ n(k-1) }[/math] choices of such [math]\displaystyle{ (u,v) }[/math].
Multiplying the numbers of choices in all steps, the number of sequences of adding directed edges to an empty graph to form a rooted tree is given by
- [math]\displaystyle{ \prod_{k=2}^nn(k-1)=n^{n-2}n! }[/math].
By the principle of double counting, counting the same thing by different methods yield the same result.
- [math]\displaystyle{ T_nn!=n^{n-2}n! }[/math],
which gives that [math]\displaystyle{ T_n=n^{n-2} }[/math].
- [math]\displaystyle{ \square }[/math]
Prüfer code
The Prüfer code encodes a labeled tree to a sequence of labels. This gives a bijections between trees and tuples.
Encoding
In a tree, the vertices of degree 1 are called leaves. It is easy to see that:
- each tree has at least two leaves; and
- after removing a leaf (along with the edge adjacent to it) from a tree, the resulting graph is still a tree.
The following algorithm transforms a tree [math]\displaystyle{ T }[/math] of [math]\displaystyle{ n }[/math] vertices [math]\displaystyle{ 1,2,\ldots,n }[/math], to a tuple [math]\displaystyle{ (v_1,v_2,\ldots,v_{n-2})\in\{1,2,\ldots,n\}^{n-2} }[/math].
Prüfer code (encoder) - Input: A tree [math]\displaystyle{ T }[/math] of [math]\displaystyle{ n }[/math] distinct vertices, labeled by [math]\displaystyle{ 1,2,\ldots,n }[/math].
- let [math]\displaystyle{ T_1=T }[/math];
- for [math]\displaystyle{ i=1 }[/math] to [math]\displaystyle{ n-1 }[/math], do
- let [math]\displaystyle{ u_i }[/math] be the leaf in [math]\displaystyle{ T_i }[/math] with the smallest label, and [math]\displaystyle{ v_i }[/math] be its neighbor;
- let [math]\displaystyle{ T_{i+1} }[/math] be the new tree obtained from deleting the leaf [math]\displaystyle{ u_i }[/math] from [math]\displaystyle{ T_i }[/math];
- end
- return [math]\displaystyle{ (v_1,v_2,\ldots,v_{n-2}) }[/math];
Decoding
It is trivial to observe the following lemma:
Lemma 1 - For each [math]\displaystyle{ 1\le i\le n-1 }[/math], [math]\displaystyle{ T_i }[/math] is a tree of [math]\displaystyle{ n-i+1 }[/math] vertices. In particular, the vertices of [math]\displaystyle{ T_i }[/math] are [math]\displaystyle{ u_i,u_{i+1},\ldots,u_{n-1},v_{n-1} }[/math], and the edges of [math]\displaystyle{ T_i }[/math] are precisely [math]\displaystyle{ \{u_j,v_j\} }[/math], [math]\displaystyle{ i\le j\le n-1 }[/math].
And there is a reason that we do not need to store [math]\displaystyle{ v_{n-1} }[/math] in the Prüfer code.
Lemma 2 - It always holds that [math]\displaystyle{ v_{n-1}=n }[/math].
Proof. Every tree (of at least two vertices) has at least two leaves. The [math]\displaystyle{ u_i }[/math], [math]\displaystyle{ 1\le i\le n-1 }[/math], are the leaf of the smallest label in [math]\displaystyle{ T_i }[/math], which can never be [math]\displaystyle{ n }[/math], thus [math]\displaystyle{ n }[/math] is never deleted.
- [math]\displaystyle{ \square }[/math]
Lemma 1 and 2 together imply that given a Prüfer code [math]\displaystyle{ (v_1,v_2,\ldots,v_{n-2}) }[/math], the only remaining task to reconstruct the tree [math]\displaystyle{ T }[/math] is to figure out those [math]\displaystyle{ u_i }[/math], [math]\displaystyle{ 1\le i\le n-1 }[/math]. The following lemma state how to obtain [math]\displaystyle{ u_i }[/math], [math]\displaystyle{ 1\le i\le n-1 }[/math], from a Prüfer code [math]\displaystyle{ (v_1,v_2,\ldots,v_{n-2}) }[/math].
Lemma 3 - For [math]\displaystyle{ i=1,2,\ldots,n-1 }[/math], [math]\displaystyle{ u_i }[/math] is the smallest element of [math]\displaystyle{ \{1,2,\ldots,n\} }[/math] not in [math]\displaystyle{ \{u_1,\ldots,u_{i-1}\}\cup\{v_i,\ldots,v_{n-1}\} }[/math].
Proof. Note that [math]\displaystyle{ u_1,u_2,\ldots,u_{n-1},v_{n-1} }[/math] is a sequence of distinct vertices, because [math]\displaystyle{ u_1,u_2,\ldots,u_{n-1} }[/math] are deleted one by one from the tree, and [math]\displaystyle{ v_{n-1}=n }[/math] is never deleted. Thus, each vertex [math]\displaystyle{ v }[/math] appears among [math]\displaystyle{ u_1,u_2,\ldots,u_{n-1},v_{n-1} }[/math] exactly once. And each vertex [math]\displaystyle{ v }[/math] appears for [math]\displaystyle{ deg(v) }[/math] times among the edges [math]\displaystyle{ \{u_i,v_i\} }[/math], [math]\displaystyle{ 1\le i\le n-1 }[/math], where [math]\displaystyle{ deg(v) }[/math] denotes the degree of vertex [math]\displaystyle{ v }[/math] in the original tree [math]\displaystyle{ T }[/math]. Therefore, each vertex [math]\displaystyle{ v }[/math] appears among [math]\displaystyle{ v_1,v_2,\ldots,v_{n-2} }[/math] for [math]\displaystyle{ deg(v)-1 }[/math] times.
Similarly, each vertex [math]\displaystyle{ v }[/math] of [math]\displaystyle{ T_i }[/math] appears among [math]\displaystyle{ v_i,v_{i+1},\ldots,v_{n-2} }[/math] for [math]\displaystyle{ deg_i(v)-1 }[/math] times, where [math]\displaystyle{ deg_i(v) }[/math] is the degree of vertex [math]\displaystyle{ v }[/math] in the tree [math]\displaystyle{ T_i }[/math]. In particular, the leaves of [math]\displaystyle{ T_i }[/math] are not among [math]\displaystyle{ \{v_i,v_{i+1},\ldots,v_{n-2}\} }[/math]. Recall that the vertices of [math]\displaystyle{ T_i }[/math] are [math]\displaystyle{ u_i,u_{i+1},\ldots,u_{n-1},v_{n-1} }[/math]. Then the leaves of [math]\displaystyle{ T_i }[/math] are the elements of [math]\displaystyle{ \{1,2,\ldots,n\} }[/math] not in [math]\displaystyle{ \{u_1,\ldots,u_{i-1}\}\cup\{v_i,\ldots,v_{n-1}\} }[/math]. By definition of Prüfer code, [math]\displaystyle{ u_i }[/math] is the leaf in [math]\displaystyle{ T_i }[/math] of smallest label, hence the smallest element of [math]\displaystyle{ \{1,2,\ldots,n\} }[/math] not in [math]\displaystyle{ \{u_1,\ldots,u_{i-1}\}\cup\{v_i,\ldots,v_{n-1}\} }[/math].
- [math]\displaystyle{ \square }[/math]
Applying Lemma 3, we have the following decoder for the Prüfer code:
Prüfer code (decoder) - Input: A tuple [math]\displaystyle{ (v_1,v_2,\ldots,v_{n-2})\in\{1,2,\ldots,n\}^{n-2} }[/math].
- let [math]\displaystyle{ T }[/math] be empty graph, and [math]\displaystyle{ v_{n-1}=n }[/math];
- for [math]\displaystyle{ i=1 }[/math] to [math]\displaystyle{ n-1 }[/math], do
- let [math]\displaystyle{ u_i }[/math] be the smallest label not in [math]\displaystyle{ \{u_1,\ldots,u_{i-1}\}\cup\{v_i,\ldots,v_{n-1}\} }[/math];
- add an edge [math]\displaystyle{ \{u_i,v_i\} }[/math] to [math]\displaystyle{ T }[/math];
- end
- return [math]\displaystyle{ T }[/math];
In other words, the encoding of trees to tuples by the Prüfer code is reversible, thus the mapping is injective (1-1). To see it is also surjective, we need to show that for every possible [math]\displaystyle{ (v_1,v_2,\ldots,v_{n-2})\in\{1,2,\ldots,n\}^{n-2} }[/math], the above decoder recovers a tree from it.
It is easy to see that the decoder always returns a graph of [math]\displaystyle{ n-1 }[/math] edges on the [math]\displaystyle{ n }[/math] vertices. The only thing remaining to verify is that the returned graph has no cycle in it, which can be easily proved by a timeline argument (left as an exercise).
Bijection proof of Cayley's formula
Therefore, the Prüfer code establishes a bijection between the set of trees on [math]\displaystyle{ n }[/math] distinct vertices and the tuples from [math]\displaystyle{ \{1,2,\ldots,n\}^{n-2} }[/math]. This proves Cayley's formula.
Kirchhoff's Matrix-Tree Theorem
Given an undirected graph [math]\displaystyle{ G([n],E) }[/math], the adjacency matrix [math]\displaystyle{ A }[/math] of graph [math]\displaystyle{ G }[/math] is an [math]\displaystyle{ n\times n }[/math] matrix such that
- [math]\displaystyle{ A(i,j)=\begin{cases} 1 & \{i,j\}\in E,\\ 0 & \{i,j\}\not\in E. \end{cases} }[/math]
Graph Laplacian
Let [math]\displaystyle{ D }[/math] be a [math]\displaystyle{ n\times n }[/math] diagonal matrix such that
- [math]\displaystyle{ D(i,j)=\begin{cases} \text{deg}(i) & i=j,\\ 0 & i\neq j, \end{cases} }[/math]
where [math]\displaystyle{ \text{deg}(i) }[/math] denotes the degree of vertex [math]\displaystyle{ i }[/math].
The Laplacian matrix [math]\displaystyle{ L }[/math] of graph [math]\displaystyle{ G }[/math] is defined as [math]\displaystyle{ L=D-A }[/math], that is,
- [math]\displaystyle{ L(i,j)=\begin{cases} \text{deg}(i) & i=j,\\ -1 & i\neq j\text{ and } \{i,j\}\in E,\\ 0 & \text{otherwise}. \end{cases} }[/math]
Suppose [math]\displaystyle{ G([n],E) }[/math] has [math]\displaystyle{ m }[/math] edges. The incidence matrix [math]\displaystyle{ B }[/math] of graph [math]\displaystyle{ G }[/math] is an [math]\displaystyle{ n\times m }[/math] matrix such that
- [math]\displaystyle{ \forall i\in[n], \forall e\in E,\quad B(i,e)=\begin{cases} 1 & e=\{i,j\}\text{ and } i\lt j,\\ -1 & e=\{i,j\}\text{ and } i\gt j,\\ 0 & \text{otherwise}. \end{cases} }[/math]
The following proposition is easy to verify.
Proposition - [math]\displaystyle{ L=BB^T }[/math].
Proof. For any [math]\displaystyle{ i,j\in[n] }[/math], we have
- [math]\displaystyle{ (BB^T)(i,j)=\sum_{e\in E}B(i,e)B^T(e,j)=\sum_{e\in E}B(i,e)B(j,e) }[/math].
It is easy to verify that
- [math]\displaystyle{ \sum_{e\in E}B(i,e)B(j,e)= \begin{cases} \text{deg}(i) & i=j,\\ -1 & i\neq j\text{ and } \{i,j\}\in E,\\ 0 & \text{otherwise}, \end{cases} }[/math]
which is equal to the definition of [math]\displaystyle{ L }[/math].
- [math]\displaystyle{ \square }[/math]
The Matrix-tree Theorem
The matrix-tree theorem of Kirchhoff states a striking fact: The number of spanning trees in any connected graph can be computed as the determinant of some appropriate graph matrix.
Kirchhoff's Matrix-Tree Theorem - For any connected graph [math]\displaystyle{ G }[/math] of [math]\displaystyle{ n }[/math] vertices, the number of spanning trees in [math]\displaystyle{ G }[/math] is [math]\displaystyle{ \det(L_{i,i})\, }[/math] for all [math]\displaystyle{ i\in[n] }[/math], where [math]\displaystyle{ L_{i,i}\, }[/math] is the [math]\displaystyle{ (n-1)\times(n-1) }[/math] matrix resulting from the Laplacian matrix [math]\displaystyle{ L }[/math] of graph [math]\displaystyle{ G }[/math] by deleting the [math]\displaystyle{ i }[/math]-th row and the [math]\displaystyle{ i }[/math]-th column.
The determinant can be computed as fast as matrix multiplication, thus is quite efficient, especially when compared to our task: counting the number of subgraphs satisfying certain nontrivial global constraint (e.g. spanning tree). Such efficient algorithm is rarely seen for counting problems, which are usually #P-hard to compute (e.g. number of matchings in a graph).
Cauchy-Binet Theorem
The key to prove the matrix-tree theorem is the Cauchy-Binet theorem in linear algebra, whose proof is beyond the scope of class.
Cauchy-Binet Theorem - Let [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math] be, respectively, [math]\displaystyle{ n\times m }[/math] and [math]\displaystyle{ m\times n }[/math] matrix. For any [math]\displaystyle{ S\subseteq [m] }[/math] of size [math]\displaystyle{ n }[/math], let [math]\displaystyle{ A_{[n], S} }[/math] and [math]\displaystyle{ B_{S,[n]} }[/math] denote, respectively, the [math]\displaystyle{ n\times n }[/math] submatrices of [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math], consisting of the columns of [math]\displaystyle{ A }[/math], or the rows of [math]\displaystyle{ B }[/math], indexed by elements of [math]\displaystyle{ S }[/math]. Then
- [math]\displaystyle{ \det(AB)=\sum_{S\in{[m]\choose n}}\det(A_{[n],S})\det(B_{S,[n]}). }[/math]
- Let [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math] be, respectively, [math]\displaystyle{ n\times m }[/math] and [math]\displaystyle{ m\times n }[/math] matrix. For any [math]\displaystyle{ S\subseteq [m] }[/math] of size [math]\displaystyle{ n }[/math], let [math]\displaystyle{ A_{[n], S} }[/math] and [math]\displaystyle{ B_{S,[n]} }[/math] denote, respectively, the [math]\displaystyle{ n\times n }[/math] submatrices of [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math], consisting of the columns of [math]\displaystyle{ A }[/math], or the rows of [math]\displaystyle{ B }[/math], indexed by elements of [math]\displaystyle{ S }[/math]. Then
Proof of matrix-tree theorem
Let [math]\displaystyle{ B }[/math] be the incidence matrix of graph [math]\displaystyle{ G }[/math]. Fix any vertex [math]\displaystyle{ i }[/math], let [math]\displaystyle{ C }[/math] be the [math]\displaystyle{ (n-1)\times m }[/math] matrix resulting from [math]\displaystyle{ B }[/math] by deleting the [math]\displaystyle{ i }[/math]-th row.
Recall that [math]\displaystyle{ L=BB^T }[/math]. It is easy to verify that
- [math]\displaystyle{ L_{i,i}=CC^T. }[/math]
Due to the Cauchy-Binet Theorem, we have
- [math]\displaystyle{ \begin{align} \det(L)=\det(CC^T) &=\sum_{S\in{[m]\choose n-1}}\det(C_{[n-1],S})\det(C^T_{S,[n-1]})\\ &=\sum_{S\in{[m]\choose n-1}}\det(C_{[n-1],S})^2. \end{align} }[/math]
The next lemma gives a key observation to prove the matrix-tree theorem.
Lemma - For any [math]\displaystyle{ S\subseteq [m] }[/math] of size [math]\displaystyle{ n-1 }[/math], the value of [math]\displaystyle{ \det(C_{[n-1],S}) }[/math] is either 1, or -1, or 0. Moreover, [math]\displaystyle{ \det(C_{[n-1],S})=\pm1 }[/math] if and only if [math]\displaystyle{ S }[/math] indicates a spanning tree in [math]\displaystyle{ G }[/math].
Proof. We first show [math]\displaystyle{ \det(C_{[n-1],S})\in\{0,1,-1\} }[/math] by induction on [math]\displaystyle{ n }[/math]. Note that [math]\displaystyle{ C_{[n-1],S} }[/math] is an [math]\displaystyle{ (n-1)\times (n-1) }[/math] matrix such that each column contains at most one 1 and at most one -1, and all other entries are 0. For such matrix, when [math]\displaystyle{ n-1=1 }[/math], the induction hypothesis [math]\displaystyle{ \det(C_{[n-1],S})\in\{0,1,-1\} }[/math] is trivially true. And for general [math]\displaystyle{ n }[/math], if every column has a 1 and a -1, then the sum of all rows is the zero vector, so the matrix is singular. Otherwise, expand the determinant by a column with one nonzero entry to find it is [math]\displaystyle{ \pm1 }[/math] times the determinant of a smaller matrix of the same property, which by induction hypothesis has value 0 or [math]\displaystyle{ \pm1 }[/math].
We then show that [math]\displaystyle{ \det(C_{[n-1],S}) }[/math] is nonzero if and only if [math]\displaystyle{ S }[/math] is a spanning tree.
Suppose the [math]\displaystyle{ n-1 }[/math] edges corresponding to [math]\displaystyle{ S }[/math] is not a spanning tree of [math]\displaystyle{ G }[/math]. Then [math]\displaystyle{ S }[/math] must have more than one components, and there must be a component [math]\displaystyle{ R }[/math] not containing vertex [math]\displaystyle{ i }[/math]. The rows of [math]\displaystyle{ C_{[n-1],S} }[/math] corresponding to [math]\displaystyle{ R }[/math] add to 0, thus these rows are linearly dependent, and hence [math]\displaystyle{ \det(C_{[n-1],S})=0 }[/math].
Suppose the [math]\displaystyle{ n-1 }[/math] edges corresponding to [math]\displaystyle{ S }[/math] is a spanning tree of [math]\displaystyle{ G }[/math]. Then there is a vertex [math]\displaystyle{ j_1\neq i }[/math] of degree 1 in [math]\displaystyle{ S }[/math]; let [math]\displaystyle{ e_1 }[/math] be the edge in [math]\displaystyle{ S }[/math] incident to [math]\displaystyle{ j_1 }[/math]. Deleting vertex [math]\displaystyle{ j_1 }[/math] and edge [math]\displaystyle{ e_1 }[/math] from [math]\displaystyle{ S }[/math], we obtain a tree of [math]\displaystyle{ n-2 }[/math] edges. Again there is a vertex [math]\displaystyle{ j_2\neq i }[/math] of degree 1 with incident edge [math]\displaystyle{ e_2 }[/math]. Continue this we enumerate all vertices except [math]\displaystyle{ i }[/math] as [math]\displaystyle{ j_1,j_2,\ldots,j_{n-1} }[/math] and all edges in [math]\displaystyle{ S }[/math] as [math]\displaystyle{ e_1,e_2,\ldots,e_{n-1} }[/math]. Now permute the rows and columns of [math]\displaystyle{ C_{[n-1],S} }[/math] such that the [math]\displaystyle{ k }[/math]-th row corresponds to vertex [math]\displaystyle{ j_k }[/math] and the [math]\displaystyle{ k }[/math]-th column corresponds to [math]\displaystyle{ e_k }[/math]. By our construction the permuted [math]\displaystyle{ C_{[n-1],S} }[/math] is lower triangle with [math]\displaystyle{ \pm1 }[/math] diagonal entries, since when each [math]\displaystyle{ j_k }[/math] is removed it is of degree 1 in the remaining tree and is incident to edge [math]\displaystyle{ e_k }[/math]. Therefore, [math]\displaystyle{ \det(C_{[n-1],S})=\pm1 }[/math].
- [math]\displaystyle{ \square }[/math]