组合数学 (Spring 2013)/Cayley's formula: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>Etone
imported>Etone
 
(8 intermediate revisions by the same user not shown)
Line 155: Line 155:


=== The Matrix-tree Theorem ===
=== The Matrix-tree Theorem ===
The matrix-tree theorem of Kirchhoff states a striking fact: The number of spanning trees in any connected graph can be computed as the determinant of some appropriate graph matrix.
{{Theorem|Kirchhoff's Matrix-Tree Theorem|
{{Theorem|Kirchhoff's Matrix-Tree Theorem|
:For any connected graph <math>G</math> of <math>n</math> vertices, the number of spanning trees in <math>G</math> is <math>\det(L_{i,i})\,</math> for all <math>i\in[n]</math>, where <math>L_{i,i}\,</math> is the <math>(n-1)\times(n-1)</math> matrix resulting from the Laplacian matrix <math>L</math> of graph <math>G</math> by deleting the <math>i</math>-th row and the <math>i</math>-th column.
:For any connected graph <math>G</math> of <math>n</math> vertices, the number of spanning trees in <math>G</math> is <math>\det(L_{i,i})\,</math> for all <math>i\in[n]</math>, where <math>L_{i,i}\,</math> is the <math>(n-1)\times(n-1)</math> matrix resulting from the Laplacian matrix <math>L</math> of graph <math>G</math> by deleting the <math>i</math>-th row and the <math>i</math>-th column.
}}
}}
The determinant can be computed as fast as matrix multiplication, thus is quite efficient, especially when compared to our task: counting the number of subgraphs satisfying certain nontrivial global constraint (e.g. spanning tree). Such efficient algorithm is rarely seen for counting problems, which are usually #P-hard to compute (e.g. number of matchings in a graph).


=== Cauchy-Binet Theorem ===
The key to prove the matrix-tree theorem is the Cauchy-Binet theorem in linear algebra, whose proof is beyond the scope of this class.
{{Theorem|Cauchy-Binet Theorem|
{{Theorem|Cauchy-Binet Theorem|
:Let <math>A</math> and <math>B</math> be, respectively, <math>n\times m</math> and <math>m\times n</math> matrix. For any <math>S\subseteq [m]</math> of size <math>n</math>, let <math>A_{[n], S}</math> and <math>B_{S,[n]}</math> denote, respectively, the <math>n\times n</math> submatrices of <math>A</math> and <math>B</math>, consisting of the columns of <math>A</math>, or the rows of <math>B</math>, indexed by elements of <math>S</math>. Then
:Let <math>A</math> and <math>B</math> be, respectively, <math>n\times m</math> and <math>m\times n</math> matrix. For any <math>S\subseteq [m]</math> of size <math>n</math>, let <math>A_{[n], S}</math> and <math>B_{S,[n]}</math> denote, respectively, the <math>n\times n</math> submatrices of <math>A</math> and <math>B</math>, consisting of the columns of <math>A</math>, or the rows of <math>B</math>, indexed by elements of <math>S</math>. Then
Line 165: Line 167:
}}
}}


=== Proof of matrix-tree theorem===
Let <math>B</math> be the incidence matrix of graph <math>G</math>. Fix any vertex <math>i</math>, let <math>C</math> be the <math>(n-1)\times m</math> matrix resulting from <math>B</math> by deleting the <math>i</math>-th row.  
Let <math>B</math> be the incidence matrix of graph <math>G</math>. Fix any vertex <math>i</math>, let <math>C</math> be the <math>(n-1)\times m</math> matrix resulting from <math>B</math> by deleting the <math>i</math>-th row.  


Line 176: Line 177:
:<math>
:<math>
\begin{align}
\begin{align}
\det(L)=\det(CC^T)
\det(L_{i,i})=\det(CC^T)
&=\sum_{S\in{[m]\choose n-1}}\det(C_{[n-1],S})\det(C^T_{S,[n-1]})\\
&=\sum_{S\in{[m]\choose n-1}}\det(C_{[n-1],S})\det(C^T_{S,[n-1]})\\
&=\sum_{S\in{[m]\choose n-1}}\det(C_{[n-1],S})^2.
&=\sum_{S\in{[m]\choose n-1}}\det(C_{[n-1],S})^2.
Line 191: Line 192:
We then show that <math>\det(C_{[n-1],S})</math> is nonzero if and only if <math>S</math> is a spanning tree.  
We then show that <math>\det(C_{[n-1],S})</math> is nonzero if and only if <math>S</math> is a spanning tree.  


Suppose the <math>n-1</math> edges corresponding to <math>S</math> is not a spanning tree of <math>G</math>. Then <math>S</math> must have more than one components, and there must be a component <math>R</math> not containing vertex <math>i</math>.  
Suppose the <math>n-1</math> edges corresponding to <math>S</math> is not a spanning tree of <math>G</math>. Then <math>S</math> must have more than one components, and there must be a component <math>R</math> not containing vertex <math>i</math>. The rows of <math>C_{[n-1],S}</math> corresponding to <math>R</math> add to 0, thus these rows are linearly dependent, and hence <math>\det(C_{[n-1],S})=0</math>.
 
Suppose the <math>n-1</math> edges corresponding to <math>S</math> is a spanning tree of <math>G</math>. Then there is a vertex <math>j_1\neq i</math> of degree 1 in <math>S</math>; let <math>e_1</math> be the edge in <math>S</math> incident to <math>j_1</math>. Deleting vertex <math>j_1</math> and edge <math>e_1</math> from <math>S</math>, we obtain a tree of <math>n-2</math> edges. Again there is a vertex <math>j_2\neq i</math> of degree 1 with incident edge <math>e_2</math>. Continue this we enumerate all vertices except <math>i</math> as <math>j_1,j_2,\ldots,j_{n-1}</math> and all edges in <math>S</math> as <math>e_1,e_2,\ldots,e_{n-1}</math>. Now permute the rows and columns of <math>C_{[n-1],S}</math> such that the <math>k</math>-th row corresponds to vertex <math>j_k</math> and the <math>k</math>-th column corresponds to <math>e_k</math>. By our construction the permuted <math>C_{[n-1],S}</math> is lower triangle with <math>\pm1</math> diagonal entries, since when each <math>j_k</math> is removed it is of degree 1 in the remaining tree and is incident to edge <math>e_k</math>. Therefore, <math>\det(C_{[n-1],S})=\pm1</math>.
}}
}}
The matrix-tree theorem follows as consequence. Recall we show that
:<math>
\begin{align}
\det(L_{i,i})
&=\sum_{S\in{[m]\choose n-1}}\det(C_{[n-1],S})^2.
\end{align}</math>
The sum enumerates over all subgraphs <math>S</math> of <math>n-1</math> edges, and by the above lemma <math>\det(C_{[n-1],S})^2=1</math> if and only if <math>S</math> is a spanning tree in <math>G</math>. Therefore, <math>\det(L_{i,i})</math> gives the number of spanning trees in <math>G</math>.
=== Cayley's formula by the matrix-tree theorem ===
The number of trees of <math>n</math> distinct vertices equals the number of spanning trees in the complete graph <math>K_n</math>. For <math>G=K_n</math>, for any <math>i\in[n]</math> the <math>(n-1)\times (n-1)</math> matrix <math>L_{ii}</math> is given by
:<math>
L_{i,i}
=
\begin{bmatrix}
n-1 & -1 & \cdots & -1\\
-1 & n-1 & \cdots & -1\\
\vdots & \vdots & \ddots & -1\\
-1 & -1 & \cdots & n-1
\end{bmatrix},
</math>
whose determinant is <math>\det(L_{i,i})=n^{n-2}</math>.

Latest revision as of 12:43, 18 April 2013

Cayley's Formula

We now present a theorem of the number of labeled trees on a fixed number of vertices. It is due to Cayley in 1889. The theorem is often referred by the name Cayley's formula.

Cayley's formula for trees
There are [math]\displaystyle{ n^{n-2} }[/math] different trees on [math]\displaystyle{ n }[/math] distinct vertices.

The theorem has several proofs, including the bijection which encodes a tree by a Prüfer code, through the Kirchhoff's matrix tree theorem, and by double counting.

Proof of Cayley's formula by double counting

We now present a double counting proof, which is considered by the Proofs from THE BOOK "the most beautiful of them all".

Proof of Cayley's formula by double counting

(Due to Pitman 1999)

Let [math]\displaystyle{ T_n }[/math] be the number of different trees defined on [math]\displaystyle{ n }[/math] distinct vertices.

A rooted tree is a tree with a special vertex. That is, one of the [math]\displaystyle{ n }[/math] vertices is marked as the "root" of the tree. A rooted tree defines a natural direction of all edges, such that an edge [math]\displaystyle{ uv }[/math] of the tree is directed from [math]\displaystyle{ u }[/math] to [math]\displaystyle{ v }[/math] if [math]\displaystyle{ u }[/math] is before [math]\displaystyle{ v }[/math] along the unique path from the root.

We count the number of different sequences of directed edges that can be added to an empty graph on [math]\displaystyle{ n }[/math] vertices to form from it a rooted tree. We note that such a sequence can be formed in two ways:

  1. Starting with an unrooted tree, choose one of its vertices as root, and fix an total order of edges to specify the order in which the edges are added.
  2. Starting from an empty graph, add the edges one by one in steps.

In the first method, we pick one of the [math]\displaystyle{ T_n }[/math] unrooted trees, choose one of the [math]\displaystyle{ n }[/math] vertices as the root, and pick one of the [math]\displaystyle{ (n-1)! }[/math] total orders of the [math]\displaystyle{ n-1 }[/math] edges. This gives us [math]\displaystyle{ T_nn(n-1)!=T_nn! }[/math] ways.

In the second method, we consider the number of choices in one step, and multiply the numbers of choices in all steps. This is done as follows.

Given a sequence of adding [math]\displaystyle{ n-1 }[/math] edges to an empty graph to form a rooted tree, we reverse this sequence and get a sequence of removing edges one by one from the final rooted tree until no edge left. We observe that:

  • At first, we remove an edge from the rooted tree. Suppose that the root of the tree is [math]\displaystyle{ r }[/math], and the removed directed edge is [math]\displaystyle{ (u,v) }[/math]. After removing [math]\displaystyle{ (u,v) }[/math], the original rooted tree is disconnected into two rooted trees, one rooted at [math]\displaystyle{ r }[/math] and the other rooted at [math]\displaystyle{ v }[/math].
  • After removing [math]\displaystyle{ k-1 }[/math] edges, there are [math]\displaystyle{ k }[/math] rooted trees. In the [math]\displaystyle{ k }[/math]th step, a directed edge [math]\displaystyle{ (u,v) }[/math] in the current forest is removed and the tree containing [math]\displaystyle{ (u,v) }[/math] is disconnected into two trees, one rooted at the old root of that tree, and the other rooted at [math]\displaystyle{ v }[/math].

We now again reverse the above procedure, and consider the sequence of adding directed edges to an empty graph to form a rooted tree.

  • At first, we have [math]\displaystyle{ n }[/math] rooted trees, each of 0 edge ([math]\displaystyle{ n }[/math] isolated vertices).
  • After adding [math]\displaystyle{ n-k }[/math] edges, there are [math]\displaystyle{ k }[/math] rooted trees. Denoting the directed edge added next as [math]\displaystyle{ (u,v) }[/math]. As observed above, [math]\displaystyle{ u }[/math] can be any one of the [math]\displaystyle{ n }[/math] vertices; but [math]\displaystyle{ v }[/math] must be the root of one of the [math]\displaystyle{ k }[/math] trees, except the tree which contains [math]\displaystyle{ u }[/math]. There are [math]\displaystyle{ n(k-1) }[/math] choices of such [math]\displaystyle{ (u,v) }[/math].

Multiplying the numbers of choices in all steps, the number of sequences of adding directed edges to an empty graph to form a rooted tree is given by

[math]\displaystyle{ \prod_{k=2}^nn(k-1)=n^{n-2}n! }[/math].

By the principle of double counting, counting the same thing by different methods yield the same result.

[math]\displaystyle{ T_nn!=n^{n-2}n! }[/math],

which gives that [math]\displaystyle{ T_n=n^{n-2} }[/math].

[math]\displaystyle{ \square }[/math]

Prüfer code

The Prüfer code encodes a labeled tree to a sequence of labels. This gives a bijections between trees and tuples.

Encoding

In a tree, the vertices of degree 1 are called leaves. It is easy to see that:

  • each tree has at least two leaves; and
  • after removing a leaf (along with the edge adjacent to it) from a tree, the resulting graph is still a tree.

The following algorithm transforms a tree [math]\displaystyle{ T }[/math] of [math]\displaystyle{ n }[/math] vertices [math]\displaystyle{ 1,2,\ldots,n }[/math], to a tuple [math]\displaystyle{ (v_1,v_2,\ldots,v_{n-2})\in\{1,2,\ldots,n\}^{n-2} }[/math].

Prüfer code (encoder)
Input: A tree [math]\displaystyle{ T }[/math] of [math]\displaystyle{ n }[/math] distinct vertices, labeled by [math]\displaystyle{ 1,2,\ldots,n }[/math].
let [math]\displaystyle{ T_1=T }[/math];
for [math]\displaystyle{ i=1 }[/math] to [math]\displaystyle{ n-1 }[/math], do
let [math]\displaystyle{ u_i }[/math] be the leaf in [math]\displaystyle{ T_i }[/math] with the smallest label, and [math]\displaystyle{ v_i }[/math] be its neighbor;
let [math]\displaystyle{ T_{i+1} }[/math] be the new tree obtained from deleting the leaf [math]\displaystyle{ u_i }[/math] from [math]\displaystyle{ T_i }[/math];
end
return [math]\displaystyle{ (v_1,v_2,\ldots,v_{n-2}) }[/math];

Decoding

It is trivial to observe the following lemma:

Lemma 1
For each [math]\displaystyle{ 1\le i\le n-1 }[/math], [math]\displaystyle{ T_i }[/math] is a tree of [math]\displaystyle{ n-i+1 }[/math] vertices. In particular, the vertices of [math]\displaystyle{ T_i }[/math] are [math]\displaystyle{ u_i,u_{i+1},\ldots,u_{n-1},v_{n-1} }[/math], and the edges of [math]\displaystyle{ T_i }[/math] are precisely [math]\displaystyle{ \{u_j,v_j\} }[/math], [math]\displaystyle{ i\le j\le n-1 }[/math].

And there is a reason that we do not need to store [math]\displaystyle{ v_{n-1} }[/math] in the Prüfer code.

Lemma 2
It always holds that [math]\displaystyle{ v_{n-1}=n }[/math].
Proof.

Every tree (of at least two vertices) has at least two leaves. The [math]\displaystyle{ u_i }[/math], [math]\displaystyle{ 1\le i\le n-1 }[/math], are the leaf of the smallest label in [math]\displaystyle{ T_i }[/math], which can never be [math]\displaystyle{ n }[/math], thus [math]\displaystyle{ n }[/math] is never deleted.

[math]\displaystyle{ \square }[/math]

Lemma 1 and 2 together imply that given a Prüfer code [math]\displaystyle{ (v_1,v_2,\ldots,v_{n-2}) }[/math], the only remaining task to reconstruct the tree [math]\displaystyle{ T }[/math] is to figure out those [math]\displaystyle{ u_i }[/math], [math]\displaystyle{ 1\le i\le n-1 }[/math]. The following lemma state how to obtain [math]\displaystyle{ u_i }[/math], [math]\displaystyle{ 1\le i\le n-1 }[/math], from a Prüfer code [math]\displaystyle{ (v_1,v_2,\ldots,v_{n-2}) }[/math].

Lemma 3
For [math]\displaystyle{ i=1,2,\ldots,n-1 }[/math], [math]\displaystyle{ u_i }[/math] is the smallest element of [math]\displaystyle{ \{1,2,\ldots,n\} }[/math] not in [math]\displaystyle{ \{u_1,\ldots,u_{i-1}\}\cup\{v_i,\ldots,v_{n-1}\} }[/math].
Proof.

Note that [math]\displaystyle{ u_1,u_2,\ldots,u_{n-1},v_{n-1} }[/math] is a sequence of distinct vertices, because [math]\displaystyle{ u_1,u_2,\ldots,u_{n-1} }[/math] are deleted one by one from the tree, and [math]\displaystyle{ v_{n-1}=n }[/math] is never deleted. Thus, each vertex [math]\displaystyle{ v }[/math] appears among [math]\displaystyle{ u_1,u_2,\ldots,u_{n-1},v_{n-1} }[/math] exactly once. And each vertex [math]\displaystyle{ v }[/math] appears for [math]\displaystyle{ deg(v) }[/math] times among the edges [math]\displaystyle{ \{u_i,v_i\} }[/math], [math]\displaystyle{ 1\le i\le n-1 }[/math], where [math]\displaystyle{ deg(v) }[/math] denotes the degree of vertex [math]\displaystyle{ v }[/math] in the original tree [math]\displaystyle{ T }[/math]. Therefore, each vertex [math]\displaystyle{ v }[/math] appears among [math]\displaystyle{ v_1,v_2,\ldots,v_{n-2} }[/math] for [math]\displaystyle{ deg(v)-1 }[/math] times.

Similarly, each vertex [math]\displaystyle{ v }[/math] of [math]\displaystyle{ T_i }[/math] appears among [math]\displaystyle{ v_i,v_{i+1},\ldots,v_{n-2} }[/math] for [math]\displaystyle{ deg_i(v)-1 }[/math] times, where [math]\displaystyle{ deg_i(v) }[/math] is the degree of vertex [math]\displaystyle{ v }[/math] in the tree [math]\displaystyle{ T_i }[/math]. In particular, the leaves of [math]\displaystyle{ T_i }[/math] are not among [math]\displaystyle{ \{v_i,v_{i+1},\ldots,v_{n-2}\} }[/math]. Recall that the vertices of [math]\displaystyle{ T_i }[/math] are [math]\displaystyle{ u_i,u_{i+1},\ldots,u_{n-1},v_{n-1} }[/math]. Then the leaves of [math]\displaystyle{ T_i }[/math] are the elements of [math]\displaystyle{ \{1,2,\ldots,n\} }[/math] not in [math]\displaystyle{ \{u_1,\ldots,u_{i-1}\}\cup\{v_i,\ldots,v_{n-1}\} }[/math]. By definition of Prüfer code, [math]\displaystyle{ u_i }[/math] is the leaf in [math]\displaystyle{ T_i }[/math] of smallest label, hence the smallest element of [math]\displaystyle{ \{1,2,\ldots,n\} }[/math] not in [math]\displaystyle{ \{u_1,\ldots,u_{i-1}\}\cup\{v_i,\ldots,v_{n-1}\} }[/math].

[math]\displaystyle{ \square }[/math]

Applying Lemma 3, we have the following decoder for the Prüfer code:

Prüfer code (decoder)
Input: A tuple [math]\displaystyle{ (v_1,v_2,\ldots,v_{n-2})\in\{1,2,\ldots,n\}^{n-2} }[/math].
let [math]\displaystyle{ T }[/math] be empty graph, and [math]\displaystyle{ v_{n-1}=n }[/math];
for [math]\displaystyle{ i=1 }[/math] to [math]\displaystyle{ n-1 }[/math], do
let [math]\displaystyle{ u_i }[/math] be the smallest label not in [math]\displaystyle{ \{u_1,\ldots,u_{i-1}\}\cup\{v_i,\ldots,v_{n-1}\} }[/math];
add an edge [math]\displaystyle{ \{u_i,v_i\} }[/math] to [math]\displaystyle{ T }[/math];
end
return [math]\displaystyle{ T }[/math];

In other words, the encoding of trees to tuples by the Prüfer code is reversible, thus the mapping is injective (1-1). To see it is also surjective, we need to show that for every possible [math]\displaystyle{ (v_1,v_2,\ldots,v_{n-2})\in\{1,2,\ldots,n\}^{n-2} }[/math], the above decoder recovers a tree from it.

It is easy to see that the decoder always returns a graph of [math]\displaystyle{ n-1 }[/math] edges on the [math]\displaystyle{ n }[/math] vertices. The only thing remaining to verify is that the returned graph has no cycle in it, which can be easily proved by a timeline argument (left as an exercise).

Bijection proof of Cayley's formula

Therefore, the Prüfer code establishes a bijection between the set of trees on [math]\displaystyle{ n }[/math] distinct vertices and the tuples from [math]\displaystyle{ \{1,2,\ldots,n\}^{n-2} }[/math]. This proves Cayley's formula.

Kirchhoff's Matrix-Tree Theorem

Given an undirected graph [math]\displaystyle{ G([n],E) }[/math], the adjacency matrix [math]\displaystyle{ A }[/math] of graph [math]\displaystyle{ G }[/math] is an [math]\displaystyle{ n\times n }[/math] matrix such that

[math]\displaystyle{ A(i,j)=\begin{cases} 1 & \{i,j\}\in E,\\ 0 & \{i,j\}\not\in E. \end{cases} }[/math]

Graph Laplacian

Let [math]\displaystyle{ D }[/math] be a [math]\displaystyle{ n\times n }[/math] diagonal matrix such that

[math]\displaystyle{ D(i,j)=\begin{cases} \text{deg}(i) & i=j,\\ 0 & i\neq j, \end{cases} }[/math]

where [math]\displaystyle{ \text{deg}(i) }[/math] denotes the degree of vertex [math]\displaystyle{ i }[/math].

The Laplacian matrix [math]\displaystyle{ L }[/math] of graph [math]\displaystyle{ G }[/math] is defined as [math]\displaystyle{ L=D-A }[/math], that is,

[math]\displaystyle{ L(i,j)=\begin{cases} \text{deg}(i) & i=j,\\ -1 & i\neq j\text{ and } \{i,j\}\in E,\\ 0 & \text{otherwise}. \end{cases} }[/math]

Suppose [math]\displaystyle{ G([n],E) }[/math] has [math]\displaystyle{ m }[/math] edges. The incidence matrix [math]\displaystyle{ B }[/math] of graph [math]\displaystyle{ G }[/math] is an [math]\displaystyle{ n\times m }[/math] matrix such that

[math]\displaystyle{ \forall i\in[n], \forall e\in E,\quad B(i,e)=\begin{cases} 1 & e=\{i,j\}\text{ and } i\lt j,\\ -1 & e=\{i,j\}\text{ and } i\gt j,\\ 0 & \text{otherwise}. \end{cases} }[/math]

The following proposition is easy to verify.

Proposition
[math]\displaystyle{ L=BB^T }[/math].
Proof.

For any [math]\displaystyle{ i,j\in[n] }[/math], we have

[math]\displaystyle{ (BB^T)(i,j)=\sum_{e\in E}B(i,e)B^T(e,j)=\sum_{e\in E}B(i,e)B(j,e) }[/math].

It is easy to verify that

[math]\displaystyle{ \sum_{e\in E}B(i,e)B(j,e)= \begin{cases} \text{deg}(i) & i=j,\\ -1 & i\neq j\text{ and } \{i,j\}\in E,\\ 0 & \text{otherwise}, \end{cases} }[/math]

which is equal to the definition of [math]\displaystyle{ L }[/math].

[math]\displaystyle{ \square }[/math]

The Matrix-tree Theorem

The matrix-tree theorem of Kirchhoff states a striking fact: The number of spanning trees in any connected graph can be computed as the determinant of some appropriate graph matrix.

Kirchhoff's Matrix-Tree Theorem
For any connected graph [math]\displaystyle{ G }[/math] of [math]\displaystyle{ n }[/math] vertices, the number of spanning trees in [math]\displaystyle{ G }[/math] is [math]\displaystyle{ \det(L_{i,i})\, }[/math] for all [math]\displaystyle{ i\in[n] }[/math], where [math]\displaystyle{ L_{i,i}\, }[/math] is the [math]\displaystyle{ (n-1)\times(n-1) }[/math] matrix resulting from the Laplacian matrix [math]\displaystyle{ L }[/math] of graph [math]\displaystyle{ G }[/math] by deleting the [math]\displaystyle{ i }[/math]-th row and the [math]\displaystyle{ i }[/math]-th column.

The determinant can be computed as fast as matrix multiplication, thus is quite efficient, especially when compared to our task: counting the number of subgraphs satisfying certain nontrivial global constraint (e.g. spanning tree). Such efficient algorithm is rarely seen for counting problems, which are usually #P-hard to compute (e.g. number of matchings in a graph).

The key to prove the matrix-tree theorem is the Cauchy-Binet theorem in linear algebra, whose proof is beyond the scope of this class.

Cauchy-Binet Theorem
Let [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math] be, respectively, [math]\displaystyle{ n\times m }[/math] and [math]\displaystyle{ m\times n }[/math] matrix. For any [math]\displaystyle{ S\subseteq [m] }[/math] of size [math]\displaystyle{ n }[/math], let [math]\displaystyle{ A_{[n], S} }[/math] and [math]\displaystyle{ B_{S,[n]} }[/math] denote, respectively, the [math]\displaystyle{ n\times n }[/math] submatrices of [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math], consisting of the columns of [math]\displaystyle{ A }[/math], or the rows of [math]\displaystyle{ B }[/math], indexed by elements of [math]\displaystyle{ S }[/math]. Then
[math]\displaystyle{ \det(AB)=\sum_{S\in{[m]\choose n}}\det(A_{[n],S})\det(B_{S,[n]}). }[/math]

Let [math]\displaystyle{ B }[/math] be the incidence matrix of graph [math]\displaystyle{ G }[/math]. Fix any vertex [math]\displaystyle{ i }[/math], let [math]\displaystyle{ C }[/math] be the [math]\displaystyle{ (n-1)\times m }[/math] matrix resulting from [math]\displaystyle{ B }[/math] by deleting the [math]\displaystyle{ i }[/math]-th row.

Recall that [math]\displaystyle{ L=BB^T }[/math]. It is easy to verify that

[math]\displaystyle{ L_{i,i}=CC^T. }[/math]

Due to the Cauchy-Binet Theorem, we have

[math]\displaystyle{ \begin{align} \det(L_{i,i})=\det(CC^T) &=\sum_{S\in{[m]\choose n-1}}\det(C_{[n-1],S})\det(C^T_{S,[n-1]})\\ &=\sum_{S\in{[m]\choose n-1}}\det(C_{[n-1],S})^2. \end{align} }[/math]

The next lemma gives a key observation to prove the matrix-tree theorem.

Lemma
For any [math]\displaystyle{ S\subseteq [m] }[/math] of size [math]\displaystyle{ n-1 }[/math], the value of [math]\displaystyle{ \det(C_{[n-1],S}) }[/math] is either 1, or -1, or 0. Moreover, [math]\displaystyle{ \det(C_{[n-1],S})=\pm1 }[/math] if and only if [math]\displaystyle{ S }[/math] indicates a spanning tree in [math]\displaystyle{ G }[/math].
Proof.

We first show [math]\displaystyle{ \det(C_{[n-1],S})\in\{0,1,-1\} }[/math] by induction on [math]\displaystyle{ n }[/math]. Note that [math]\displaystyle{ C_{[n-1],S} }[/math] is an [math]\displaystyle{ (n-1)\times (n-1) }[/math] matrix such that each column contains at most one 1 and at most one -1, and all other entries are 0. For such matrix, when [math]\displaystyle{ n-1=1 }[/math], the induction hypothesis [math]\displaystyle{ \det(C_{[n-1],S})\in\{0,1,-1\} }[/math] is trivially true. And for general [math]\displaystyle{ n }[/math], if every column has a 1 and a -1, then the sum of all rows is the zero vector, so the matrix is singular. Otherwise, expand the determinant by a column with one nonzero entry to find it is [math]\displaystyle{ \pm1 }[/math] times the determinant of a smaller matrix of the same property, which by induction hypothesis has value 0 or [math]\displaystyle{ \pm1 }[/math].

We then show that [math]\displaystyle{ \det(C_{[n-1],S}) }[/math] is nonzero if and only if [math]\displaystyle{ S }[/math] is a spanning tree.

Suppose the [math]\displaystyle{ n-1 }[/math] edges corresponding to [math]\displaystyle{ S }[/math] is not a spanning tree of [math]\displaystyle{ G }[/math]. Then [math]\displaystyle{ S }[/math] must have more than one components, and there must be a component [math]\displaystyle{ R }[/math] not containing vertex [math]\displaystyle{ i }[/math]. The rows of [math]\displaystyle{ C_{[n-1],S} }[/math] corresponding to [math]\displaystyle{ R }[/math] add to 0, thus these rows are linearly dependent, and hence [math]\displaystyle{ \det(C_{[n-1],S})=0 }[/math].

Suppose the [math]\displaystyle{ n-1 }[/math] edges corresponding to [math]\displaystyle{ S }[/math] is a spanning tree of [math]\displaystyle{ G }[/math]. Then there is a vertex [math]\displaystyle{ j_1\neq i }[/math] of degree 1 in [math]\displaystyle{ S }[/math]; let [math]\displaystyle{ e_1 }[/math] be the edge in [math]\displaystyle{ S }[/math] incident to [math]\displaystyle{ j_1 }[/math]. Deleting vertex [math]\displaystyle{ j_1 }[/math] and edge [math]\displaystyle{ e_1 }[/math] from [math]\displaystyle{ S }[/math], we obtain a tree of [math]\displaystyle{ n-2 }[/math] edges. Again there is a vertex [math]\displaystyle{ j_2\neq i }[/math] of degree 1 with incident edge [math]\displaystyle{ e_2 }[/math]. Continue this we enumerate all vertices except [math]\displaystyle{ i }[/math] as [math]\displaystyle{ j_1,j_2,\ldots,j_{n-1} }[/math] and all edges in [math]\displaystyle{ S }[/math] as [math]\displaystyle{ e_1,e_2,\ldots,e_{n-1} }[/math]. Now permute the rows and columns of [math]\displaystyle{ C_{[n-1],S} }[/math] such that the [math]\displaystyle{ k }[/math]-th row corresponds to vertex [math]\displaystyle{ j_k }[/math] and the [math]\displaystyle{ k }[/math]-th column corresponds to [math]\displaystyle{ e_k }[/math]. By our construction the permuted [math]\displaystyle{ C_{[n-1],S} }[/math] is lower triangle with [math]\displaystyle{ \pm1 }[/math] diagonal entries, since when each [math]\displaystyle{ j_k }[/math] is removed it is of degree 1 in the remaining tree and is incident to edge [math]\displaystyle{ e_k }[/math]. Therefore, [math]\displaystyle{ \det(C_{[n-1],S})=\pm1 }[/math].

[math]\displaystyle{ \square }[/math]

The matrix-tree theorem follows as consequence. Recall we show that

[math]\displaystyle{ \begin{align} \det(L_{i,i}) &=\sum_{S\in{[m]\choose n-1}}\det(C_{[n-1],S})^2. \end{align} }[/math]

The sum enumerates over all subgraphs [math]\displaystyle{ S }[/math] of [math]\displaystyle{ n-1 }[/math] edges, and by the above lemma [math]\displaystyle{ \det(C_{[n-1],S})^2=1 }[/math] if and only if [math]\displaystyle{ S }[/math] is a spanning tree in [math]\displaystyle{ G }[/math]. Therefore, [math]\displaystyle{ \det(L_{i,i}) }[/math] gives the number of spanning trees in [math]\displaystyle{ G }[/math].

Cayley's formula by the matrix-tree theorem

The number of trees of [math]\displaystyle{ n }[/math] distinct vertices equals the number of spanning trees in the complete graph [math]\displaystyle{ K_n }[/math]. For [math]\displaystyle{ G=K_n }[/math], for any [math]\displaystyle{ i\in[n] }[/math] the [math]\displaystyle{ (n-1)\times (n-1) }[/math] matrix [math]\displaystyle{ L_{ii} }[/math] is given by

[math]\displaystyle{ L_{i,i} = \begin{bmatrix} n-1 & -1 & \cdots & -1\\ -1 & n-1 & \cdots & -1\\ \vdots & \vdots & \ddots & -1\\ -1 & -1 & \cdots & n-1 \end{bmatrix}, }[/math]

whose determinant is [math]\displaystyle{ \det(L_{i,i})=n^{n-2} }[/math].