随机算法 (Spring 2013)/Conditional Probability and 组合数学 (Spring 2013)/Generating functions: Difference between pages

From TCS Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Etone
 
imported>Etone
 
Line 1: Line 1:
<font color=red size=3> This part of the lecture note is still under editting. This notice will be removed once the lecture note has been fully updated.</font>
== Generating Functions ==
In Stanley's magnificent book ''Enumerative Combinatorics'', he comments the generating function as "the most useful but most difficult to understand method (for counting)".


= Conditional Probability =
The solution to a counting problem is usually represented as some <math>a_n</math> depending a parameter <math>n</math>. Sometimes this <math>a_n</math> is called a ''counting function'' as it is a function of the parameter <math>n</math>. <math>a_n</math> can also be treated as a infinite series:
In probability theory, the word "condition" is a verb. "Conditioning on the event ..." means that it is assumed that the event occurs.  
:<math>a_0,a_1,a_2,\ldots</math>


{{Theorem
The '''ordinary generating function (OGF)''' defined by <math>a_n</math> is
|Definition (conditional probability)|
:<math>
:The '''conditional probability''' that event <math>\mathcal{E}_1</math> occurs given that event <math>\mathcal{E}_2</math> occurs is
G(x)=\sum_{n\ge 0} a_nx^n.
::<math>
\Pr[\mathcal{E}_1\mid \mathcal{E}_2]=\frac{\Pr[\mathcal{E}_1\wedge \mathcal{E}_2]}{\Pr[\mathcal{E}_2]}.
</math>
</math>
}}


The conditional probability is well-defined only if <math>\Pr[\mathcal{E}_2]\neq0</math>.
So <math>G(x)=a_0+a_1x+a_2x^2+\cdots</math>. An expression in this form is called a [http://en.wikipedia.org/wiki/Formal_power_series '''formal power series'''], and <math>a_0,a_1,a_2,\ldots</math> is the sequence of '''coefficients'''.
 
Furthermore, the generating function can be expanded as
:G(x)=<math>(\underbrace{1+\cdots+1}_{a_0})+(\underbrace{x+\cdots+x}_{a_1})+(\underbrace{x^2+\cdots+x^2}_{a_2})+\cdots+(\underbrace{x^n+\cdots+x^n}_{a_n})+\cdots</math>
so it indeed "generates" all the possible instances of the objects we want to count.
 
Usually, we do not evaluate the generating function <math>GF(x)</math> on any particular value. <math>x</math> remains as a '''formal variable''' without assuming any value. The numbers that we want to count are the coefficients carried by the terms in the formal power series. So far the generating function is just another way to represent the sequence
:<math>(a_0,a_1,a_2,\ldots\ldots)</math>.
 
The true power of generating functions comes from the various algebraic operations that we can perform on these generating functions. We use an example to demonstrate this.
 
=== Combinations ===
Suppose we wish to enumerate all subsets of an <math>n</math>-set. To construct a subset, we specifies for every element of the <math>n</math>-set whether the element is chosen or not. Let us denote the choice to omit an element by <math>x_0</math>, and the choice to include it by <math>x_1</math>. Using "<math>+</math>" to represent "OR", and using the multiplication to denote "AND", the choices of subsets of the <math>n</math>-set are expressed as
:<math>\underbrace{(x_0+x_1)(x_0+x_1)\cdots (x_0+x_1)}_{n\mbox{ elements}}=(x_0+x_1)^n</math>.
 
For example, when <math>n=3</math>, we have
:<math>\begin{align}
(x_0+x_1)^3
&=x_0x_0x_0+x_0x_0x_1+x_0x_1x_0+x_0x_1x_1\\
&\quad +x_1x_0x_0+x_1x_0x_1+x_1x_1x_0+x_1x_1x_1
\end{align}</math>.
 
So it "generate" all subsets of the 3-set. Writing <math>1</math> for <math>x_0</math> and <math>x</math> for <math>x_1</math>, we have <math>(1+x)^3=1+3x+3x^2+x^3</math>. The coefficient of <math>x^k</math> is the number of <math>k</math>-subsets of a 3-element set.
 
In general, <math>(1+x)^n</math> has the coefficients which are the number of subsets of fixed sizes of an <math>n</math>-element set.
 
-----
 
Suppose that we have twelve balls: <font color="red">3 red</font>, <font color="blue">4 blue</font>, and <font color="green">5 green</font>. Balls with the same color are indistinguishable.
 
We want to determine the number of ways to select <math>k</math> balls from these twelve balls, for some <math>0\le k\le 12</math>.


For independent events <math>\mathcal{E}_1</math> and <math>\mathcal{E}_2</math>, it holds that
The generating function of this sequence is
:<math>
:<math>\begin{align}
\Pr[\mathcal{E}_1\mid \mathcal{E}_2]=\frac{\Pr[\mathcal{E}_1\wedge \mathcal{E}_2]}{\Pr[\mathcal{E}_2]}
&\quad {\color{Red}(1+x+x^2+x^3)}{\color{Blue}(1+x+x^2+x^3+x^4)}{\color{OliveGreen}(1+x+x^2+x^3+x^4+x^5)}\\
=\frac{\Pr[\mathcal{E}_1]\cdot\Pr[\mathcal{E}_2]}{\Pr[\mathcal{E}_2]}
&=1+3x+6x^2+10x^3+14x^4+17x^5+18x^6+17x^7+14x^8+10x^9+6x^{10}+3x^{11}+x^{12}.
=\Pr[\mathcal{E}_1].
\end{align}</math>
</math>
The coefficient of <math>x^k</math> gives the number of ways to select <math>k</math> balls.
It supports our intuition that for two independent events, whether one of them occurs will not affect the chance of the other.
 
=== Fibonacci numbers  ===
Consider the following counting problems.
* Count the number of ways that the nonnegative integer <math>n</math> can be written as a sum of ones and twos (in order).
: The problem asks for the number of compositions of <math>n</math> with summands from <math>\{1,2\}</math>. Formally, we are counting the number of tuples <math>(x_1,x_2,\ldots,x_k)</math> for some <math>k\le n</math> such that <math>x_i\in\{1,2\}</math> and <math>x_1+x_2+\cdots+x_k=n</math>.
: Let <math>F_n</math> be the solution. We observe that a composition either starts with a 1, in which case the rest is a composition of <math>n-1</math>; or starts with a 2, in which case the rest is a composition of <math>n-2</math>. So we have the recursion for <math>F_n</math> that
::<math>F_n=F_{n-1}+F_{n-2}</math>.
* Count the ways to completely cover a <math>2\times n</math> rectangle with <math>2\times 1</math> dominos without any overlaps.
: Dominos are identical <math>2\times 1</math> rectangles, so that only their orientations --- vertical or horizontal matter.
: Let <math>F_n</math> be the solution. It also holds that <math>F_n=F_{n-1}+F_{n-2}</math>. The proof is left as an exercise.


== Law of total probability ==
In both problems, the solution is given by <math>F_n</math> which satisfies the following recursion.
The following fact is known as the law of total probability. It computes the probability by averaging over all possible cases.
:<math>F_n=\begin{cases}
{{Theorem
F_{n-1}+F_{n-2} & \mbox{if }n\ge 2,\\
|Theorem (law of total probability)|
1 & \mbox{if }n=1\\
:Let <math>\mathcal{E}_1,\mathcal{E}_2,\ldots,\mathcal{E}_n</math> be mutually disjoint events, and <math>\bigvee_{i=1}^n\mathcal{E}_i=\Omega</math> is the sample space.  
0 & \mbox{if }n=0.
:Then for any event <math>\mathcal{E}</math>,
\end{cases}</math>
::<math>
\Pr[\mathcal{E}]=\sum_{i=1}^n\Pr[\mathcal{E}\mid\mathcal{E}_i]\cdot\Pr[\mathcal{E}_i].
</math>
}}
{{Proof| Since <math>\mathcal{E}_1,\mathcal{E}_2,\ldots,\mathcal{E}_n</math> are mutually disjoint and <math>\bigvee_{i=1}^n\mathcal{E}_i=\Omega</math>, events <math>\mathcal{E}\wedge\mathcal{E}_1,\mathcal{E}\wedge\mathcal{E}_2,\ldots,\mathcal{E}\wedge\mathcal{E}_n</math> are also mutually disjoint, and <math>\mathcal{E}=\bigvee_{i=1}^n\left(\mathcal{E}\wedge\mathcal{E}_i\right)</math>. Then
:<math>
\Pr[\mathcal{E}]=\sum_{i=1}^n\Pr[\mathcal{E}\wedge\mathcal{E}_i],
</math>
which according to the definition of conditional probability, is <math>\sum_{i=1}^n\Pr[\mathcal{E}\mid\mathcal{E}_i]\cdot\Pr[\mathcal{E}_i]</math>.
}}


The law of total probability provides us a standard tool for breaking a probability into sub-cases. Sometimes, it helps the analysis.
<math>F_n</math> is called the [http://en.wikipedia.org/wiki/Fibonacci_number Fibonacci number].


== A Chain of Conditioning ==
By the definition of conditional probability, <math>\Pr[A\mid B]=\frac{\Pr[A\wedge B]}{\Pr[B]}</math>. Thus, <math>\Pr[A\wedge B] =\Pr[B]\cdot\Pr[A\mid B]</math>. This hints us that we can compute the probability of the AND of events by conditional probabilities. Formally, we have the following theorem:
{{Theorem|Theorem|
{{Theorem|Theorem|
:Let <math>\mathcal{E}_1, \mathcal{E}_2, \ldots, \mathcal{E}_n</math>  be any <math>n</math> events. Then
::<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)</math>,
::<math>\begin{align}
:where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.
\Pr\left[\bigwedge_{i=1}^n\mathcal{E}_i\right]
&=
\prod_{k=1}^n\Pr\left[\mathcal{E}_k \mid \bigwedge_{i<k}\mathcal{E}_i\right].
\end{align}</math>
}}
}}
{{Proof|It holds that <math>\Pr[A\wedge B] =\Pr[B]\cdot\Pr[A\mid B]</math>. Thus, let <math>A=\mathcal{E}_n</math> and <math>B=\mathcal{E}_1\wedge\mathcal{E}_2\wedge\cdots\wedge\mathcal{E}_{n-1}</math>, then
The quantity <math>\phi=\frac{1+\sqrt{5}}{2}</math> is the so-called [http://en.wikipedia.org/wiki/Golden_ratio golden ratio], a constant with some significance in mathematics and aesthetics.
 
We now prove this theorem by using generating functions.
The ordinary generating function for the Fibonacci number <math>F_{n}</math> is
:<math>G(x)=\sum_{n\ge 0}F_n x^n</math>.
We have that <math>F_{n}=F_{n-1}+F_{n-2}</math> for <math>n\ge 2</math>, thus
:<math>\begin{align}
:<math>\begin{align}
\Pr[\mathcal{E}_1\wedge\mathcal{E}_2\wedge\cdots\wedge\mathcal{E}_n]
G(x)
&=
&=
\Pr[\mathcal{E}_1\wedge\mathcal{E}_2\wedge\cdots\wedge\mathcal{E}_{n-1}]\cdot\Pr\left[\mathcal{E}_n\mid \bigwedge_{i<n}\mathcal{E}_i\right].
\sum_{n\ge 0}F_n x^n
&=
F_0+F_1x+\sum_{n\ge 2}F_n x^n
&=
x+\sum_{n\ge 2}(F_{n-1}+F_{n-2})x^n.
\end{align}
\end{align}
</math>
</math>
Recursively applying this equation to <math>\Pr[\mathcal{E}_1\wedge\mathcal{E}_2\wedge\cdots\wedge\mathcal{E}_{n-1}]</math> until there is only <math>\mathcal{E}_1</math> left, the theorem is proved.
For generating functions, there are general ways to generate <math>F_{n-1}</math> and <math>F_{n-2}</math>, or the coefficients with any smaller indices.
:<math>
\begin{align}
xG(x)
&=\sum_{n\ge 0}F_n x^{n+1}=\sum_{n\ge 1}F_{n-1} x^n=\sum_{n\ge 2}F_{n-1} x^n\\
x^2G(x)
&=\sum_{n\ge 0}F_n x^{n+2}=\sum_{n\ge 2}F_{n-2} x^n.
\end{align}
</math>
So we have
:<math>G(x)=x+(x+x^2)G(x)\,</math>,
hence
:<math>G(x)=\frac{x}{1-x-x^2}</math>.
The value of <math>F_n</math> is the coefficient of <math>x^n</math> in the Taylor series for this formular, which is <math>\frac{G^{(n)}(0)}{n!}=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>. Although this expansion works in principle, the detailed calculus is rather painful.
 
----
It is easier to expand the generating function by breaking it into two geometric series.
{{Theorem|Proposition|
:Let <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>. It holds that
::<math>\frac{x}{1-x-x^2}=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}</math>.
}}
}}


=Polynomial Identity Testing (PIT)=
It is easy to verify the above equation, but to deduce it, we need some (high school) calculation.
Consider the following problem of '''Polynomial Identity Testing (PIT)''':
* '''Input:''' two <math>n</math>-variate polynomials <math>f, g\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> of degree <math>d</math>.
* '''Output:''' "yes" if <math>f\equiv g</math>, and "no" if otherwise.
The <math>\mathbb{F}[x_1,x_2,\ldots,x_n]</math> is the [http://en.wikipedia.org/wiki/Polynomial_ring#The_polynomial_ring_in_several_variables of multi-variate polynomials] over field <math>\mathbb{F}</math>. The most natural way to represent an <math>n</math>-variate polynomial of degree <math>d</math> is to write it as a sum of monomials:
:<math>f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n}</math>.
The '''degree''' or '''total degree''' of a monomial <math>a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n}</math> is given by <math>i_1+i_2+\cdots+i_n</math> and the degree of a polynomial <math>f</math> is the maximum degree of monomials of nonzero coefficients.


Alternatively, we can consider the following equivalent problem:
{|border="2" width="100%" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
* '''Input:''' a polynomial <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> of degree <math>d</math>.
|
* '''Output:''' "yes" if <math>f\equiv 0</math>, and "no" if otherwise.
:{|
|
<math>1-x-x^2</math> has two roots <math>\frac{-1\pm\sqrt{5}}{2}</math>.


If <math>f</math> is written explicitly as a sum of monomials, then the problem is trivial. Again we allow <math>f</math> to be represented in product form.
Denote that <math>\phi=\frac{2}{-1+\sqrt{5}}=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{2}{-1-\sqrt{5}}=\frac{1-\sqrt{5}}{2}</math>.  


{{Theorem|Example|
Then <math>(1-x-x^2)=(1-\phi x)(1-\hat{\phi}x)</math>, so we can write
The [http://en.wikipedia.org/wiki/Vandermonde_matrix Vandermonde matrix] <math>M=M(x_1,x_2,\ldots,x_n)</math> is defined as that <math>M_{ij}=x_i^{j-1}</math>, that is
:<math>M=\begin{bmatrix}
1 & x_1 & x_1^2 & \dots & x_1^{n-1}\\
1 & x_2 & x_2^2 & \dots & x_2^{n-1}\\
1 & x_3 & x_3^2 & \dots & x_3^{n-1}\\
\vdots & \vdots & \vdots & \ddots &\vdots \\
1 & x_n & x_n^2 & \dots & x_n^{n-1}
\end{bmatrix}</math>.
Let <math>f</math> be the polynomial defined as
:<math>
:<math>
f(x_1,\ldots,x_n)=\det(M)=\prod_{j<i}(x_i-x_j).
\begin{align}
\frac{x}{1-x-x^2}
&=\frac{x}{(1-\phi x)(1-\hat{\phi} x)}\\
&=\frac{\alpha}{(1-\phi x)}+\frac{\beta}{(1-\hat{\phi} x)},
\end{align}
</math>
</math>
It is pretty easy to evaluate <math>f(x_1,x_2,\ldots,x_n)</math> on any particular <math>x_1,x_2,\ldots,x_n</math>, however it is prohibitively expensive to symbolically expand <math>f(x_1,\ldots,x_n)</math> to its sum-of-monomial form.
where <math>\alpha</math> and <math>\beta</math> satisfying that
}}
:<math>\begin{cases}
\alpha+\beta=0\\
\alpha\phi+\beta\hat{\phi}= -1.
\end{cases}</math>
Solving this we have that <math>\alpha=\frac{1}{\sqrt{5}}</math> and <math>\beta=-\frac{1}{\sqrt{5}}</math>. Thus,
:<math>G(x)=\frac{x}{1-x-x^2}=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}</math>.
|}
:<math>\square</math>
|}
 
Note that the expression <math>\frac{1}{1-z}</math> has a well known geometric expansion:
:<math>\frac{1}{1-z}=\sum_{n\ge 0}z^n</math>.


== Schwartz-Zippel Theorem==
Therefore, <math>G(x)</math> can be expanded as
Here is a very simple randomized algorithm, due to Schwartz and Zippel.
:<math>
{{Theorem|Randomized algorithm for multi-variate PIT|
\begin{align}
* fix an arbitrary set <math>S\subseteq \mathbb{F}</math> whose size to be fixed;
G(x)
* pick <math>r_1,r_2,\ldots,r_n\in S</math> uniformly and independently at random;
&=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}\\
* if <math>f(\vec{r})=f(r_1,r_2,\ldots,r_n) = 0</math> then return “yes” else return “no”;
&=\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\phi x)^n-\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\hat{\phi} x)^n\\
}}
&=\sum_{n\ge 0}\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)x^n.
\end{align}</math>
So the <math>n</math>th Fibonacci number is given by
:<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>.


This algorithm requires only the evaluation of <math>f</math> at a single point <math>\vec{r}</math>. And if <math>f\equiv 0</math> it is always correct.
== Solving recurrences ==
The following steps describe a general methodology of solving recurrences by generating functions.
:1. Give a recursion that computes <math>a_n</math>. In the case of Fibonacci sequence
::<math>a_n=a_{n-1}+a_{n-2}</math>.
:2. Multiply both sides of the equation by <math>x^n</math> and sum over all <math>n</math>. This gives the generating function
::<math>G(x)=\sum_{n\ge 0}a_nx^n=\sum_{n\ge 0}(a_{n-1}+a_{n-2})x^n</math>.
:: And manipulate the right hand side of the equation so that it becomes some other expression involving <math>G(x)</math>.
::<math>G(x)=x+(x+x^2)G(x)\,</math>.
:3. Solve the resulting equation to derive an explicit formula for <math>G(x)</math>.
::<math>G(x)=\frac{x}{1-x-x^2}</math>.
:4. Expand <math>G(x)</math> into a power series and read off the coefficient of <math>x^n</math>, which is a closed form for <math>a_n</math>.


In the Theorem below, we’ll see that if <math>f\not\equiv 0</math> then the algorithm is incorrect with probability at most <math>\frac{d}{|S|}</math>, where <math>d</math> is the degree of the polynomial <math>f</math>.  
The first step is usually established by combinatorial observations, or explicitly given by the problem. The third step is trivial.


{{Theorem|Theorem (Schwartz-Zippel)|
The second and the forth steps need some non-trivial analytic techniques.
: Let <math>f\in\mathbb{F}[x_1,x_2,\ldots,x_n]</math> be a multivariate polynomial of degree <math>d</math> over a field <math>\mathbb{F}</math> such that <math>f\not\equiv 0</math>. Fix any finite set <math>S\subset\mathbb{F}</math>, and let <math>r_1,r_2\ldots,r_n</math> be chosen uniformly and independently at random from <math>S</math>. Then
::<math>\Pr[f(r_1,f_2,\ldots,r_n)=0]\le\frac{d}{|S|}.</math>
}}
{{Proof|
We prove by induction on <math>n</math> the number of variables.


For <math>n=1</math>, assuming that <math>f\not\equiv 0</math>, due to the fundamental theorem of algebra, the degree-<math>d</math> polynomial <math>f(x)</math> has at most <math>d</math> roots, thus
=== Algebraic operations on generating functions ===
:<math>\Pr[f(r)=0]\le\frac{d}{|S|}.
The second step in the above methodology is somehow tricky. It involves first applying the recurrence to the coefficients of <math>G(x)</math>, which is easy; and then manipulating the resulting formal power series to express it in terms of <math>G(x)</math>, which is more difficult (because it works backwards).
</math>


Assume the induction hypothesis for a multi-variate polynomial up to <math>n-1</math> variable.
We can apply several natural algebraic operations on the formal power series.


An <math>n</math>-variate polynomial <math>f(x_1,x_2,\ldots,x_n)</math> can be represented as
{{Theorem|Generating function manipulation|
:<math>f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1})</math>,
:Let <math>G(x)=\sum_{n\ge 0}g_nx^n</math> and <math>F(x)=\sum_{n\ge 0}f_nx^n</math>.
where <math>k</math> is the largest power of <math>x_n</math>, which means that the degree of <math>f_k</math> is at most <math>d-k</math> and <math>f_k\not\equiv 0</math>.
----


In particular, we write <math>f</math> as a sum of two parts:
:<math>f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n)</math>,
where both <math>f_k</math> and <math>\bar{f}</math> are polynomials, such that
* as argued above, the degree of <math>f_k</math> is at most <math>d-k</math> and <math>f_k\not\equiv 0</math>;
* <math>\bar{f}(x_1,x_2,\ldots,x_n)=\sum_{i=0}^{k-1}x_n^i f_i(x_1,x_2,\ldots,x_{n-1})</math>, thus <math>\bar{f}(x_1,x_2,\ldots,x_n)</math> has no <math>x_n^{k}</math> factor in any term.


By the law of total probability, it holds that
::<math>
:<math>
\begin{align}
\begin{align}
&\Pr[f(r_1,r_2,\ldots,r_n)=0]\\
&\text{shift:}
=
&x^k G(x)
&\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\
&= \sum_{n\ge k}g_{n-k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
&+\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})\neq0].
&\text{addition:}
& F(x)+G(x)
&= \sum_{n\ge 0} (f_n+ g_n)x^n\\
&\text{convolution:}
&F(x)G(x)
&= \sum_{n\ge 0}\sum_{k=0}^nf_kg_{n-k}x^n\\
&\text{differentiation:}
&G'(x)
&=\sum_{n\ge 0}(n+1)g_{n+1}x^n
\end{align}
\end{align}
</math>
</math>
Note that <math>f_k(r_1,r_2,\ldots,r_{n-1})</math> is a polynomial on <math>n-1</math> variables of degree <math>d-k</math> such that <math>f_k\not\equiv 0</math>.
}}
By the induction hypothesis, we have  
 
When manipulating generating functions, these rules are applied backwards; that is, from the right-hand-side to the left-hand-side.
 
=== Expanding generating functions ===
The last step of solving recurrences by generating function is expanding the closed form generating function <math>G(x)</math> to evaluate its <math>n</math>-th coefficient. In principle, we can always use the [http://en.wikipedia.org/wiki/Taylor_series Taylor series]
:<math>G(x)=\sum_{n\ge 0}\frac{G^{(n)}(0)}{n!}x^n</math>,
where <math>G^{(n)}(0)</math> is the value of the <math>n</math>-th derivative of <math>G(x)</math> evaluated at <math>x=0</math>.
 
Some interesting special cases are very useful.
 
====Geometric sequence====
In the example of Fibonacci numbers, we use the well known geometric series:
:<math>\frac{1}{1-x}=\sum_{n\ge 0}x^n</math>.
It is useful when we can express the generating function in the form of <math>G(x)=\frac{a_1}{1-b_1x}+\frac{a_2}{1-b_2x}+\cdots+\frac{a_k}{1-b_kx}</math>. The coefficient of <math>x^n</math> in such <math>G(x)</math> is <math>a_1b_1^n+a_2b_2^n+\cdots+a_kb_k^n</math>.
 
====Binomial theorem====
The <math>n</math>-th derivative of <math>(1+x)^\alpha</math> for some real <math>\alpha</math> is
:<math>\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)(1+x)^{\alpha-n}</math>.
By Taylor series, we get a generalized version of the binomial theorem known as [http://en.wikipedia.org/wiki/Binomial_coefficient#Newton.27s_binomial_series '''Newton's formula''']:
{{Theorem|Newton's formular (generalized binomial theorem)|
If <math>|x|<1</math>, then
:<math>(1+x)^\alpha=\sum_{n\ge 0}{\alpha\choose n}x^{n}</math>,
where <math>{\alpha\choose n}</math> is the '''generalized binomial coefficient''' defined by
:<math>{\alpha\choose n}=\frac{\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)}{n!}</math>.
}}
 
=== Example: multisets ===
In the last lecture we gave a combinatorial proof of the number of <math>k</math>-multisets on an <math>n</math>-set. Now we give a generating function approach to the problem.
 
Let <math>S=\{x_1,x_2,\ldots,x_n\}</math> be an <math>n</math>-element set. We have
:<math>(1+x_1+x_1^2+\cdots)(1+x_2+x_2^2+\cdots)\cdots(1+x_n+x_n^2+\cdots)=\sum_{m:S\rightarrow\mathbb{N}} \prod_{x_i\in S}x_i^{m(x_i)}</math>,
where each <math>m:S\rightarrow\mathbb{N}</math> species a possible multiset on <math>S</math> with multiplicity function <math>m</math>.
 
Let all <math>x_i=x</math>. Then
:<math>
:<math>
\begin{align}
\begin{align}
(*)
(1+x+x^2+\cdots)^n
&\qquad
&=
&\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}.
\sum_{m:S\rightarrow\mathbb{N}}x^{m(x_1)+\cdots+m(x_n)}\\
&=
\sum_{\text{multiset }M\text{ on }S}x^{|M|}\\
&=
\sum_{k\ge 0}\left({n\choose k}\right)x^k.
\end{align}
\end{align}
</math>
</math>
The last equation is due to the the definition of <math>\left({n\choose k}\right)</math>. Our task is to evaluate <math>\left({n\choose k}\right)</math>.


For the second case, recall that <math>\bar{f}(x_1,\ldots,x_n)</math> has no <math>x_n^k</math> factor in any term, thus the condition <math>f_k(r_1,r_2,\ldots,r_{n-1})\neq0</math> guarantees that
Due to the geometric sequence and the Newton's formula
:<math>f(r_1,\ldots,r_{n-1},x_n)=x_n^k f_k(r_1,r_2,\ldots,r_{n-1})+\bar{f}(r_1,r_2,\ldots,r_n)=g_{r_1,\ldots,r_{n-1}}(x_n)</math>
is a single-variate polynomial such that the degree of <math>g_{r_1,\ldots,r_{n-1}}(x_n)</math> is <math>k</math> and <math>g_{r_1,\ldots,r_{n-1}}\not\equiv 0</math>, for which we already known that the probability <math>g_{r_1,\ldots,r_{n-1}}(r_n)=0</math> is at most <math>\frac{k}{|S|}</math>.
Therefore,
:<math>
:<math>
\begin{align}
(1+x+x^2+\cdots)^n=(1-x)^{-n}=\sum_{k\ge 0}{-n\choose k}(-x)^k.
(**)
</math>
&\qquad
So
&\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|}
\end{align}
</math>.
Substituting both <math>(*)</math> and <math>(**)</math> back in the total probability, we have
:<math>
:<math>
\Pr[f(r_1,r_2,\ldots,r_n)=0]
\left({n\choose k}\right)=(-1)^k{-n\choose k}={n+k-1\choose k}.
\le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}.
</math>
</math>
}}
The last equation is due to the definition of the generalized binomial coefficient. We use an analytic (generating function) proof to get the same result of <math>\left({n\choose k}\right)</math> as the combinatorial proof.


= Min-Cut in a Graph =
== Catalan Number ==
Let <math>G(V, E)</math> be a graph. Suppose that we want to partition the vertex set <math>V</math> into two parts <math>S</math> and <math>T</math> such that the number of ''crossing edges'', edges with one endpoint in each part, is as small as possible. This can be described as the following problem: the min-cut problem.
We now introduce a class of counting problems, all with the same solution, called [http://en.wikipedia.org/wiki/Catalan_number '''Catalan number'''].  


For a connected graph <math>G(V, E)</math>, a '''cut''' is a set <math>C\subseteq E</math> of edges, removal of which causes <math>G</math> becomes disconnected. The min-cut problem is to find the cut with minimum cardinality. A canonical deterministic algorithm for this problem is through the [http://en.wikipedia.org/wiki/Max-flow_min-cut_theorem max-flow min-cut theorem]. A global minimum cut is the minimum <math>s</math>-<math>t</math> min-cut, which is equal to the minimum <math>s</math>-<math>t</math> max-flow.
The <math>n</math>th Catalan number is denoted as <math>C_n</math>.
In Volume 2 of Stanley's ''Enumerative Combinatorics'', a set of exercises describe 66 different interpretations of the Catalan numbers. We give a few examples, cited from Wikipedia.
* ''C''<sub>''n''</sub> is the number of '''Dyck words''' of length 2''n''. A Dyck word is a string consisting of ''n'' X's and ''n'' Y's such that no initial segment of the string has more Y's than X's (see also [http://en.wikipedia.org/wiki/Dyck_language Dyck language]). For example, the following are the Dyck words of length 6:
<div class="center"><big> XXXYYY &nbsp;&nbsp;&nbsp; XYXXYY &nbsp;&nbsp;&nbsp; XYXYXY &nbsp;&nbsp;&nbsp; XXYYXY &nbsp;&nbsp;&nbsp; XXYXYY.</big></div>


Do we have to rely on the "advanced" tools like flows? The answer is "no", with a little help of randomness.
* Re-interpreting the symbol X as an open parenthesis and Y as a close parenthesis, ''C''<sub>''n''</sub> counts the number of expressions containing ''n'' pairs of parentheses which are correctly matched:
<div class="center"><big> ((())) &nbsp;&nbsp;&nbsp; ()(()) &nbsp;&nbsp;&nbsp; ()()() &nbsp;&nbsp;&nbsp; (())() &nbsp;&nbsp;&nbsp; (()()) </big></div>


== Karger's Min-Cut Algorithm ==
* ''C''<sub>''n''</sub> is the number of different ways ''n''&nbsp;+&nbsp;1 factors can be completely parenthesized (or the number of ways of associating ''n'' applications of a '''binary operator'''). For ''n'' = 3, for example, we have the following five different parenthesizations of four factors:
We will introduce an extremely simple algorithm discovered by [http://people.csail.mit.edu/karger/ David Karger]. The algorithm works on multigraphs, graphs allowing multiple edges between vertices.
<div class="center"><math>((ab)c)d \quad (a(bc))d \quad(ab)(cd) \quad a((bc)d) \quad a(b(cd))</math></div>


We define an operation on multigraphs called ''contraction'':
* Successive applications of a binary operator can be represented in terms of a '''full binary tree'''. (A rooted binary tree is ''full'' if every vertex has either two children or no children.) It follows that ''C''<sub>''n''</sub> is the number of full binary trees with ''n''&nbsp;+&nbsp;1 leaves:
For a multigraph <math>G(V, E)</math>, for any edge <math>uv\in E</math>, let <math>contract(G,uv)</math> be a new multigraph constructed as follows: <math>u</math> and <math>v</math> in <math>V</math> are replaced by a singe new vertex whose neighbors are all the old neighbors of <math>u</math> and <math>v</math>. In other words, <math>u</math> and <math>v</math> are merged into one vertex. The old edges between <math>u</math> and <math>v</math> are deleted.
[[Image:Catalan number binary tree example.png|center]]


Karger's min-cut algorithm is described as follows:
* ''C''<sub>''n''</sub> is the number of '''monotonic paths''' along the edges of a grid with ''n'' × ''n'' square cells, which do not pass above the diagonal. A monotonic path is one which starts in the lower left corner, finishes in the upper right corner, and consists entirely of edges pointing rightwards or upwards. Counting such paths is equivalent to counting Dyck words: X stands for "move right" and Y stands for "move up". The following diagrams show the case ''n'' = 4:
[[Image:Catalan number 4x4 grid example.svg.png|450px|center]]


'''MinCut(multigraph <math>G(V, E)</math>)'''
* ''C''<sub>''n''</sub> is the number of different ways a [http://en.wikipedia.org/wiki/Convex_polygon '''convex polygon'''] with ''n''&nbsp;+&nbsp;2 sides can be cut into '''triangles''' by connecting vertices with straight lines. The following hexagons illustrate the case ''n'' = 4:
* while <math>|V|>2</math> do
[[Image:Catalan-Hexagons-example.png|400px|center]]
** choose an edge <math>uv\in E</math> uniformly at random;
** <math>G=contract(G,uv)</math>;
*return the edges between the only two vertices in <math>V</math>;


----
* ''C''<sub>''n''</sub> is the number of [http://en.wikipedia.org/wiki/Stack_(data_structure) '''stack''']-sortable permutations of {1, ..., ''n''}. A permutation ''w'' is called '''stack-sortable''' if ''S''(''w'') =&nbsp;(1,&nbsp;...,&nbsp;''n''), where ''S''(''w'') is defined recursively as follows: write ''w'' =&nbsp;''unv'' where ''n'' is the largest element in ''w'' and ''u'' and ''v'' are shorter sequences, and set ''S''(''w'') =&nbsp;''S''(''u'')''S''(''v'')''n'', with ''S'' being the identity for one-element sequences.  
A better way to understand Karger's min-cut algorithm is to describe it as randomly merging sets of vertices. Initially, each vertex <math>v\in V</math> corresponds to a singleton set <math>\{v\}</math>. At each step, (1) a crossing edge (edge whose endpoints are in different sets) is chosen uniformly at random from all crossing edges; and (2) the two sets connected by the chosen crossing-edge are merged to one set. Repeat this process until there are only two sets. The crossing edges between the two sets are returned.
----


== Analysis ==
* ''C''<sub>''n''</sub> is the number of ways to tile a stairstep shape of height ''n'' with ''n'' rectangles. The following figure illustrates the case ''n''&nbsp;=&nbsp;4:
For a multigraph <math>G(V, E)</math>, fixed a minimum cut <math>C</math> (there might be more than one minimum cuts), we analyze the probability that <math>C</math> is returned by the MinCut algorithm. <math>C</math> is returned by MinCut if and only if no edge in <math>C</math> is contracted during the execution of MinCut. We will bound this probability
[[Image:Catalan stairsteps 4.png|400px|center]]
<math>\Pr[\mbox{no edge in }C\mbox{ is contracted}]</math>.


{{Theorem
=== Solving the Catalan numbers ===
|Lemma 1|
{{Theorem|Recurrence relation for Catalan numbers|
:Let <math>G(V, E)</math> be a multigraph with <math>n</math> vertices, if the size of the minimum cut of <math>G</math> is <math>k</math>, then <math>|E|\ge nk/2</math>.
:<math>C_0=1</math>, and for <math>n\ge1</math>,
::<math>
C_n=
\sum_{k=0}^{n-1}C_kC_{n-1-k}</math>.
}}
}}
{{Proof|
 
:It holds that every vertex has at least <math>k</math> neighbors, because if there exists <math>v</math> with <math><k</math> neighbors, then the <math><k</math> edges adjacent to <math>v</math> disconnect <math>v</math> from the rest of <math>G</math>, forming a cut of size smaller than <math>k</math>. Therefore <math>|E|\ge kn/2</math>.
Let <math>G(x)=\sum_{n\ge 0}C_nx^n</math> be the generating function. Then
:<math>
\begin{align}
G(x)^2
&=\sum_{n\ge 0}\sum_{k=0}^{n}C_kC_{n-k}x^n\\
xG(x)^2
&=\sum_{n\ge 0}\sum_{k=0}^{n}C_kC_{n-k}x^{n+1}=\sum_{n\ge 1}\sum_{k=0}^{n-1}C_kC_{n-1-k}x^n.
\end{align}
</math>
Due to the recurrence,
:<math>G(x)=\sum_{n\ge 0}C_nx^n=C_0+\sum_{n\ge 1}\sum_{k=0}^{n-1}C_kC_{n-1-k}x^n=1+xG(x)^2</math>.
Solving <math>xG(x)^2-G(x)+1=0</math>, we obtain
:<math>G(x)=\frac{1\pm(1-4x)^{1/2}}{2x}</math>.
Only one of these functions can be the generating function for <math>C_n</math>, and it must satisfy
:<math>\lim_{x\rightarrow 0}G(x)=C_0=1</math>.
It is easy to check that the correct function is
:<math>G(x)=\frac{1-(1-4x)^{1/2}}{2x}</math>.
Expanding <math>(1-4x)^{1/2}</math> by Newton's formula,
:<math>
\begin{align}
(1-4x)^{1/2}
&=
\sum_{n\ge 0}{1/2\choose n}(-4x)^n\\
&=
1+\sum_{n\ge 1}{1/2\choose n}(-4x)^n\\
&=
1-4x\sum_{n\ge 0}{1/2\choose n+1}(-4x)^n
\end{align}
</math>
Then, we have
:<math>
\begin{align}
G(x)
&=
\frac{1-(1-4x)^{1/2}}{2x}\\
&=
2\sum_{n\ge 0}{1/2\choose n+1}(-4x)^n
\end{align}
</math>
Thus,  
:<math>
\begin{align}
C_n
&=2{1/2\choose n+1}(-4)^n\\
&=2\cdot\left(\frac{1}{2}\cdot\frac{-1}{2}\cdot\frac{-3}{2}\cdots\frac{-(2n-1)}{2}\right)\cdot\frac{1}{(n+1)!}\cdot(-4)^n\\
&=\frac{2^n}{(n+1)!}\prod_{k=1}^n(2k-1)\\
&=\frac{2^n}{(n+1)!}\prod_{k=1}^n\frac{(2k-1)2k}{2k}\\
&=\frac{1}{n!(n+1)!}\prod_{k=1}^n (2k-1)2k\\
&=\frac{(2n)!}{n!(n+1)!}\\
&=\frac{1}{n+1}{2n\choose n}.
\end{align}
</math>
So we prove the following closed form for Catalan number.
{{Theorem|Theorem|
:<math>C_n=\frac{1}{n+1}{2n\choose n}</math>.
}}
}}


{{Theorem
== Analysis of Quicksort ==
|Lemma 2|
Given as input a set <math>S</math> of <math>n</math> numbers, we want to sort the numbers in <math>S</math> in increasing order. One of the most famous algorithm for this problem is the  [http://en.wikipedia.org/wiki/Quicksort Quicksort] algorithm.
:Let <math>G(V, E)</math> be a multigraph with <math>n</math> vertices, and <math>C</math> a minimum cut of <math>G</math>.  If <math>e\not\in C</math>, then <math>C</math> is still a minimum cut of <math>contract(G, e)</math>.
{{Theorem|Quicksort algorithm|
'''Input''': a set <math>S</math> of <math>n</math> numbers.
* if <math>|S|>1</math> do:
** pick an <math>x\in S</math> as the ''pivot'';
** partition <math>S</math> into <math>S_1=\{y\in S\mid y<x\}</math> and <math>S_2=\{y\in S\mid y>x\}</math>;
** recursively sort <math>S_1</math> and <math>S_2</math>;
}}
}}
{{Proof|
:We first show that no edge in <math>C</math> is lost during the contraction. Due to the definition of contraction, the only edges removed from <math>G</math> in a contraction <math>contract(G, e)</math> are the parallel-edges sharing both endpoints with <math>e</math>. Since <math>e\not\in C</math>, none of these edges can be in <math>C</math>, or otherwise <math>C</math> cannot be a minimum cut of <math>G</math>. Thus every edge in <math>C</math> remains in <math>G</math>.


:It is then obvious to see that <math>C</math> is a cut of <math>contract(G, e)</math>. All paths in a contracted graph can be revived in the original multigraph by inserting the contracted edges into the path, thus a connected <math>contract(G, e)-C</math> would imply a connected <math>G-C</math>, which contradicts that <math>C</math> is a cut in <math>G</math>.
Usually the input set <math>S</math> is given as an array of the <math>n</math> elements in an arbitrary order. The pivot is picked from a fixed position in the arrary (e.g. the first number in the array).  


:Notice that a cut in a contracted graph must be a cut in the original graph. This can be easily verified by seeing contraction as taking the union of two sets of vertices. Therefore a contraction can never reduce the size of minimum cuts of a multigraph. A minimum cut <math>C</math> must still be a minimum cut in the contracted graph as long as it is still a cut.
The time complexity of this sorting algorithm is measured by the '''number of comparisons'''.


:Concluding the above arguments, we have that <math>C</math> is a minimum cut of <math>contract(G, e)</math> for any <math>e\not\in C</math>.
=== The quicksort recursion ===
}}
It is easy to observe that the running time of the algorithm depends only on the relative order of the elements in the input array.  


Let <math>G(V, E)</math> be a multigraph, and <math>C</math> a minimum cut of <math>G</math>.
Let <math>T_n</math> be the average number of comparison used by the Quicksort to sort an array of <math>n</math> numbers, where the average is taken over all <math>n!</math> total orders of the elements in the array.


Initially <math>|V|=n</math>.
{{Theorem|The Quicksort recursion|
After <math>(i-1)</math> contractions,  denote the current multigraph as <math>G_i(V_i, E_i)</math>. Suppose that no edge in <math>C</math> has been chosen to be contracted yet. According to Lemma 2, <math>C</math> must be a minimum cut of the <math>G_i</math>. Then due to Lemma 1, the current edge number is <math>|E_i|\ge |V_i||C|/2</math>. Uniformly choosing an edge <math>e\in E_i</math> to contract, the probability that the <math>i</math>th contraction contracts an edge in <math>C</math> is given by:
:<math>T_n=
\frac{1}{n}\sum_{k=1}^n\left(n-1+T_{k-1}+T_{n-k}\right)
</math>
:and <math>T_0=T_1=0\,</math>.
}}
The recursion is got from averaging over the <math>n</math> sub-cases that the pivot is chosen as the <math>k</math>-th smallest element for <math>k=1,2,\ldots,n</math>. Partitioning the input set <math>S</math> to <math>S_1</math> and <math>S_2</math> takes exactly <math>n-1</math> comparisons regardless the choice of the pivot. Given that the pivot is chosen as the  <math>k</math>-th smallest element, the sizes of <math>S_1</math> and <math>S_2</math> are <math>k-1</math> and <math>n-k</math> respectively, thus the costs of sorting <math>S_1</math> and <math>S_2</math> are given recursively by <math>T_{k-1}</math> and <math>T_{n-k}</math>.


:<math>\begin{align}\Pr_{e\in E_i}[e\in C] &= \frac{|C|}{|E_i|}  
=== Manipulating the OGF===
&\le |C|\cdot\frac{2}{|V_i||C|}
We write the ordinary generating function (OGF) for the quicksort:
&= \frac{2}{|V_i|}.\end{align}</math>
:<math>
\begin{align}
G(x)
&=\sum_{n\ge 0}T_nx^n.
\end{align}
</math>


Therefore, assuming that <math>C</math> is intact after <math>(i-1)</math> contractions, the probability that <math>C</math> survives the <math>i</math>th contraction is at least <math>1-2/|V_i|</math>. Note that <math>|V_i|=n-i+1</math>, because each contraction decrease the vertex number by 1.
The quicksort recursion also gives us another equation for formal power series:
:<math>
\begin{align}
\sum_{n\ge 0}nT_nx^n
&=\sum_{n\ge 0}\left(\sum_{k=1}^n\left(n-1+T_{k-1}+T_{n-k}\right)\right)x^n\\
&=\sum_{n\ge 0}n(n-1)x^n+2\sum_{n\ge 0}\left(\sum_{k=0}^{n-1}T_{k}\right)x^n.
\end{align}
</math>


The probability that no edge in the minimum cut <math>C</math> is ever contracted is:
We express the three terms <math>\sum_{n\ge 0}n(n-1)x^n</math>, <math>2\sum_{n\ge 0}\left(\sum_{k=0}^{n-1}T_{k}\right)x^n</math> and <math>\sum_{n\ge 0}nT_nx^n</math> in closed form involving <math>G(x)</math> as follows:
# Evaluate the power series: <math>\sum_{n\ge 0}n(n-1)x^n=x^2\sum_{n\ge 0}n(n-1)x^{n-2}=\frac{2x^2}{(1-x)^3}</math>.
# Apply the convolution rule of OGF: <math>2\sum_{n\ge 0}\left(\sum_{k=0}^{n-1}T_{k}\right)x^n=2x\sum_{n\ge 0}\left(\sum_{k=0}^{n}T_{k}\right)x^{n}=2xF(x)G(x)</math>,
#:where <math>F(x)=\sum_{n\ge 0}x^n=\frac{1}{1-x}</math>,
#:therefore, <math>2\sum_{n\ge 0}\left(\sum_{k=0}^{n-1}T_{k}\right)x^n=2xF(x)G(x)=\frac{2x}{1-x}G(x)</math>.
# Apply the differentiation rule of OGF: <math>\sum_{n\ge 0}nT_nx^n=x\sum_{n\ge 0}(n+1)T_{n+1}x^{n}=xG'(x)</math>.
Therefore we have the following identity for the OGF for quicksort:
{{Theorem|Equation for the generating function|
:<math>xG'(x)=\frac{2x^2}{(1-x)^3}+\frac{2x}{1-x}G(x)</math>.
}}
=== Solving the equation ===
The above equation for the generating function <math>G(x)</math> is a first-order linear differential equation, for which there is a standard method for solution.


:<math>\begin{align}
:<math>G(x)=\frac{2}{(1-x)^2}\ln\frac{1}{1-x}</math>.
&\quad\,\prod_{i=1}^{n-2}\Pr[\,C\mbox{ survives all }(n-2)\mbox{ contractions }]\\
&=
\prod_{i=1}^{n-2}\Pr[\,C\mbox{ survives the }i\mbox{-th contraction}\mid C\mbox{ survives the first }(i-1)\mbox{-th contractions}]\\
&=
\prod_{i=1}^{n-2}\left(1-\frac{2}{|V_i|}\right) \\
&=
\prod_{i=1}^{n-2}\left(1-\frac{2}{n-i+1}\right)\\
&=
\prod_{k=3}^{n}\frac{k-2}{k}\\
&= \frac{2}{n(n-1)}.
\end{align}</math>


Therefore, we prove the following theorem,
=== Expanding ===
Due to Taylor's expansion,
:<math>\frac{2}{(1-x)^2}=2\sum_{n\ge 0}(n+1) x^{n}</math>.
:<math>\ln\frac{1}{1-x}=\sum_{n\ge 1}\frac{x^n}{n}</math>.


{{Theorem
The generating function <math>G(x)</math> is a convolution product of these two series.
|Theorem|
:<math>
: For any multigraph with <math>n</math> vertices, the MinCut algorithm returns a minimum cut with probability at least <math>\frac{2}{n(n-1)}</math>.
\begin{align}
}}
G(x)
&=\frac{2}{(1-x)^2}\ln\frac{1}{1-x}\\
&=2\sum_{n\ge 0}(n+1) x^{n}\sum_{n\ge 1}\frac{x^n}{n}\\
&=2\sum_{n\ge 1}\left(\sum_{k=1}^{n}(n-k+1)\frac{1}{k}\right)x^n
\end{align}
</math>


Run MinCut independently for <math>n(n-1)/2</math> times and return the smallest cut returned. The probability that this the minimum cut is found is:
Thus the coefficient of <math>x^n</math> in <math>G(x)</math>, denoted as <math>[x^n]G(x)</math>, is:
:<math>
\begin{align}
\,[x^n]G(x)
&=2\sum_{k=1}^{n}(n-k+1)\frac{1}{k}\\
&=2(n+1)\sum_{k=1}^n\frac{1}{k}-2\sum_{k=1}^nk\cdot\frac{1}{k}\\
&=2(n+1)H(n)-2n,
\end{align}
</math>
where <math>H(n)</math> is the <math>n</math>th [http://en.wikipedia.org/wiki/Harmonic_number harmonic number] defined as <math>H(n)=\sum_{k=1}^n\frac{1}{k}</math>.


:<math>\begin{align}
Therefore, the average number of comparisons used by the quicksort to sort lists of length <math>n</math> is
1-\Pr[\mbox{failed every time}] &= 1-\Pr[\mbox{MinCut fails}]^{n(n-1)/2} \\
:<math>T_n=2(n+1)H(n)-2n= 2n\ln n+O(n)\,</math>.
&\ge 1- \left(1-\frac{2}{n(n-1)}\right)^{n(n-1)/2} \\
&\ge 1-\frac{1}{e}.
\end{align}</math>


A constant probability!
== Reference ==
* ''Graham, Knuth, and Patashnik'', Concrete Mathematics: A Foundation for Computer Science, Chapter 7.
* ''van Lin and Wilson'', A course in combinatorics, Chapter 14.

Revision as of 05:50, 20 March 2013

Generating Functions

In Stanley's magnificent book Enumerative Combinatorics, he comments the generating function as "the most useful but most difficult to understand method (for counting)".

The solution to a counting problem is usually represented as some [math]\displaystyle{ a_n }[/math] depending a parameter [math]\displaystyle{ n }[/math]. Sometimes this [math]\displaystyle{ a_n }[/math] is called a counting function as it is a function of the parameter [math]\displaystyle{ n }[/math]. [math]\displaystyle{ a_n }[/math] can also be treated as a infinite series:

[math]\displaystyle{ a_0,a_1,a_2,\ldots }[/math]

The ordinary generating function (OGF) defined by [math]\displaystyle{ a_n }[/math] is

[math]\displaystyle{ G(x)=\sum_{n\ge 0} a_nx^n. }[/math]

So [math]\displaystyle{ G(x)=a_0+a_1x+a_2x^2+\cdots }[/math]. An expression in this form is called a formal power series, and [math]\displaystyle{ a_0,a_1,a_2,\ldots }[/math] is the sequence of coefficients.

Furthermore, the generating function can be expanded as

G(x)=[math]\displaystyle{ (\underbrace{1+\cdots+1}_{a_0})+(\underbrace{x+\cdots+x}_{a_1})+(\underbrace{x^2+\cdots+x^2}_{a_2})+\cdots+(\underbrace{x^n+\cdots+x^n}_{a_n})+\cdots }[/math]

so it indeed "generates" all the possible instances of the objects we want to count.

Usually, we do not evaluate the generating function [math]\displaystyle{ GF(x) }[/math] on any particular value. [math]\displaystyle{ x }[/math] remains as a formal variable without assuming any value. The numbers that we want to count are the coefficients carried by the terms in the formal power series. So far the generating function is just another way to represent the sequence

[math]\displaystyle{ (a_0,a_1,a_2,\ldots\ldots) }[/math].

The true power of generating functions comes from the various algebraic operations that we can perform on these generating functions. We use an example to demonstrate this.

Combinations

Suppose we wish to enumerate all subsets of an [math]\displaystyle{ n }[/math]-set. To construct a subset, we specifies for every element of the [math]\displaystyle{ n }[/math]-set whether the element is chosen or not. Let us denote the choice to omit an element by [math]\displaystyle{ x_0 }[/math], and the choice to include it by [math]\displaystyle{ x_1 }[/math]. Using "[math]\displaystyle{ + }[/math]" to represent "OR", and using the multiplication to denote "AND", the choices of subsets of the [math]\displaystyle{ n }[/math]-set are expressed as

[math]\displaystyle{ \underbrace{(x_0+x_1)(x_0+x_1)\cdots (x_0+x_1)}_{n\mbox{ elements}}=(x_0+x_1)^n }[/math].

For example, when [math]\displaystyle{ n=3 }[/math], we have

[math]\displaystyle{ \begin{align} (x_0+x_1)^3 &=x_0x_0x_0+x_0x_0x_1+x_0x_1x_0+x_0x_1x_1\\ &\quad +x_1x_0x_0+x_1x_0x_1+x_1x_1x_0+x_1x_1x_1 \end{align} }[/math].

So it "generate" all subsets of the 3-set. Writing [math]\displaystyle{ 1 }[/math] for [math]\displaystyle{ x_0 }[/math] and [math]\displaystyle{ x }[/math] for [math]\displaystyle{ x_1 }[/math], we have [math]\displaystyle{ (1+x)^3=1+3x+3x^2+x^3 }[/math]. The coefficient of [math]\displaystyle{ x^k }[/math] is the number of [math]\displaystyle{ k }[/math]-subsets of a 3-element set.

In general, [math]\displaystyle{ (1+x)^n }[/math] has the coefficients which are the number of subsets of fixed sizes of an [math]\displaystyle{ n }[/math]-element set.


Suppose that we have twelve balls: 3 red, 4 blue, and 5 green. Balls with the same color are indistinguishable.

We want to determine the number of ways to select [math]\displaystyle{ k }[/math] balls from these twelve balls, for some [math]\displaystyle{ 0\le k\le 12 }[/math].

The generating function of this sequence is

[math]\displaystyle{ \begin{align} &\quad {\color{Red}(1+x+x^2+x^3)}{\color{Blue}(1+x+x^2+x^3+x^4)}{\color{OliveGreen}(1+x+x^2+x^3+x^4+x^5)}\\ &=1+3x+6x^2+10x^3+14x^4+17x^5+18x^6+17x^7+14x^8+10x^9+6x^{10}+3x^{11}+x^{12}. \end{align} }[/math]

The coefficient of [math]\displaystyle{ x^k }[/math] gives the number of ways to select [math]\displaystyle{ k }[/math] balls.

Fibonacci numbers

Consider the following counting problems.

  • Count the number of ways that the nonnegative integer [math]\displaystyle{ n }[/math] can be written as a sum of ones and twos (in order).
The problem asks for the number of compositions of [math]\displaystyle{ n }[/math] with summands from [math]\displaystyle{ \{1,2\} }[/math]. Formally, we are counting the number of tuples [math]\displaystyle{ (x_1,x_2,\ldots,x_k) }[/math] for some [math]\displaystyle{ k\le n }[/math] such that [math]\displaystyle{ x_i\in\{1,2\} }[/math] and [math]\displaystyle{ x_1+x_2+\cdots+x_k=n }[/math].
Let [math]\displaystyle{ F_n }[/math] be the solution. We observe that a composition either starts with a 1, in which case the rest is a composition of [math]\displaystyle{ n-1 }[/math]; or starts with a 2, in which case the rest is a composition of [math]\displaystyle{ n-2 }[/math]. So we have the recursion for [math]\displaystyle{ F_n }[/math] that
[math]\displaystyle{ F_n=F_{n-1}+F_{n-2} }[/math].
  • Count the ways to completely cover a [math]\displaystyle{ 2\times n }[/math] rectangle with [math]\displaystyle{ 2\times 1 }[/math] dominos without any overlaps.
Dominos are identical [math]\displaystyle{ 2\times 1 }[/math] rectangles, so that only their orientations --- vertical or horizontal matter.
Let [math]\displaystyle{ F_n }[/math] be the solution. It also holds that [math]\displaystyle{ F_n=F_{n-1}+F_{n-2} }[/math]. The proof is left as an exercise.

In both problems, the solution is given by [math]\displaystyle{ F_n }[/math] which satisfies the following recursion.

[math]\displaystyle{ F_n=\begin{cases} F_{n-1}+F_{n-2} & \mbox{if }n\ge 2,\\ 1 & \mbox{if }n=1\\ 0 & \mbox{if }n=0. \end{cases} }[/math]

[math]\displaystyle{ F_n }[/math] is called the Fibonacci number.

Theorem
[math]\displaystyle{ F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right) }[/math],
where [math]\displaystyle{ \phi=\frac{1+\sqrt{5}}{2} }[/math] and [math]\displaystyle{ \hat{\phi}=\frac{1-\sqrt{5}}{2} }[/math].

The quantity [math]\displaystyle{ \phi=\frac{1+\sqrt{5}}{2} }[/math] is the so-called golden ratio, a constant with some significance in mathematics and aesthetics.

We now prove this theorem by using generating functions. The ordinary generating function for the Fibonacci number [math]\displaystyle{ F_{n} }[/math] is

[math]\displaystyle{ G(x)=\sum_{n\ge 0}F_n x^n }[/math].

We have that [math]\displaystyle{ F_{n}=F_{n-1}+F_{n-2} }[/math] for [math]\displaystyle{ n\ge 2 }[/math], thus

[math]\displaystyle{ \begin{align} G(x) &= \sum_{n\ge 0}F_n x^n &= F_0+F_1x+\sum_{n\ge 2}F_n x^n &= x+\sum_{n\ge 2}(F_{n-1}+F_{n-2})x^n. \end{align} }[/math]

For generating functions, there are general ways to generate [math]\displaystyle{ F_{n-1} }[/math] and [math]\displaystyle{ F_{n-2} }[/math], or the coefficients with any smaller indices.

[math]\displaystyle{ \begin{align} xG(x) &=\sum_{n\ge 0}F_n x^{n+1}=\sum_{n\ge 1}F_{n-1} x^n=\sum_{n\ge 2}F_{n-1} x^n\\ x^2G(x) &=\sum_{n\ge 0}F_n x^{n+2}=\sum_{n\ge 2}F_{n-2} x^n. \end{align} }[/math]

So we have

[math]\displaystyle{ G(x)=x+(x+x^2)G(x)\, }[/math],

hence

[math]\displaystyle{ G(x)=\frac{x}{1-x-x^2} }[/math].

The value of [math]\displaystyle{ F_n }[/math] is the coefficient of [math]\displaystyle{ x^n }[/math] in the Taylor series for this formular, which is [math]\displaystyle{ \frac{G^{(n)}(0)}{n!}=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n }[/math]. Although this expansion works in principle, the detailed calculus is rather painful.


It is easier to expand the generating function by breaking it into two geometric series.

Proposition
Let [math]\displaystyle{ \phi=\frac{1+\sqrt{5}}{2} }[/math] and [math]\displaystyle{ \hat{\phi}=\frac{1-\sqrt{5}}{2} }[/math]. It holds that
[math]\displaystyle{ \frac{x}{1-x-x^2}=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x} }[/math].

It is easy to verify the above equation, but to deduce it, we need some (high school) calculation.

[math]\displaystyle{ 1-x-x^2 }[/math] has two roots [math]\displaystyle{ \frac{-1\pm\sqrt{5}}{2} }[/math].

Denote that [math]\displaystyle{ \phi=\frac{2}{-1+\sqrt{5}}=\frac{1+\sqrt{5}}{2} }[/math] and [math]\displaystyle{ \hat{\phi}=\frac{2}{-1-\sqrt{5}}=\frac{1-\sqrt{5}}{2} }[/math].

Then [math]\displaystyle{ (1-x-x^2)=(1-\phi x)(1-\hat{\phi}x) }[/math], so we can write

[math]\displaystyle{ \begin{align} \frac{x}{1-x-x^2} &=\frac{x}{(1-\phi x)(1-\hat{\phi} x)}\\ &=\frac{\alpha}{(1-\phi x)}+\frac{\beta}{(1-\hat{\phi} x)}, \end{align} }[/math]

where [math]\displaystyle{ \alpha }[/math] and [math]\displaystyle{ \beta }[/math] satisfying that

[math]\displaystyle{ \begin{cases} \alpha+\beta=0\\ \alpha\phi+\beta\hat{\phi}= -1. \end{cases} }[/math]

Solving this we have that [math]\displaystyle{ \alpha=\frac{1}{\sqrt{5}} }[/math] and [math]\displaystyle{ \beta=-\frac{1}{\sqrt{5}} }[/math]. Thus,

[math]\displaystyle{ G(x)=\frac{x}{1-x-x^2}=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x} }[/math].
[math]\displaystyle{ \square }[/math]

Note that the expression [math]\displaystyle{ \frac{1}{1-z} }[/math] has a well known geometric expansion:

[math]\displaystyle{ \frac{1}{1-z}=\sum_{n\ge 0}z^n }[/math].

Therefore, [math]\displaystyle{ G(x) }[/math] can be expanded as

[math]\displaystyle{ \begin{align} G(x) &=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}\\ &=\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\phi x)^n-\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\hat{\phi} x)^n\\ &=\sum_{n\ge 0}\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)x^n. \end{align} }[/math]

So the [math]\displaystyle{ n }[/math]th Fibonacci number is given by

[math]\displaystyle{ F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n }[/math].

Solving recurrences

The following steps describe a general methodology of solving recurrences by generating functions.

1. Give a recursion that computes [math]\displaystyle{ a_n }[/math]. In the case of Fibonacci sequence
[math]\displaystyle{ a_n=a_{n-1}+a_{n-2} }[/math].
2. Multiply both sides of the equation by [math]\displaystyle{ x^n }[/math] and sum over all [math]\displaystyle{ n }[/math]. This gives the generating function
[math]\displaystyle{ G(x)=\sum_{n\ge 0}a_nx^n=\sum_{n\ge 0}(a_{n-1}+a_{n-2})x^n }[/math].
And manipulate the right hand side of the equation so that it becomes some other expression involving [math]\displaystyle{ G(x) }[/math].
[math]\displaystyle{ G(x)=x+(x+x^2)G(x)\, }[/math].
3. Solve the resulting equation to derive an explicit formula for [math]\displaystyle{ G(x) }[/math].
[math]\displaystyle{ G(x)=\frac{x}{1-x-x^2} }[/math].
4. Expand [math]\displaystyle{ G(x) }[/math] into a power series and read off the coefficient of [math]\displaystyle{ x^n }[/math], which is a closed form for [math]\displaystyle{ a_n }[/math].

The first step is usually established by combinatorial observations, or explicitly given by the problem. The third step is trivial.

The second and the forth steps need some non-trivial analytic techniques.

Algebraic operations on generating functions

The second step in the above methodology is somehow tricky. It involves first applying the recurrence to the coefficients of [math]\displaystyle{ G(x) }[/math], which is easy; and then manipulating the resulting formal power series to express it in terms of [math]\displaystyle{ G(x) }[/math], which is more difficult (because it works backwards).

We can apply several natural algebraic operations on the formal power series.

Generating function manipulation
Let [math]\displaystyle{ G(x)=\sum_{n\ge 0}g_nx^n }[/math] and [math]\displaystyle{ F(x)=\sum_{n\ge 0}f_nx^n }[/math].


[math]\displaystyle{ \begin{align} &\text{shift:} &x^k G(x) &= \sum_{n\ge k}g_{n-k}x^n, &\qquad (\mbox{integer }k\ge 0)\\ &\text{addition:} & F(x)+G(x) &= \sum_{n\ge 0} (f_n+ g_n)x^n\\ &\text{convolution:} &F(x)G(x) &= \sum_{n\ge 0}\sum_{k=0}^nf_kg_{n-k}x^n\\ &\text{differentiation:} &G'(x) &=\sum_{n\ge 0}(n+1)g_{n+1}x^n \end{align} }[/math]

When manipulating generating functions, these rules are applied backwards; that is, from the right-hand-side to the left-hand-side.

Expanding generating functions

The last step of solving recurrences by generating function is expanding the closed form generating function [math]\displaystyle{ G(x) }[/math] to evaluate its [math]\displaystyle{ n }[/math]-th coefficient. In principle, we can always use the Taylor series

[math]\displaystyle{ G(x)=\sum_{n\ge 0}\frac{G^{(n)}(0)}{n!}x^n }[/math],

where [math]\displaystyle{ G^{(n)}(0) }[/math] is the value of the [math]\displaystyle{ n }[/math]-th derivative of [math]\displaystyle{ G(x) }[/math] evaluated at [math]\displaystyle{ x=0 }[/math].

Some interesting special cases are very useful.

Geometric sequence

In the example of Fibonacci numbers, we use the well known geometric series:

[math]\displaystyle{ \frac{1}{1-x}=\sum_{n\ge 0}x^n }[/math].

It is useful when we can express the generating function in the form of [math]\displaystyle{ G(x)=\frac{a_1}{1-b_1x}+\frac{a_2}{1-b_2x}+\cdots+\frac{a_k}{1-b_kx} }[/math]. The coefficient of [math]\displaystyle{ x^n }[/math] in such [math]\displaystyle{ G(x) }[/math] is [math]\displaystyle{ a_1b_1^n+a_2b_2^n+\cdots+a_kb_k^n }[/math].

Binomial theorem

The [math]\displaystyle{ n }[/math]-th derivative of [math]\displaystyle{ (1+x)^\alpha }[/math] for some real [math]\displaystyle{ \alpha }[/math] is

[math]\displaystyle{ \alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)(1+x)^{\alpha-n} }[/math].

By Taylor series, we get a generalized version of the binomial theorem known as Newton's formula:

Newton's formular (generalized binomial theorem)

If [math]\displaystyle{ |x|\lt 1 }[/math], then

[math]\displaystyle{ (1+x)^\alpha=\sum_{n\ge 0}{\alpha\choose n}x^{n} }[/math],

where [math]\displaystyle{ {\alpha\choose n} }[/math] is the generalized binomial coefficient defined by

[math]\displaystyle{ {\alpha\choose n}=\frac{\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)}{n!} }[/math].

Example: multisets

In the last lecture we gave a combinatorial proof of the number of [math]\displaystyle{ k }[/math]-multisets on an [math]\displaystyle{ n }[/math]-set. Now we give a generating function approach to the problem.

Let [math]\displaystyle{ S=\{x_1,x_2,\ldots,x_n\} }[/math] be an [math]\displaystyle{ n }[/math]-element set. We have

[math]\displaystyle{ (1+x_1+x_1^2+\cdots)(1+x_2+x_2^2+\cdots)\cdots(1+x_n+x_n^2+\cdots)=\sum_{m:S\rightarrow\mathbb{N}} \prod_{x_i\in S}x_i^{m(x_i)} }[/math],

where each [math]\displaystyle{ m:S\rightarrow\mathbb{N} }[/math] species a possible multiset on [math]\displaystyle{ S }[/math] with multiplicity function [math]\displaystyle{ m }[/math].

Let all [math]\displaystyle{ x_i=x }[/math]. Then

[math]\displaystyle{ \begin{align} (1+x+x^2+\cdots)^n &= \sum_{m:S\rightarrow\mathbb{N}}x^{m(x_1)+\cdots+m(x_n)}\\ &= \sum_{\text{multiset }M\text{ on }S}x^{|M|}\\ &= \sum_{k\ge 0}\left({n\choose k}\right)x^k. \end{align} }[/math]

The last equation is due to the the definition of [math]\displaystyle{ \left({n\choose k}\right) }[/math]. Our task is to evaluate [math]\displaystyle{ \left({n\choose k}\right) }[/math].

Due to the geometric sequence and the Newton's formula

[math]\displaystyle{ (1+x+x^2+\cdots)^n=(1-x)^{-n}=\sum_{k\ge 0}{-n\choose k}(-x)^k. }[/math]

So

[math]\displaystyle{ \left({n\choose k}\right)=(-1)^k{-n\choose k}={n+k-1\choose k}. }[/math]

The last equation is due to the definition of the generalized binomial coefficient. We use an analytic (generating function) proof to get the same result of [math]\displaystyle{ \left({n\choose k}\right) }[/math] as the combinatorial proof.

Catalan Number

We now introduce a class of counting problems, all with the same solution, called Catalan number.

The [math]\displaystyle{ n }[/math]th Catalan number is denoted as [math]\displaystyle{ C_n }[/math]. In Volume 2 of Stanley's Enumerative Combinatorics, a set of exercises describe 66 different interpretations of the Catalan numbers. We give a few examples, cited from Wikipedia.

  • Cn is the number of Dyck words of length 2n. A Dyck word is a string consisting of n X's and n Y's such that no initial segment of the string has more Y's than X's (see also Dyck language). For example, the following are the Dyck words of length 6:
XXXYYY     XYXXYY     XYXYXY     XXYYXY     XXYXYY.
  • Re-interpreting the symbol X as an open parenthesis and Y as a close parenthesis, Cn counts the number of expressions containing n pairs of parentheses which are correctly matched:
((()))     ()(())     ()()()     (())()     (()())
  • Cn is the number of different ways n + 1 factors can be completely parenthesized (or the number of ways of associating n applications of a binary operator). For n = 3, for example, we have the following five different parenthesizations of four factors:
[math]\displaystyle{ ((ab)c)d \quad (a(bc))d \quad(ab)(cd) \quad a((bc)d) \quad a(b(cd)) }[/math]
  • Successive applications of a binary operator can be represented in terms of a full binary tree. (A rooted binary tree is full if every vertex has either two children or no children.) It follows that Cn is the number of full binary trees with n + 1 leaves:
  • Cn is the number of monotonic paths along the edges of a grid with n × n square cells, which do not pass above the diagonal. A monotonic path is one which starts in the lower left corner, finishes in the upper right corner, and consists entirely of edges pointing rightwards or upwards. Counting such paths is equivalent to counting Dyck words: X stands for "move right" and Y stands for "move up". The following diagrams show the case n = 4:
  • Cn is the number of different ways a convex polygon with n + 2 sides can be cut into triangles by connecting vertices with straight lines. The following hexagons illustrate the case n = 4:
  • Cn is the number of stack-sortable permutations of {1, ..., n}. A permutation w is called stack-sortable if S(w) = (1, ..., n), where S(w) is defined recursively as follows: write wunv where n is the largest element in w and u and v are shorter sequences, and set S(w) = S(u)S(v)n, with S being the identity for one-element sequences.
  • Cn is the number of ways to tile a stairstep shape of height n with n rectangles. The following figure illustrates the case n = 4:

Solving the Catalan numbers

Recurrence relation for Catalan numbers
[math]\displaystyle{ C_0=1 }[/math], and for [math]\displaystyle{ n\ge1 }[/math],
[math]\displaystyle{ C_n= \sum_{k=0}^{n-1}C_kC_{n-1-k} }[/math].

Let [math]\displaystyle{ G(x)=\sum_{n\ge 0}C_nx^n }[/math] be the generating function. Then

[math]\displaystyle{ \begin{align} G(x)^2 &=\sum_{n\ge 0}\sum_{k=0}^{n}C_kC_{n-k}x^n\\ xG(x)^2 &=\sum_{n\ge 0}\sum_{k=0}^{n}C_kC_{n-k}x^{n+1}=\sum_{n\ge 1}\sum_{k=0}^{n-1}C_kC_{n-1-k}x^n. \end{align} }[/math]

Due to the recurrence,

[math]\displaystyle{ G(x)=\sum_{n\ge 0}C_nx^n=C_0+\sum_{n\ge 1}\sum_{k=0}^{n-1}C_kC_{n-1-k}x^n=1+xG(x)^2 }[/math].

Solving [math]\displaystyle{ xG(x)^2-G(x)+1=0 }[/math], we obtain

[math]\displaystyle{ G(x)=\frac{1\pm(1-4x)^{1/2}}{2x} }[/math].

Only one of these functions can be the generating function for [math]\displaystyle{ C_n }[/math], and it must satisfy

[math]\displaystyle{ \lim_{x\rightarrow 0}G(x)=C_0=1 }[/math].

It is easy to check that the correct function is

[math]\displaystyle{ G(x)=\frac{1-(1-4x)^{1/2}}{2x} }[/math].

Expanding [math]\displaystyle{ (1-4x)^{1/2} }[/math] by Newton's formula,

[math]\displaystyle{ \begin{align} (1-4x)^{1/2} &= \sum_{n\ge 0}{1/2\choose n}(-4x)^n\\ &= 1+\sum_{n\ge 1}{1/2\choose n}(-4x)^n\\ &= 1-4x\sum_{n\ge 0}{1/2\choose n+1}(-4x)^n \end{align} }[/math]

Then, we have

[math]\displaystyle{ \begin{align} G(x) &= \frac{1-(1-4x)^{1/2}}{2x}\\ &= 2\sum_{n\ge 0}{1/2\choose n+1}(-4x)^n \end{align} }[/math]

Thus,

[math]\displaystyle{ \begin{align} C_n &=2{1/2\choose n+1}(-4)^n\\ &=2\cdot\left(\frac{1}{2}\cdot\frac{-1}{2}\cdot\frac{-3}{2}\cdots\frac{-(2n-1)}{2}\right)\cdot\frac{1}{(n+1)!}\cdot(-4)^n\\ &=\frac{2^n}{(n+1)!}\prod_{k=1}^n(2k-1)\\ &=\frac{2^n}{(n+1)!}\prod_{k=1}^n\frac{(2k-1)2k}{2k}\\ &=\frac{1}{n!(n+1)!}\prod_{k=1}^n (2k-1)2k\\ &=\frac{(2n)!}{n!(n+1)!}\\ &=\frac{1}{n+1}{2n\choose n}. \end{align} }[/math]

So we prove the following closed form for Catalan number.

Theorem
[math]\displaystyle{ C_n=\frac{1}{n+1}{2n\choose n} }[/math].

Analysis of Quicksort

Given as input a set [math]\displaystyle{ S }[/math] of [math]\displaystyle{ n }[/math] numbers, we want to sort the numbers in [math]\displaystyle{ S }[/math] in increasing order. One of the most famous algorithm for this problem is the Quicksort algorithm.

Quicksort algorithm

Input: a set [math]\displaystyle{ S }[/math] of [math]\displaystyle{ n }[/math] numbers.

  • if [math]\displaystyle{ |S|\gt 1 }[/math] do:
    • pick an [math]\displaystyle{ x\in S }[/math] as the pivot;
    • partition [math]\displaystyle{ S }[/math] into [math]\displaystyle{ S_1=\{y\in S\mid y\lt x\} }[/math] and [math]\displaystyle{ S_2=\{y\in S\mid y\gt x\} }[/math];
    • recursively sort [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math];

Usually the input set [math]\displaystyle{ S }[/math] is given as an array of the [math]\displaystyle{ n }[/math] elements in an arbitrary order. The pivot is picked from a fixed position in the arrary (e.g. the first number in the array).

The time complexity of this sorting algorithm is measured by the number of comparisons.

The quicksort recursion

It is easy to observe that the running time of the algorithm depends only on the relative order of the elements in the input array.

Let [math]\displaystyle{ T_n }[/math] be the average number of comparison used by the Quicksort to sort an array of [math]\displaystyle{ n }[/math] numbers, where the average is taken over all [math]\displaystyle{ n! }[/math] total orders of the elements in the array.

The Quicksort recursion
[math]\displaystyle{ T_n= \frac{1}{n}\sum_{k=1}^n\left(n-1+T_{k-1}+T_{n-k}\right) }[/math]
and [math]\displaystyle{ T_0=T_1=0\, }[/math].

The recursion is got from averaging over the [math]\displaystyle{ n }[/math] sub-cases that the pivot is chosen as the [math]\displaystyle{ k }[/math]-th smallest element for [math]\displaystyle{ k=1,2,\ldots,n }[/math]. Partitioning the input set [math]\displaystyle{ S }[/math] to [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math] takes exactly [math]\displaystyle{ n-1 }[/math] comparisons regardless the choice of the pivot. Given that the pivot is chosen as the [math]\displaystyle{ k }[/math]-th smallest element, the sizes of [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math] are [math]\displaystyle{ k-1 }[/math] and [math]\displaystyle{ n-k }[/math] respectively, thus the costs of sorting [math]\displaystyle{ S_1 }[/math] and [math]\displaystyle{ S_2 }[/math] are given recursively by [math]\displaystyle{ T_{k-1} }[/math] and [math]\displaystyle{ T_{n-k} }[/math].

Manipulating the OGF

We write the ordinary generating function (OGF) for the quicksort:

[math]\displaystyle{ \begin{align} G(x) &=\sum_{n\ge 0}T_nx^n. \end{align} }[/math]

The quicksort recursion also gives us another equation for formal power series:

[math]\displaystyle{ \begin{align} \sum_{n\ge 0}nT_nx^n &=\sum_{n\ge 0}\left(\sum_{k=1}^n\left(n-1+T_{k-1}+T_{n-k}\right)\right)x^n\\ &=\sum_{n\ge 0}n(n-1)x^n+2\sum_{n\ge 0}\left(\sum_{k=0}^{n-1}T_{k}\right)x^n. \end{align} }[/math]

We express the three terms [math]\displaystyle{ \sum_{n\ge 0}n(n-1)x^n }[/math], [math]\displaystyle{ 2\sum_{n\ge 0}\left(\sum_{k=0}^{n-1}T_{k}\right)x^n }[/math] and [math]\displaystyle{ \sum_{n\ge 0}nT_nx^n }[/math] in closed form involving [math]\displaystyle{ G(x) }[/math] as follows:

  1. Evaluate the power series: [math]\displaystyle{ \sum_{n\ge 0}n(n-1)x^n=x^2\sum_{n\ge 0}n(n-1)x^{n-2}=\frac{2x^2}{(1-x)^3} }[/math].
  2. Apply the convolution rule of OGF: [math]\displaystyle{ 2\sum_{n\ge 0}\left(\sum_{k=0}^{n-1}T_{k}\right)x^n=2x\sum_{n\ge 0}\left(\sum_{k=0}^{n}T_{k}\right)x^{n}=2xF(x)G(x) }[/math],
    where [math]\displaystyle{ F(x)=\sum_{n\ge 0}x^n=\frac{1}{1-x} }[/math],
    therefore, [math]\displaystyle{ 2\sum_{n\ge 0}\left(\sum_{k=0}^{n-1}T_{k}\right)x^n=2xF(x)G(x)=\frac{2x}{1-x}G(x) }[/math].
  3. Apply the differentiation rule of OGF: [math]\displaystyle{ \sum_{n\ge 0}nT_nx^n=x\sum_{n\ge 0}(n+1)T_{n+1}x^{n}=xG'(x) }[/math].

Therefore we have the following identity for the OGF for quicksort:

Equation for the generating function
[math]\displaystyle{ xG'(x)=\frac{2x^2}{(1-x)^3}+\frac{2x}{1-x}G(x) }[/math].

Solving the equation

The above equation for the generating function [math]\displaystyle{ G(x) }[/math] is a first-order linear differential equation, for which there is a standard method for solution.

[math]\displaystyle{ G(x)=\frac{2}{(1-x)^2}\ln\frac{1}{1-x} }[/math].

Expanding

Due to Taylor's expansion,

[math]\displaystyle{ \frac{2}{(1-x)^2}=2\sum_{n\ge 0}(n+1) x^{n} }[/math].
[math]\displaystyle{ \ln\frac{1}{1-x}=\sum_{n\ge 1}\frac{x^n}{n} }[/math].

The generating function [math]\displaystyle{ G(x) }[/math] is a convolution product of these two series.

[math]\displaystyle{ \begin{align} G(x) &=\frac{2}{(1-x)^2}\ln\frac{1}{1-x}\\ &=2\sum_{n\ge 0}(n+1) x^{n}\sum_{n\ge 1}\frac{x^n}{n}\\ &=2\sum_{n\ge 1}\left(\sum_{k=1}^{n}(n-k+1)\frac{1}{k}\right)x^n \end{align} }[/math]

Thus the coefficient of [math]\displaystyle{ x^n }[/math] in [math]\displaystyle{ G(x) }[/math], denoted as [math]\displaystyle{ [x^n]G(x) }[/math], is:

[math]\displaystyle{ \begin{align} \,[x^n]G(x) &=2\sum_{k=1}^{n}(n-k+1)\frac{1}{k}\\ &=2(n+1)\sum_{k=1}^n\frac{1}{k}-2\sum_{k=1}^nk\cdot\frac{1}{k}\\ &=2(n+1)H(n)-2n, \end{align} }[/math]

where [math]\displaystyle{ H(n) }[/math] is the [math]\displaystyle{ n }[/math]th harmonic number defined as [math]\displaystyle{ H(n)=\sum_{k=1}^n\frac{1}{k} }[/math].

Therefore, the average number of comparisons used by the quicksort to sort lists of length [math]\displaystyle{ n }[/math] is

[math]\displaystyle{ T_n=2(n+1)H(n)-2n= 2n\ln n+O(n)\, }[/math].

Reference

  • Graham, Knuth, and Patashnik, Concrete Mathematics: A Foundation for Computer Science, Chapter 7.
  • van Lin and Wilson, A course in combinatorics, Chapter 14.