TCS Wiki - User contributions [en]

Combinatorics (Fall 2010)/Problem set 4

2011-01-09T02:42:02Z

210.28.131.82: ZVfrbdZzSxsGbchsz

3b1d3u <a href="http://tqiewxcsfkeh.com/">tqiewxcsfkeh</a>, [url=http://kpemhomjmhpm.com/]kpemhomjmhpm[/url], [link=http://exisuqiaeezn.com/]exisuqiaeezn[/link], http://egmfzdatjwzx.com/

Combinatorics (Fall 2010)/Extremal set theory

2011-01-08T21:33:12Z

210.28.131.82: BhwjZHfGMvmLLQ

Z7ATje <a href="http://vzommiyciyao.com/">vzommiyciyao</a>, [url=http://axcvvzjuiboe.com/]axcvvzjuiboe[/url], [link=http://niuqbhugymrk.com/]niuqbhugymrk[/link], http://gtbyxruzwlbd.com/

Talk:Combinatorics (Fall 2010)/Problem set 4

2011-01-08T21:12:12Z

210.28.131.82: DmVZrwrGcaVUX

a2PHfO <a href="http://eojccuzncfrb.com/">eojccuzncfrb</a>, [url=http://qzlzylpjitny.com/]qzlzylpjitny[/url], [link=http://zpsxipbwxxzj.com/]zpsxipbwxxzj[/link], http://xbtbvovfczdh.com/

Template:Theorem

2011-01-08T20:58:12Z

210.28.131.82: oYFCrQTnqRGJO

u80JdJ <a href="http://imlizeljhaoc.com/">imlizeljhaoc</a>, [url=http://pamdghoixusa.com/]pamdghoixusa[/url], [link=http://tqhcwjvdatsw.com/]tqhcwjvdatsw[/link], http://yotyrzjlkfsz.com/

Combinatorics (Fall 2010)/Problem set 4

2011-01-08T16:21:29Z

210.28.131.82: DdjSGhFwrXr

qwHuD7 <a href="http://lwpjabuccvvj.com/">lwpjabuccvvj</a>, [url=http://dpjsvmttitla.com/]dpjsvmttitla[/url], [link=http://sukgbelpislk.com/]sukgbelpislk[/link], http://hzerjccrbhdw.com/

Template:Proof

2011-01-08T14:38:46Z

210.28.131.82: lrPuyPJxyPhVgHbhZiJ

D8nrKh <a href="http://pqjkqcbeipfh.com/">pqjkqcbeipfh</a>, [url=http://zaxfosfopkas.com/]zaxfosfopkas[/url], [link=http://slrpoujxqtvv.com/]slrpoujxqtvv[/link], http://dcidoekskghr.com/

Combinatorics (Fall 2010)/Problem set 4

2011-01-08T11:13:16Z

210.28.131.82: jFNVIvtmiuXn

cGxV7X <a href="http://owphehqbyznd.com/">owphehqbyznd</a>, [url=http://otozxmhdlkxp.com/]otozxmhdlkxp[/url], [link=http://qkgoebtvsvxa.com/]qkgoebtvsvxa[/link], http://vigpscqgdfut.com/

Combinatorics (Fall 2010)/Problem set 6

2010-12-24T01:15:42Z

210.28.131.82: /* Problem 2 */

== Problem 1 ==
<math>2m</math> 个小球，共m种颜色，每种颜色两个球，相同颜色的球不可区分。

取k个球，有多少种取法。

== Problem 2 ==
共有 <math>r\ge 5</math> 种颜色。对于如下的图，有多少个对顶点的着色，使得任意相邻两点不共色。

== Problem 3 ==
<math>\phi(x_1,x_2,\ldots,x_n)</math> 为<math>n</math>个变量的<math>m</math>个子句 (clause) 的 conjunctive normal form (CNF) 逻辑表达式。

证明：对于任何如上的<math>\phi</math>，总存在一个<math>(x_1,x_2,\ldots,x_n)\in\{\text{true},\text{false}\}^n</math>的赋值，满足至少<math>\frac{m}{2}</math>个子句。

== Problem 4 ==
令<math>S_1,\ldots,S_m</math>一系列集合，满足
* 任何 <math>S_i</math> 有至少 <math>k</math> 个元素;
* 任何元素至多属于 <math>k</math> 个集合
证明：<math>S_1,\ldots,S_m</math> 有 system of distinct representatives (SDR)。

== Problem 5 ==
令 <math>\mathcal{F}\subseteq 2^{[n]}</math> 为一个antichain，即 <math>\forall S,T\in\mathcal{F}, S\not\subset T</math>。此外，<math>\forall S\in\mathcal{F}, |S|\le k</math>。

证明：<math>|\mathcal{F}|\le{n\choose k}</math>。

（使用课上教的某定理）

Combinatorics (Fall 2010)/Problem set 6

2010-12-24T01:11:03Z

210.28.131.82: /* Problem 5 */

== Problem 1 ==
<math>2m</math> 个小球，共m种颜色，每种颜色两个球，相同颜色的球不可区分。

取k个球，有多少种取法。

== Problem 2 ==
<math>[n]</math> 的排列 <math>\pi</math> 的一个不动点是满足 <math>\pi(i)=i</math> 的<math>i</math>。

给出有不多于一个不动点的排列的数量（不必给出闭合形式）。

== Problem 3 ==
<math>\phi(x_1,x_2,\ldots,x_n)</math> 为<math>n</math>个变量的<math>m</math>个子句 (clause) 的 conjunctive normal form (CNF) 逻辑表达式。

证明：对于任何如上的<math>\phi</math>，总存在一个<math>(x_1,x_2,\ldots,x_n)\in\{\text{true},\text{false}\}^n</math>的赋值，满足至少<math>\frac{m}{2}</math>个子句。

== Problem 4 ==
令<math>S_1,\ldots,S_m</math>一系列集合，满足
* 任何 <math>S_i</math> 有至少 <math>k</math> 个元素;
* 任何元素至多属于 <math>k</math> 个集合
证明：<math>S_1,\ldots,S_m</math> 有 system of distinct representatives (SDR)。

== Problem 5 ==
令 <math>\mathcal{F}\subseteq 2^{[n]}</math> 为一个antichain，即 <math>\forall S,T\in\mathcal{F}, S\not\subset T</math>。此外，<math>\forall S\in\mathcal{F}, |S|\le k</math>。

证明：<math>|\mathcal{F}|\le{n\choose k}</math>。

（使用课上教的某定理）

Combinatorics (Fall 2010)/Problem set 6

2010-12-24T01:00:31Z

210.28.131.82: /* Problem 4 */

Talk:Combinatorics (Fall 2010)/Problem set 4

2010-11-19T14:14:45Z

210.28.131.82: /* Typos */

=Typos =
#<math>n \geq N(n)</math> should be <math>n \geq N(k)</math>
=Suggestion=
#Tournament Graph is <math>K_n</math>
by 詹宇森

Talk:Combinatorics (Fall 2010)/Extremal set theory II

2010-11-15T14:05:03Z

210.28.131.82:

=Heredity应该在down-shift前面提一下=
down-shift之前因该讲Heredity啊！要不然down-shift太突兀了
by 詹宇森

Talk:Combinatorics (Fall 2010)/Extremal set theory

2010-11-15T14:03:22Z

210.28.131.82: /* Typos */

=Typos=
#Proof of Erdős-Ko-Rado theorem <math>\frac{(n-1)|}{(k-1)!(n-k)!}</math>应为<math>\frac{(n-1)!}{(k-1)!(n-k)!}</math>
by 詹宇森
----
:Thanks! Correct it. 下次留言可以签个名字 --etone

Talk:Combinatorics (Fall 2010)/Extremal set theory

2010-11-07T06:59:39Z

210.28.131.82: /* Typos */

=Typos=
#Proof of Erdős-Ko-Rado theorem <math>\frac{(n-1)|}{(k-1)!(n-k)!}</math>应为<math>\frac{(n-1)!}{(k-1)!(n-k)!}</math>

Talk:Combinatorics (Fall 2010)/Extremal set theory

2010-11-07T06:58:27Z

210.28.131.82: Created page with '=Typos= #Proof of Erdős-Ko-Rado theorem <math>(n-1)|</math>应为<math>(n-1)!</math>'

=Typos=
#Proof of Erdős-Ko-Rado theorem <math>(n-1)|</math>应为<math>(n-1)!</math>

Combinatorics (Fall 2010)/Existence, the probabilistic method

2010-09-20T13:50:21Z

210.28.131.82: /* Linearity of expectation */

== Counting arguments ==
;Circuit complexity

This is a fundamental problem in in Computer Science.

A '''boolean function''' is a function is the form <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Formally, a boolean circuit is a directed acyclic graph. Nodes with indegree zero are input nodes, labeled <math>x_1, x_2, \ldots , x_n</math>. A circuit has a unique node with outdegree zero, called the output node. Every other node is a gate. There are three types of gates: AND, OR (both with indegree two), and NOT (with indegree one).

Computations in Turing machines can be simulated by circuits, and any boolean function in '''P''' can be computed by a circuit with polynomially many gates. Thus, if we can find a function in '''NP''' that cannot be computed by any circuit with polynomially many gates, then '''NP'''<math>\neq</math>'''P'''.

The following theorem due to Shannon says that functions with exponentially large circuit complexity do exist.

{{Theorem
|Theorem (Shannon 1949)|
:There is a boolean function <math>f:\{0,1\}^n\rightarrow \{0,1\}</math> with circuit complexity greater than <math>\frac{2^n}{3n}</math>.
}}
{{Proof|
We first count the number of boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>. There are <math>2^{2^n}</math> boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Then we count the number of boolean circuit with fixed number of gates.
Fix an integer <math>t</math>, we count the number of circuits with <math>t</math> gates. By the [http://en.wikipedia.org/wiki/De_Morgan's_laws De Morgan's laws], we can assume that all NOTs are pushed back to the inputs. Each gate has one of the two types (AND or OR), and has two inputs. Each of the inputs to a gate is either a constant 0 or 1, an input variable <math>x_i</math>, an inverted input variable <math>\neg x_i</math>, or the output of another gate; thus, there are at most <math>2+2n+t-1</math> possible gate inputs. It follows that the number of circuits with <math>t</math> gates is at most <math>2^t(t+2n+1)^{2t}</math>.

If <math>t=2^n/3n</math>, then
:<math>
\frac{2^t(t+2n+1)^{2t}}{2^{2^n}}=o(1)<1,</math> thus, <math>2^t(t+2n+1)^{2t} < 2^{2^n}.</math>

Each boolean circuit computes one boolean function. Therefore, there must exist a boolean function <math>f</math> which cannot be computed by any circuits with <math>2^n/3n</math> gates.
}}

Note that by Shannon's theorem, not only there exists a boolean function with exponentially large circuit complexity, but ''almost all'' boolean functions have exponentially large circuit complexity.

=== Double counting ===
;Handshaking lemma
{{Theorem|Handshaking Lemma|
:At a party, the number of guests who shake hands an odd number of times is even.
}}

We model this scenario as an undirected graph <math>G(V,E)</math> with <math>|V|=n</math> standing for the <math>n</math> guests. There is an edge <math>uv\in E</math> if <math>u</math> and <math>v</math> shake hands. Let <math>d(v)</math> be the degree of vertex <math>v</math>, which represents the number of times that <math>v</math> shakes hand. The handshaking lemma states that in any undirected graph, the sum of odd degrees is even.

The handshaking lemma is a direct consequence of the following lemma, which is proved by Euler in a 1736 paper that began the study of graph theory.

{{Theorem|Lemma (Euler 1736)|
:<math>\sum_{v\in V}d(v)=2|E|</math>
}}
{{Proof|
We count the number of '''directed''' edges. A directed edge is an ordered pair <math>(u,v)</math> such that <math>\{u,v\}\in E</math>. There are two ways to count the directed edges.

First, we can enumerate by edges. Pick every edge <math>uv\in E</math> and apply two directions <math>(u,v)</math> and <math>(v,u)</math> to the edge. This gives us <math>2|E|</math> directed edges.

On the other hand, we can enumerate by vertices. Pick every vertex <math>v\in V</math> and for each of its <math>d(v)</math> neighbors, say <math>u</math>, generate a directed edge <math>(v,u)</math>. This gives us <math>\sum_{v\in V}d(v)</math> directed edges.

It is obvious that the two terms are equal, since we just count the same thing twice with different methods. The lemma follows.
}}

The handshaking lemma is implied directly by the above lemma, since the sum of even degrees is even.

;Cayley's formula
{{Theorem|Caylay's formula for trees|
: There are <math>n^{n-2}</math> different trees on <math>n</math> distinct vertices.
}}

== The Pigeonhole Principle ==

=== Monotonic subsequences ===

{{Theorem|Theorem (Erdős-Szekeres 1935)|
:A sequence of more than <math>mn</math> different real numbers must contain either an increasing subsequence of length <math>m+1</math>, or a decreasing subsequence of length <math>n+1</math>.
}}
{{Proof|(due to Seidenberg 1959)
}}

=== Dirichlet's theorem ===

{{Theorem|Theorem (Dirichlet 1879)|
:Let <math>x</math> be a real number. For any natural number <math>n</math>, there is a rational number <math>\frac{p}{q}</math> such that <math>1\le q\le n</math> and
::<math>\left|x-\frac{p}{q}\right|<\frac{1}{nq}</math>.
}}

== The Probabilistic Method ==
The probabilistic method provides another way of proving the existence of objects: instead of explicitly constructing an object, we define a probability space of objects in which the probability is positive that a randomly selected object has the required property.

The basic principle of the probabilistic method is very simple, and can be stated in intuitive ways:
*If an object chosen randomly from a universe satisfies a property with positive probability, then there must be an object in the universe that satisfies that property.
:For example, for a ball(the object) randomly chosen from a box(the universe) of balls, if the probability that the chosen ball is blue(the property) is >0, then there must be a blue ball in the box.
*Any random variable assumes at least one value that is no smaller than its expectation, and at least one value that is no greater than the expectation.
:For example, if we know the average height of the students in the class is <math>\ell</math>, then we know there is a students whose height is at least <math>\ell</math>, and there is a student whose height is at most <math>\ell</math>.

Although the idea of the probabilistic method is simple, it provides us a powerful tool for existential proof.

===Ramsey number===

Recall the Ramsey theorem which states that in a meeting of at least six people, there are either three people knowing each other or three people not knowing each other. In graph theoretical terms, this means that no matter how we color the edges of <math>K_6</math> (the complete graph on six vertices), there must be a '''monochromatic''' <math>K_3</math> (a triangle whose edges have the same color).

Generally, the '''Ramsey number''' <math>R(k,\ell)</math> is the smallest integer <math>n</math> such that in any two-coloring of the edges of a complete graph on <math>n</math> vertices <math>K_n</math> by red and blue, either there is a red <math>K_k</math> or there is a blue <math>K_\ell</math>.

Ramsey showed in 1929 that <math>R(k,\ell)</math> is finite for any <math>k</math> and <math>\ell</math>. It is extremely hard to compute the exact value of <math>R(k,\ell)</math>. Here we give a lower bound of <math>R(k,k)</math> by the probabilistic method.

{{Theorem
|Theorem (Erdős 1947)|
:If <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math> then it is possible to color the edges of <math>K_n</math> with two colors so that there is no monochromatic <math>K_k</math> subgraph.
}}
{{Proof| Consider a random two-coloring of edges of <math>K_n</math> obtained as follows:
* For each edge of <math>K_n</math>, independently flip a fair coin to decide the color of the edge.

For any fixed set <math>S</math> of <math>k</math> vertices, let <math>\mathcal{E}_S</math> be the event that the <math>K_k</math> subgraph induced by <math>S</math> is monochromatic. There are <math>{k\choose 2}</math> many edges in <math>K_k</math>, therefore
:<math>\Pr[\mathcal{E}_S]=2\cdot 2^{-{k\choose 2}}=2^{1-{k\choose 2}}.</math>

Since there are <math>{n\choose k}</math> possible choices of <math>S</math>, by the union bound
:<math>
\Pr[\exists S, \mathcal{E}_S]\le {n\choose k}\cdot\Pr[\mathcal{E}_S]={n\choose k}\cdot 2^{1-{k\choose 2}}.
</math>
Due to the assumption, <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math>, thus there exists a two coloring that none of <math>\mathcal{E}_S</math> occurs, which means there is no monochromatic <math>K_k</math> subgraph.
}}

For <math>k\ge 3</math> and we take <math>n=\lfloor2^{k/2}\rfloor</math>, then
:<math>
\begin{align}
{n\choose k}\cdot 2^{1-{k\choose 2}}
&<
\frac{n^k}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&\le
\frac{2^{k^2/2}}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&=
\frac{2^{1+\frac{k}{2}}}{k!}\\
&<1.
\end{align}
</math>
By the above theorem, there exists a two-coloring of <math>K_n</math> that there is no monochromatic <math>K_k</math>. Therefore, the Ramsey number <math>R(k,k)>\lfloor2^{k/2}\rfloor</math> for all <math>k\ge 3</math>.

Note that for sufficiently large <math>k</math>, if <math>n= \lfloor 2^{k/2}\rfloor</math>, then the probability that there exists a monochromatic <math>K_k</math> is bounded by
:<math>
{n\choose k}\cdot 2^{1-{k\choose 2}}
<
\frac{2^{1+\frac{k}{2}}}{k!}
\ll 1,
</math>
which means that a random two-coloring of <math>K_n</math> is very likely not to contain a monochromatic <math>K_{2\log n}</math>. This gives us a very simple randomized algorithm for finding a two-coloring of <math>K_n</math> without monochromatic <math>K_{2\log n}</math>.

===Tournament===
A '''[http://en.wikipedia.org/wiki/Tournament_(graph_theory) tournament]''' (竞赛图) on a set <math>V</math> of <math>n</math> players is an '''orientation''' of the edges of the complete graph on the set of vertices <math>V</math>. Thus for every two distinct vertices <math>u,v</math> in <math>V</math>, either <math>(u,v)\in E</math> or <math>(v,u)\in E</math>, but not both.

We can think of the set <math>V</math> as a set of <math>n</math> players in which each pair participates in a single match, where <math>(u,v)</math> is in the tournament iff player <math>u</math> beats player <math>v</math>.

{{Theorem|Definition|
:We say that a tournament has '''property <math>S_k</math>''' if for every set of <math>k</math> players there is one who beats them all.
}}

Is it true for every finite <math>k</math>, there is a tournament (on more than <math>k</math> vertices, of course) with the property <math>S_k</math>? This problem was first raised by Schütte, and as shown by Erdős, can be solved almost trivially by the probabilistic method.

{{Theorem|Theorem (Erdős 1963)|
:If <math>{n\choose k}\left(1-2^{-k}\right)^{n-k}<1</math> then there is a tournament on <math>n</math> vertices that has the property <math>S_k</math>.
}}

=== Linearity of expectation ===
;Hamiltonian paths
The following result of Szele in 1943 is often considered the first use of the probabilistic method.
{{Theorem|Theorem (Szele 1943)|
:There is a tournament on <math>n</math> players with at least <math>n!2^{-(n-1)}</math> Hamiltonian paths.
}}

;Maximum cut

Given an undirected graph <math>G(V,E)</math>, a set <math>C</math> of edges of <math>G</math> is called a '''cut''' if <math>G</math> is disconnected after removing the edges in <math>C</math>. We can represent a cut by <math>c(S,T)</math> where <math>(S,T)</math> is a bipartition of the vertex set <math>V</math>, and <math>c(S,T)=\{uv\in E\mid u\in S,v\in T\}</math> is the set of edges crossing between <math>S</math> and <math>T</math>.

We have seen how to compute min-cut: either by deterministic max-flow algorithm, or by Karger's randomized algorithm. On the other hand, max-cut is hard to compute, because it is '''NP-complete'''. Actually, the weighted version of max-cut is among the [http://en.wikipedia.org/wiki/Karp's_21_NP-complete_problems Karp's 21 NP-complete problems].

We now show by the probabilistic method that a max-cut always has at least half the edges.

{{Theorem
|Theorem|
:Given an undirected graph <math>G</math> with <math>n</math> vertices and <math>m</math> edges, there is a cut of size at least <math>\frac{m}{2}</math>.
}}
{{Proof| Enumerate the vertices in an arbitrary order. Partition the vertex set <math>V</math> into two disjoint sets <math>S</math> and <math>T</math> as follows.
:For each vertex <math>v\in V</math>,
:* independently choose one of <math>S</math> and <math>T</math> with equal probability, and let <math>v</math> join the chosen set.

For each vertex <math>v\in V</math>, let <math>X_v\in\{S,T\}</math> be the random variable which represents the set that <math>v</math> joins. For each edge <math>uv\in E</math>, let <math>Y_{uv}</math> be the 0-1 random variable which indicates whether <math>uv</math> crosses between <math>S</math> and <math>T</math>. Clearly,
:<math>
\Pr[Y_{uv}=1]=\Pr[X_u\neq X_v]=\frac{1}{2}.
</math>

The size of <math>c(S,T)</math> is given by <math>Y=\sum_{uv\in E}Y_{uv}</math>. By the linearity of expectation,
:<math>
\mathbf{E}[Y]=\sum_{uv\in E}\mathbf{E}[Y_{uv}]=\sum_{uv\in E}\Pr[Y_{uv}=1]=\frac{m}{2}.
</math>
Therefore, there exist a bipartition <math>(S,T)</math> of <math>V</math> such that <math>|c(S,T)|\ge\frac{m}{2}</math>, i.e. there exists a cut of <math>G</math> which contains at least <math>\frac{m}{2}</math> edges.
}}

=== Independent sets ===
An independent set of a graph is a set of vertices with no edges between them. The following theorem gives a lower bound on the size of the largest independent set.
{{Theorem
|Theorem|
:Let <math>G(V,E)</math> be a graph on <math>n</math> vertices with <math>m</math> edges. Then <math>G</math> has an independent set with at least <math>\frac{n^2}{4m}</math> vertices.
}}
{{Proof| Let <math>S</math> be a set of vertices constructed as follows:
:For each vertex <math>v\in V</math>:
:* <math>v</math> is included in <math>S</math> independently with probability <math>p</math>,
<math>p</math> to be determined.

Let <math>X=|S|</math>. It is obvious that <math>\mathbf{E}[X]=np</math>.

For each edge <math>e\in E</math>, let <math>Y_{e}</math> be the random variable which indicates whether both endpoints of <math></math> are in <math>S</math>.
:<math>
\mathbf{E}[Y_{uv}]=\Pr[u\in S\wedge v\in S]=p^2.
</math>
Let <math>Y</math> be the number of edges in the subgraph of <math>G</math> induced by <math>S</math>. It holds that <math>Y=\sum_{e\in E}Y_e</math>. By linearity of expectation,
:<math>\mathbf{E}[Y]=\sum_{e\in E}\mathbf{E}[Y_e]=mp^2</math>.

Note that although <math>S</math> is not necessary an independent set, it can be modified to one if for each edge <math>e</math> of the induced subgraph <math>G(S)</math>, we delete one of the endpoint of <math>e</math> from <math>S</math>. Let <math>S^*</math> be the resulting set. It is obvious that <math>S^*</math> is an independent set since there is no edge left in the induced subgraph <math>G(S^*)</math>.

Since there are <math>Y</math> edges in <math>G(S)</math>, there are at most <math>Y</math> vertices in <math>S</math> are deleted to make it become <math>S^*</math>. Therefore, <math>|S^*|\ge X-Y</math>. By linearity of expectation,
:<math>
\mathbf{E}[|S^*|]\ge\mathbf{E}[X-Y]=\mathbf{E}[X]-\mathbf{E}[Y]=np-mp^2.
</math>
The expectation is maximized when <math>p=\frac{n}{2m}</math>, thus
:<math>
\mathbf{E}[|S^*|]\ge n\cdot\frac{n}{2m}-m\left(\frac{n}{2m}\right)^2=\frac{n^2}{4m}.
</math>
There exists an independent set which contains at least <math>\frac{n^2}{4m}</math> vertices.
}}

Combinatorics (Fall 2010)/Existence, the probabilistic method

2010-09-20T13:32:24Z

210.28.131.82: /* Independent sets */

== Counting arguments ==
;Circuit complexity

This is a fundamental problem in in Computer Science.

A '''boolean function''' is a function is the form <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Formally, a boolean circuit is a directed acyclic graph. Nodes with indegree zero are input nodes, labeled <math>x_1, x_2, \ldots , x_n</math>. A circuit has a unique node with outdegree zero, called the output node. Every other node is a gate. There are three types of gates: AND, OR (both with indegree two), and NOT (with indegree one).

Computations in Turing machines can be simulated by circuits, and any boolean function in '''P''' can be computed by a circuit with polynomially many gates. Thus, if we can find a function in '''NP''' that cannot be computed by any circuit with polynomially many gates, then '''NP'''<math>\neq</math>'''P'''.

The following theorem due to Shannon says that functions with exponentially large circuit complexity do exist.

{{Theorem
|Theorem (Shannon 1949)|
:There is a boolean function <math>f:\{0,1\}^n\rightarrow \{0,1\}</math> with circuit complexity greater than <math>\frac{2^n}{3n}</math>.
}}
{{Proof|
We first count the number of boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>. There are <math>2^{2^n}</math> boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Then we count the number of boolean circuit with fixed number of gates.
Fix an integer <math>t</math>, we count the number of circuits with <math>t</math> gates. By the [http://en.wikipedia.org/wiki/De_Morgan's_laws De Morgan's laws], we can assume that all NOTs are pushed back to the inputs. Each gate has one of the two types (AND or OR), and has two inputs. Each of the inputs to a gate is either a constant 0 or 1, an input variable <math>x_i</math>, an inverted input variable <math>\neg x_i</math>, or the output of another gate; thus, there are at most <math>2+2n+t-1</math> possible gate inputs. It follows that the number of circuits with <math>t</math> gates is at most <math>2^t(t+2n+1)^{2t}</math>.

If <math>t=2^n/3n</math>, then
:<math>
\frac{2^t(t+2n+1)^{2t}}{2^{2^n}}=o(1)<1,</math> thus, <math>2^t(t+2n+1)^{2t} < 2^{2^n}.</math>

Each boolean circuit computes one boolean function. Therefore, there must exist a boolean function <math>f</math> which cannot be computed by any circuits with <math>2^n/3n</math> gates.
}}

Note that by Shannon's theorem, not only there exists a boolean function with exponentially large circuit complexity, but ''almost all'' boolean functions have exponentially large circuit complexity.

=== Double counting ===
;Handshaking lemma
{{Theorem|Handshaking Lemma|
:At a party, the number of guests who shake hands an odd number of times is even.
}}

We model this scenario as an undirected graph <math>G(V,E)</math> with <math>|V|=n</math> standing for the <math>n</math> guests. There is an edge <math>uv\in E</math> if <math>u</math> and <math>v</math> shake hands. Let <math>d(v)</math> be the degree of vertex <math>v</math>, which represents the number of times that <math>v</math> shakes hand. The handshaking lemma states that in any undirected graph, the sum of odd degrees is even.

The handshaking lemma is a direct consequence of the following lemma, which is proved by Euler in a 1736 paper that began the study of graph theory.

{{Theorem|Lemma (Euler 1736)|
:<math>\sum_{v\in V}d(v)=2|E|</math>
}}
{{Proof|
We count the number of '''directed''' edges. A directed edge is an ordered pair <math>(u,v)</math> such that <math>\{u,v\}\in E</math>. There are two ways to count the directed edges.

First, we can enumerate by edges. Pick every edge <math>uv\in E</math> and apply two directions <math>(u,v)</math> and <math>(v,u)</math> to the edge. This gives us <math>2|E|</math> directed edges.

On the other hand, we can enumerate by vertices. Pick every vertex <math>v\in V</math> and for each of its <math>d(v)</math> neighbors, say <math>u</math>, generate a directed edge <math>(v,u)</math>. This gives us <math>\sum_{v\in V}d(v)</math> directed edges.

It is obvious that the two terms are equal, since we just count the same thing twice with different methods. The lemma follows.
}}

The handshaking lemma is implied directly by the above lemma, since the sum of even degrees is even.

;Cayley's formula
{{Theorem|Caylay's formula for trees|
: There are <math>n^{n-2}</math> different trees on <math>n</math> distinct vertices.
}}

== The Pigeonhole Principle ==

=== Monotonic subsequences ===

{{Theorem|Theorem (Erdős-Szekeres 1935)|
:A sequence of more than <math>mn</math> different real numbers must contain either an increasing subsequence of length <math>m+1</math>, or a decreasing subsequence of length <math>n+1</math>.
}}
{{Proof|(due to Seidenberg 1959)
}}

=== Dirichlet's theorem ===

{{Theorem|Theorem (Dirichlet 1879)|
:Let <math>x</math> be a real number. For any natural number <math>n</math>, there is a rational number <math>\frac{p}{q}</math> such that <math>1\le q\le n</math> and
::<math>\left|x-\frac{p}{q}\right|<\frac{1}{nq}</math>.
}}

== The Probabilistic Method ==
The probabilistic method provides another way of proving the existence of objects: instead of explicitly constructing an object, we define a probability space of objects in which the probability is positive that a randomly selected object has the required property.

The basic principle of the probabilistic method is very simple, and can be stated in intuitive ways:
*If an object chosen randomly from a universe satisfies a property with positive probability, then there must be an object in the universe that satisfies that property.
:For example, for a ball(the object) randomly chosen from a box(the universe) of balls, if the probability that the chosen ball is blue(the property) is >0, then there must be a blue ball in the box.
*Any random variable assumes at least one value that is no smaller than its expectation, and at least one value that is no greater than the expectation.
:For example, if we know the average height of the students in the class is <math>\ell</math>, then we know there is a students whose height is at least <math>\ell</math>, and there is a student whose height is at most <math>\ell</math>.

Although the idea of the probabilistic method is simple, it provides us a powerful tool for existential proof.

===Ramsey number===

Recall the Ramsey theorem which states that in a meeting of at least six people, there are either three people knowing each other or three people not knowing each other. In graph theoretical terms, this means that no matter how we color the edges of <math>K_6</math> (the complete graph on six vertices), there must be a '''monochromatic''' <math>K_3</math> (a triangle whose edges have the same color).

Generally, the '''Ramsey number''' <math>R(k,\ell)</math> is the smallest integer <math>n</math> such that in any two-coloring of the edges of a complete graph on <math>n</math> vertices <math>K_n</math> by red and blue, either there is a red <math>K_k</math> or there is a blue <math>K_\ell</math>.

Ramsey showed in 1929 that <math>R(k,\ell)</math> is finite for any <math>k</math> and <math>\ell</math>. It is extremely hard to compute the exact value of <math>R(k,\ell)</math>. Here we give a lower bound of <math>R(k,k)</math> by the probabilistic method.

{{Theorem
|Theorem (Erdős 1947)|
:If <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math> then it is possible to color the edges of <math>K_n</math> with two colors so that there is no monochromatic <math>K_k</math> subgraph.
}}
{{Proof| Consider a random two-coloring of edges of <math>K_n</math> obtained as follows:
* For each edge of <math>K_n</math>, independently flip a fair coin to decide the color of the edge.

For any fixed set <math>S</math> of <math>k</math> vertices, let <math>\mathcal{E}_S</math> be the event that the <math>K_k</math> subgraph induced by <math>S</math> is monochromatic. There are <math>{k\choose 2}</math> many edges in <math>K_k</math>, therefore
:<math>\Pr[\mathcal{E}_S]=2\cdot 2^{-{k\choose 2}}=2^{1-{k\choose 2}}.</math>

Since there are <math>{n\choose k}</math> possible choices of <math>S</math>, by the union bound
:<math>
\Pr[\exists S, \mathcal{E}_S]\le {n\choose k}\cdot\Pr[\mathcal{E}_S]={n\choose k}\cdot 2^{1-{k\choose 2}}.
</math>
Due to the assumption, <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math>, thus there exists a two coloring that none of <math>\mathcal{E}_S</math> occurs, which means there is no monochromatic <math>K_k</math> subgraph.
}}

For <math>k\ge 3</math> and we take <math>n=\lfloor2^{k/2}\rfloor</math>, then
:<math>
\begin{align}
{n\choose k}\cdot 2^{1-{k\choose 2}}
&<
\frac{n^k}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&\le
\frac{2^{k^2/2}}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&=
\frac{2^{1+\frac{k}{2}}}{k!}\\
&<1.
\end{align}
</math>
By the above theorem, there exists a two-coloring of <math>K_n</math> that there is no monochromatic <math>K_k</math>. Therefore, the Ramsey number <math>R(k,k)>\lfloor2^{k/2}\rfloor</math> for all <math>k\ge 3</math>.

Note that for sufficiently large <math>k</math>, if <math>n= \lfloor 2^{k/2}\rfloor</math>, then the probability that there exists a monochromatic <math>K_k</math> is bounded by
:<math>
{n\choose k}\cdot 2^{1-{k\choose 2}}
<
\frac{2^{1+\frac{k}{2}}}{k!}
\ll 1,
</math>
which means that a random two-coloring of <math>K_n</math> is very likely not to contain a monochromatic <math>K_{2\log n}</math>. This gives us a very simple randomized algorithm for finding a two-coloring of <math>K_n</math> without monochromatic <math>K_{2\log n}</math>.

===Tournament===
A '''[http://en.wikipedia.org/wiki/Tournament_(graph_theory) tournament]''' (竞赛图) on a set <math>V</math> of <math>n</math> players is an '''orientation''' of the edges of the complete graph on the set of vertices <math>V</math>. Thus for every two distinct vertices <math>u,v</math> in <math>V</math>, either <math>(u,v)\in E</math> or <math>(v,u)\in E</math>, but not both.

We can think of the set <math>V</math> as a set of <math>n</math> players in which each pair participates in a single match, where <math>(u,v)</math> is in the tournament iff player <math>u</math> beats player <math>v</math>.

{{Theorem|Definition|
:We say that a tournament has '''property <math>S_k</math>''' if for every set of <math>k</math> players there is one who beats them all.
}}

Is it true for every finite <math>k</math>, there is a tournament (on more than <math>k</math> vertices, of course) with the property <math>S_k</math>? This problem was first raised by Schütte, and as shown by Erdős, can be solved almost trivially by the probabilistic method.

{{Theorem|Theorem (Erdős 1963)|
:If <math>{n\choose k}\left(1-2^{-k}\right)^{n-k}<1</math> then there is a tournament on <math>n</math> vertices that has the property <math>S_k</math>.
}}

=== Linearity of expectation ===

;Maximum cut

Given an undirected graph <math>G(V,E)</math>, a set <math>C</math> of edges of <math>G</math> is called a '''cut''' if <math>G</math> is disconnected after removing the edges in <math>C</math>. We can represent a cut by <math>c(S,T)</math> where <math>(S,T)</math> is a bipartition of the vertex set <math>V</math>, and <math>c(S,T)=\{uv\in E\mid u\in S,v\in T\}</math> is the set of edges crossing between <math>S</math> and <math>T</math>.

We have seen how to compute min-cut: either by deterministic max-flow algorithm, or by Karger's randomized algorithm. On the other hand, max-cut is hard to compute, because it is '''NP-complete'''. Actually, the weighted version of max-cut is among the [http://en.wikipedia.org/wiki/Karp's_21_NP-complete_problems Karp's 21 NP-complete problems].

We now show by the probabilistic method that a max-cut always has at least half the edges.

{{Theorem
|Theorem|
:Given an undirected graph <math>G</math> with <math>n</math> vertices and <math>m</math> edges, there is a cut of size at least <math>\frac{m}{2}</math>.
}}
{{Proof| Enumerate the vertices in an arbitrary order. Partition the vertex set <math>V</math> into two disjoint sets <math>S</math> and <math>T</math> as follows.
:For each vertex <math>v\in V</math>,
:* independently choose one of <math>S</math> and <math>T</math> with equal probability, and let <math>v</math> join the chosen set.

For each vertex <math>v\in V</math>, let <math>X_v\in\{S,T\}</math> be the random variable which represents the set that <math>v</math> joins. For each edge <math>uv\in E</math>, let <math>Y_{uv}</math> be the 0-1 random variable which indicates whether <math>uv</math> crosses between <math>S</math> and <math>T</math>. Clearly,
:<math>
\Pr[Y_{uv}=1]=\Pr[X_u\neq X_v]=\frac{1}{2}.
</math>

The size of <math>c(S,T)</math> is given by <math>Y=\sum_{uv\in E}Y_{uv}</math>. By the linearity of expectation,
:<math>
\mathbf{E}[Y]=\sum_{uv\in E}\mathbf{E}[Y_{uv}]=\sum_{uv\in E}\Pr[Y_{uv}=1]=\frac{m}{2}.
</math>
Therefore, there exist a bipartition <math>(S,T)</math> of <math>V</math> such that <math>|c(S,T)|\ge\frac{m}{2}</math>, i.e. there exists a cut of <math>G</math> which contains at least <math>\frac{m}{2}</math> edges.
}}

=== Independent sets ===
An independent set of a graph is a set of vertices with no edges between them. The following theorem gives a lower bound on the size of the largest independent set.
{{Theorem
|Theorem|
:Let <math>G(V,E)</math> be a graph on <math>n</math> vertices with <math>m</math> edges. Then <math>G</math> has an independent set with at least <math>\frac{n^2}{4m}</math> vertices.
}}
{{Proof| Let <math>S</math> be a set of vertices constructed as follows:
:For each vertex <math>v\in V</math>:
:* <math>v</math> is included in <math>S</math> independently with probability <math>p</math>,
<math>p</math> to be determined.

Let <math>X=|S|</math>. It is obvious that <math>\mathbf{E}[X]=np</math>.

For each edge <math>e\in E</math>, let <math>Y_{e}</math> be the random variable which indicates whether both endpoints of <math></math> are in <math>S</math>.
:<math>
\mathbf{E}[Y_{uv}]=\Pr[u\in S\wedge v\in S]=p^2.
</math>
Let <math>Y</math> be the number of edges in the subgraph of <math>G</math> induced by <math>S</math>. It holds that <math>Y=\sum_{e\in E}Y_e</math>. By linearity of expectation,
:<math>\mathbf{E}[Y]=\sum_{e\in E}\mathbf{E}[Y_e]=mp^2</math>.

Note that although <math>S</math> is not necessary an independent set, it can be modified to one if for each edge <math>e</math> of the induced subgraph <math>G(S)</math>, we delete one of the endpoint of <math>e</math> from <math>S</math>. Let <math>S^*</math> be the resulting set. It is obvious that <math>S^*</math> is an independent set since there is no edge left in the induced subgraph <math>G(S^*)</math>.

Since there are <math>Y</math> edges in <math>G(S)</math>, there are at most <math>Y</math> vertices in <math>S</math> are deleted to make it become <math>S^*</math>. Therefore, <math>|S^*|\ge X-Y</math>. By linearity of expectation,
:<math>
\mathbf{E}[|S^*|]\ge\mathbf{E}[X-Y]=\mathbf{E}[X]-\mathbf{E}[Y]=np-mp^2.
</math>
The expectation is maximized when <math>p=\frac{n}{2m}</math>, thus
:<math>
\mathbf{E}[|S^*|]\ge n\cdot\frac{n}{2m}-m\left(\frac{n}{2m}\right)^2=\frac{n^2}{4m}.
</math>
There exists an independent set which contains at least <math>\frac{n^2}{4m}</math> vertices.
}}

Combinatorics (Fall 2010)/Existence, the probabilistic method

2010-09-20T13:25:59Z

210.28.131.82: /* Alterations */

== Counting arguments ==
;Circuit complexity

This is a fundamental problem in in Computer Science.

A '''boolean function''' is a function is the form <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Formally, a boolean circuit is a directed acyclic graph. Nodes with indegree zero are input nodes, labeled <math>x_1, x_2, \ldots , x_n</math>. A circuit has a unique node with outdegree zero, called the output node. Every other node is a gate. There are three types of gates: AND, OR (both with indegree two), and NOT (with indegree one).

Computations in Turing machines can be simulated by circuits, and any boolean function in '''P''' can be computed by a circuit with polynomially many gates. Thus, if we can find a function in '''NP''' that cannot be computed by any circuit with polynomially many gates, then '''NP'''<math>\neq</math>'''P'''.

The following theorem due to Shannon says that functions with exponentially large circuit complexity do exist.

{{Theorem
|Theorem (Shannon 1949)|
:There is a boolean function <math>f:\{0,1\}^n\rightarrow \{0,1\}</math> with circuit complexity greater than <math>\frac{2^n}{3n}</math>.
}}
{{Proof|
We first count the number of boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>. There are <math>2^{2^n}</math> boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Then we count the number of boolean circuit with fixed number of gates.
Fix an integer <math>t</math>, we count the number of circuits with <math>t</math> gates. By the [http://en.wikipedia.org/wiki/De_Morgan's_laws De Morgan's laws], we can assume that all NOTs are pushed back to the inputs. Each gate has one of the two types (AND or OR), and has two inputs. Each of the inputs to a gate is either a constant 0 or 1, an input variable <math>x_i</math>, an inverted input variable <math>\neg x_i</math>, or the output of another gate; thus, there are at most <math>2+2n+t-1</math> possible gate inputs. It follows that the number of circuits with <math>t</math> gates is at most <math>2^t(t+2n+1)^{2t}</math>.

If <math>t=2^n/3n</math>, then
:<math>
\frac{2^t(t+2n+1)^{2t}}{2^{2^n}}=o(1)<1,</math> thus, <math>2^t(t+2n+1)^{2t} < 2^{2^n}.</math>

Each boolean circuit computes one boolean function. Therefore, there must exist a boolean function <math>f</math> which cannot be computed by any circuits with <math>2^n/3n</math> gates.
}}

Note that by Shannon's theorem, not only there exists a boolean function with exponentially large circuit complexity, but ''almost all'' boolean functions have exponentially large circuit complexity.

=== Double counting ===
;Handshaking lemma
{{Theorem|Handshaking Lemma|
:At a party, the number of guests who shake hands an odd number of times is even.
}}

We model this scenario as an undirected graph <math>G(V,E)</math> with <math>|V|=n</math> standing for the <math>n</math> guests. There is an edge <math>uv\in E</math> if <math>u</math> and <math>v</math> shake hands. Let <math>d(v)</math> be the degree of vertex <math>v</math>, which represents the number of times that <math>v</math> shakes hand. The handshaking lemma states that in any undirected graph, the sum of odd degrees is even.

The handshaking lemma is a direct consequence of the following lemma, which is proved by Euler in a 1736 paper that began the study of graph theory.

{{Theorem|Lemma (Euler 1736)|
:<math>\sum_{v\in V}d(v)=2|E|</math>
}}
{{Proof|
We count the number of '''directed''' edges. A directed edge is an ordered pair <math>(u,v)</math> such that <math>\{u,v\}\in E</math>. There are two ways to count the directed edges.

First, we can enumerate by edges. Pick every edge <math>uv\in E</math> and apply two directions <math>(u,v)</math> and <math>(v,u)</math> to the edge. This gives us <math>2|E|</math> directed edges.

On the other hand, we can enumerate by vertices. Pick every vertex <math>v\in V</math> and for each of its <math>d(v)</math> neighbors, say <math>u</math>, generate a directed edge <math>(v,u)</math>. This gives us <math>\sum_{v\in V}d(v)</math> directed edges.

It is obvious that the two terms are equal, since we just count the same thing twice with different methods. The lemma follows.
}}

The handshaking lemma is implied directly by the above lemma, since the sum of even degrees is even.

;Cayley's formula
{{Theorem|Caylay's formula for trees|
: There are <math>n^{n-2}</math> different trees on <math>n</math> distinct vertices.
}}

== The Pigeonhole Principle ==

=== Monotonic subsequences ===

{{Theorem|Theorem (Erdős-Szekeres 1935)|
:A sequence of more than <math>mn</math> different real numbers must contain either an increasing subsequence of length <math>m+1</math>, or a decreasing subsequence of length <math>n+1</math>.
}}
{{Proof|(due to Seidenberg 1959)
}}

=== Dirichlet's theorem ===

{{Theorem|Theorem (Dirichlet 1879)|
:Let <math>x</math> be a real number. For any natural number <math>n</math>, there is a rational number <math>\frac{p}{q}</math> such that <math>1\le q\le n</math> and
::<math>\left|x-\frac{p}{q}\right|<\frac{1}{nq}</math>.
}}

== The Probabilistic Method ==
The probabilistic method provides another way of proving the existence of objects: instead of explicitly constructing an object, we define a probability space of objects in which the probability is positive that a randomly selected object has the required property.

The basic principle of the probabilistic method is very simple, and can be stated in intuitive ways:
*If an object chosen randomly from a universe satisfies a property with positive probability, then there must be an object in the universe that satisfies that property.
:For example, for a ball(the object) randomly chosen from a box(the universe) of balls, if the probability that the chosen ball is blue(the property) is >0, then there must be a blue ball in the box.
*Any random variable assumes at least one value that is no smaller than its expectation, and at least one value that is no greater than the expectation.
:For example, if we know the average height of the students in the class is <math>\ell</math>, then we know there is a students whose height is at least <math>\ell</math>, and there is a student whose height is at most <math>\ell</math>.

Although the idea of the probabilistic method is simple, it provides us a powerful tool for existential proof.

===Ramsey number===

Recall the Ramsey theorem which states that in a meeting of at least six people, there are either three people knowing each other or three people not knowing each other. In graph theoretical terms, this means that no matter how we color the edges of <math>K_6</math> (the complete graph on six vertices), there must be a '''monochromatic''' <math>K_3</math> (a triangle whose edges have the same color).

Generally, the '''Ramsey number''' <math>R(k,\ell)</math> is the smallest integer <math>n</math> such that in any two-coloring of the edges of a complete graph on <math>n</math> vertices <math>K_n</math> by red and blue, either there is a red <math>K_k</math> or there is a blue <math>K_\ell</math>.

Ramsey showed in 1929 that <math>R(k,\ell)</math> is finite for any <math>k</math> and <math>\ell</math>. It is extremely hard to compute the exact value of <math>R(k,\ell)</math>. Here we give a lower bound of <math>R(k,k)</math> by the probabilistic method.

{{Theorem
|Theorem (Erdős 1947)|
:If <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math> then it is possible to color the edges of <math>K_n</math> with two colors so that there is no monochromatic <math>K_k</math> subgraph.
}}
{{Proof| Consider a random two-coloring of edges of <math>K_n</math> obtained as follows:
* For each edge of <math>K_n</math>, independently flip a fair coin to decide the color of the edge.

For any fixed set <math>S</math> of <math>k</math> vertices, let <math>\mathcal{E}_S</math> be the event that the <math>K_k</math> subgraph induced by <math>S</math> is monochromatic. There are <math>{k\choose 2}</math> many edges in <math>K_k</math>, therefore
:<math>\Pr[\mathcal{E}_S]=2\cdot 2^{-{k\choose 2}}=2^{1-{k\choose 2}}.</math>

Since there are <math>{n\choose k}</math> possible choices of <math>S</math>, by the union bound
:<math>
\Pr[\exists S, \mathcal{E}_S]\le {n\choose k}\cdot\Pr[\mathcal{E}_S]={n\choose k}\cdot 2^{1-{k\choose 2}}.
</math>
Due to the assumption, <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math>, thus there exists a two coloring that none of <math>\mathcal{E}_S</math> occurs, which means there is no monochromatic <math>K_k</math> subgraph.
}}

For <math>k\ge 3</math> and we take <math>n=\lfloor2^{k/2}\rfloor</math>, then
:<math>
\begin{align}
{n\choose k}\cdot 2^{1-{k\choose 2}}
&<
\frac{n^k}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&\le
\frac{2^{k^2/2}}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&=
\frac{2^{1+\frac{k}{2}}}{k!}\\
&<1.
\end{align}
</math>
By the above theorem, there exists a two-coloring of <math>K_n</math> that there is no monochromatic <math>K_k</math>. Therefore, the Ramsey number <math>R(k,k)>\lfloor2^{k/2}\rfloor</math> for all <math>k\ge 3</math>.

Note that for sufficiently large <math>k</math>, if <math>n= \lfloor 2^{k/2}\rfloor</math>, then the probability that there exists a monochromatic <math>K_k</math> is bounded by
:<math>
{n\choose k}\cdot 2^{1-{k\choose 2}}
<
\frac{2^{1+\frac{k}{2}}}{k!}
\ll 1,
</math>
which means that a random two-coloring of <math>K_n</math> is very likely not to contain a monochromatic <math>K_{2\log n}</math>. This gives us a very simple randomized algorithm for finding a two-coloring of <math>K_n</math> without monochromatic <math>K_{2\log n}</math>.

===Tournament===
A '''[http://en.wikipedia.org/wiki/Tournament_(graph_theory) tournament]''' (竞赛图) on a set <math>V</math> of <math>n</math> players is an '''orientation''' of the edges of the complete graph on the set of vertices <math>V</math>. Thus for every two distinct vertices <math>u,v</math> in <math>V</math>, either <math>(u,v)\in E</math> or <math>(v,u)\in E</math>, but not both.

We can think of the set <math>V</math> as a set of <math>n</math> players in which each pair participates in a single match, where <math>(u,v)</math> is in the tournament iff player <math>u</math> beats player <math>v</math>.

{{Theorem|Definition|
:We say that a tournament has '''property <math>S_k</math>''' if for every set of <math>k</math> players there is one who beats them all.
}}

Is it true for every finite <math>k</math>, there is a tournament (on more than <math>k</math> vertices, of course) with the property <math>S_k</math>? This problem was first raised by Schütte, and as shown by Erdős, can be solved almost trivially by the probabilistic method.

{{Theorem|Theorem (Erdős 1963)|
:If <math>{n\choose k}\left(1-2^{-k}\right)^{n-k}<1</math> then there is a tournament on <math>n</math> vertices that has the property <math>S_k</math>.
}}

=== Linearity of expectation ===

;Maximum cut

Given an undirected graph <math>G(V,E)</math>, a set <math>C</math> of edges of <math>G</math> is called a '''cut''' if <math>G</math> is disconnected after removing the edges in <math>C</math>. We can represent a cut by <math>c(S,T)</math> where <math>(S,T)</math> is a bipartition of the vertex set <math>V</math>, and <math>c(S,T)=\{uv\in E\mid u\in S,v\in T\}</math> is the set of edges crossing between <math>S</math> and <math>T</math>.

We have seen how to compute min-cut: either by deterministic max-flow algorithm, or by Karger's randomized algorithm. On the other hand, max-cut is hard to compute, because it is '''NP-complete'''. Actually, the weighted version of max-cut is among the [http://en.wikipedia.org/wiki/Karp's_21_NP-complete_problems Karp's 21 NP-complete problems].

We now show by the probabilistic method that a max-cut always has at least half the edges.

{{Theorem
|Theorem|
:Given an undirected graph <math>G</math> with <math>n</math> vertices and <math>m</math> edges, there is a cut of size at least <math>\frac{m}{2}</math>.
}}
{{Proof| Enumerate the vertices in an arbitrary order. Partition the vertex set <math>V</math> into two disjoint sets <math>S</math> and <math>T</math> as follows.
:For each vertex <math>v\in V</math>,
:* independently choose one of <math>S</math> and <math>T</math> with equal probability, and let <math>v</math> join the chosen set.

For each vertex <math>v\in V</math>, let <math>X_v\in\{S,T\}</math> be the random variable which represents the set that <math>v</math> joins. For each edge <math>uv\in E</math>, let <math>Y_{uv}</math> be the 0-1 random variable which indicates whether <math>uv</math> crosses between <math>S</math> and <math>T</math>. Clearly,
:<math>
\Pr[Y_{uv}=1]=\Pr[X_u\neq X_v]=\frac{1}{2}.
</math>

The size of <math>c(S,T)</math> is given by <math>Y=\sum_{uv\in E}Y_{uv}</math>. By the linearity of expectation,
:<math>
\mathbf{E}[Y]=\sum_{uv\in E}\mathbf{E}[Y_{uv}]=\sum_{uv\in E}\Pr[Y_{uv}=1]=\frac{m}{2}.
</math>
Therefore, there exist a bipartition <math>(S,T)</math> of <math>V</math> such that <math>|c(S,T)|\ge\frac{m}{2}</math>, i.e. there exists a cut of <math>G</math> which contains at least <math>\frac{m}{2}</math> edges.
}}

=== Independent sets ===
An independent set of a graph is a set of vertices with no edges between them. The following theorem gives a lower bound on the size of the largest independent set.
{{Theorem
|Theorem|
:Let <math>G(V,E)</math> be a graph on <math>n</math> vertices with <math>m</math> edges. Then <math>G</math> has an independent set with at least <math>\frac{n^2}{4m}</math> vertices.
}}
{{Proof| Let <math>S</math> be a set of vertices constructed as follows:
:For each vertex <math>v\in V</math>:
:* <math>v</math> is included in <math>S</math> independently with probability <math>p</math>,
<math>p</math> to be determined.

Let <math>X=|S|</math>. It is obvious that <math>\mathbf{E}[X]=np</math>.

For each edge <math>e\in E</math>, let <math>Y_{e}</math> be the random variable which indicates whether both endpoints of <math></math> are in <math>S</math>.
:<math>
\mathbf{E}[Y_{uv}]=\Pr[u\in S\wedge v\in S]=p^2.
</math>
Let <math>Y</math> be the number of edges in the subgraph of <math>G</math> induced by <math>S</math>. It holds that <math>Y=\sum_{e\in E}Y_e</math>. By linearity of expectation,
:<math>\mathbf{E}[Y]=\sum_{e\in E}\mathbf{E}[Y_e]=mp^2</math>.

Note that although <math>S</math> is not necessary an independent set, it can be modified to one if for each edge <math>e</math> of the induced subgraph <math>G(S)</math>, we delete one of the endpoint of <math>e</math> from <math>S</math>. Let <math>S^*</math> be the resulting set. It is obvious that <math>S^*</math> is an independent set since there is no edge left in the induced subgraph <math>G(S^*)</math>.

Since there are <math>Y</math> edges in <math>G(S)</math>, there are at most <math>Y</math> vertices in <math>S</math> are deleted to make it become <math>S^*</math>. Therefore, <math>|S^*|\ge X-Y</math>. By linearity of expectation,
:<math>
\mathbf{E}[|S^*|]\ge\mathbf{E}[X-Y]=\mathbf{E}[X]-\mathbf{E}[Y]=np-mp^2.
</math>
The expectation is maximized when <math>p=\frac{n}{2m}</math>, thus
:<math>
\mathbf{E}[|S^*|]\ge n\cdot\frac{n}{2m}-m\left(\frac{n}{2m}\right)^2=\frac{n^2}{4m}.
</math>
There exists an independent set which contains at least <math>\frac{n^2}{4m}</math> vertices.
}}

The proof actually propose a randomized algorithm for constructing large independent set:

{{Theorem
|Algorithm|
Given a graph on <math>n</math> vertices with <math>m</math> edges, let <math>d=\frac{2m}{n}</math> be the average degree.
#For each vertex <math>v\in V</math>, <math>v</math> is included in <math>S</math> independently with probability <math>\frac{1}{d}</math>.
#For each remaining edge in the induced subgraph <math>G(S)</math>, remove one of the endpoints from <math>S</math>.
}}

Let <math>S^*</math> be the resulting set. We have shown that <math>S^*</math> is an independent set and <math>\mathbf{E}[|S^*|]\ge\frac{n^2}{4m}</math>.

Combinatorics (Fall 2010)/Existence, the probabilistic method

2010-09-20T13:25:42Z

210.28.131.82: /* Linearity of expectation */

== Counting arguments ==
;Circuit complexity

This is a fundamental problem in in Computer Science.

A '''boolean function''' is a function is the form <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Formally, a boolean circuit is a directed acyclic graph. Nodes with indegree zero are input nodes, labeled <math>x_1, x_2, \ldots , x_n</math>. A circuit has a unique node with outdegree zero, called the output node. Every other node is a gate. There are three types of gates: AND, OR (both with indegree two), and NOT (with indegree one).

Computations in Turing machines can be simulated by circuits, and any boolean function in '''P''' can be computed by a circuit with polynomially many gates. Thus, if we can find a function in '''NP''' that cannot be computed by any circuit with polynomially many gates, then '''NP'''<math>\neq</math>'''P'''.

The following theorem due to Shannon says that functions with exponentially large circuit complexity do exist.

{{Theorem
|Theorem (Shannon 1949)|
:There is a boolean function <math>f:\{0,1\}^n\rightarrow \{0,1\}</math> with circuit complexity greater than <math>\frac{2^n}{3n}</math>.
}}
{{Proof|
We first count the number of boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>. There are <math>2^{2^n}</math> boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Then we count the number of boolean circuit with fixed number of gates.
Fix an integer <math>t</math>, we count the number of circuits with <math>t</math> gates. By the [http://en.wikipedia.org/wiki/De_Morgan's_laws De Morgan's laws], we can assume that all NOTs are pushed back to the inputs. Each gate has one of the two types (AND or OR), and has two inputs. Each of the inputs to a gate is either a constant 0 or 1, an input variable <math>x_i</math>, an inverted input variable <math>\neg x_i</math>, or the output of another gate; thus, there are at most <math>2+2n+t-1</math> possible gate inputs. It follows that the number of circuits with <math>t</math> gates is at most <math>2^t(t+2n+1)^{2t}</math>.

If <math>t=2^n/3n</math>, then
:<math>
\frac{2^t(t+2n+1)^{2t}}{2^{2^n}}=o(1)<1,</math> thus, <math>2^t(t+2n+1)^{2t} < 2^{2^n}.</math>

Each boolean circuit computes one boolean function. Therefore, there must exist a boolean function <math>f</math> which cannot be computed by any circuits with <math>2^n/3n</math> gates.
}}

Note that by Shannon's theorem, not only there exists a boolean function with exponentially large circuit complexity, but ''almost all'' boolean functions have exponentially large circuit complexity.

=== Double counting ===
;Handshaking lemma
{{Theorem|Handshaking Lemma|
:At a party, the number of guests who shake hands an odd number of times is even.
}}

We model this scenario as an undirected graph <math>G(V,E)</math> with <math>|V|=n</math> standing for the <math>n</math> guests. There is an edge <math>uv\in E</math> if <math>u</math> and <math>v</math> shake hands. Let <math>d(v)</math> be the degree of vertex <math>v</math>, which represents the number of times that <math>v</math> shakes hand. The handshaking lemma states that in any undirected graph, the sum of odd degrees is even.

The handshaking lemma is a direct consequence of the following lemma, which is proved by Euler in a 1736 paper that began the study of graph theory.

{{Theorem|Lemma (Euler 1736)|
:<math>\sum_{v\in V}d(v)=2|E|</math>
}}
{{Proof|
We count the number of '''directed''' edges. A directed edge is an ordered pair <math>(u,v)</math> such that <math>\{u,v\}\in E</math>. There are two ways to count the directed edges.

First, we can enumerate by edges. Pick every edge <math>uv\in E</math> and apply two directions <math>(u,v)</math> and <math>(v,u)</math> to the edge. This gives us <math>2|E|</math> directed edges.

On the other hand, we can enumerate by vertices. Pick every vertex <math>v\in V</math> and for each of its <math>d(v)</math> neighbors, say <math>u</math>, generate a directed edge <math>(v,u)</math>. This gives us <math>\sum_{v\in V}d(v)</math> directed edges.

It is obvious that the two terms are equal, since we just count the same thing twice with different methods. The lemma follows.
}}

The handshaking lemma is implied directly by the above lemma, since the sum of even degrees is even.

;Cayley's formula
{{Theorem|Caylay's formula for trees|
: There are <math>n^{n-2}</math> different trees on <math>n</math> distinct vertices.
}}

== The Pigeonhole Principle ==

=== Monotonic subsequences ===

{{Theorem|Theorem (Erdős-Szekeres 1935)|
:A sequence of more than <math>mn</math> different real numbers must contain either an increasing subsequence of length <math>m+1</math>, or a decreasing subsequence of length <math>n+1</math>.
}}
{{Proof|(due to Seidenberg 1959)
}}

=== Dirichlet's theorem ===

{{Theorem|Theorem (Dirichlet 1879)|
:Let <math>x</math> be a real number. For any natural number <math>n</math>, there is a rational number <math>\frac{p}{q}</math> such that <math>1\le q\le n</math> and
::<math>\left|x-\frac{p}{q}\right|<\frac{1}{nq}</math>.
}}

== The Probabilistic Method ==
The probabilistic method provides another way of proving the existence of objects: instead of explicitly constructing an object, we define a probability space of objects in which the probability is positive that a randomly selected object has the required property.

The basic principle of the probabilistic method is very simple, and can be stated in intuitive ways:
*If an object chosen randomly from a universe satisfies a property with positive probability, then there must be an object in the universe that satisfies that property.
:For example, for a ball(the object) randomly chosen from a box(the universe) of balls, if the probability that the chosen ball is blue(the property) is >0, then there must be a blue ball in the box.
*Any random variable assumes at least one value that is no smaller than its expectation, and at least one value that is no greater than the expectation.
:For example, if we know the average height of the students in the class is <math>\ell</math>, then we know there is a students whose height is at least <math>\ell</math>, and there is a student whose height is at most <math>\ell</math>.

Although the idea of the probabilistic method is simple, it provides us a powerful tool for existential proof.

===Ramsey number===

Recall the Ramsey theorem which states that in a meeting of at least six people, there are either three people knowing each other or three people not knowing each other. In graph theoretical terms, this means that no matter how we color the edges of <math>K_6</math> (the complete graph on six vertices), there must be a '''monochromatic''' <math>K_3</math> (a triangle whose edges have the same color).

Generally, the '''Ramsey number''' <math>R(k,\ell)</math> is the smallest integer <math>n</math> such that in any two-coloring of the edges of a complete graph on <math>n</math> vertices <math>K_n</math> by red and blue, either there is a red <math>K_k</math> or there is a blue <math>K_\ell</math>.

Ramsey showed in 1929 that <math>R(k,\ell)</math> is finite for any <math>k</math> and <math>\ell</math>. It is extremely hard to compute the exact value of <math>R(k,\ell)</math>. Here we give a lower bound of <math>R(k,k)</math> by the probabilistic method.

{{Theorem
|Theorem (Erdős 1947)|
:If <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math> then it is possible to color the edges of <math>K_n</math> with two colors so that there is no monochromatic <math>K_k</math> subgraph.
}}
{{Proof| Consider a random two-coloring of edges of <math>K_n</math> obtained as follows:
* For each edge of <math>K_n</math>, independently flip a fair coin to decide the color of the edge.

For any fixed set <math>S</math> of <math>k</math> vertices, let <math>\mathcal{E}_S</math> be the event that the <math>K_k</math> subgraph induced by <math>S</math> is monochromatic. There are <math>{k\choose 2}</math> many edges in <math>K_k</math>, therefore
:<math>\Pr[\mathcal{E}_S]=2\cdot 2^{-{k\choose 2}}=2^{1-{k\choose 2}}.</math>

Since there are <math>{n\choose k}</math> possible choices of <math>S</math>, by the union bound
:<math>
\Pr[\exists S, \mathcal{E}_S]\le {n\choose k}\cdot\Pr[\mathcal{E}_S]={n\choose k}\cdot 2^{1-{k\choose 2}}.
</math>
Due to the assumption, <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math>, thus there exists a two coloring that none of <math>\mathcal{E}_S</math> occurs, which means there is no monochromatic <math>K_k</math> subgraph.
}}

For <math>k\ge 3</math> and we take <math>n=\lfloor2^{k/2}\rfloor</math>, then
:<math>
\begin{align}
{n\choose k}\cdot 2^{1-{k\choose 2}}
&<
\frac{n^k}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&\le
\frac{2^{k^2/2}}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&=
\frac{2^{1+\frac{k}{2}}}{k!}\\
&<1.
\end{align}
</math>
By the above theorem, there exists a two-coloring of <math>K_n</math> that there is no monochromatic <math>K_k</math>. Therefore, the Ramsey number <math>R(k,k)>\lfloor2^{k/2}\rfloor</math> for all <math>k\ge 3</math>.

Note that for sufficiently large <math>k</math>, if <math>n= \lfloor 2^{k/2}\rfloor</math>, then the probability that there exists a monochromatic <math>K_k</math> is bounded by
:<math>
{n\choose k}\cdot 2^{1-{k\choose 2}}
<
\frac{2^{1+\frac{k}{2}}}{k!}
\ll 1,
</math>
which means that a random two-coloring of <math>K_n</math> is very likely not to contain a monochromatic <math>K_{2\log n}</math>. This gives us a very simple randomized algorithm for finding a two-coloring of <math>K_n</math> without monochromatic <math>K_{2\log n}</math>.

===Tournament===
A '''[http://en.wikipedia.org/wiki/Tournament_(graph_theory) tournament]''' (竞赛图) on a set <math>V</math> of <math>n</math> players is an '''orientation''' of the edges of the complete graph on the set of vertices <math>V</math>. Thus for every two distinct vertices <math>u,v</math> in <math>V</math>, either <math>(u,v)\in E</math> or <math>(v,u)\in E</math>, but not both.

We can think of the set <math>V</math> as a set of <math>n</math> players in which each pair participates in a single match, where <math>(u,v)</math> is in the tournament iff player <math>u</math> beats player <math>v</math>.

{{Theorem|Definition|
:We say that a tournament has '''property <math>S_k</math>''' if for every set of <math>k</math> players there is one who beats them all.
}}

Is it true for every finite <math>k</math>, there is a tournament (on more than <math>k</math> vertices, of course) with the property <math>S_k</math>? This problem was first raised by Schütte, and as shown by Erdős, can be solved almost trivially by the probabilistic method.

{{Theorem|Theorem (Erdős 1963)|
:If <math>{n\choose k}\left(1-2^{-k}\right)^{n-k}<1</math> then there is a tournament on <math>n</math> vertices that has the property <math>S_k</math>.
}}

=== Linearity of expectation ===

;Maximum cut

Given an undirected graph <math>G(V,E)</math>, a set <math>C</math> of edges of <math>G</math> is called a '''cut''' if <math>G</math> is disconnected after removing the edges in <math>C</math>. We can represent a cut by <math>c(S,T)</math> where <math>(S,T)</math> is a bipartition of the vertex set <math>V</math>, and <math>c(S,T)=\{uv\in E\mid u\in S,v\in T\}</math> is the set of edges crossing between <math>S</math> and <math>T</math>.

We have seen how to compute min-cut: either by deterministic max-flow algorithm, or by Karger's randomized algorithm. On the other hand, max-cut is hard to compute, because it is '''NP-complete'''. Actually, the weighted version of max-cut is among the [http://en.wikipedia.org/wiki/Karp's_21_NP-complete_problems Karp's 21 NP-complete problems].

We now show by the probabilistic method that a max-cut always has at least half the edges.

{{Theorem
|Theorem|
:Given an undirected graph <math>G</math> with <math>n</math> vertices and <math>m</math> edges, there is a cut of size at least <math>\frac{m}{2}</math>.
}}
{{Proof| Enumerate the vertices in an arbitrary order. Partition the vertex set <math>V</math> into two disjoint sets <math>S</math> and <math>T</math> as follows.
:For each vertex <math>v\in V</math>,
:* independently choose one of <math>S</math> and <math>T</math> with equal probability, and let <math>v</math> join the chosen set.

For each vertex <math>v\in V</math>, let <math>X_v\in\{S,T\}</math> be the random variable which represents the set that <math>v</math> joins. For each edge <math>uv\in E</math>, let <math>Y_{uv}</math> be the 0-1 random variable which indicates whether <math>uv</math> crosses between <math>S</math> and <math>T</math>. Clearly,
:<math>
\Pr[Y_{uv}=1]=\Pr[X_u\neq X_v]=\frac{1}{2}.
</math>

The size of <math>c(S,T)</math> is given by <math>Y=\sum_{uv\in E}Y_{uv}</math>. By the linearity of expectation,
:<math>
\mathbf{E}[Y]=\sum_{uv\in E}\mathbf{E}[Y_{uv}]=\sum_{uv\in E}\Pr[Y_{uv}=1]=\frac{m}{2}.
</math>
Therefore, there exist a bipartition <math>(S,T)</math> of <math>V</math> such that <math>|c(S,T)|\ge\frac{m}{2}</math>, i.e. there exists a cut of <math>G</math> which contains at least <math>\frac{m}{2}</math> edges.
}}

=== Alterations ===
;Independent sets
An independent set of a graph is a set of vertices with no edges between them. The following theorem gives a lower bound on the size of the largest independent set.
{{Theorem
|Theorem|
:Let <math>G(V,E)</math> be a graph on <math>n</math> vertices with <math>m</math> edges. Then <math>G</math> has an independent set with at least <math>\frac{n^2}{4m}</math> vertices.
}}
{{Proof| Let <math>S</math> be a set of vertices constructed as follows:
:For each vertex <math>v\in V</math>:
:* <math>v</math> is included in <math>S</math> independently with probability <math>p</math>,
<math>p</math> to be determined.

Let <math>X=|S|</math>. It is obvious that <math>\mathbf{E}[X]=np</math>.

For each edge <math>e\in E</math>, let <math>Y_{e}</math> be the random variable which indicates whether both endpoints of <math></math> are in <math>S</math>.
:<math>
\mathbf{E}[Y_{uv}]=\Pr[u\in S\wedge v\in S]=p^2.
</math>
Let <math>Y</math> be the number of edges in the subgraph of <math>G</math> induced by <math>S</math>. It holds that <math>Y=\sum_{e\in E}Y_e</math>. By linearity of expectation,
:<math>\mathbf{E}[Y]=\sum_{e\in E}\mathbf{E}[Y_e]=mp^2</math>.

Note that although <math>S</math> is not necessary an independent set, it can be modified to one if for each edge <math>e</math> of the induced subgraph <math>G(S)</math>, we delete one of the endpoint of <math>e</math> from <math>S</math>. Let <math>S^*</math> be the resulting set. It is obvious that <math>S^*</math> is an independent set since there is no edge left in the induced subgraph <math>G(S^*)</math>.

Since there are <math>Y</math> edges in <math>G(S)</math>, there are at most <math>Y</math> vertices in <math>S</math> are deleted to make it become <math>S^*</math>. Therefore, <math>|S^*|\ge X-Y</math>. By linearity of expectation,
:<math>
\mathbf{E}[|S^*|]\ge\mathbf{E}[X-Y]=\mathbf{E}[X]-\mathbf{E}[Y]=np-mp^2.
</math>
The expectation is maximized when <math>p=\frac{n}{2m}</math>, thus
:<math>
\mathbf{E}[|S^*|]\ge n\cdot\frac{n}{2m}-m\left(\frac{n}{2m}\right)^2=\frac{n^2}{4m}.
</math>
There exists an independent set which contains at least <math>\frac{n^2}{4m}</math> vertices.
}}

The proof actually propose a randomized algorithm for constructing large independent set:

{{Theorem
|Algorithm|
Given a graph on <math>n</math> vertices with <math>m</math> edges, let <math>d=\frac{2m}{n}</math> be the average degree.
#For each vertex <math>v\in V</math>, <math>v</math> is included in <math>S</math> independently with probability <math>\frac{1}{d}</math>.
#For each remaining edge in the induced subgraph <math>G(S)</math>, remove one of the endpoints from <math>S</math>.
}}

Let <math>S^*</math> be the resulting set. We have shown that <math>S^*</math> is an independent set and <math>\mathbf{E}[|S^*|]\ge\frac{n^2}{4m}</math>.

Combinatorics (Fall 2010)/Existence, the probabilistic method

2010-09-20T13:25:10Z

210.28.131.82: /* Ramsey number */

== Counting arguments ==
;Circuit complexity

This is a fundamental problem in in Computer Science.

A '''boolean function''' is a function is the form <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Formally, a boolean circuit is a directed acyclic graph. Nodes with indegree zero are input nodes, labeled <math>x_1, x_2, \ldots , x_n</math>. A circuit has a unique node with outdegree zero, called the output node. Every other node is a gate. There are three types of gates: AND, OR (both with indegree two), and NOT (with indegree one).

Computations in Turing machines can be simulated by circuits, and any boolean function in '''P''' can be computed by a circuit with polynomially many gates. Thus, if we can find a function in '''NP''' that cannot be computed by any circuit with polynomially many gates, then '''NP'''<math>\neq</math>'''P'''.

The following theorem due to Shannon says that functions with exponentially large circuit complexity do exist.

{{Theorem
|Theorem (Shannon 1949)|
:There is a boolean function <math>f:\{0,1\}^n\rightarrow \{0,1\}</math> with circuit complexity greater than <math>\frac{2^n}{3n}</math>.
}}
{{Proof|
We first count the number of boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>. There are <math>2^{2^n}</math> boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Then we count the number of boolean circuit with fixed number of gates.
Fix an integer <math>t</math>, we count the number of circuits with <math>t</math> gates. By the [http://en.wikipedia.org/wiki/De_Morgan's_laws De Morgan's laws], we can assume that all NOTs are pushed back to the inputs. Each gate has one of the two types (AND or OR), and has two inputs. Each of the inputs to a gate is either a constant 0 or 1, an input variable <math>x_i</math>, an inverted input variable <math>\neg x_i</math>, or the output of another gate; thus, there are at most <math>2+2n+t-1</math> possible gate inputs. It follows that the number of circuits with <math>t</math> gates is at most <math>2^t(t+2n+1)^{2t}</math>.

If <math>t=2^n/3n</math>, then
:<math>
\frac{2^t(t+2n+1)^{2t}}{2^{2^n}}=o(1)<1,</math> thus, <math>2^t(t+2n+1)^{2t} < 2^{2^n}.</math>

Each boolean circuit computes one boolean function. Therefore, there must exist a boolean function <math>f</math> which cannot be computed by any circuits with <math>2^n/3n</math> gates.
}}

Note that by Shannon's theorem, not only there exists a boolean function with exponentially large circuit complexity, but ''almost all'' boolean functions have exponentially large circuit complexity.

=== Double counting ===
;Handshaking lemma
{{Theorem|Handshaking Lemma|
:At a party, the number of guests who shake hands an odd number of times is even.
}}

We model this scenario as an undirected graph <math>G(V,E)</math> with <math>|V|=n</math> standing for the <math>n</math> guests. There is an edge <math>uv\in E</math> if <math>u</math> and <math>v</math> shake hands. Let <math>d(v)</math> be the degree of vertex <math>v</math>, which represents the number of times that <math>v</math> shakes hand. The handshaking lemma states that in any undirected graph, the sum of odd degrees is even.

The handshaking lemma is a direct consequence of the following lemma, which is proved by Euler in a 1736 paper that began the study of graph theory.

{{Theorem|Lemma (Euler 1736)|
:<math>\sum_{v\in V}d(v)=2|E|</math>
}}
{{Proof|
We count the number of '''directed''' edges. A directed edge is an ordered pair <math>(u,v)</math> such that <math>\{u,v\}\in E</math>. There are two ways to count the directed edges.

First, we can enumerate by edges. Pick every edge <math>uv\in E</math> and apply two directions <math>(u,v)</math> and <math>(v,u)</math> to the edge. This gives us <math>2|E|</math> directed edges.

On the other hand, we can enumerate by vertices. Pick every vertex <math>v\in V</math> and for each of its <math>d(v)</math> neighbors, say <math>u</math>, generate a directed edge <math>(v,u)</math>. This gives us <math>\sum_{v\in V}d(v)</math> directed edges.

It is obvious that the two terms are equal, since we just count the same thing twice with different methods. The lemma follows.
}}

The handshaking lemma is implied directly by the above lemma, since the sum of even degrees is even.

;Cayley's formula
{{Theorem|Caylay's formula for trees|
: There are <math>n^{n-2}</math> different trees on <math>n</math> distinct vertices.
}}

== The Pigeonhole Principle ==

=== Monotonic subsequences ===

{{Theorem|Theorem (Erdős-Szekeres 1935)|
:A sequence of more than <math>mn</math> different real numbers must contain either an increasing subsequence of length <math>m+1</math>, or a decreasing subsequence of length <math>n+1</math>.
}}
{{Proof|(due to Seidenberg 1959)
}}

=== Dirichlet's theorem ===

{{Theorem|Theorem (Dirichlet 1879)|
:Let <math>x</math> be a real number. For any natural number <math>n</math>, there is a rational number <math>\frac{p}{q}</math> such that <math>1\le q\le n</math> and
::<math>\left|x-\frac{p}{q}\right|<\frac{1}{nq}</math>.
}}

== The Probabilistic Method ==
The probabilistic method provides another way of proving the existence of objects: instead of explicitly constructing an object, we define a probability space of objects in which the probability is positive that a randomly selected object has the required property.

The basic principle of the probabilistic method is very simple, and can be stated in intuitive ways:
*If an object chosen randomly from a universe satisfies a property with positive probability, then there must be an object in the universe that satisfies that property.
:For example, for a ball(the object) randomly chosen from a box(the universe) of balls, if the probability that the chosen ball is blue(the property) is >0, then there must be a blue ball in the box.
*Any random variable assumes at least one value that is no smaller than its expectation, and at least one value that is no greater than the expectation.
:For example, if we know the average height of the students in the class is <math>\ell</math>, then we know there is a students whose height is at least <math>\ell</math>, and there is a student whose height is at most <math>\ell</math>.

Although the idea of the probabilistic method is simple, it provides us a powerful tool for existential proof.

===Ramsey number===

Recall the Ramsey theorem which states that in a meeting of at least six people, there are either three people knowing each other or three people not knowing each other. In graph theoretical terms, this means that no matter how we color the edges of <math>K_6</math> (the complete graph on six vertices), there must be a '''monochromatic''' <math>K_3</math> (a triangle whose edges have the same color).

Generally, the '''Ramsey number''' <math>R(k,\ell)</math> is the smallest integer <math>n</math> such that in any two-coloring of the edges of a complete graph on <math>n</math> vertices <math>K_n</math> by red and blue, either there is a red <math>K_k</math> or there is a blue <math>K_\ell</math>.

Ramsey showed in 1929 that <math>R(k,\ell)</math> is finite for any <math>k</math> and <math>\ell</math>. It is extremely hard to compute the exact value of <math>R(k,\ell)</math>. Here we give a lower bound of <math>R(k,k)</math> by the probabilistic method.

{{Theorem
|Theorem (Erdős 1947)|
:If <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math> then it is possible to color the edges of <math>K_n</math> with two colors so that there is no monochromatic <math>K_k</math> subgraph.
}}
{{Proof| Consider a random two-coloring of edges of <math>K_n</math> obtained as follows:
* For each edge of <math>K_n</math>, independently flip a fair coin to decide the color of the edge.

For any fixed set <math>S</math> of <math>k</math> vertices, let <math>\mathcal{E}_S</math> be the event that the <math>K_k</math> subgraph induced by <math>S</math> is monochromatic. There are <math>{k\choose 2}</math> many edges in <math>K_k</math>, therefore
:<math>\Pr[\mathcal{E}_S]=2\cdot 2^{-{k\choose 2}}=2^{1-{k\choose 2}}.</math>

Since there are <math>{n\choose k}</math> possible choices of <math>S</math>, by the union bound
:<math>
\Pr[\exists S, \mathcal{E}_S]\le {n\choose k}\cdot\Pr[\mathcal{E}_S]={n\choose k}\cdot 2^{1-{k\choose 2}}.
</math>
Due to the assumption, <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math>, thus there exists a two coloring that none of <math>\mathcal{E}_S</math> occurs, which means there is no monochromatic <math>K_k</math> subgraph.
}}

For <math>k\ge 3</math> and we take <math>n=\lfloor2^{k/2}\rfloor</math>, then
:<math>
\begin{align}
{n\choose k}\cdot 2^{1-{k\choose 2}}
&<
\frac{n^k}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&\le
\frac{2^{k^2/2}}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&=
\frac{2^{1+\frac{k}{2}}}{k!}\\
&<1.
\end{align}
</math>
By the above theorem, there exists a two-coloring of <math>K_n</math> that there is no monochromatic <math>K_k</math>. Therefore, the Ramsey number <math>R(k,k)>\lfloor2^{k/2}\rfloor</math> for all <math>k\ge 3</math>.

Note that for sufficiently large <math>k</math>, if <math>n= \lfloor 2^{k/2}\rfloor</math>, then the probability that there exists a monochromatic <math>K_k</math> is bounded by
:<math>
{n\choose k}\cdot 2^{1-{k\choose 2}}
<
\frac{2^{1+\frac{k}{2}}}{k!}
\ll 1,
</math>
which means that a random two-coloring of <math>K_n</math> is very likely not to contain a monochromatic <math>K_{2\log n}</math>. This gives us a very simple randomized algorithm for finding a two-coloring of <math>K_n</math> without monochromatic <math>K_{2\log n}</math>.

===Tournament===
A '''[http://en.wikipedia.org/wiki/Tournament_(graph_theory) tournament]''' (竞赛图) on a set <math>V</math> of <math>n</math> players is an '''orientation''' of the edges of the complete graph on the set of vertices <math>V</math>. Thus for every two distinct vertices <math>u,v</math> in <math>V</math>, either <math>(u,v)\in E</math> or <math>(v,u)\in E</math>, but not both.

We can think of the set <math>V</math> as a set of <math>n</math> players in which each pair participates in a single match, where <math>(u,v)</math> is in the tournament iff player <math>u</math> beats player <math>v</math>.

{{Theorem|Definition|
:We say that a tournament has '''property <math>S_k</math>''' if for every set of <math>k</math> players there is one who beats them all.
}}

Is it true for every finite <math>k</math>, there is a tournament (on more than <math>k</math> vertices, of course) with the property <math>S_k</math>? This problem was first raised by Schütte, and as shown by Erdős, can be solved almost trivially by the probabilistic method.

{{Theorem|Theorem (Erdős 1963)|
:If <math>{n\choose k}\left(1-2^{-k}\right)^{n-k}<1</math> then there is a tournament on <math>n</math> vertices that has the property <math>S_k</math>.
}}

=== Linearity of expectation ===

;Maximum cut

Given an undirected graph <math>G(V,E)</math>, a set <math>C</math> of edges of <math>G</math> is called a '''cut''' if <math>G</math> is disconnected after removing the edges in <math>C</math>. We can represent a cut by <math>c(S,T)</math> where <math>(S,T)</math> is a bipartition of the vertex set <math>V</math>, and <math>c(S,T)=\{uv\in E\mid u\in S,v\in T\}</math> is the set of edges crossing between <math>S</math> and <math>T</math>.

We have seen how to compute min-cut: either by deterministic max-flow algorithm, or by Karger's randomized algorithm. On the other hand, max-cut is hard to compute, because it is '''NP-complete'''. Actually, the weighted version of max-cut is among the [http://en.wikipedia.org/wiki/Karp's_21_NP-complete_problems Karp's 21 NP-complete problems].

We now show by the probabilistic method that a max-cut always has at least half the edges.

{{Theorem
|Theorem|
:Given an undirected graph <math>G</math> with <math>n</math> vertices and <math>m</math> edges, there is a cut of size at least <math>\frac{m}{2}</math>.
}}
{{Proof| Enumerate the vertices in an arbitrary order. Partition the vertex set <math>V</math> into two disjoint sets <math>S</math> and <math>T</math> as follows.
:For each vertex <math>v\in V</math>,
:* independently choose one of <math>S</math> and <math>T</math> with equal probability, and let <math>v</math> join the chosen set.

For each vertex <math>v\in V</math>, let <math>X_v\in\{S,T\}</math> be the random variable which represents the set that <math>v</math> joins. For each edge <math>uv\in E</math>, let <math>Y_{uv}</math> be the 0-1 random variable which indicates whether <math>uv</math> crosses between <math>S</math> and <math>T</math>. Clearly,
:<math>
\Pr[Y_{uv}=1]=\Pr[X_u\neq X_v]=\frac{1}{2}.
</math>

The size of <math>c(S,T)</math> is given by <math>Y=\sum_{uv\in E}Y_{uv}</math>. By the linearity of expectation,
:<math>
\mathbf{E}[Y]=\sum_{uv\in E}\mathbf{E}[Y_{uv}]=\sum_{uv\in E}\Pr[Y_{uv}=1]=\frac{m}{2}.
</math>
Therefore, there exist a bipartition <math>(S,T)</math> of <math>V</math> such that <math>|c(S,T)|\ge\frac{m}{2}</math>, i.e. there exists a cut of <math>G</math> which contains at least <math>\frac{m}{2}</math> edges.
}}

;Maximum satisfiability

Suppose that we have a number of boolean variables <math>x_1,x_2,\ldots,\in\{\mathrm{true},\mathrm{false}\}</math>. A '''literal''' is either a variable <math>x_i</math> itself or its negation <math>\neg x_i</math>. A logic expression is a '''conjunctive normal form (CNF)''' if it is written as the conjunction(AND) of a set of '''clauses''', where each clause is a disjunction(OR) of literals. For example:
:<math>
(x_1\vee \neg x_2 \vee \neg x_3)\wedge (\neg x_1\vee \neg x_3)\wedge (x_1\vee x_2\vee x_4)\wedge (x_4\vee \neg x_3)\wedge (x_4\vee \neg x_1).
</math>

The satisfiability (SAT) problem ask whether the CNF is satisfiable, i.e. there exists an assignment of variables to the values of true and false so that all clauses are true. The maximum satisfiability (MAXSAT) is the optimization version of SAT, which ask for an assignment that the number of satisfied clauses is maximized.

SAT is the first problem known to be '''NP-complete''' (the Cook-Levin theorem). MAXSAT is also '''NP-complete'''. We then see that there always exists a roughly good truth assignment which satisfies half the clauses.

{{Theorem
|Theorem|
:For any set of <math>m</math> clauses, there is a truth assignment that satisfies at least <math>\frac{m}{2}</math> clauses.
}}
{{Proof| For each variable, independently assign a random value in <math>\{\mathrm{true},\mathrm{false}\}</math> with equal probability. For the <math>i</math>th clause, let <math>X_i</math> be the random variable which indicates whether the <math>i</math>th clause is satisfied. Suppose that there are <math>k</math> literals in the clause. The probability that the clause is satisfied is
:<math>\Pr[X_k=1]\ge(1-2^{-k})\ge\frac{1}{2}</math>.

Let <math>X=\sum_{i=1}^m X_i</math> be the number of satisfied clauses. By the linearity of expectation,
:<math>
\mathbf{E}[X]=\sum_{i=1}^{m}\mathbf{E}[X_i]\ge \frac{m}{2}.
</math>
Therefore, there exists an assignment such that at least <math>\frac{m}{2}</math> clauses are satisfied.
}}

=== Alterations ===
;Independent sets
An independent set of a graph is a set of vertices with no edges between them. The following theorem gives a lower bound on the size of the largest independent set.
{{Theorem
|Theorem|
:Let <math>G(V,E)</math> be a graph on <math>n</math> vertices with <math>m</math> edges. Then <math>G</math> has an independent set with at least <math>\frac{n^2}{4m}</math> vertices.
}}
{{Proof| Let <math>S</math> be a set of vertices constructed as follows:
:For each vertex <math>v\in V</math>:
:* <math>v</math> is included in <math>S</math> independently with probability <math>p</math>,
<math>p</math> to be determined.

Let <math>X=|S|</math>. It is obvious that <math>\mathbf{E}[X]=np</math>.

For each edge <math>e\in E</math>, let <math>Y_{e}</math> be the random variable which indicates whether both endpoints of <math></math> are in <math>S</math>.
:<math>
\mathbf{E}[Y_{uv}]=\Pr[u\in S\wedge v\in S]=p^2.
</math>
Let <math>Y</math> be the number of edges in the subgraph of <math>G</math> induced by <math>S</math>. It holds that <math>Y=\sum_{e\in E}Y_e</math>. By linearity of expectation,
:<math>\mathbf{E}[Y]=\sum_{e\in E}\mathbf{E}[Y_e]=mp^2</math>.

Note that although <math>S</math> is not necessary an independent set, it can be modified to one if for each edge <math>e</math> of the induced subgraph <math>G(S)</math>, we delete one of the endpoint of <math>e</math> from <math>S</math>. Let <math>S^*</math> be the resulting set. It is obvious that <math>S^*</math> is an independent set since there is no edge left in the induced subgraph <math>G(S^*)</math>.

Since there are <math>Y</math> edges in <math>G(S)</math>, there are at most <math>Y</math> vertices in <math>S</math> are deleted to make it become <math>S^*</math>. Therefore, <math>|S^*|\ge X-Y</math>. By linearity of expectation,
:<math>
\mathbf{E}[|S^*|]\ge\mathbf{E}[X-Y]=\mathbf{E}[X]-\mathbf{E}[Y]=np-mp^2.
</math>
The expectation is maximized when <math>p=\frac{n}{2m}</math>, thus
:<math>
\mathbf{E}[|S^*|]\ge n\cdot\frac{n}{2m}-m\left(\frac{n}{2m}\right)^2=\frac{n^2}{4m}.
</math>
There exists an independent set which contains at least <math>\frac{n^2}{4m}</math> vertices.
}}

The proof actually propose a randomized algorithm for constructing large independent set:

{{Theorem
|Algorithm|
Given a graph on <math>n</math> vertices with <math>m</math> edges, let <math>d=\frac{2m}{n}</math> be the average degree.
#For each vertex <math>v\in V</math>, <math>v</math> is included in <math>S</math> independently with probability <math>\frac{1}{d}</math>.
#For each remaining edge in the induced subgraph <math>G(S)</math>, remove one of the endpoints from <math>S</math>.
}}

Let <math>S^*</math> be the resulting set. We have shown that <math>S^*</math> is an independent set and <math>\mathbf{E}[|S^*|]\ge\frac{n^2}{4m}</math>.

Combinatorics (Fall 2010)/Existence, the probabilistic method

2010-09-20T13:24:51Z

210.28.131.82: /* Sampling */

== Counting arguments ==
;Circuit complexity

This is a fundamental problem in in Computer Science.

A '''boolean function''' is a function is the form <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Formally, a boolean circuit is a directed acyclic graph. Nodes with indegree zero are input nodes, labeled <math>x_1, x_2, \ldots , x_n</math>. A circuit has a unique node with outdegree zero, called the output node. Every other node is a gate. There are three types of gates: AND, OR (both with indegree two), and NOT (with indegree one).

Computations in Turing machines can be simulated by circuits, and any boolean function in '''P''' can be computed by a circuit with polynomially many gates. Thus, if we can find a function in '''NP''' that cannot be computed by any circuit with polynomially many gates, then '''NP'''<math>\neq</math>'''P'''.

The following theorem due to Shannon says that functions with exponentially large circuit complexity do exist.

{{Theorem
|Theorem (Shannon 1949)|
:There is a boolean function <math>f:\{0,1\}^n\rightarrow \{0,1\}</math> with circuit complexity greater than <math>\frac{2^n}{3n}</math>.
}}
{{Proof|
We first count the number of boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>. There are <math>2^{2^n}</math> boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Then we count the number of boolean circuit with fixed number of gates.
Fix an integer <math>t</math>, we count the number of circuits with <math>t</math> gates. By the [http://en.wikipedia.org/wiki/De_Morgan's_laws De Morgan's laws], we can assume that all NOTs are pushed back to the inputs. Each gate has one of the two types (AND or OR), and has two inputs. Each of the inputs to a gate is either a constant 0 or 1, an input variable <math>x_i</math>, an inverted input variable <math>\neg x_i</math>, or the output of another gate; thus, there are at most <math>2+2n+t-1</math> possible gate inputs. It follows that the number of circuits with <math>t</math> gates is at most <math>2^t(t+2n+1)^{2t}</math>.

If <math>t=2^n/3n</math>, then
:<math>
\frac{2^t(t+2n+1)^{2t}}{2^{2^n}}=o(1)<1,</math> thus, <math>2^t(t+2n+1)^{2t} < 2^{2^n}.</math>

Each boolean circuit computes one boolean function. Therefore, there must exist a boolean function <math>f</math> which cannot be computed by any circuits with <math>2^n/3n</math> gates.
}}

Note that by Shannon's theorem, not only there exists a boolean function with exponentially large circuit complexity, but ''almost all'' boolean functions have exponentially large circuit complexity.

=== Double counting ===
;Handshaking lemma
{{Theorem|Handshaking Lemma|
:At a party, the number of guests who shake hands an odd number of times is even.
}}

We model this scenario as an undirected graph <math>G(V,E)</math> with <math>|V|=n</math> standing for the <math>n</math> guests. There is an edge <math>uv\in E</math> if <math>u</math> and <math>v</math> shake hands. Let <math>d(v)</math> be the degree of vertex <math>v</math>, which represents the number of times that <math>v</math> shakes hand. The handshaking lemma states that in any undirected graph, the sum of odd degrees is even.

The handshaking lemma is a direct consequence of the following lemma, which is proved by Euler in a 1736 paper that began the study of graph theory.

{{Theorem|Lemma (Euler 1736)|
:<math>\sum_{v\in V}d(v)=2|E|</math>
}}
{{Proof|
We count the number of '''directed''' edges. A directed edge is an ordered pair <math>(u,v)</math> such that <math>\{u,v\}\in E</math>. There are two ways to count the directed edges.

First, we can enumerate by edges. Pick every edge <math>uv\in E</math> and apply two directions <math>(u,v)</math> and <math>(v,u)</math> to the edge. This gives us <math>2|E|</math> directed edges.

On the other hand, we can enumerate by vertices. Pick every vertex <math>v\in V</math> and for each of its <math>d(v)</math> neighbors, say <math>u</math>, generate a directed edge <math>(v,u)</math>. This gives us <math>\sum_{v\in V}d(v)</math> directed edges.

It is obvious that the two terms are equal, since we just count the same thing twice with different methods. The lemma follows.
}}

The handshaking lemma is implied directly by the above lemma, since the sum of even degrees is even.

;Cayley's formula
{{Theorem|Caylay's formula for trees|
: There are <math>n^{n-2}</math> different trees on <math>n</math> distinct vertices.
}}

== The Pigeonhole Principle ==

=== Monotonic subsequences ===

{{Theorem|Theorem (Erdős-Szekeres 1935)|
:A sequence of more than <math>mn</math> different real numbers must contain either an increasing subsequence of length <math>m+1</math>, or a decreasing subsequence of length <math>n+1</math>.
}}
{{Proof|(due to Seidenberg 1959)
}}

=== Dirichlet's theorem ===

{{Theorem|Theorem (Dirichlet 1879)|
:Let <math>x</math> be a real number. For any natural number <math>n</math>, there is a rational number <math>\frac{p}{q}</math> such that <math>1\le q\le n</math> and
::<math>\left|x-\frac{p}{q}\right|<\frac{1}{nq}</math>.
}}

== The Probabilistic Method ==
The probabilistic method provides another way of proving the existence of objects: instead of explicitly constructing an object, we define a probability space of objects in which the probability is positive that a randomly selected object has the required property.

The basic principle of the probabilistic method is very simple, and can be stated in intuitive ways:
*If an object chosen randomly from a universe satisfies a property with positive probability, then there must be an object in the universe that satisfies that property.
:For example, for a ball(the object) randomly chosen from a box(the universe) of balls, if the probability that the chosen ball is blue(the property) is >0, then there must be a blue ball in the box.
*Any random variable assumes at least one value that is no smaller than its expectation, and at least one value that is no greater than the expectation.
:For example, if we know the average height of the students in the class is <math>\ell</math>, then we know there is a students whose height is at least <math>\ell</math>, and there is a student whose height is at most <math>\ell</math>.

Although the idea of the probabilistic method is simple, it provides us a powerful tool for existential proof.

===Ramsey number===

Recall the Ramsey theorem which states that in a meeting of at least six people, there are either three people knowing each other or three people not knowing each other. In graph theoretical terms, this means that no matter how we color the edges of <math>K_6</math> (the complete graph on six vertices), there must be a '''monochromatic''' <math>K_3</math> (a triangle whose edges have the same color).

Generally, the '''Ramsey number''' <math>R(k,\ell)</math> is the smallest integer <math>n</math> such that in any two-coloring of the edges of a complete graph on <math>n</math> vertices <math>K_n</math> by red and blue, either there is a red <math>K_k</math> or there is a blue <math>K_\ell</math>.

Ramsey showed in 1929 that <math>R(k,\ell)</math> is finite for any <math>k</math> and <math>\ell</math>. It is extremely hard to compute the exact value of <math>R(k,\ell)</math>. Here we give a lower bound of <math>R(k,k)</math> by the probabilistic method.

{{Theorem
|Theorem (Erdős 1947)|
:If <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math> then it is possible to color the edges of <math>K_n</math> with two colors so that there is no monochromatic <math>K_k</math> subgraph.
}}
{{Proof| Consider a random two-coloring of edges of <math>K_n</math> obtained as follows:
* For each edge of <math>K_n</math>, independently flip a fair coin to decide the color of the edge.

For any fixed set <math>S</math> of <math>k</math> vertices, let <math>\mathcal{E}_S</math> be the event that the <math>K_k</math> subgraph induced by <math>S</math> is monochromatic. There are <math>{k\choose 2}</math> many edges in <math>K_k</math>, therefore
:<math>\Pr[\mathcal{E}_S]=2\cdot 2^{-{k\choose 2}}=2^{1-{k\choose 2}}.</math>

Since there are <math>{n\choose k}</math> possible choices of <math>S</math>, by the union bound
:<math>
\Pr[\exists S, \mathcal{E}_S]\le {n\choose k}\cdot\Pr[\mathcal{E}_S]={n\choose k}\cdot 2^{1-{k\choose 2}}.
</math>
Due to the assumption, <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math>, thus there exists a two coloring that none of <math>\mathcal{E}_S</math> occurs, which means there is no monochromatic <math>K_k</math> subgraph.
}}

For <math>k\ge 3</math> and we take <math>n=\lfloor2^{k/2}\rfloor</math>, then
:<math>
\begin{align}
{n\choose k}\cdot 2^{1-{k\choose 2}}
&<
\frac{n^k}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&\le
\frac{2^{k^2/2}}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&=
\frac{2^{1+\frac{k}{2}}}{k!}\\
&<1.
\end{align}
</math>
By the above theorem, there exists a two-coloring of <math>K_n</math> that there is no monochromatic <math>K_k</math>. Therefore, the Ramsey number <math>R(k,k)>\lfloor2^{k/2}\rfloor</math> for all <math>k\ge 3</math>.

Note that for sufficiently large <math>k</math>, if <math>n= \lfloor 2^{k/2}\rfloor</math>, then the probability that there exists a monochromatic <math>K_k</math> is bounded by
:<math>
{n\choose k}\cdot 2^{1-{k\choose 2}}
<
\frac{2^{1+\frac{k}{2}}}{k!}
\ll 1,
</math>
which means that a random two-coloring of <math>K_n</math> is very likely not to contain a monochromatic <math>K_{2\log n}</math>. This gives us a very simple randomized algorithm for finding a two-coloring of <math>K_n</math> without monochromatic <math>K_{2\log n}</math>.

;Tournament
A '''[http://en.wikipedia.org/wiki/Tournament_(graph_theory) tournament]''' (竞赛图) on a set <math>V</math> of <math>n</math> players is an '''orientation''' of the edges of the complete graph on the set of vertices <math>V</math>. Thus for every two distinct vertices <math>u,v</math> in <math>V</math>, either <math>(u,v)\in E</math> or <math>(v,u)\in E</math>, but not both.

We can think of the set <math>V</math> as a set of <math>n</math> players in which each pair participates in a single match, where <math>(u,v)</math> is in the tournament iff player <math>u</math> beats player <math>v</math>.

{{Theorem|Definition|
:We say that a tournament has '''property <math>S_k</math>''' if for every set of <math>k</math> players there is one who beats them all.
}}

Is it true for every finite <math>k</math>, there is a tournament (on more than <math>k</math> vertices, of course) with the property <math>S_k</math>? This problem was first raised by Schütte, and as shown by Erdős, can be solved almost trivially by the probabilistic method.

{{Theorem|Theorem (Erdős 1963)|
:If <math>{n\choose k}\left(1-2^{-k}\right)^{n-k}<1</math> then there is a tournament on <math>n</math> vertices that has the property <math>S_k</math>.
}}

=== Linearity of expectation ===

;Maximum cut

Given an undirected graph <math>G(V,E)</math>, a set <math>C</math> of edges of <math>G</math> is called a '''cut''' if <math>G</math> is disconnected after removing the edges in <math>C</math>. We can represent a cut by <math>c(S,T)</math> where <math>(S,T)</math> is a bipartition of the vertex set <math>V</math>, and <math>c(S,T)=\{uv\in E\mid u\in S,v\in T\}</math> is the set of edges crossing between <math>S</math> and <math>T</math>.

We have seen how to compute min-cut: either by deterministic max-flow algorithm, or by Karger's randomized algorithm. On the other hand, max-cut is hard to compute, because it is '''NP-complete'''. Actually, the weighted version of max-cut is among the [http://en.wikipedia.org/wiki/Karp's_21_NP-complete_problems Karp's 21 NP-complete problems].

We now show by the probabilistic method that a max-cut always has at least half the edges.

{{Theorem
|Theorem|
:Given an undirected graph <math>G</math> with <math>n</math> vertices and <math>m</math> edges, there is a cut of size at least <math>\frac{m}{2}</math>.
}}
{{Proof| Enumerate the vertices in an arbitrary order. Partition the vertex set <math>V</math> into two disjoint sets <math>S</math> and <math>T</math> as follows.
:For each vertex <math>v\in V</math>,
:* independently choose one of <math>S</math> and <math>T</math> with equal probability, and let <math>v</math> join the chosen set.

For each vertex <math>v\in V</math>, let <math>X_v\in\{S,T\}</math> be the random variable which represents the set that <math>v</math> joins. For each edge <math>uv\in E</math>, let <math>Y_{uv}</math> be the 0-1 random variable which indicates whether <math>uv</math> crosses between <math>S</math> and <math>T</math>. Clearly,
:<math>
\Pr[Y_{uv}=1]=\Pr[X_u\neq X_v]=\frac{1}{2}.
</math>

The size of <math>c(S,T)</math> is given by <math>Y=\sum_{uv\in E}Y_{uv}</math>. By the linearity of expectation,
:<math>
\mathbf{E}[Y]=\sum_{uv\in E}\mathbf{E}[Y_{uv}]=\sum_{uv\in E}\Pr[Y_{uv}=1]=\frac{m}{2}.
</math>
Therefore, there exist a bipartition <math>(S,T)</math> of <math>V</math> such that <math>|c(S,T)|\ge\frac{m}{2}</math>, i.e. there exists a cut of <math>G</math> which contains at least <math>\frac{m}{2}</math> edges.
}}

;Maximum satisfiability

Suppose that we have a number of boolean variables <math>x_1,x_2,\ldots,\in\{\mathrm{true},\mathrm{false}\}</math>. A '''literal''' is either a variable <math>x_i</math> itself or its negation <math>\neg x_i</math>. A logic expression is a '''conjunctive normal form (CNF)''' if it is written as the conjunction(AND) of a set of '''clauses''', where each clause is a disjunction(OR) of literals. For example:
:<math>
(x_1\vee \neg x_2 \vee \neg x_3)\wedge (\neg x_1\vee \neg x_3)\wedge (x_1\vee x_2\vee x_4)\wedge (x_4\vee \neg x_3)\wedge (x_4\vee \neg x_1).
</math>

The satisfiability (SAT) problem ask whether the CNF is satisfiable, i.e. there exists an assignment of variables to the values of true and false so that all clauses are true. The maximum satisfiability (MAXSAT) is the optimization version of SAT, which ask for an assignment that the number of satisfied clauses is maximized.

SAT is the first problem known to be '''NP-complete''' (the Cook-Levin theorem). MAXSAT is also '''NP-complete'''. We then see that there always exists a roughly good truth assignment which satisfies half the clauses.

{{Theorem
|Theorem|
:For any set of <math>m</math> clauses, there is a truth assignment that satisfies at least <math>\frac{m}{2}</math> clauses.
}}
{{Proof| For each variable, independently assign a random value in <math>\{\mathrm{true},\mathrm{false}\}</math> with equal probability. For the <math>i</math>th clause, let <math>X_i</math> be the random variable which indicates whether the <math>i</math>th clause is satisfied. Suppose that there are <math>k</math> literals in the clause. The probability that the clause is satisfied is
:<math>\Pr[X_k=1]\ge(1-2^{-k})\ge\frac{1}{2}</math>.

Let <math>X=\sum_{i=1}^m X_i</math> be the number of satisfied clauses. By the linearity of expectation,
:<math>
\mathbf{E}[X]=\sum_{i=1}^{m}\mathbf{E}[X_i]\ge \frac{m}{2}.
</math>
Therefore, there exists an assignment such that at least <math>\frac{m}{2}</math> clauses are satisfied.
}}

=== Alterations ===
;Independent sets
An independent set of a graph is a set of vertices with no edges between them. The following theorem gives a lower bound on the size of the largest independent set.
{{Theorem
|Theorem|
:Let <math>G(V,E)</math> be a graph on <math>n</math> vertices with <math>m</math> edges. Then <math>G</math> has an independent set with at least <math>\frac{n^2}{4m}</math> vertices.
}}
{{Proof| Let <math>S</math> be a set of vertices constructed as follows:
:For each vertex <math>v\in V</math>:
:* <math>v</math> is included in <math>S</math> independently with probability <math>p</math>,
<math>p</math> to be determined.

Let <math>X=|S|</math>. It is obvious that <math>\mathbf{E}[X]=np</math>.

For each edge <math>e\in E</math>, let <math>Y_{e}</math> be the random variable which indicates whether both endpoints of <math></math> are in <math>S</math>.
:<math>
\mathbf{E}[Y_{uv}]=\Pr[u\in S\wedge v\in S]=p^2.
</math>
Let <math>Y</math> be the number of edges in the subgraph of <math>G</math> induced by <math>S</math>. It holds that <math>Y=\sum_{e\in E}Y_e</math>. By linearity of expectation,
:<math>\mathbf{E}[Y]=\sum_{e\in E}\mathbf{E}[Y_e]=mp^2</math>.

Note that although <math>S</math> is not necessary an independent set, it can be modified to one if for each edge <math>e</math> of the induced subgraph <math>G(S)</math>, we delete one of the endpoint of <math>e</math> from <math>S</math>. Let <math>S^*</math> be the resulting set. It is obvious that <math>S^*</math> is an independent set since there is no edge left in the induced subgraph <math>G(S^*)</math>.

Since there are <math>Y</math> edges in <math>G(S)</math>, there are at most <math>Y</math> vertices in <math>S</math> are deleted to make it become <math>S^*</math>. Therefore, <math>|S^*|\ge X-Y</math>. By linearity of expectation,
:<math>
\mathbf{E}[|S^*|]\ge\mathbf{E}[X-Y]=\mathbf{E}[X]-\mathbf{E}[Y]=np-mp^2.
</math>
The expectation is maximized when <math>p=\frac{n}{2m}</math>, thus
:<math>
\mathbf{E}[|S^*|]\ge n\cdot\frac{n}{2m}-m\left(\frac{n}{2m}\right)^2=\frac{n^2}{4m}.
</math>
There exists an independent set which contains at least <math>\frac{n^2}{4m}</math> vertices.
}}

The proof actually propose a randomized algorithm for constructing large independent set:

{{Theorem
|Algorithm|
Given a graph on <math>n</math> vertices with <math>m</math> edges, let <math>d=\frac{2m}{n}</math> be the average degree.
#For each vertex <math>v\in V</math>, <math>v</math> is included in <math>S</math> independently with probability <math>\frac{1}{d}</math>.
#For each remaining edge in the induced subgraph <math>G(S)</math>, remove one of the endpoints from <math>S</math>.
}}

Let <math>S^*</math> be the resulting set. We have shown that <math>S^*</math> is an independent set and <math>\mathbf{E}[|S^*|]\ge\frac{n^2}{4m}</math>.

Combinatorics (Fall 2010)/Existence, the probabilistic method

2010-09-20T13:24:26Z

210.28.131.82: /* Sampling */

== Counting arguments ==
;Circuit complexity

This is a fundamental problem in in Computer Science.

A '''boolean function''' is a function is the form <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Formally, a boolean circuit is a directed acyclic graph. Nodes with indegree zero are input nodes, labeled <math>x_1, x_2, \ldots , x_n</math>. A circuit has a unique node with outdegree zero, called the output node. Every other node is a gate. There are three types of gates: AND, OR (both with indegree two), and NOT (with indegree one).

Computations in Turing machines can be simulated by circuits, and any boolean function in '''P''' can be computed by a circuit with polynomially many gates. Thus, if we can find a function in '''NP''' that cannot be computed by any circuit with polynomially many gates, then '''NP'''<math>\neq</math>'''P'''.

The following theorem due to Shannon says that functions with exponentially large circuit complexity do exist.

{{Theorem
|Theorem (Shannon 1949)|
:There is a boolean function <math>f:\{0,1\}^n\rightarrow \{0,1\}</math> with circuit complexity greater than <math>\frac{2^n}{3n}</math>.
}}
{{Proof|
We first count the number of boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>. There are <math>2^{2^n}</math> boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Then we count the number of boolean circuit with fixed number of gates.
Fix an integer <math>t</math>, we count the number of circuits with <math>t</math> gates. By the [http://en.wikipedia.org/wiki/De_Morgan's_laws De Morgan's laws], we can assume that all NOTs are pushed back to the inputs. Each gate has one of the two types (AND or OR), and has two inputs. Each of the inputs to a gate is either a constant 0 or 1, an input variable <math>x_i</math>, an inverted input variable <math>\neg x_i</math>, or the output of another gate; thus, there are at most <math>2+2n+t-1</math> possible gate inputs. It follows that the number of circuits with <math>t</math> gates is at most <math>2^t(t+2n+1)^{2t}</math>.

If <math>t=2^n/3n</math>, then
:<math>
\frac{2^t(t+2n+1)^{2t}}{2^{2^n}}=o(1)<1,</math> thus, <math>2^t(t+2n+1)^{2t} < 2^{2^n}.</math>

Each boolean circuit computes one boolean function. Therefore, there must exist a boolean function <math>f</math> which cannot be computed by any circuits with <math>2^n/3n</math> gates.
}}

Note that by Shannon's theorem, not only there exists a boolean function with exponentially large circuit complexity, but ''almost all'' boolean functions have exponentially large circuit complexity.

=== Double counting ===
;Handshaking lemma
{{Theorem|Handshaking Lemma|
:At a party, the number of guests who shake hands an odd number of times is even.
}}

We model this scenario as an undirected graph <math>G(V,E)</math> with <math>|V|=n</math> standing for the <math>n</math> guests. There is an edge <math>uv\in E</math> if <math>u</math> and <math>v</math> shake hands. Let <math>d(v)</math> be the degree of vertex <math>v</math>, which represents the number of times that <math>v</math> shakes hand. The handshaking lemma states that in any undirected graph, the sum of odd degrees is even.

The handshaking lemma is a direct consequence of the following lemma, which is proved by Euler in a 1736 paper that began the study of graph theory.

{{Theorem|Lemma (Euler 1736)|
:<math>\sum_{v\in V}d(v)=2|E|</math>
}}
{{Proof|
We count the number of '''directed''' edges. A directed edge is an ordered pair <math>(u,v)</math> such that <math>\{u,v\}\in E</math>. There are two ways to count the directed edges.

First, we can enumerate by edges. Pick every edge <math>uv\in E</math> and apply two directions <math>(u,v)</math> and <math>(v,u)</math> to the edge. This gives us <math>2|E|</math> directed edges.

On the other hand, we can enumerate by vertices. Pick every vertex <math>v\in V</math> and for each of its <math>d(v)</math> neighbors, say <math>u</math>, generate a directed edge <math>(v,u)</math>. This gives us <math>\sum_{v\in V}d(v)</math> directed edges.

It is obvious that the two terms are equal, since we just count the same thing twice with different methods. The lemma follows.
}}

The handshaking lemma is implied directly by the above lemma, since the sum of even degrees is even.

;Cayley's formula
{{Theorem|Caylay's formula for trees|
: There are <math>n^{n-2}</math> different trees on <math>n</math> distinct vertices.
}}

== The Pigeonhole Principle ==

=== Monotonic subsequences ===

{{Theorem|Theorem (Erdős-Szekeres 1935)|
:A sequence of more than <math>mn</math> different real numbers must contain either an increasing subsequence of length <math>m+1</math>, or a decreasing subsequence of length <math>n+1</math>.
}}
{{Proof|(due to Seidenberg 1959)
}}

=== Dirichlet's theorem ===

{{Theorem|Theorem (Dirichlet 1879)|
:Let <math>x</math> be a real number. For any natural number <math>n</math>, there is a rational number <math>\frac{p}{q}</math> such that <math>1\le q\le n</math> and
::<math>\left|x-\frac{p}{q}\right|<\frac{1}{nq}</math>.
}}

== The Probabilistic Method ==
The probabilistic method provides another way of proving the existence of objects: instead of explicitly constructing an object, we define a probability space of objects in which the probability is positive that a randomly selected object has the required property.

The basic principle of the probabilistic method is very simple, and can be stated in intuitive ways:
*If an object chosen randomly from a universe satisfies a property with positive probability, then there must be an object in the universe that satisfies that property.
:For example, for a ball(the object) randomly chosen from a box(the universe) of balls, if the probability that the chosen ball is blue(the property) is >0, then there must be a blue ball in the box.
*Any random variable assumes at least one value that is no smaller than its expectation, and at least one value that is no greater than the expectation.
:For example, if we know the average height of the students in the class is <math>\ell</math>, then we know there is a students whose height is at least <math>\ell</math>, and there is a student whose height is at most <math>\ell</math>.

Although the idea of the probabilistic method is simple, it provides us a powerful tool for existential proof.

=== Sampling ===
;Ramsey number

Recall the Ramsey theorem which states that in a meeting of at least six people, there are either three people knowing each other or three people not knowing each other. In graph theoretical terms, this means that no matter how we color the edges of <math>K_6</math> (the complete graph on six vertices), there must be a '''monochromatic''' <math>K_3</math> (a triangle whose edges have the same color).

Generally, the '''Ramsey number''' <math>R(k,\ell)</math> is the smallest integer <math>n</math> such that in any two-coloring of the edges of a complete graph on <math>n</math> vertices <math>K_n</math> by red and blue, either there is a red <math>K_k</math> or there is a blue <math>K_\ell</math>.

Ramsey showed in 1929 that <math>R(k,\ell)</math> is finite for any <math>k</math> and <math>\ell</math>. It is extremely hard to compute the exact value of <math>R(k,\ell)</math>. Here we give a lower bound of <math>R(k,k)</math> by the probabilistic method.

{{Theorem
|Theorem (Erdős 1947)|
:If <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math> then it is possible to color the edges of <math>K_n</math> with two colors so that there is no monochromatic <math>K_k</math> subgraph.
}}
{{Proof| Consider a random two-coloring of edges of <math>K_n</math> obtained as follows:
* For each edge of <math>K_n</math>, independently flip a fair coin to decide the color of the edge.

For any fixed set <math>S</math> of <math>k</math> vertices, let <math>\mathcal{E}_S</math> be the event that the <math>K_k</math> subgraph induced by <math>S</math> is monochromatic. There are <math>{k\choose 2}</math> many edges in <math>K_k</math>, therefore
:<math>\Pr[\mathcal{E}_S]=2\cdot 2^{-{k\choose 2}}=2^{1-{k\choose 2}}.</math>

Since there are <math>{n\choose k}</math> possible choices of <math>S</math>, by the union bound
:<math>
\Pr[\exists S, \mathcal{E}_S]\le {n\choose k}\cdot\Pr[\mathcal{E}_S]={n\choose k}\cdot 2^{1-{k\choose 2}}.
</math>
Due to the assumption, <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math>, thus there exists a two coloring that none of <math>\mathcal{E}_S</math> occurs, which means there is no monochromatic <math>K_k</math> subgraph.
}}

For <math>k\ge 3</math> and we take <math>n=\lfloor2^{k/2}\rfloor</math>, then
:<math>
\begin{align}
{n\choose k}\cdot 2^{1-{k\choose 2}}
&<
\frac{n^k}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&\le
\frac{2^{k^2/2}}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&=
\frac{2^{1+\frac{k}{2}}}{k!}\\
&<1.
\end{align}
</math>
By the above theorem, there exists a two-coloring of <math>K_n</math> that there is no monochromatic <math>K_k</math>. Therefore, the Ramsey number <math>R(k,k)>\lfloor2^{k/2}\rfloor</math> for all <math>k\ge 3</math>.

Note that for sufficiently large <math>k</math>, if <math>n= \lfloor 2^{k/2}\rfloor</math>, then the probability that there exists a monochromatic <math>K_k</math> is bounded by
:<math>
{n\choose k}\cdot 2^{1-{k\choose 2}}
<
\frac{2^{1+\frac{k}{2}}}{k!}
\ll 1,
</math>
which means that a random two-coloring of <math>K_n</math> is very likely not to contain a monochromatic <math>K_{2\log n}</math>. This gives us a very simple randomized algorithm for finding a two-coloring of <math>K_n</math> without monochromatic <math>K_{2\log n}</math>.

;Tournament
A '''[http://en.wikipedia.org/wiki/Tournament_(graph_theory) tournament]''' (竞赛图) on a set <math>V</math> of <math>n</math> players is an '''orientation''' of the edges of the complete graph on the set of vertices <math>V</math>. Thus for every two distinct vertices <math>u,v</math> in <math>V</math>, either <math>(u,v)\in E</math> or <math>(v,u)\in E</math>, but not both.

We can think of the set <math>V</math> as a set of <math>n</math> players in which each pair participates in a single match, where <math>(u,v)</math> is in the tournament iff player <math>u</math> beats player <math>v</math>.

{{Theorem|Definition|
:We say that a tournament has '''property <math>S_k</math>''' if for every set of <math>k</math> players there is one who beats them all.
}}

Is it true for every finite <math>k</math>, there is a tournament (on more than <math>k</math> vertices, of course) with the property <math>S_k</math>? This problem was first raised by Schütte, and as shown by Erdős, can be solved almost trivially by the probabilistic method.

{{Theorem|Theorem (Erdős 1963)|
:If <math>{n\choose k}\left(1-2^{-k}\right)^{n-k}<1</math> then there is a tournament on <math>n</math> vertices that has the property <math>S_k</math>.
}}

=== Linearity of expectation ===

;Maximum cut

Given an undirected graph <math>G(V,E)</math>, a set <math>C</math> of edges of <math>G</math> is called a '''cut''' if <math>G</math> is disconnected after removing the edges in <math>C</math>. We can represent a cut by <math>c(S,T)</math> where <math>(S,T)</math> is a bipartition of the vertex set <math>V</math>, and <math>c(S,T)=\{uv\in E\mid u\in S,v\in T\}</math> is the set of edges crossing between <math>S</math> and <math>T</math>.

We have seen how to compute min-cut: either by deterministic max-flow algorithm, or by Karger's randomized algorithm. On the other hand, max-cut is hard to compute, because it is '''NP-complete'''. Actually, the weighted version of max-cut is among the [http://en.wikipedia.org/wiki/Karp's_21_NP-complete_problems Karp's 21 NP-complete problems].

We now show by the probabilistic method that a max-cut always has at least half the edges.

{{Theorem
|Theorem|
:Given an undirected graph <math>G</math> with <math>n</math> vertices and <math>m</math> edges, there is a cut of size at least <math>\frac{m}{2}</math>.
}}
{{Proof| Enumerate the vertices in an arbitrary order. Partition the vertex set <math>V</math> into two disjoint sets <math>S</math> and <math>T</math> as follows.
:For each vertex <math>v\in V</math>,
:* independently choose one of <math>S</math> and <math>T</math> with equal probability, and let <math>v</math> join the chosen set.

For each vertex <math>v\in V</math>, let <math>X_v\in\{S,T\}</math> be the random variable which represents the set that <math>v</math> joins. For each edge <math>uv\in E</math>, let <math>Y_{uv}</math> be the 0-1 random variable which indicates whether <math>uv</math> crosses between <math>S</math> and <math>T</math>. Clearly,
:<math>
\Pr[Y_{uv}=1]=\Pr[X_u\neq X_v]=\frac{1}{2}.
</math>

The size of <math>c(S,T)</math> is given by <math>Y=\sum_{uv\in E}Y_{uv}</math>. By the linearity of expectation,
:<math>
\mathbf{E}[Y]=\sum_{uv\in E}\mathbf{E}[Y_{uv}]=\sum_{uv\in E}\Pr[Y_{uv}=1]=\frac{m}{2}.
</math>
Therefore, there exist a bipartition <math>(S,T)</math> of <math>V</math> such that <math>|c(S,T)|\ge\frac{m}{2}</math>, i.e. there exists a cut of <math>G</math> which contains at least <math>\frac{m}{2}</math> edges.
}}

;Maximum satisfiability

Suppose that we have a number of boolean variables <math>x_1,x_2,\ldots,\in\{\mathrm{true},\mathrm{false}\}</math>. A '''literal''' is either a variable <math>x_i</math> itself or its negation <math>\neg x_i</math>. A logic expression is a '''conjunctive normal form (CNF)''' if it is written as the conjunction(AND) of a set of '''clauses''', where each clause is a disjunction(OR) of literals. For example:
:<math>
(x_1\vee \neg x_2 \vee \neg x_3)\wedge (\neg x_1\vee \neg x_3)\wedge (x_1\vee x_2\vee x_4)\wedge (x_4\vee \neg x_3)\wedge (x_4\vee \neg x_1).
</math>

The satisfiability (SAT) problem ask whether the CNF is satisfiable, i.e. there exists an assignment of variables to the values of true and false so that all clauses are true. The maximum satisfiability (MAXSAT) is the optimization version of SAT, which ask for an assignment that the number of satisfied clauses is maximized.

SAT is the first problem known to be '''NP-complete''' (the Cook-Levin theorem). MAXSAT is also '''NP-complete'''. We then see that there always exists a roughly good truth assignment which satisfies half the clauses.

{{Theorem
|Theorem|
:For any set of <math>m</math> clauses, there is a truth assignment that satisfies at least <math>\frac{m}{2}</math> clauses.
}}
{{Proof| For each variable, independently assign a random value in <math>\{\mathrm{true},\mathrm{false}\}</math> with equal probability. For the <math>i</math>th clause, let <math>X_i</math> be the random variable which indicates whether the <math>i</math>th clause is satisfied. Suppose that there are <math>k</math> literals in the clause. The probability that the clause is satisfied is
:<math>\Pr[X_k=1]\ge(1-2^{-k})\ge\frac{1}{2}</math>.

Let <math>X=\sum_{i=1}^m X_i</math> be the number of satisfied clauses. By the linearity of expectation,
:<math>
\mathbf{E}[X]=\sum_{i=1}^{m}\mathbf{E}[X_i]\ge \frac{m}{2}.
</math>
Therefore, there exists an assignment such that at least <math>\frac{m}{2}</math> clauses are satisfied.
}}

=== Alterations ===
;Independent sets
An independent set of a graph is a set of vertices with no edges between them. The following theorem gives a lower bound on the size of the largest independent set.
{{Theorem
|Theorem|
:Let <math>G(V,E)</math> be a graph on <math>n</math> vertices with <math>m</math> edges. Then <math>G</math> has an independent set with at least <math>\frac{n^2}{4m}</math> vertices.
}}
{{Proof| Let <math>S</math> be a set of vertices constructed as follows:
:For each vertex <math>v\in V</math>:
:* <math>v</math> is included in <math>S</math> independently with probability <math>p</math>,
<math>p</math> to be determined.

Let <math>X=|S|</math>. It is obvious that <math>\mathbf{E}[X]=np</math>.

For each edge <math>e\in E</math>, let <math>Y_{e}</math> be the random variable which indicates whether both endpoints of <math></math> are in <math>S</math>.
:<math>
\mathbf{E}[Y_{uv}]=\Pr[u\in S\wedge v\in S]=p^2.
</math>
Let <math>Y</math> be the number of edges in the subgraph of <math>G</math> induced by <math>S</math>. It holds that <math>Y=\sum_{e\in E}Y_e</math>. By linearity of expectation,
:<math>\mathbf{E}[Y]=\sum_{e\in E}\mathbf{E}[Y_e]=mp^2</math>.

Note that although <math>S</math> is not necessary an independent set, it can be modified to one if for each edge <math>e</math> of the induced subgraph <math>G(S)</math>, we delete one of the endpoint of <math>e</math> from <math>S</math>. Let <math>S^*</math> be the resulting set. It is obvious that <math>S^*</math> is an independent set since there is no edge left in the induced subgraph <math>G(S^*)</math>.

Since there are <math>Y</math> edges in <math>G(S)</math>, there are at most <math>Y</math> vertices in <math>S</math> are deleted to make it become <math>S^*</math>. Therefore, <math>|S^*|\ge X-Y</math>. By linearity of expectation,
:<math>
\mathbf{E}[|S^*|]\ge\mathbf{E}[X-Y]=\mathbf{E}[X]-\mathbf{E}[Y]=np-mp^2.
</math>
The expectation is maximized when <math>p=\frac{n}{2m}</math>, thus
:<math>
\mathbf{E}[|S^*|]\ge n\cdot\frac{n}{2m}-m\left(\frac{n}{2m}\right)^2=\frac{n^2}{4m}.
</math>
There exists an independent set which contains at least <math>\frac{n^2}{4m}</math> vertices.
}}

The proof actually propose a randomized algorithm for constructing large independent set:

{{Theorem
|Algorithm|
Given a graph on <math>n</math> vertices with <math>m</math> edges, let <math>d=\frac{2m}{n}</math> be the average degree.
#For each vertex <math>v\in V</math>, <math>v</math> is included in <math>S</math> independently with probability <math>\frac{1}{d}</math>.
#For each remaining edge in the induced subgraph <math>G(S)</math>, remove one of the endpoints from <math>S</math>.
}}

Let <math>S^*</math> be the resulting set. We have shown that <math>S^*</math> is an independent set and <math>\mathbf{E}[|S^*|]\ge\frac{n^2}{4m}</math>.

Combinatorics (Fall 2010)/Existence, the probabilistic method

2010-09-20T13:00:09Z

210.28.131.82: /* The Probabilistic Method */

== Counting arguments ==
;Circuit complexity

This is a fundamental problem in in Computer Science.

A '''boolean function''' is a function is the form <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Formally, a boolean circuit is a directed acyclic graph. Nodes with indegree zero are input nodes, labeled <math>x_1, x_2, \ldots , x_n</math>. A circuit has a unique node with outdegree zero, called the output node. Every other node is a gate. There are three types of gates: AND, OR (both with indegree two), and NOT (with indegree one).

Computations in Turing machines can be simulated by circuits, and any boolean function in '''P''' can be computed by a circuit with polynomially many gates. Thus, if we can find a function in '''NP''' that cannot be computed by any circuit with polynomially many gates, then '''NP'''<math>\neq</math>'''P'''.

The following theorem due to Shannon says that functions with exponentially large circuit complexity do exist.

{{Theorem
|Theorem (Shannon 1949)|
:There is a boolean function <math>f:\{0,1\}^n\rightarrow \{0,1\}</math> with circuit complexity greater than <math>\frac{2^n}{3n}</math>.
}}
{{Proof|
We first count the number of boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>. There are <math>2^{2^n}</math> boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Then we count the number of boolean circuit with fixed number of gates.
Fix an integer <math>t</math>, we count the number of circuits with <math>t</math> gates. By the [http://en.wikipedia.org/wiki/De_Morgan's_laws De Morgan's laws], we can assume that all NOTs are pushed back to the inputs. Each gate has one of the two types (AND or OR), and has two inputs. Each of the inputs to a gate is either a constant 0 or 1, an input variable <math>x_i</math>, an inverted input variable <math>\neg x_i</math>, or the output of another gate; thus, there are at most <math>2+2n+t-1</math> possible gate inputs. It follows that the number of circuits with <math>t</math> gates is at most <math>2^t(t+2n+1)^{2t}</math>.

If <math>t=2^n/3n</math>, then
:<math>
\frac{2^t(t+2n+1)^{2t}}{2^{2^n}}=o(1)<1,</math> thus, <math>2^t(t+2n+1)^{2t} < 2^{2^n}.</math>

Each boolean circuit computes one boolean function. Therefore, there must exist a boolean function <math>f</math> which cannot be computed by any circuits with <math>2^n/3n</math> gates.
}}

Note that by Shannon's theorem, not only there exists a boolean function with exponentially large circuit complexity, but ''almost all'' boolean functions have exponentially large circuit complexity.

=== Double counting ===
;Handshaking lemma
{{Theorem|Handshaking Lemma|
:At a party, the number of guests who shake hands an odd number of times is even.
}}

We model this scenario as an undirected graph <math>G(V,E)</math> with <math>|V|=n</math> standing for the <math>n</math> guests. There is an edge <math>uv\in E</math> if <math>u</math> and <math>v</math> shake hands. Let <math>d(v)</math> be the degree of vertex <math>v</math>, which represents the number of times that <math>v</math> shakes hand. The handshaking lemma states that in any undirected graph, the sum of odd degrees is even.

The handshaking lemma is a direct consequence of the following lemma, which is proved by Euler in a 1736 paper that began the study of graph theory.

{{Theorem|Lemma (Euler 1736)|
:<math>\sum_{v\in V}d(v)=2|E|</math>
}}
{{Proof|
We count the number of '''directed''' edges. A directed edge is an ordered pair <math>(u,v)</math> such that <math>\{u,v\}\in E</math>. There are two ways to count the directed edges.

First, we can enumerate by edges. Pick every edge <math>uv\in E</math> and apply two directions <math>(u,v)</math> and <math>(v,u)</math> to the edge. This gives us <math>2|E|</math> directed edges.

On the other hand, we can enumerate by vertices. Pick every vertex <math>v\in V</math> and for each of its <math>d(v)</math> neighbors, say <math>u</math>, generate a directed edge <math>(v,u)</math>. This gives us <math>\sum_{v\in V}d(v)</math> directed edges.

It is obvious that the two terms are equal, since we just count the same thing twice with different methods. The lemma follows.
}}

The handshaking lemma is implied directly by the above lemma, since the sum of even degrees is even.

;Cayley's formula
{{Theorem|Caylay's formula for trees|
: There are <math>n^{n-2}</math> different trees on <math>n</math> distinct vertices.
}}

== The Pigeonhole Principle ==

=== Monotonic subsequences ===

{{Theorem|Theorem (Erdős-Szekeres 1935)|
:A sequence of more than <math>mn</math> different real numbers must contain either an increasing subsequence of length <math>m+1</math>, or a decreasing subsequence of length <math>n+1</math>.
}}
{{Proof|(due to Seidenberg 1959)
}}

=== Dirichlet's theorem ===

{{Theorem|Theorem (Dirichlet 1879)|
:Let <math>x</math> be a real number. For any natural number <math>n</math>, there is a rational number <math>\frac{p}{q}</math> such that <math>1\le q\le n</math> and
::<math>\left|x-\frac{p}{q}\right|<\frac{1}{nq}</math>.
}}

== The Probabilistic Method ==
The probabilistic method provides another way of proving the existence of objects: instead of explicitly constructing an object, we define a probability space of objects in which the probability is positive that a randomly selected object has the required property.

The basic principle of the probabilistic method is very simple, and can be stated in intuitive ways:
*If an object chosen randomly from a universe satisfies a property with positive probability, then there must be an object in the universe that satisfies that property.
:For example, for a ball(the object) randomly chosen from a box(the universe) of balls, if the probability that the chosen ball is blue(the property) is >0, then there must be a blue ball in the box.
*Any random variable assumes at least one value that is no smaller than its expectation, and at least one value that is no greater than the expectation.
:For example, if we know the average height of the students in the class is <math>\ell</math>, then we know there is a students whose height is at least <math>\ell</math>, and there is a student whose height is at most <math>\ell</math>.

Although the idea of the probabilistic method is simple, it provides us a powerful tool for existential proof.

=== Sampling ===
;Ramsey number

Recall the Ramsey theorem which states that in a meeting of at least six people, there are either three people knowing each other or three people not knowing each other. In graph theoretical terms, this means that no matter how we color the edges of <math>K_6</math> (the complete graph on six vertices), there must be a '''monochromatic''' <math>K_3</math> (a triangle whose edges have the same color).

Generally, the '''Ramsey number''' <math>R(k,\ell)</math> is the smallest integer <math>n</math> such that in any two-coloring of the edges of a complete graph on <math>n</math> vertices <math>K_n</math> by red and blue, either there is a red <math>K_k</math> or there is a blue <math>K_\ell</math>.

Ramsey showed in 1929 that <math>R(k,\ell)</math> is finite for any <math>k</math> and <math>\ell</math>. It is extremely hard to compute the exact value of <math>R(k,\ell)</math>. Here we give a lower bound of <math>R(k,k)</math> by the probabilistic method.

{{Theorem
|Theorem (Erdős 1947)|
:If <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math> then it is possible to color the edges of <math>K_n</math> with two colors so that there is no monochromatic <math>K_k</math> subgraph.
}}
{{Proof| Consider a random two-coloring of edges of <math>K_n</math> obtained as follows:
* For each edge of <math>K_n</math>, independently flip a fair coin to decide the color of the edge.

For any fixed set <math>S</math> of <math>k</math> vertices, let <math>\mathcal{E}_S</math> be the event that the <math>K_k</math> subgraph induced by <math>S</math> is monochromatic. There are <math>{k\choose 2}</math> many edges in <math>K_k</math>, therefore
:<math>\Pr[\mathcal{E}_S]=2\cdot 2^{-{k\choose 2}}=2^{1-{k\choose 2}}.</math>

Since there are <math>{n\choose k}</math> possible choices of <math>S</math>, by the union bound
:<math>
\Pr[\exists S, \mathcal{E}_S]\le {n\choose k}\cdot\Pr[\mathcal{E}_S]={n\choose k}\cdot 2^{1-{k\choose 2}}.
</math>
Due to the assumption, <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math>, thus there exists a two coloring that none of <math>\mathcal{E}_S</math> occurs, which means there is no monochromatic <math>K_k</math> subgraph.
}}

For <math>k\ge 3</math> and we take <math>n=\lfloor2^{k/2}\rfloor</math>, then
:<math>
\begin{align}
{n\choose k}\cdot 2^{1-{k\choose 2}}
&<
\frac{n^k}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&\le
\frac{2^{k^2/2}}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&=
\frac{2^{1+\frac{k}{2}}}{k!}\\
&<1.
\end{align}
</math>
By the above theorem, there exists a two-coloring of <math>K_n</math> that there is no monochromatic <math>K_k</math>. Therefore, the Ramsey number <math>R(k,k)>\lfloor2^{k/2}\rfloor</math> for all <math>k\ge 3</math>.

Note that for sufficiently large <math>k</math>, if <math>n= \lfloor 2^{k/2}\rfloor</math>, then the probability that there exists a monochromatic <math>K_k</math> is bounded by
:<math>
{n\choose k}\cdot 2^{1-{k\choose 2}}
<
\frac{2^{1+\frac{k}{2}}}{k!}
\ll 1,
</math>
which means that a random two-coloring of <math>K_n</math> is very likely not to contain a monochromatic <math>K_{2\log n}</math>. This gives us a very simple randomized algorithm for finding a two-coloring of <math>K_n</math> without monochromatic <math>K_{2\log n}</math>.

=== Linearity of expectation ===

;Maximum cut

Given an undirected graph <math>G(V,E)</math>, a set <math>C</math> of edges of <math>G</math> is called a '''cut''' if <math>G</math> is disconnected after removing the edges in <math>C</math>. We can represent a cut by <math>c(S,T)</math> where <math>(S,T)</math> is a bipartition of the vertex set <math>V</math>, and <math>c(S,T)=\{uv\in E\mid u\in S,v\in T\}</math> is the set of edges crossing between <math>S</math> and <math>T</math>.

We have seen how to compute min-cut: either by deterministic max-flow algorithm, or by Karger's randomized algorithm. On the other hand, max-cut is hard to compute, because it is '''NP-complete'''. Actually, the weighted version of max-cut is among the [http://en.wikipedia.org/wiki/Karp's_21_NP-complete_problems Karp's 21 NP-complete problems].

We now show by the probabilistic method that a max-cut always has at least half the edges.

{{Theorem
|Theorem|
:Given an undirected graph <math>G</math> with <math>n</math> vertices and <math>m</math> edges, there is a cut of size at least <math>\frac{m}{2}</math>.
}}
{{Proof| Enumerate the vertices in an arbitrary order. Partition the vertex set <math>V</math> into two disjoint sets <math>S</math> and <math>T</math> as follows.
:For each vertex <math>v\in V</math>,
:* independently choose one of <math>S</math> and <math>T</math> with equal probability, and let <math>v</math> join the chosen set.

For each vertex <math>v\in V</math>, let <math>X_v\in\{S,T\}</math> be the random variable which represents the set that <math>v</math> joins. For each edge <math>uv\in E</math>, let <math>Y_{uv}</math> be the 0-1 random variable which indicates whether <math>uv</math> crosses between <math>S</math> and <math>T</math>. Clearly,
:<math>
\Pr[Y_{uv}=1]=\Pr[X_u\neq X_v]=\frac{1}{2}.
</math>

The size of <math>c(S,T)</math> is given by <math>Y=\sum_{uv\in E}Y_{uv}</math>. By the linearity of expectation,
:<math>
\mathbf{E}[Y]=\sum_{uv\in E}\mathbf{E}[Y_{uv}]=\sum_{uv\in E}\Pr[Y_{uv}=1]=\frac{m}{2}.
</math>
Therefore, there exist a bipartition <math>(S,T)</math> of <math>V</math> such that <math>|c(S,T)|\ge\frac{m}{2}</math>, i.e. there exists a cut of <math>G</math> which contains at least <math>\frac{m}{2}</math> edges.
}}

;Maximum satisfiability

Suppose that we have a number of boolean variables <math>x_1,x_2,\ldots,\in\{\mathrm{true},\mathrm{false}\}</math>. A '''literal''' is either a variable <math>x_i</math> itself or its negation <math>\neg x_i</math>. A logic expression is a '''conjunctive normal form (CNF)''' if it is written as the conjunction(AND) of a set of '''clauses''', where each clause is a disjunction(OR) of literals. For example:
:<math>
(x_1\vee \neg x_2 \vee \neg x_3)\wedge (\neg x_1\vee \neg x_3)\wedge (x_1\vee x_2\vee x_4)\wedge (x_4\vee \neg x_3)\wedge (x_4\vee \neg x_1).
</math>

The satisfiability (SAT) problem ask whether the CNF is satisfiable, i.e. there exists an assignment of variables to the values of true and false so that all clauses are true. The maximum satisfiability (MAXSAT) is the optimization version of SAT, which ask for an assignment that the number of satisfied clauses is maximized.

SAT is the first problem known to be '''NP-complete''' (the Cook-Levin theorem). MAXSAT is also '''NP-complete'''. We then see that there always exists a roughly good truth assignment which satisfies half the clauses.

{{Theorem
|Theorem|
:For any set of <math>m</math> clauses, there is a truth assignment that satisfies at least <math>\frac{m}{2}</math> clauses.
}}
{{Proof| For each variable, independently assign a random value in <math>\{\mathrm{true},\mathrm{false}\}</math> with equal probability. For the <math>i</math>th clause, let <math>X_i</math> be the random variable which indicates whether the <math>i</math>th clause is satisfied. Suppose that there are <math>k</math> literals in the clause. The probability that the clause is satisfied is
:<math>\Pr[X_k=1]\ge(1-2^{-k})\ge\frac{1}{2}</math>.

Let <math>X=\sum_{i=1}^m X_i</math> be the number of satisfied clauses. By the linearity of expectation,
:<math>
\mathbf{E}[X]=\sum_{i=1}^{m}\mathbf{E}[X_i]\ge \frac{m}{2}.
</math>
Therefore, there exists an assignment such that at least <math>\frac{m}{2}</math> clauses are satisfied.
}}

=== Alterations ===
;Independent sets
An independent set of a graph is a set of vertices with no edges between them. The following theorem gives a lower bound on the size of the largest independent set.
{{Theorem
|Theorem|
:Let <math>G(V,E)</math> be a graph on <math>n</math> vertices with <math>m</math> edges. Then <math>G</math> has an independent set with at least <math>\frac{n^2}{4m}</math> vertices.
}}
{{Proof| Let <math>S</math> be a set of vertices constructed as follows:
:For each vertex <math>v\in V</math>:
:* <math>v</math> is included in <math>S</math> independently with probability <math>p</math>,
<math>p</math> to be determined.

Let <math>X=|S|</math>. It is obvious that <math>\mathbf{E}[X]=np</math>.

For each edge <math>e\in E</math>, let <math>Y_{e}</math> be the random variable which indicates whether both endpoints of <math></math> are in <math>S</math>.
:<math>
\mathbf{E}[Y_{uv}]=\Pr[u\in S\wedge v\in S]=p^2.
</math>
Let <math>Y</math> be the number of edges in the subgraph of <math>G</math> induced by <math>S</math>. It holds that <math>Y=\sum_{e\in E}Y_e</math>. By linearity of expectation,
:<math>\mathbf{E}[Y]=\sum_{e\in E}\mathbf{E}[Y_e]=mp^2</math>.

Note that although <math>S</math> is not necessary an independent set, it can be modified to one if for each edge <math>e</math> of the induced subgraph <math>G(S)</math>, we delete one of the endpoint of <math>e</math> from <math>S</math>. Let <math>S^*</math> be the resulting set. It is obvious that <math>S^*</math> is an independent set since there is no edge left in the induced subgraph <math>G(S^*)</math>.

Since there are <math>Y</math> edges in <math>G(S)</math>, there are at most <math>Y</math> vertices in <math>S</math> are deleted to make it become <math>S^*</math>. Therefore, <math>|S^*|\ge X-Y</math>. By linearity of expectation,
:<math>
\mathbf{E}[|S^*|]\ge\mathbf{E}[X-Y]=\mathbf{E}[X]-\mathbf{E}[Y]=np-mp^2.
</math>
The expectation is maximized when <math>p=\frac{n}{2m}</math>, thus
:<math>
\mathbf{E}[|S^*|]\ge n\cdot\frac{n}{2m}-m\left(\frac{n}{2m}\right)^2=\frac{n^2}{4m}.
</math>
There exists an independent set which contains at least <math>\frac{n^2}{4m}</math> vertices.
}}

The proof actually propose a randomized algorithm for constructing large independent set:

{{Theorem
|Algorithm|
Given a graph on <math>n</math> vertices with <math>m</math> edges, let <math>d=\frac{2m}{n}</math> be the average degree.
#For each vertex <math>v\in V</math>, <math>v</math> is included in <math>S</math> independently with probability <math>\frac{1}{d}</math>.
#For each remaining edge in the induced subgraph <math>G(S)</math>, remove one of the endpoints from <math>S</math>.
}}

Let <math>S^*</math> be the resulting set. We have shown that <math>S^*</math> is an independent set and <math>\mathbf{E}[|S^*|]\ge\frac{n^2}{4m}</math>.

Combinatorics (Fall 2010)/Existence, the probabilistic method

2010-09-20T12:59:29Z

210.28.131.82: /* The Probabilistic Method */

== Counting arguments ==
;Circuit complexity

This is a fundamental problem in in Computer Science.

A '''boolean function''' is a function is the form <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Formally, a boolean circuit is a directed acyclic graph. Nodes with indegree zero are input nodes, labeled <math>x_1, x_2, \ldots , x_n</math>. A circuit has a unique node with outdegree zero, called the output node. Every other node is a gate. There are three types of gates: AND, OR (both with indegree two), and NOT (with indegree one).

Computations in Turing machines can be simulated by circuits, and any boolean function in '''P''' can be computed by a circuit with polynomially many gates. Thus, if we can find a function in '''NP''' that cannot be computed by any circuit with polynomially many gates, then '''NP'''<math>\neq</math>'''P'''.

The following theorem due to Shannon says that functions with exponentially large circuit complexity do exist.

{{Theorem
|Theorem (Shannon 1949)|
:There is a boolean function <math>f:\{0,1\}^n\rightarrow \{0,1\}</math> with circuit complexity greater than <math>\frac{2^n}{3n}</math>.
}}
{{Proof|
We first count the number of boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>. There are <math>2^{2^n}</math> boolean functions <math>f:\{0,1\}^n\rightarrow \{0,1\}</math>.

Then we count the number of boolean circuit with fixed number of gates.
Fix an integer <math>t</math>, we count the number of circuits with <math>t</math> gates. By the [http://en.wikipedia.org/wiki/De_Morgan's_laws De Morgan's laws], we can assume that all NOTs are pushed back to the inputs. Each gate has one of the two types (AND or OR), and has two inputs. Each of the inputs to a gate is either a constant 0 or 1, an input variable <math>x_i</math>, an inverted input variable <math>\neg x_i</math>, or the output of another gate; thus, there are at most <math>2+2n+t-1</math> possible gate inputs. It follows that the number of circuits with <math>t</math> gates is at most <math>2^t(t+2n+1)^{2t}</math>.

If <math>t=2^n/3n</math>, then
:<math>
\frac{2^t(t+2n+1)^{2t}}{2^{2^n}}=o(1)<1,</math> thus, <math>2^t(t+2n+1)^{2t} < 2^{2^n}.</math>

Each boolean circuit computes one boolean function. Therefore, there must exist a boolean function <math>f</math> which cannot be computed by any circuits with <math>2^n/3n</math> gates.
}}

Note that by Shannon's theorem, not only there exists a boolean function with exponentially large circuit complexity, but ''almost all'' boolean functions have exponentially large circuit complexity.

=== Double counting ===
;Handshaking lemma
{{Theorem|Handshaking Lemma|
:At a party, the number of guests who shake hands an odd number of times is even.
}}

We model this scenario as an undirected graph <math>G(V,E)</math> with <math>|V|=n</math> standing for the <math>n</math> guests. There is an edge <math>uv\in E</math> if <math>u</math> and <math>v</math> shake hands. Let <math>d(v)</math> be the degree of vertex <math>v</math>, which represents the number of times that <math>v</math> shakes hand. The handshaking lemma states that in any undirected graph, the sum of odd degrees is even.

The handshaking lemma is a direct consequence of the following lemma, which is proved by Euler in a 1736 paper that began the study of graph theory.

{{Theorem|Lemma (Euler 1736)|
:<math>\sum_{v\in V}d(v)=2|E|</math>
}}
{{Proof|
We count the number of '''directed''' edges. A directed edge is an ordered pair <math>(u,v)</math> such that <math>\{u,v\}\in E</math>. There are two ways to count the directed edges.

First, we can enumerate by edges. Pick every edge <math>uv\in E</math> and apply two directions <math>(u,v)</math> and <math>(v,u)</math> to the edge. This gives us <math>2|E|</math> directed edges.

On the other hand, we can enumerate by vertices. Pick every vertex <math>v\in V</math> and for each of its <math>d(v)</math> neighbors, say <math>u</math>, generate a directed edge <math>(v,u)</math>. This gives us <math>\sum_{v\in V}d(v)</math> directed edges.

It is obvious that the two terms are equal, since we just count the same thing twice with different methods. The lemma follows.
}}

The handshaking lemma is implied directly by the above lemma, since the sum of even degrees is even.

;Cayley's formula
{{Theorem|Caylay's formula for trees|
: There are <math>n^{n-2}</math> different trees on <math>n</math> distinct vertices.
}}

== The Pigeonhole Principle ==

=== Monotonic subsequences ===

{{Theorem|Theorem (Erdős-Szekeres 1935)|
:A sequence of more than <math>mn</math> different real numbers must contain either an increasing subsequence of length <math>m+1</math>, or a decreasing subsequence of length <math>n+1</math>.
}}
{{Proof|(due to Seidenberg 1959)
}}

=== Dirichlet's theorem ===

{{Theorem|Theorem (Dirichlet 1879)|
:Let <math>x</math> be a real number. For any natural number <math>n</math>, there is a rational number <math>\frac{p}{q}</math> such that <math>1\le q\le n</math> and
::<math>\left|x-\frac{p}{q}\right|<\frac{1}{nq}</math>.
}}

== The Probabilistic Method ==

Suppose we want prove the existence of mathematic objects with certain properties. One way to do so is to explicitly construct such an object. This kind of proofs can be interpreted as ''deterministic algorithms'' which find the object with desirable properties.

The probabilistic method provides another way of proving the existence of objects: instead of explicitly constructing an object, we define a probability space of objects in which the probability is positive that a randomly selected object has the required property.

The basic principle of the probabilistic method is very simple, and can be stated in intuitive ways:
*If an object chosen randomly from a universe satisfies a property with positive probability, then there must be an object in the universe that satisfies that property.
:For example, for a ball(the object) randomly chosen from a box(the universe) of balls, if the probability that the chosen ball is blue(the property) is >0, then there must be a blue ball in the box.
*Any random variable assumes at least one value that is no smaller than its expectation, and at least one value that is no greater than the expectation.
:For example, if we know the average height of the students in the class is <math>\ell</math>, then we know there is a students whose height is at least <math>\ell</math>, and there is a student whose height is at most <math>\ell</math>.

Although the idea of the probabilistic method is simple, it provides us a powerful tool for existential proof.

=== Sampling ===
;Ramsey number

Recall the Ramsey theorem which states that in a meeting of at least six people, there are either three people knowing each other or three people not knowing each other. In graph theoretical terms, this means that no matter how we color the edges of <math>K_6</math> (the complete graph on six vertices), there must be a '''monochromatic''' <math>K_3</math> (a triangle whose edges have the same color).

Generally, the '''Ramsey number''' <math>R(k,\ell)</math> is the smallest integer <math>n</math> such that in any two-coloring of the edges of a complete graph on <math>n</math> vertices <math>K_n</math> by red and blue, either there is a red <math>K_k</math> or there is a blue <math>K_\ell</math>.

Ramsey showed in 1929 that <math>R(k,\ell)</math> is finite for any <math>k</math> and <math>\ell</math>. It is extremely hard to compute the exact value of <math>R(k,\ell)</math>. Here we give a lower bound of <math>R(k,k)</math> by the probabilistic method.

{{Theorem
|Theorem (Erdős 1947)|
:If <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math> then it is possible to color the edges of <math>K_n</math> with two colors so that there is no monochromatic <math>K_k</math> subgraph.
}}
{{Proof| Consider a random two-coloring of edges of <math>K_n</math> obtained as follows:
* For each edge of <math>K_n</math>, independently flip a fair coin to decide the color of the edge.

For any fixed set <math>S</math> of <math>k</math> vertices, let <math>\mathcal{E}_S</math> be the event that the <math>K_k</math> subgraph induced by <math>S</math> is monochromatic. There are <math>{k\choose 2}</math> many edges in <math>K_k</math>, therefore
:<math>\Pr[\mathcal{E}_S]=2\cdot 2^{-{k\choose 2}}=2^{1-{k\choose 2}}.</math>

Since there are <math>{n\choose k}</math> possible choices of <math>S</math>, by the union bound
:<math>
\Pr[\exists S, \mathcal{E}_S]\le {n\choose k}\cdot\Pr[\mathcal{E}_S]={n\choose k}\cdot 2^{1-{k\choose 2}}.
</math>
Due to the assumption, <math>{n\choose k}\cdot 2^{1-{k\choose 2}}<1</math>, thus there exists a two coloring that none of <math>\mathcal{E}_S</math> occurs, which means there is no monochromatic <math>K_k</math> subgraph.
}}

For <math>k\ge 3</math> and we take <math>n=\lfloor2^{k/2}\rfloor</math>, then
:<math>
\begin{align}
{n\choose k}\cdot 2^{1-{k\choose 2}}
&<
\frac{n^k}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&\le
\frac{2^{k^2/2}}{k!}\cdot\frac{2^{1+\frac{k}{2}}}{2^{k^2/2}}\\
&=
\frac{2^{1+\frac{k}{2}}}{k!}\\
&<1.
\end{align}
</math>
By the above theorem, there exists a two-coloring of <math>K_n</math> that there is no monochromatic <math>K_k</math>. Therefore, the Ramsey number <math>R(k,k)>\lfloor2^{k/2}\rfloor</math> for all <math>k\ge 3</math>.

Note that for sufficiently large <math>k</math>, if <math>n= \lfloor 2^{k/2}\rfloor</math>, then the probability that there exists a monochromatic <math>K_k</math> is bounded by
:<math>
{n\choose k}\cdot 2^{1-{k\choose 2}}
<
\frac{2^{1+\frac{k}{2}}}{k!}
\ll 1,
</math>
which means that a random two-coloring of <math>K_n</math> is very likely not to contain a monochromatic <math>K_{2\log n}</math>. This gives us a very simple randomized algorithm for finding a two-coloring of <math>K_n</math> without monochromatic <math>K_{2\log n}</math>.

=== Linearity of expectation ===

;Maximum cut

Given an undirected graph <math>G(V,E)</math>, a set <math>C</math> of edges of <math>G</math> is called a '''cut''' if <math>G</math> is disconnected after removing the edges in <math>C</math>. We can represent a cut by <math>c(S,T)</math> where <math>(S,T)</math> is a bipartition of the vertex set <math>V</math>, and <math>c(S,T)=\{uv\in E\mid u\in S,v\in T\}</math> is the set of edges crossing between <math>S</math> and <math>T</math>.

We have seen how to compute min-cut: either by deterministic max-flow algorithm, or by Karger's randomized algorithm. On the other hand, max-cut is hard to compute, because it is '''NP-complete'''. Actually, the weighted version of max-cut is among the [http://en.wikipedia.org/wiki/Karp's_21_NP-complete_problems Karp's 21 NP-complete problems].

We now show by the probabilistic method that a max-cut always has at least half the edges.

{{Theorem
|Theorem|
:Given an undirected graph <math>G</math> with <math>n</math> vertices and <math>m</math> edges, there is a cut of size at least <math>\frac{m}{2}</math>.
}}
{{Proof| Enumerate the vertices in an arbitrary order. Partition the vertex set <math>V</math> into two disjoint sets <math>S</math> and <math>T</math> as follows.
:For each vertex <math>v\in V</math>,
:* independently choose one of <math>S</math> and <math>T</math> with equal probability, and let <math>v</math> join the chosen set.

For each vertex <math>v\in V</math>, let <math>X_v\in\{S,T\}</math> be the random variable which represents the set that <math>v</math> joins. For each edge <math>uv\in E</math>, let <math>Y_{uv}</math> be the 0-1 random variable which indicates whether <math>uv</math> crosses between <math>S</math> and <math>T</math>. Clearly,
:<math>
\Pr[Y_{uv}=1]=\Pr[X_u\neq X_v]=\frac{1}{2}.
</math>

The size of <math>c(S,T)</math> is given by <math>Y=\sum_{uv\in E}Y_{uv}</math>. By the linearity of expectation,
:<math>
\mathbf{E}[Y]=\sum_{uv\in E}\mathbf{E}[Y_{uv}]=\sum_{uv\in E}\Pr[Y_{uv}=1]=\frac{m}{2}.
</math>
Therefore, there exist a bipartition <math>(S,T)</math> of <math>V</math> such that <math>|c(S,T)|\ge\frac{m}{2}</math>, i.e. there exists a cut of <math>G</math> which contains at least <math>\frac{m}{2}</math> edges.
}}

;Maximum satisfiability

Suppose that we have a number of boolean variables <math>x_1,x_2,\ldots,\in\{\mathrm{true},\mathrm{false}\}</math>. A '''literal''' is either a variable <math>x_i</math> itself or its negation <math>\neg x_i</math>. A logic expression is a '''conjunctive normal form (CNF)''' if it is written as the conjunction(AND) of a set of '''clauses''', where each clause is a disjunction(OR) of literals. For example:
:<math>
(x_1\vee \neg x_2 \vee \neg x_3)\wedge (\neg x_1\vee \neg x_3)\wedge (x_1\vee x_2\vee x_4)\wedge (x_4\vee \neg x_3)\wedge (x_4\vee \neg x_1).
</math>

The satisfiability (SAT) problem ask whether the CNF is satisfiable, i.e. there exists an assignment of variables to the values of true and false so that all clauses are true. The maximum satisfiability (MAXSAT) is the optimization version of SAT, which ask for an assignment that the number of satisfied clauses is maximized.

SAT is the first problem known to be '''NP-complete''' (the Cook-Levin theorem). MAXSAT is also '''NP-complete'''. We then see that there always exists a roughly good truth assignment which satisfies half the clauses.

{{Theorem
|Theorem|
:For any set of <math>m</math> clauses, there is a truth assignment that satisfies at least <math>\frac{m}{2}</math> clauses.
}}
{{Proof| For each variable, independently assign a random value in <math>\{\mathrm{true},\mathrm{false}\}</math> with equal probability. For the <math>i</math>th clause, let <math>X_i</math> be the random variable which indicates whether the <math>i</math>th clause is satisfied. Suppose that there are <math>k</math> literals in the clause. The probability that the clause is satisfied is
:<math>\Pr[X_k=1]\ge(1-2^{-k})\ge\frac{1}{2}</math>.

Let <math>X=\sum_{i=1}^m X_i</math> be the number of satisfied clauses. By the linearity of expectation,
:<math>
\mathbf{E}[X]=\sum_{i=1}^{m}\mathbf{E}[X_i]\ge \frac{m}{2}.
</math>
Therefore, there exists an assignment such that at least <math>\frac{m}{2}</math> clauses are satisfied.
}}

=== Alterations ===
;Independent sets
An independent set of a graph is a set of vertices with no edges between them. The following theorem gives a lower bound on the size of the largest independent set.
{{Theorem
|Theorem|
:Let <math>G(V,E)</math> be a graph on <math>n</math> vertices with <math>m</math> edges. Then <math>G</math> has an independent set with at least <math>\frac{n^2}{4m}</math> vertices.
}}
{{Proof| Let <math>S</math> be a set of vertices constructed as follows:
:For each vertex <math>v\in V</math>:
:* <math>v</math> is included in <math>S</math> independently with probability <math>p</math>,
<math>p</math> to be determined.

Let <math>X=|S|</math>. It is obvious that <math>\mathbf{E}[X]=np</math>.

For each edge <math>e\in E</math>, let <math>Y_{e}</math> be the random variable which indicates whether both endpoints of <math></math> are in <math>S</math>.
:<math>
\mathbf{E}[Y_{uv}]=\Pr[u\in S\wedge v\in S]=p^2.
</math>
Let <math>Y</math> be the number of edges in the subgraph of <math>G</math> induced by <math>S</math>. It holds that <math>Y=\sum_{e\in E}Y_e</math>. By linearity of expectation,
:<math>\mathbf{E}[Y]=\sum_{e\in E}\mathbf{E}[Y_e]=mp^2</math>.

Note that although <math>S</math> is not necessary an independent set, it can be modified to one if for each edge <math>e</math> of the induced subgraph <math>G(S)</math>, we delete one of the endpoint of <math>e</math> from <math>S</math>. Let <math>S^*</math> be the resulting set. It is obvious that <math>S^*</math> is an independent set since there is no edge left in the induced subgraph <math>G(S^*)</math>.

Since there are <math>Y</math> edges in <math>G(S)</math>, there are at most <math>Y</math> vertices in <math>S</math> are deleted to make it become <math>S^*</math>. Therefore, <math>|S^*|\ge X-Y</math>. By linearity of expectation,
:<math>
\mathbf{E}[|S^*|]\ge\mathbf{E}[X-Y]=\mathbf{E}[X]-\mathbf{E}[Y]=np-mp^2.
</math>
The expectation is maximized when <math>p=\frac{n}{2m}</math>, thus
:<math>
\mathbf{E}[|S^*|]\ge n\cdot\frac{n}{2m}-m\left(\frac{n}{2m}\right)^2=\frac{n^2}{4m}.
</math>
There exists an independent set which contains at least <math>\frac{n^2}{4m}</math> vertices.
}}

The proof actually propose a randomized algorithm for constructing large independent set:

{{Theorem
|Algorithm|
Given a graph on <math>n</math> vertices with <math>m</math> edges, let <math>d=\frac{2m}{n}</math> be the average degree.
#For each vertex <math>v\in V</math>, <math>v</math> is included in <math>S</math> independently with probability <math>\frac{1}{d}</math>.
#For each remaining edge in the induced subgraph <math>G(S)</math>, remove one of the endpoints from <math>S</math>.
}}

Let <math>S^*</math> be the resulting set. We have shown that <math>S^*</math> is an independent set and <math>\mathbf{E}[|S^*|]\ge\frac{n^2}{4m}</math>.

Combinatorics (Fall 2010)/Partitions, sieve methods

2010-09-15T00:53:38Z

210.28.131.82: /* Principle of Inclusion-Exclusion */

== Partitions ==
We count the ways of partitioning <math>n</math> ''identical'' objects into <math>k</math> ''unordered'' groups. This is equivalent to counting the ways partitioning a number <math>n</math> into <math>k</math> unordered parts.

A '''<math>k</math>-partition''' of a number <math>n</math> is a multiset <math>\{x_1,x_2,\ldots,x_k\}</math> with <math>x_i\ge 1</math> for every element <math>x_i</math> and <math>x_1+x_2+\cdots+x_k=n</math>.

We define <math>p_k(n)</math> as the number of <math>k</math>-partitions of <math>n</math>.

For example, number 7 has the following partitions:
<div class="center"><math>
\begin{align}
&\{7\}
& p_1(7)=1\\
&\{1,6\},\{2,5\},\{3,4\}
& p_2(7)=3\\
&\{1,1,5\}, \{1,2,4\}, \{1,3,3\}, \{2,2,3\}
& p_3(7)=4\\
&\{1,1,1,4\},\{1,1,2,3\}, \{1,2,2,2\}
& p_4(7)=3\\
&\{1,1,1,1,3\},\{1,1,1,2,2\}
& p_5(7)=2\\
&\{1,1,1,1,1,2\}
& p_6(7)=1\\
&\{1,1,1,1,1,1,1\}
& p_7(7)=1
\end{align}
</math></div>

Equivalently, we can also define that A <math>k</math>-partition of a number <math>n</math> is a <math>k</math>-tuple <math>(x_1,x_2,\ldots,x_k)</math> with:
* <math>x_1\ge x_2\ge\cdots\ge x_k\ge 1</math>;
* <math>x_1+x_2+\cdots+x_k=n</math>.

<math>p_k(n)</math> the number of integral solutions to the above system.

Let <math>p(n)=\sum_{k=1}^n p_k(n)</math> be the total number of partitions of <math>n</math>. The function <math>p(n)</math> is called the '''partition number'''.

=== Counting <math>p_k(n)</math>===
We now try to determine <math>p_k(n)</math>. Unlike most problems we learned in the last lecture, <math>p_k(n)</math> does not have a nice closed form formula. We now give a recurrence for <math>p_k(n)</math>.

{{Theorem|Proposition|
:<math>p_k(n)=p_{k-1}(n-1)+p_k(n-k)\,</math>.
}}
{{Proof|
Suppose that <math>(x_1,\ldots,x_k)</math> is a <math>k</math>-partition of <math>n</math>. Note that it must hold that
:<math>x_1\ge x_2\ge \cdots \ge x_k\ge 1</math>.
There are two cases: <math>x_k=1</math> or <math>x_k>1</math>.
;Case 1.
:If <math>x_k=1</math>, then <math>(x_1,\cdots,x_{k-1})</math> is a distinct <math>(k-1)</math>-partition of <math>n-1</math>. And every <math>(k-1)</math>-partition of <math>n-1</math> can be obtained in this way. Thus the number of <math>k</math>-partitions of <math>n</math> in this case is <math>p_{k-1}(n-1)</math>.
;Case 2.
:If <math>x_k>1</math>, then <math>(x_1-1,\cdots,x_{k}-1)</math> is a distinct <math>k</math>-partition of <math>n-k</math>. And every <math>k</math>-partition of <math>n-k</math> can be obtained in this way. Thus the number of <math>k</math>-partitions of <math>n</math> in this case is <math>p_{k}(n-k)</math>.
In conclusion, the number of <math>k</math>-partitions of <math>n</math> is <math>p_{k-1}(n-1)+p_k(n-k)</math>, i.e.
:<math>p_k(n)=p_{k-1}(n-1)+p_k(n-k)\,</math>.
}}

Use the above recurrence, we can compute the <math>p_k(n)</math> for some decent <math>n</math> and <math>k</math> by computer simulation.

If we are not restricted ourselves to the precise estimation of <math>p_k(n)</math>, the next theorem gives an asymptotic estimation of <math>p_k(n)</math>. Note that it only holds for '''constant''' <math>k</math>, i.e. <math>k</math> does not depend on <math>n</math>.

{{Theorem|Theorem|
For any fixed <math>k</math>,
:<math>p_k(n)\sim\frac{n^{k-1}}{k!(k-1)!}</math>,
as <math>n\rightarrow \infty</math>.
}}
{{Proof|
Suppose that <math>(x_1,\ldots,x_k)</math> is a <math>k</math>-partition of <math>n</math>. Then <math>x_1+x_2+\cdots+x_k=n</math> and <math>x_1\ge x_2\ge \cdots \ge x_k\ge 1</math>.

The <math>k!</math> permutations of <math>(x_1,\ldots,x_k)</math> yield at most <math>k!</math> many <math>k</math>-compositions (the ''ordered'' sum of <math>k</math> positive integers). There are <math>{n-1\choose k-1}</math> many <math>k</math>-compositions of <math>n</math>, every one of which can be yielded in this way by permuting a partition. Thus,
:<math>k!p_k(n)\ge{n-1\choose k-1}</math>.

Let <math>y_i=x_i+k-i</math>. That is, <math>y_k=x_k, y_{k-1}=x_k+1, y_{k-2}=x_k+2,\ldots, y_{1}=x_k+k-1</math>. Then, it holds that
* <math>y_1>y_2>\cdots>y_k\ge 1</math>; and
* <math>y_1+y_2+\cdots+y_k=n+\frac{k(k-1)}{2}</math>.
Each permutation of <math>(y_1,y_2,\ldots,y_k)</math> yields a '''distinct''' <math>k</math>-composition of <math>n+\frac{k(k-1)}{2}</math>, because all <math>y_i</math> are distinct.
Thus,
:<math>k!p_k(n)\le {n+\frac{k(k-1)}{2}-1\choose k-1}</math>.

Combining the two inequalities, we have
:<math>\frac{{n-1\choose k-1}}{k!}\le p_k(n)\le \frac{{n+\frac{k(k-1)}{2}-1\choose k-1}}{k!}</math>.
The theorem follows.
}}

=== Ferrers diagram ===
A partition of a number <math>n</math> can be represented as a diagram of dots (or squares), called a '''Ferrers diagram''' (the square version of Ferrers diagram is also called a '''Young diagram''', named after a structured called Young tableaux).

Let <math>(x_1,x_2,\ldots,x_k)</math> with that <math>x_1\ge x_2\ge \cdots x_k\ge 1</math> be a partition of <math>n</math>. Its Ferrers diagram consists of <math>k</math> rows, where the <math>i</math>-th row contains <math>x_i</math> dots (or squares).

<div class="center">
{|border="0"
|
{|border="0"
|[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]
|-
|[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]
|-
|[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]
|-
|[[File:Chess xot45.svg|22px]]
|}
|
[[File:Chess t45.svg|120px]]
|align=center|
{|border="2" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|}
|-
|align=center|Ferrers diagram (''dot version'') of (5,4,2,1)||
|align=center|Ferrers diagram (''square version'') of (5,4,2,1)
|}
</div>

;Conjugate partition
The partition we get by reading the Ferrers diagram by column instead of rows is called the '''conjugate''' of the original partition.
<div class="center">
{|border="0"
|align=center|
{|border="2" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|}
|
[[File:Chess t45.svg|120px]]
|align=center|
{|border="2" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|}
|-
|align=center|<math>(6,4,4,2,1)</math>||
|align=center|conjugate: <math>(5,4,3,3,1,1)</math>
|}
</div>

Clearly,
* different partitions cannot have the same conjugate, and
* every partition of <math>n</math> is the conjugate of some partition of <math>n</math>,
so the conjugation mapping is a permutation on the set of partitions of <math>n</math>. This fact is very useful in proving theorems for partitions numbers.

Some theorems of partitions can be easily proved by representing partitions in Ferrers diagrams.

{{Theorem|Proposition|
# The number of partitions of <math>n</math> which have largest summand <math>k</math>, is <math>p_k(n)</math>.
# The number of <math>n</math> into <math>k</math> parts equals the number of partitions of <math>n-k</math> into at most <math>k</math> parts. Formally,
::<math>p_k(n)=\sum_{j=1}^k p_j(n-k)</math>.
}}
{{Proof|
# For every <math>k</math>-partition, the conjugate partition has largest part <math>k</math>. And vice versa.
# For a <math>k</math>-partition of <math>n</math>, remove the leftmost cell of every row of the Ferrers diagram. Totally <math>k</math> cells are removed and the remaining diagram is a partition of <math>n-k</math> into at most <math>k</math> parts. And for a partition of <math>n-k</math> into at most <math>k</math> parts, add a cell to each of the <math>k</math> rows (including the empty ones). This will give us a <math>k</math>-partition of <math>n</math>. It is easy to see the above mappings are 1-1 correspondences. Thus, the number of <math>n</math> into <math>k</math> parts equals the number of partitions of <math>n-k</math> into at most <math>k</math> parts.
}}

== Principle of Inclusion-Exclusion ==
Let <math>A</math> and <math>B</math> be two finite sets. The cardinality of their union is
:<math>|A\cup B|=|A|+|B|-{\color{Blue}|A\cap B|}</math>.
For three sets <math>A</math>, <math>B</math>, and <math>C</math>, the cardinality of the union of these three sets is computed as
:<math>|A\cup B\cup C|=|A|+|B|+|C|-{\color{Blue}|A\cap B|}-{\color{Blue}|A\cap C|}-{\color{Blue}|B\cap C|}+{\color{Red}|A\cap B\cap C|}</math>.
This is illustrated by the following figure.
::[[Image:Inclusion-exclusion.png|200px|border|center]]

Generally, the '''Principle of Inclusion-Exclusion''' states the rule for computing the union of <math>n</math> finite sets <math>A_1,A_2,\ldots,A_n</math>, such that
{{Equation|
<math>
\begin{align}
\left|\bigcup_{i=1}^nA_i\right|
&=
\sum_{I\subseteq\{1,\ldots,n\}}(-1)^{|I|-1}\left|\bigcap_{i\in I}A_i\right|.
\end{align}
</math>
}}
-----

In combinatorial enumeration, the Principle of Inclusion-Exclusion is usually applied in its complement form.

Let <math>A_1,A_2,\ldots,A_n\subseteq U</math> be subsets of some finite set <math>U</math>. Here <math>U</math> is some universe of combinatorial objects, whose cardinality is easy to calculate (e.g. all strings, tuples, permutations), and each <math>A_i</math> contains the objects with some specific property (e.g. a "pattern") which we want to avoid. The problem is to count the number of objects without any of the <math>n</math> properties. We write <math>\bar{A_i}=U-A</math>. The number of objects without any of the properties <math>A_1,A_2,\ldots,A_n</math> is
{{Equation|
<math>
\begin{align}
\left|\bar{A_1}\cap\bar{A_2}\cap\cdots\cap\bar{A_n}\right|=\left|U-\bigcup_{i=1}^nA_i\right|
&=
|U|+\sum_{I\subseteq\{1,\ldots,n\}}(-1)^{|I|}\left|\bigcap_{i\in I}A_i\right|.
\end{align}
</math>
}}
For an <math>I\subseteq\{1,2,\ldots,n\}</math>, we denote
:<math>A_I=\bigcap_{i\in I}A_i</math>
with the convention that <math>A_\emptyset=U</math>. The above equation is stated as:
{{Theorem|Principle of Inclusion-Exclusion|
:Let <math>A_1,A_2,\ldots,A_n</math> be a family of subsets of <math>U</math>. Then the number of elements of <math>U</math> which lie in none of the subsets <math>A_i</math> is
::<math>\sum_{I\subseteq\{1,\ldots, n\}}(-1)^{|I|}|A_I|</math>.
}}

Let <math>S_k=\sum_{|I|=k}|A_I|\,</math>. Conventionally, <math>S_0=|A_\emptyset|=|U|</math>. The principle of inclusion-exclusion can be expressed as
{{Equation|<math>
S_0-S_1+S_2+\cdots+(-1)^nS_n.
</math>
}}

=== Surjections ===
In the twelvefold way, we discuss the counting problems incurred by the mappings <math>f:N\rightarrow M</math>. The basic case is that elements from both <math>N</math> and <math>M</math> are distinguishable. In this case, it is easy to count the number of arbitrary mappings (which is <math>m^n</math>) and the number of injective (one-to-one) mappings (which is <math>(m)_n</math>), but the number of surjective is difficult. Here we apply the principle of inclusion-exclusion to count the number of surjective (onto) mappings.
{{Theorem|Theorem|
:The number of surjective mappings from an <math>n</math>-set to an <math>m</math>-set is given by
::<math>\sum_{k=1}^m(-1)^{m-k}{m\choose k}k^n</math>.
}}
{{Proof|
Let <math>U=\{f:[n]\rightarrow[m]\}</math> be the set of mappings from <math>[n]</math> to <math>[m]</math>. Then <math>|U|=m^n</math>.

For <math>i\in[m]</math>, let <math>A_i</math> be the set of mappings <math>f:[n]\rightarrow[m]</math> that none of <math>j\in[n]</math> is mapped to <math>i</math>, i.e. <math>A_i=\{f:[n]\rightarrow[m]\setminus\{i\}\}</math>, thus <math>|A_i|=(m-1)^n</math>.

More generally, for <math>I\subseteq [m]</math>, <math>A_I=\bigcap_{i\in I}A_i</math> contains the mappings <math>f:[n]\rightarrow[m]\setminus I</math>. And <math>|A_I|=(m-|I|)^n\,</math>.

A mapping <math>f:[n]\rightarrow[m]</math> is surjective if <math>f</math> lies in none of <math>A_i</math>. By the principle of inclusion-exclusion, the number of surjective <math>f:[n]\rightarrow[m]</math> is
:<math>\sum_{I\subseteq[m]}(-1)^{|I|}\left|A_I\right|=\sum_{I\subseteq[m]}(-1)^{|I|}(m-|I|)^n=\sum_{j=0}^m(-1)^j{m\choose j}(m-j)^n</math>.
Let <math>k=m-j</math>. The theorem is proved.
}}

Recall that, in the twelvefold way, we establish a relation between surjections and partitions.

* Surjection to ordered partition:
:For a surjective <math>f:[n]\rightarrow[m]</math>, <math>(f^{-1}(0),f^{-1}(1),\ldots,f^{-1}(m-1))</math> is an '''ordered partition''' of <math>[n]</math>.
* Ordered partition to surjection:
:For an ordered <math>m</math>-partition <math>(B_0,B_1,\ldots, B_{m-1})</math> of <math>[n]</math>, we can define a function <math>f:[n]\rightarrow[m]</math> by letting <math>f(i)=j</math> if and only if <math>i\in B_j</math>. <math>f</math> is surjective since as a partition, none of <math>B_i</math> is empty.

Therefore, we have a one-to-one correspondence between surjective mappings from an <math>n</math>-set to an <math>m</math>-set and the ordered <math>m</math>-partitions of an <math>n</math>-set.

The Stirling number of the second kind <math>S(n,m)</math> is the number of <math>m</math>-partitions of an <math>n</math>-set. There are <math>m!</math> ways to order an <math>m</math>-partition, thus the number of surjective mappings <math>f:[n]\rightarrow[m]</math> is <math>m! S(n,m)</math>. Combining with what we have proved for surjections, we give the following result for the Stirling number of the second kind.

{{Theorem|Proposition|
:<math>S(n,m)=\frac{1}{m!}\sum_{k=1}^m(-1)^{m-k}{m\choose k}k^n</math>.
}}

=== Derangements ===
We now count the number of bijections from a set to itself with no fixed points. This is the '''derangement problem'''.

For a permutation <math>\pi</math> of <math>\{1,2,\ldots,n\}</math>, a '''fixed point''' is such an <math>i\in\{1,2,\ldots,n\}</math> that <math>\pi(i)=i</math>.
A [http://en.wikipedia.org/wiki/Derangement '''derangement'''] of <math>\{1,2,\ldots,n\}</math> is a permutation of <math>\{1,2,\ldots,n\}</math> that has no fixed points.

{{Theorem|Theorem|
:The number of derangements of <math>\{1,2,\ldots,n\}</math> given by
::<math>n!\sum_{k=0}^n\frac{(-1)^k}{k!}\approx \frac{n!}{\mathrm{e}}</math>.
}}
{{Proof|
Let <math>U</math> be the set of all permutations of <math>\{1,2,\ldots,n\}</math>. So <math>|U|=n!</math>.

Let <math>A_i</math> be the set of permutations with fixed point <math>i</math>; so <math>|A_i|=(n-1)!</math>. More generally, for any <math>I\subseteq \{1,2,\ldots,n\}</math>, <math>A_I=\bigcap_{i\in I}A_i</math>, and <math>|A_I|=(n-|I|)!</math>, since permutations in <math>A_I</math> fix every point in <math>I</math> and permute the remaining points arbitrarily. A permutation is a derangement if and only if it lies in none of the sets <math>A_i</math>. So the number of derangements is
:<math>\sum_{I\subseteq\{1,2,\ldots,n\}}(-1)^{|I|}(n-|I|)!=\sum_{k=0}^n(-1)^k{n\choose k}(n-k)!=n!\sum_{k=0}^n\frac{(-1)^k}{k!}.</math>
By Taylor's series,
:<math>\frac{1}{\mathrm{e}}=\sum_{k=0}^\infty\frac{(-1)^k}{k!}=\sum_{k=0}^n\frac{(-1)^k}{k!}\pm o\left(\frac{1}{n!}\right)</math>.
It is not hard to see that <math>n!\sum_{k=0}^n\frac{(-1)^k}{k!}</math> is the closest integer to <math>\frac{n!}{\mathrm{e}}</math>.
}}

Therefore, there are about <math>\frac{1}{\mathrm{e}}</math> fraction of all permutations with no fixed points.

=== Permutations with restricted positions ===
We introduce a general theory of counting permutations with restricted positions. In the derangement problem, we count the number of permutations that <math>\pi(i)\neq i</math>. We now generalize to the problem of counting permutations which avoid a set of arbitrarily specified positions.

It is traditionally described using terminology from the game of chess. Let <math>B\subseteq \{1,\ldots,n\}\times \{1,\ldots,n\}</math>, called a '''board'''. As illustrated below, we can think of <math>B</math> as a chess board, with the positions in <math>B</math> marked by "<math>\times</math>".
{{Chess diagram small
|
|
|=
8 |__|xx|xx|__|xx|__|__|xx|=
7 |xx|__|__|xx|__|__|xx|__|=
6 |xx|__|xx|xx|__|xx|xx|__|=
5 |__|xx|__|__|xx|__|xx|__|=
4 |xx|__|__|__|xx|xx|xx|__|=
3 |__|xx|__|xx|__|__|__|xx|=
2 |__|__|xx|__|xx|__|__|xx|=
1 |xx|__|__|xx|__|xx|__|__|=
a b c d e f g h
|
}}
For a permutation <math>\pi</math> of <math>\{1,\ldots,n\}</math>, define the '''graph''' <math>G_\pi(V,E)</math> as
:<math>
\begin{align}
G_\pi &= \{(i,\pi(i))\mid i\in \{1,2,\ldots,n\}\}.
\end{align}
</math>
This can also be viewed as a set of marked positions on a chess board. Each row and each column has only one marked position, because <math>\pi</math> is a permutation. Thus, we can identify each <math>G_\pi</math> as a placement of <math>n</math> rooks (“城堡”，规则同中国象棋里的“车”) without attacking each other.

For example, the following is the <math>G_\pi</math> of such <math>\pi</math> that <math>\pi(i)=i</math>.
{{Chess diagram small
|
|
|=
8 |rl|__|__|__|__|__|__|__|=
7 |__|rl|__|__|__|__|__|__|=
6 |__|__|rl|__|__|__|__|__|=
5 |__|__|__|rl|__|__|__|__|=
4 |__|__|__|__|rl|__|__|__|=
3 |__|__|__|__|__|rl|__|__|=
2 |__|__|__|__|__|__|rl|__|=
1 |__|__|__|__|__|__|__|rl|=
a b c d e f g h
|
}}
Now define
:<math>\begin{align}
N_0 &= \left|\left\{\pi\mid B\cap G_\pi=\emptyset\right\}\right|\\
r_k &= \mbox{number of }k\mbox{-subsets of }B\mbox{ such that no two elements have a common coordinate}\\
&=\left|\left\{S\in{B\choose k} \,\bigg|\, \forall (i_1,j_1),(i_2,j_2)\in S, i_1\neq i_2, j_1\neq j_2 \right\}\right|
\end{align}
</math>
Interpreted in chess game,
* <math>B</math>: a set of marked positions in an <math>[n]\times [n]</math> chess board.
* <math>N_0</math>: the number of ways of placing <math>n</math> non-attacking rooks on the chess board such that none of these rooks lie in <math>B</math>.
* <math>r_k</math>: number of ways of placing <math>k</math> non-attacking rooks on <math>B</math>.

Our goal is to count <math>N_0</math> in terms of <math>r_k</math>. This gives the number of permutations avoid all positions in a <math>B</math>.

{{Theorem|Theorem|
:<math>N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!</math>.
}}
{{Proof|
For each <math>i\in[n]</math>, let <math>A_i=\{\pi\mid (i,\pi(i))\in B\}</math> be the set of permutations <math>\pi</math> whose <math>i</math>-th position is in <math>B</math>.

<math>N_0</math> is the number of permutations avoid all positions in <math>B</math>. Thus, our goal is to count the number of permutations <math>\pi</math> in none of <math>A_i</math> for <math>i\in [n]</math>.

For each <math>I\subseteq [n]</math>, let <math>A_I=\bigcap_{i\in I}A_i</math>, which is the set of permutations <math>\pi</math> such that <math>(i,\pi(i))\in B</math> for all <math>i\in I</math>. Due to the principle of inclusion-exclusion,
:<math>N_0=\sum_{I\subseteq [n]} (-1)^{|I|}|A_I|=\sum_{k=0}^n(-1)^k\sum_{I\in{[n]\choose k}}|A_I|</math>.

The next observation is that
:<math>\sum_{I\in{[n]\choose k}}|A_I|=r_k(n-k)!</math>,
because we can count both sides by first placing <math>k</math> non-attacking rooks on <math>B</math> and placing <math>n-k</math> additional non-attacking rooks on <math>[n]\times [n]</math> in <math>(n-k)!</math> ways.

Therefore,
:<math>N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!</math>.
}}

====Derangement problem====
We use the above general method to solve the derange problem again.

Take <math>B=\{(1,1),(2,2),\ldots,(n,n)\}</math> as the chess board. A derangement <math>\pi</math> is a placement of <math>n</math> non-attacking rooks such that none of them is in <math>B</math>.
{{Chess diagram small
|
|
|=
8 |xx|__|__|__|__|__|__|__|=
7 |__|xx|__|__|__|__|__|__|=
6 |__|__|xx|__|__|__|__|__|=
5 |__|__|__|xx|__|__|__|__|=
4 |__|__|__|__|xx|__|__|__|=
3 |__|__|__|__|__|xx|__|__|=
2 |__|__|__|__|__|__|xx|__|=
1 |__|__|__|__|__|__|__|xx|=
a b c d e f g h
|
}}
Clearly, the number of ways of placing <math>k</math> non-attacking rooks on <math>B</math> is <math>r_k={n\choose k}</math>. We want to count <math>N_0</math>, which gives the number of ways of placing <math>n</math> non-attacking rooks such that none of these rooks lie in <math>B</math>.

By the above theorem
:<math>
N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!=\sum_{k=0}^n(-1)^k{n\choose k}(n-k)!=\sum_{k=0}^n(-1)^k\frac{n!}{k!}=n!\sum_{k=0}^n(-1)^k\frac{1}{k!}\approx\frac{n!}{e}.
</math>

====Problème des ménages====
Suppose that in a banquet, we want to seat <math>n</math> couples at a circular table, satisfying the following constraints:
* Men and women are in alternate places.
* No one sits next to his/her spouse.

In how many ways can this be done?

(For convenience, we assume that every seat at the table marked differently so that rotating the seats clockwise or anti-clockwise will end up with a '''different''' solution.)

First, let the <math>n</math> ladies find their seats. They may either sit at the odd numbered seats or even numbered seats, in either case, there are <math>n!</math> different orders. Thus, there are <math>2(n!)</math> ways to seat the <math>n</math> ladies.

After sitting the wives, we label the remaining <math>n</math> places clockwise as <math>0,1,\ldots, n-1</math>. And a seating of the <math>n</math> husbands is given by a permutation <math>\pi</math> of <math>[n]</math> defined as follows. Let <math>\pi(i)</math> be the seat of the husband of he lady sitting at the <math>i</math>-th place.

It is easy to see that <math>\pi</math> satisfies that <math>\pi(i)\neq i</math> and <math>\pi(i)\not\equiv i+1\pmod n</math>, and every permutation <math>\pi</math> with these properties gives a feasible seating of the <math>n</math> husbands. Thus, we only need to count the number of permutations <math>\pi</math> such that <math>\pi(i)\not\equiv i, i+1\pmod n</math>.

Take <math>B=\{(0,0),(1,1),\ldots,(n-1,n-1), (0,1),(1,2),\ldots,(n-2,n-1),(n-1,0)\}</math> as the chess board. A permutation <math>\pi</math> which defines a way of seating the husbands, is a placement of <math>n</math> non-attacking rooks such that none of them is in <math>B</math>.
{{Chess diagram small
|
|
|=
8 |xx|xx|__|__|__|__|__|__|=
7 |__|xx|xx|__|__|__|__|__|=
6 |__|__|xx|xx|__|__|__|__|=
5 |__|__|__|xx|xx|__|__|__|=
4 |__|__|__|__|xx|xx|__|__|=
3 |__|__|__|__|__|xx|xx|__|=
2 |__|__|__|__|__|__|xx|xx|=
1 |xx|__|__|__|__|__|__|xx|=
a b c d e f g h
|
}}
We need to compute <math>r_k</math>, the number of ways of placing <math>k</math> non-attacking rooks on <math>B</math>. For our choice of <math>B</math>, <math>r_k</math> is the number of ways of choosing <math>k</math> points, no two consecutive, from a collection of <math>2n</math> points arranged in a circle.

We first see how to do this in a ''line''.
{{Theorem|Lemma|
:The number of ways of choosing <math>k</math> ''non-consecutive'' objects from a collection of <math>m</math> objects arranged in a ''line'', is <math>{m-k+1\choose k}</math>.
}}
{{Proof|
We draw a line of <math>m-k</math> black points, and then insert <math>k</math> red points into the <math>m-k+1</math> spaces between the black points (including the beginning and end).
::<math>
\begin{align}
&\sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \\
&\qquad\qquad\qquad\quad\Downarrow\\
&\sqcup \, \bullet \,\, {\color{Red}\bullet} \, \bullet \,\, {\color{Red}\bullet} \, \bullet \, \sqcup \, \bullet \,\, {\color{Red}\bullet}\, \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \,\, {\color{Red}\bullet}
\end{align}
</math>
This gives us a line of <math>m</math> points, and the red points specifies the chosen objects, which are non-consecutive. The mapping is 1-1 correspondence.
There are <math>{m-k+1\choose k}</math> ways of placing <math>k</math> red points into <math>m-k+1</math> spaces.
}}

The problem of choosing non-consecutive objects in a circle can be reduced to the case that the objects are in a line.

{{Theorem|Lemma|
:The number of ways of choosing <math>k</math> ''non-consecutive'' objects from a collection of <math>m</math> objects arranged in a ''circle'', is <math>\frac{m}{m-k}{m-k\choose k}</math>.
}}
{{Proof|
Let <math>f(m,k)</math> be the desired number; and let <math>g(m,k)</math> be the number of ways of choosing <math>k</math> non-consecutive points from <math>m</math> points arranged in a circle, next coloring the <math>k</math> points red, and then coloring one of the uncolored point blue.

Clearly, <math>g(m,k)=(m-k)f(m,k)</math>.

But we can also compute <math>g(m,k)</math> as follows:
* Choose one of the <math>m</math> points and color it blue. This gives us <math>m</math> ways.
* Cut the circle to make a line of <math>m-1</math> points by removing the blue point.
* Choose <math>k</math> non-consecutive points from the line of <math>m-1</math> points and color them red. This gives <math>{m-k\choose k}</math> ways due to the previous lemma.

Thus, <math>g(m,k)=m{m-k\choose k}</math>. Therefore we have the desired number <math>f(m,k)=\frac{m}{m-k}{m-k\choose k}</math>.
}}

By the above lemma, we have that <math>r_k=\frac{2n}{2n-k}{2n-k\choose k}</math>. Then apply the theorem of counting permutations with restricted positions,
:<math>
N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!=\sum_{k=0}^n(-1)^k\frac{2n}{2n-k}{2n-k\choose k}(n-k)!.
</math>

This gives the number of ways of seating the <math>n</math> husbands ''after the ladies are seated''. Recall that there are <math>2n!</math> ways of seating the <math>n</math> ladies. Thus, the total number of ways of seating <math>n</math> couples as required by problème des ménages is
:<math>
2n!\sum_{k=0}^n(-1)^k\frac{2n}{2n-k}{2n-k\choose k}(n-k)!.
</math>

=== The Euler totient function ===
Two integers <math>m, n</math> are said to be '''relatively prime''' if their greatest common diviser <math>\mathrm{gcd}(m,n)=1</math>. For a positive integer <math>n</math>, let <math>\phi(n)</math> be the number of positive integers from <math>\{1,2,\ldots,n\}</math> that are relative prime to <math>n</math>. This function, called the Euler <math>\phi</math> function or '''the Euler totient function''', is fundamental in number theory.

We know derive a formula for this function by using the principle of inclusion-exclusion.
{{Theorem|Theorem (The Euler totient function)|
Suppose <math>n</math> is divisible by precisely <math>r</math> different primes, denoted <math>p_1,\ldots,p_r</math>. Then
:<math>\phi(n)=n\prod_{i=1}^r\left(1-\frac{1}{p_i}\right)</math>.
}}
{{Proof|
Let <math>U=\{1,2,\ldots,n\}</math> be the universe. The number of positive integers from <math>U</math> which is divisible by some <math>p_{i_1},p_{i_2},\ldots,p_{i_s}\in\{p_1,\ldots,p_r\}</math>, is <math>\frac{n}{p_{i_1}p_{i_2}\cdots p_{i_s}}</math>.

<math>\phi(n)</math> is the number of integers from <math>U</math> which is not divisible by any <math>p_1,\ldots,p_r</math>.
By principle of inclusion-exclusion,
:<math>
\begin{align}
\phi(n)
&=n+\sum_{k=1}^r(-1)^k\sum_{1\le i_1<i_2<\cdots <i_k\le n}\frac{n}{p_{i_1}p_{i_2}\cdots p_{i_k}}\\
&=n-\sum_{1\le i\le n}\frac{n}{p_i}+\sum_{1\le i<j\le n}\frac{n}{p_i p_j}-\sum_{1\le i<j<k\le n}\frac{n}{p_{i} p_{j} p_{k}}+\cdots + (-1)^r\frac{n}{p_{1}p_{2}\cdots p_{r}}\\
&=n\left(1-\sum_{1\le i\le n}\frac{1}{p_i}+\sum_{1\le i<j\le n}\frac{1}{p_i p_j}-\sum_{1\le i<j<k\le n}\frac{1}{p_{i} p_{j} p_{k}}+\cdots + (-1)^r\frac{1}{p_{1}p_{2}\cdots p_{r}}\right)\\
&=n\prod_{i=1}^n\left(1-\frac{1}{p_i}\right).
\end{align}
</math>
}}

== Reference ==
* ''Stanley,'' Enumerative Combinatorics, Volume 1, Chapter 2.
* ''van Lin and Wilson'', A course in combinatorics, Chapter 10, 15.

Combinatorics (Fall 2010)/Partitions, sieve methods

2010-09-15T00:52:23Z

210.28.131.82: /* Principle of Inclusion-Exclusion */

== Partitions ==
We count the ways of partitioning <math>n</math> ''identical'' objects into <math>k</math> ''unordered'' groups. This is equivalent to counting the ways partitioning a number <math>n</math> into <math>k</math> unordered parts.

A '''<math>k</math>-partition''' of a number <math>n</math> is a multiset <math>\{x_1,x_2,\ldots,x_k\}</math> with <math>x_i\ge 1</math> for every element <math>x_i</math> and <math>x_1+x_2+\cdots+x_k=n</math>.

We define <math>p_k(n)</math> as the number of <math>k</math>-partitions of <math>n</math>.

For example, number 7 has the following partitions:
<div class="center"><math>
\begin{align}
&\{7\}
& p_1(7)=1\\
&\{1,6\},\{2,5\},\{3,4\}
& p_2(7)=3\\
&\{1,1,5\}, \{1,2,4\}, \{1,3,3\}, \{2,2,3\}
& p_3(7)=4\\
&\{1,1,1,4\},\{1,1,2,3\}, \{1,2,2,2\}
& p_4(7)=3\\
&\{1,1,1,1,3\},\{1,1,1,2,2\}
& p_5(7)=2\\
&\{1,1,1,1,1,2\}
& p_6(7)=1\\
&\{1,1,1,1,1,1,1\}
& p_7(7)=1
\end{align}
</math></div>

Equivalently, we can also define that A <math>k</math>-partition of a number <math>n</math> is a <math>k</math>-tuple <math>(x_1,x_2,\ldots,x_k)</math> with:
* <math>x_1\ge x_2\ge\cdots\ge x_k\ge 1</math>;
* <math>x_1+x_2+\cdots+x_k=n</math>.

<math>p_k(n)</math> the number of integral solutions to the above system.

Let <math>p(n)=\sum_{k=1}^n p_k(n)</math> be the total number of partitions of <math>n</math>. The function <math>p(n)</math> is called the '''partition number'''.

=== Counting <math>p_k(n)</math>===
We now try to determine <math>p_k(n)</math>. Unlike most problems we learned in the last lecture, <math>p_k(n)</math> does not have a nice closed form formula. We now give a recurrence for <math>p_k(n)</math>.

{{Theorem|Proposition|
:<math>p_k(n)=p_{k-1}(n-1)+p_k(n-k)\,</math>.
}}
{{Proof|
Suppose that <math>(x_1,\ldots,x_k)</math> is a <math>k</math>-partition of <math>n</math>. Note that it must hold that
:<math>x_1\ge x_2\ge \cdots \ge x_k\ge 1</math>.
There are two cases: <math>x_k=1</math> or <math>x_k>1</math>.
;Case 1.
:If <math>x_k=1</math>, then <math>(x_1,\cdots,x_{k-1})</math> is a distinct <math>(k-1)</math>-partition of <math>n-1</math>. And every <math>(k-1)</math>-partition of <math>n-1</math> can be obtained in this way. Thus the number of <math>k</math>-partitions of <math>n</math> in this case is <math>p_{k-1}(n-1)</math>.
;Case 2.
:If <math>x_k>1</math>, then <math>(x_1-1,\cdots,x_{k}-1)</math> is a distinct <math>k</math>-partition of <math>n-k</math>. And every <math>k</math>-partition of <math>n-k</math> can be obtained in this way. Thus the number of <math>k</math>-partitions of <math>n</math> in this case is <math>p_{k}(n-k)</math>.
In conclusion, the number of <math>k</math>-partitions of <math>n</math> is <math>p_{k-1}(n-1)+p_k(n-k)</math>, i.e.
:<math>p_k(n)=p_{k-1}(n-1)+p_k(n-k)\,</math>.
}}

Use the above recurrence, we can compute the <math>p_k(n)</math> for some decent <math>n</math> and <math>k</math> by computer simulation.

If we are not restricted ourselves to the precise estimation of <math>p_k(n)</math>, the next theorem gives an asymptotic estimation of <math>p_k(n)</math>. Note that it only holds for '''constant''' <math>k</math>, i.e. <math>k</math> does not depend on <math>n</math>.

{{Theorem|Theorem|
For any fixed <math>k</math>,
:<math>p_k(n)\sim\frac{n^{k-1}}{k!(k-1)!}</math>,
as <math>n\rightarrow \infty</math>.
}}
{{Proof|
Suppose that <math>(x_1,\ldots,x_k)</math> is a <math>k</math>-partition of <math>n</math>. Then <math>x_1+x_2+\cdots+x_k=n</math> and <math>x_1\ge x_2\ge \cdots \ge x_k\ge 1</math>.

The <math>k!</math> permutations of <math>(x_1,\ldots,x_k)</math> yield at most <math>k!</math> many <math>k</math>-compositions (the ''ordered'' sum of <math>k</math> positive integers). There are <math>{n-1\choose k-1}</math> many <math>k</math>-compositions of <math>n</math>, every one of which can be yielded in this way by permuting a partition. Thus,
:<math>k!p_k(n)\ge{n-1\choose k-1}</math>.

Let <math>y_i=x_i+k-i</math>. That is, <math>y_k=x_k, y_{k-1}=x_k+1, y_{k-2}=x_k+2,\ldots, y_{1}=x_k+k-1</math>. Then, it holds that
* <math>y_1>y_2>\cdots>y_k\ge 1</math>; and
* <math>y_1+y_2+\cdots+y_k=n+\frac{k(k-1)}{2}</math>.
Each permutation of <math>(y_1,y_2,\ldots,y_k)</math> yields a '''distinct''' <math>k</math>-composition of <math>n+\frac{k(k-1)}{2}</math>, because all <math>y_i</math> are distinct.
Thus,
:<math>k!p_k(n)\le {n+\frac{k(k-1)}{2}-1\choose k-1}</math>.

Combining the two inequalities, we have
:<math>\frac{{n-1\choose k-1}}{k!}\le p_k(n)\le \frac{{n+\frac{k(k-1)}{2}-1\choose k-1}}{k!}</math>.
The theorem follows.
}}

=== Ferrers diagram ===
A partition of a number <math>n</math> can be represented as a diagram of dots (or squares), called a '''Ferrers diagram''' (the square version of Ferrers diagram is also called a '''Young diagram''', named after a structured called Young tableaux).

Let <math>(x_1,x_2,\ldots,x_k)</math> with that <math>x_1\ge x_2\ge \cdots x_k\ge 1</math> be a partition of <math>n</math>. Its Ferrers diagram consists of <math>k</math> rows, where the <math>i</math>-th row contains <math>x_i</math> dots (or squares).

<div class="center">
{|border="0"
|
{|border="0"
|[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]
|-
|[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]
|-
|[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]
|-
|[[File:Chess xot45.svg|22px]]
|}
|
[[File:Chess t45.svg|120px]]
|align=center|
{|border="2" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|}
|-
|align=center|Ferrers diagram (''dot version'') of (5,4,2,1)||
|align=center|Ferrers diagram (''square version'') of (5,4,2,1)
|}
</div>

;Conjugate partition
The partition we get by reading the Ferrers diagram by column instead of rows is called the '''conjugate''' of the original partition.
<div class="center">
{|border="0"
|align=center|
{|border="2" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|}
|
[[File:Chess t45.svg|120px]]
|align=center|
{|border="2" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|}
|-
|align=center|<math>(6,4,4,2,1)</math>||
|align=center|conjugate: <math>(5,4,3,3,1,1)</math>
|}
</div>

Clearly,
* different partitions cannot have the same conjugate, and
* every partition of <math>n</math> is the conjugate of some partition of <math>n</math>,
so the conjugation mapping is a permutation on the set of partitions of <math>n</math>. This fact is very useful in proving theorems for partitions numbers.

Some theorems of partitions can be easily proved by representing partitions in Ferrers diagrams.

{{Theorem|Proposition|
# The number of partitions of <math>n</math> which have largest summand <math>k</math>, is <math>p_k(n)</math>.
# The number of <math>n</math> into <math>k</math> parts equals the number of partitions of <math>n-k</math> into at most <math>k</math> parts. Formally,
::<math>p_k(n)=\sum_{j=1}^k p_j(n-k)</math>.
}}
{{Proof|
# For every <math>k</math>-partition, the conjugate partition has largest part <math>k</math>. And vice versa.
# For a <math>k</math>-partition of <math>n</math>, remove the leftmost cell of every row of the Ferrers diagram. Totally <math>k</math> cells are removed and the remaining diagram is a partition of <math>n-k</math> into at most <math>k</math> parts. And for a partition of <math>n-k</math> into at most <math>k</math> parts, add a cell to each of the <math>k</math> rows (including the empty ones). This will give us a <math>k</math>-partition of <math>n</math>. It is easy to see the above mappings are 1-1 correspondences. Thus, the number of <math>n</math> into <math>k</math> parts equals the number of partitions of <math>n-k</math> into at most <math>k</math> parts.
}}

== Principle of Inclusion-Exclusion ==
Let <math>A</math> and <math>B</math> be two finite sets. The cardinality of their union is
:<math>|A\cup B|=|A|+|B|-{\color{Blue}|A\cap B|}</math>.
For three sets <math>A</math>, <math>B</math>, and <math>C</math>, the cardinality of the union of these three sets is computed as
:<math>|A\cup B\cup C|=|A|+|B|+|C|-{\color{Blue}|A\cap B|}-{\color{Blue}|A\cap C|}-{\color{Blue}|B\cap C|}+{\color{Red}|A\cap B\cap C|}</math>.
This is illustrated by the following figure.
::[[Image:Inclusion-exclusion.png|200px|border|center]]

Generally, the '''Principle of Inclusion-Exclusion''' states the rule for computing the union of <math>n</math> finite sets <math>A_1,A_2,\ldots,A_n</math>, such that
{{Equation|
<math>
\begin{align}
\left|\bigcup_{i=1}^nA_i\right|
&=
\sum_{I\subseteq\{1,\ldots,n\}}(-1)^{|I|-1}\left|\bigcap_{i\in I}A_i\right|.
\end{align}
</math>
}}
-----

In combinatorial enumeration, the Principle of Inclusion-Exclusion is usually applied in its complement form.

Let <math>A_1,A_2,\ldots,A_n\subseteq U</math> be subsets of some finite set <math>U</math>. Here <math>U</math> is some universe of combinatorial objects, whose cardinality is easy to calculate (e.g. all strings, tuples, permutations), and each <math>A_i</math> contains the objects with some specific property (e.g. a "pattern") which we want to avoid. The problem is to count the number of objects without any of the <math>n</math> properties. We write <math>\bar{A_i}=U-A</math>. The number of objects without any of the properties <math>A_1,A_2,\ldots,A_n</math> is
{{Equation|
<math>
\begin{align}
\left|\bar{A_1}\cap\bar{A_2}\cap\cdots\cap\bar{A_n}\right|=\left|U-\bigcup_{i=1}^nA_i\right|
&=
|U|-\sum_{I\subseteq\{1,\ldots,n\}}(-1)^{|I|-1}\left|\bigcap_{i\in I}A_i\right|.
\end{align}
</math>
}}
For an <math>I\subseteq\{1,2,\ldots,n\}</math>, we denote
:<math>A_I=\bigcap_{i\in I}A_i</math>
with the convention that <math>A_\emptyset=U</math>. The above equation is stated as:
{{Theorem|Principle of Inclusion-Exclusion|
:Let <math>A_1,A_2,\ldots,A_n</math> be a family of subsets of <math>U</math>. Then the number of elements of <math>U</math> which lie in none of the subsets <math>A_i</math> is
::<math>\sum_{I\subseteq\{1,\ldots, n\}}(-1)^{|I|}|A_I|</math>.
}}

Let <math>S_k=\sum_{|I|=k}|A_I|\,</math>. Conventionally, <math>S_0=|A_\emptyset|=|U|</math>. The principle of inclusion-exclusion can be expressed as
{{Equation|<math>
S_0-S_1+S_2+\cdots+(-1)^nS_n.
</math>
}}

=== Surjections ===
In the twelvefold way, we discuss the counting problems incurred by the mappings <math>f:N\rightarrow M</math>. The basic case is that elements from both <math>N</math> and <math>M</math> are distinguishable. In this case, it is easy to count the number of arbitrary mappings (which is <math>m^n</math>) and the number of injective (one-to-one) mappings (which is <math>(m)_n</math>), but the number of surjective is difficult. Here we apply the principle of inclusion-exclusion to count the number of surjective (onto) mappings.
{{Theorem|Theorem|
:The number of surjective mappings from an <math>n</math>-set to an <math>m</math>-set is given by
::<math>\sum_{k=1}^m(-1)^{m-k}{m\choose k}k^n</math>.
}}
{{Proof|
Let <math>U=\{f:[n]\rightarrow[m]\}</math> be the set of mappings from <math>[n]</math> to <math>[m]</math>. Then <math>|U|=m^n</math>.

For <math>i\in[m]</math>, let <math>A_i</math> be the set of mappings <math>f:[n]\rightarrow[m]</math> that none of <math>j\in[n]</math> is mapped to <math>i</math>, i.e. <math>A_i=\{f:[n]\rightarrow[m]\setminus\{i\}\}</math>, thus <math>|A_i|=(m-1)^n</math>.

More generally, for <math>I\subseteq [m]</math>, <math>A_I=\bigcap_{i\in I}A_i</math> contains the mappings <math>f:[n]\rightarrow[m]\setminus I</math>. And <math>|A_I|=(m-|I|)^n\,</math>.

A mapping <math>f:[n]\rightarrow[m]</math> is surjective if <math>f</math> lies in none of <math>A_i</math>. By the principle of inclusion-exclusion, the number of surjective <math>f:[n]\rightarrow[m]</math> is
:<math>\sum_{I\subseteq[m]}(-1)^{|I|}\left|A_I\right|=\sum_{I\subseteq[m]}(-1)^{|I|}(m-|I|)^n=\sum_{j=0}^m(-1)^j{m\choose j}(m-j)^n</math>.
Let <math>k=m-j</math>. The theorem is proved.
}}

Recall that, in the twelvefold way, we establish a relation between surjections and partitions.

* Surjection to ordered partition:
:For a surjective <math>f:[n]\rightarrow[m]</math>, <math>(f^{-1}(0),f^{-1}(1),\ldots,f^{-1}(m-1))</math> is an '''ordered partition''' of <math>[n]</math>.
* Ordered partition to surjection:
:For an ordered <math>m</math>-partition <math>(B_0,B_1,\ldots, B_{m-1})</math> of <math>[n]</math>, we can define a function <math>f:[n]\rightarrow[m]</math> by letting <math>f(i)=j</math> if and only if <math>i\in B_j</math>. <math>f</math> is surjective since as a partition, none of <math>B_i</math> is empty.

Therefore, we have a one-to-one correspondence between surjective mappings from an <math>n</math>-set to an <math>m</math>-set and the ordered <math>m</math>-partitions of an <math>n</math>-set.

The Stirling number of the second kind <math>S(n,m)</math> is the number of <math>m</math>-partitions of an <math>n</math>-set. There are <math>m!</math> ways to order an <math>m</math>-partition, thus the number of surjective mappings <math>f:[n]\rightarrow[m]</math> is <math>m! S(n,m)</math>. Combining with what we have proved for surjections, we give the following result for the Stirling number of the second kind.

{{Theorem|Proposition|
:<math>S(n,m)=\frac{1}{m!}\sum_{k=1}^m(-1)^{m-k}{m\choose k}k^n</math>.
}}

=== Derangements ===
We now count the number of bijections from a set to itself with no fixed points. This is the '''derangement problem'''.

For a permutation <math>\pi</math> of <math>\{1,2,\ldots,n\}</math>, a '''fixed point''' is such an <math>i\in\{1,2,\ldots,n\}</math> that <math>\pi(i)=i</math>.
A [http://en.wikipedia.org/wiki/Derangement '''derangement'''] of <math>\{1,2,\ldots,n\}</math> is a permutation of <math>\{1,2,\ldots,n\}</math> that has no fixed points.

{{Theorem|Theorem|
:The number of derangements of <math>\{1,2,\ldots,n\}</math> given by
::<math>n!\sum_{k=0}^n\frac{(-1)^k}{k!}\approx \frac{n!}{\mathrm{e}}</math>.
}}
{{Proof|
Let <math>U</math> be the set of all permutations of <math>\{1,2,\ldots,n\}</math>. So <math>|U|=n!</math>.

Let <math>A_i</math> be the set of permutations with fixed point <math>i</math>; so <math>|A_i|=(n-1)!</math>. More generally, for any <math>I\subseteq \{1,2,\ldots,n\}</math>, <math>A_I=\bigcap_{i\in I}A_i</math>, and <math>|A_I|=(n-|I|)!</math>, since permutations in <math>A_I</math> fix every point in <math>I</math> and permute the remaining points arbitrarily. A permutation is a derangement if and only if it lies in none of the sets <math>A_i</math>. So the number of derangements is
:<math>\sum_{I\subseteq\{1,2,\ldots,n\}}(-1)^{|I|}(n-|I|)!=\sum_{k=0}^n(-1)^k{n\choose k}(n-k)!=n!\sum_{k=0}^n\frac{(-1)^k}{k!}.</math>
By Taylor's series,
:<math>\frac{1}{\mathrm{e}}=\sum_{k=0}^\infty\frac{(-1)^k}{k!}=\sum_{k=0}^n\frac{(-1)^k}{k!}\pm o\left(\frac{1}{n!}\right)</math>.
It is not hard to see that <math>n!\sum_{k=0}^n\frac{(-1)^k}{k!}</math> is the closest integer to <math>\frac{n!}{\mathrm{e}}</math>.
}}

Therefore, there are about <math>\frac{1}{\mathrm{e}}</math> fraction of all permutations with no fixed points.

=== Permutations with restricted positions ===
We introduce a general theory of counting permutations with restricted positions. In the derangement problem, we count the number of permutations that <math>\pi(i)\neq i</math>. We now generalize to the problem of counting permutations which avoid a set of arbitrarily specified positions.

It is traditionally described using terminology from the game of chess. Let <math>B\subseteq \{1,\ldots,n\}\times \{1,\ldots,n\}</math>, called a '''board'''. As illustrated below, we can think of <math>B</math> as a chess board, with the positions in <math>B</math> marked by "<math>\times</math>".
{{Chess diagram small
|
|
|=
8 |__|xx|xx|__|xx|__|__|xx|=
7 |xx|__|__|xx|__|__|xx|__|=
6 |xx|__|xx|xx|__|xx|xx|__|=
5 |__|xx|__|__|xx|__|xx|__|=
4 |xx|__|__|__|xx|xx|xx|__|=
3 |__|xx|__|xx|__|__|__|xx|=
2 |__|__|xx|__|xx|__|__|xx|=
1 |xx|__|__|xx|__|xx|__|__|=
a b c d e f g h
|
}}
For a permutation <math>\pi</math> of <math>\{1,\ldots,n\}</math>, define the '''graph''' <math>G_\pi(V,E)</math> as
:<math>
\begin{align}
G_\pi &= \{(i,\pi(i))\mid i\in \{1,2,\ldots,n\}\}.
\end{align}
</math>
This can also be viewed as a set of marked positions on a chess board. Each row and each column has only one marked position, because <math>\pi</math> is a permutation. Thus, we can identify each <math>G_\pi</math> as a placement of <math>n</math> rooks (“城堡”，规则同中国象棋里的“车”) without attacking each other.

For example, the following is the <math>G_\pi</math> of such <math>\pi</math> that <math>\pi(i)=i</math>.
{{Chess diagram small
|
|
|=
8 |rl|__|__|__|__|__|__|__|=
7 |__|rl|__|__|__|__|__|__|=
6 |__|__|rl|__|__|__|__|__|=
5 |__|__|__|rl|__|__|__|__|=
4 |__|__|__|__|rl|__|__|__|=
3 |__|__|__|__|__|rl|__|__|=
2 |__|__|__|__|__|__|rl|__|=
1 |__|__|__|__|__|__|__|rl|=
a b c d e f g h
|
}}
Now define
:<math>\begin{align}
N_0 &= \left|\left\{\pi\mid B\cap G_\pi=\emptyset\right\}\right|\\
r_k &= \mbox{number of }k\mbox{-subsets of }B\mbox{ such that no two elements have a common coordinate}\\
&=\left|\left\{S\in{B\choose k} \,\bigg|\, \forall (i_1,j_1),(i_2,j_2)\in S, i_1\neq i_2, j_1\neq j_2 \right\}\right|
\end{align}
</math>
Interpreted in chess game,
* <math>B</math>: a set of marked positions in an <math>[n]\times [n]</math> chess board.
* <math>N_0</math>: the number of ways of placing <math>n</math> non-attacking rooks on the chess board such that none of these rooks lie in <math>B</math>.
* <math>r_k</math>: number of ways of placing <math>k</math> non-attacking rooks on <math>B</math>.

Our goal is to count <math>N_0</math> in terms of <math>r_k</math>. This gives the number of permutations avoid all positions in a <math>B</math>.

{{Theorem|Theorem|
:<math>N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!</math>.
}}
{{Proof|
For each <math>i\in[n]</math>, let <math>A_i=\{\pi\mid (i,\pi(i))\in B\}</math> be the set of permutations <math>\pi</math> whose <math>i</math>-th position is in <math>B</math>.

<math>N_0</math> is the number of permutations avoid all positions in <math>B</math>. Thus, our goal is to count the number of permutations <math>\pi</math> in none of <math>A_i</math> for <math>i\in [n]</math>.

For each <math>I\subseteq [n]</math>, let <math>A_I=\bigcap_{i\in I}A_i</math>, which is the set of permutations <math>\pi</math> such that <math>(i,\pi(i))\in B</math> for all <math>i\in I</math>. Due to the principle of inclusion-exclusion,
:<math>N_0=\sum_{I\subseteq [n]} (-1)^{|I|}|A_I|=\sum_{k=0}^n(-1)^k\sum_{I\in{[n]\choose k}}|A_I|</math>.

The next observation is that
:<math>\sum_{I\in{[n]\choose k}}|A_I|=r_k(n-k)!</math>,
because we can count both sides by first placing <math>k</math> non-attacking rooks on <math>B</math> and placing <math>n-k</math> additional non-attacking rooks on <math>[n]\times [n]</math> in <math>(n-k)!</math> ways.

Therefore,
:<math>N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!</math>.
}}

====Derangement problem====
We use the above general method to solve the derange problem again.

Take <math>B=\{(1,1),(2,2),\ldots,(n,n)\}</math> as the chess board. A derangement <math>\pi</math> is a placement of <math>n</math> non-attacking rooks such that none of them is in <math>B</math>.
{{Chess diagram small
|
|
|=
8 |xx|__|__|__|__|__|__|__|=
7 |__|xx|__|__|__|__|__|__|=
6 |__|__|xx|__|__|__|__|__|=
5 |__|__|__|xx|__|__|__|__|=
4 |__|__|__|__|xx|__|__|__|=
3 |__|__|__|__|__|xx|__|__|=
2 |__|__|__|__|__|__|xx|__|=
1 |__|__|__|__|__|__|__|xx|=
a b c d e f g h
|
}}
Clearly, the number of ways of placing <math>k</math> non-attacking rooks on <math>B</math> is <math>r_k={n\choose k}</math>. We want to count <math>N_0</math>, which gives the number of ways of placing <math>n</math> non-attacking rooks such that none of these rooks lie in <math>B</math>.

By the above theorem
:<math>
N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!=\sum_{k=0}^n(-1)^k{n\choose k}(n-k)!=\sum_{k=0}^n(-1)^k\frac{n!}{k!}=n!\sum_{k=0}^n(-1)^k\frac{1}{k!}\approx\frac{n!}{e}.
</math>

====Problème des ménages====
Suppose that in a banquet, we want to seat <math>n</math> couples at a circular table, satisfying the following constraints:
* Men and women are in alternate places.
* No one sits next to his/her spouse.

In how many ways can this be done?

(For convenience, we assume that every seat at the table marked differently so that rotating the seats clockwise or anti-clockwise will end up with a '''different''' solution.)

First, let the <math>n</math> ladies find their seats. They may either sit at the odd numbered seats or even numbered seats, in either case, there are <math>n!</math> different orders. Thus, there are <math>2(n!)</math> ways to seat the <math>n</math> ladies.

After sitting the wives, we label the remaining <math>n</math> places clockwise as <math>0,1,\ldots, n-1</math>. And a seating of the <math>n</math> husbands is given by a permutation <math>\pi</math> of <math>[n]</math> defined as follows. Let <math>\pi(i)</math> be the seat of the husband of he lady sitting at the <math>i</math>-th place.

It is easy to see that <math>\pi</math> satisfies that <math>\pi(i)\neq i</math> and <math>\pi(i)\not\equiv i+1\pmod n</math>, and every permutation <math>\pi</math> with these properties gives a feasible seating of the <math>n</math> husbands. Thus, we only need to count the number of permutations <math>\pi</math> such that <math>\pi(i)\not\equiv i, i+1\pmod n</math>.

Take <math>B=\{(0,0),(1,1),\ldots,(n-1,n-1), (0,1),(1,2),\ldots,(n-2,n-1),(n-1,0)\}</math> as the chess board. A permutation <math>\pi</math> which defines a way of seating the husbands, is a placement of <math>n</math> non-attacking rooks such that none of them is in <math>B</math>.
{{Chess diagram small
|
|
|=
8 |xx|xx|__|__|__|__|__|__|=
7 |__|xx|xx|__|__|__|__|__|=
6 |__|__|xx|xx|__|__|__|__|=
5 |__|__|__|xx|xx|__|__|__|=
4 |__|__|__|__|xx|xx|__|__|=
3 |__|__|__|__|__|xx|xx|__|=
2 |__|__|__|__|__|__|xx|xx|=
1 |xx|__|__|__|__|__|__|xx|=
a b c d e f g h
|
}}
We need to compute <math>r_k</math>, the number of ways of placing <math>k</math> non-attacking rooks on <math>B</math>. For our choice of <math>B</math>, <math>r_k</math> is the number of ways of choosing <math>k</math> points, no two consecutive, from a collection of <math>2n</math> points arranged in a circle.

We first see how to do this in a ''line''.
{{Theorem|Lemma|
:The number of ways of choosing <math>k</math> ''non-consecutive'' objects from a collection of <math>m</math> objects arranged in a ''line'', is <math>{m-k+1\choose k}</math>.
}}
{{Proof|
We draw a line of <math>m-k</math> black points, and then insert <math>k</math> red points into the <math>m-k+1</math> spaces between the black points (including the beginning and end).
::<math>
\begin{align}
&\sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \\
&\qquad\qquad\qquad\quad\Downarrow\\
&\sqcup \, \bullet \,\, {\color{Red}\bullet} \, \bullet \,\, {\color{Red}\bullet} \, \bullet \, \sqcup \, \bullet \,\, {\color{Red}\bullet}\, \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \,\, {\color{Red}\bullet}
\end{align}
</math>
This gives us a line of <math>m</math> points, and the red points specifies the chosen objects, which are non-consecutive. The mapping is 1-1 correspondence.
There are <math>{m-k+1\choose k}</math> ways of placing <math>k</math> red points into <math>m-k+1</math> spaces.
}}

The problem of choosing non-consecutive objects in a circle can be reduced to the case that the objects are in a line.

{{Theorem|Lemma|
:The number of ways of choosing <math>k</math> ''non-consecutive'' objects from a collection of <math>m</math> objects arranged in a ''circle'', is <math>\frac{m}{m-k}{m-k\choose k}</math>.
}}
{{Proof|
Let <math>f(m,k)</math> be the desired number; and let <math>g(m,k)</math> be the number of ways of choosing <math>k</math> non-consecutive points from <math>m</math> points arranged in a circle, next coloring the <math>k</math> points red, and then coloring one of the uncolored point blue.

Clearly, <math>g(m,k)=(m-k)f(m,k)</math>.

But we can also compute <math>g(m,k)</math> as follows:
* Choose one of the <math>m</math> points and color it blue. This gives us <math>m</math> ways.
* Cut the circle to make a line of <math>m-1</math> points by removing the blue point.
* Choose <math>k</math> non-consecutive points from the line of <math>m-1</math> points and color them red. This gives <math>{m-k\choose k}</math> ways due to the previous lemma.

Thus, <math>g(m,k)=m{m-k\choose k}</math>. Therefore we have the desired number <math>f(m,k)=\frac{m}{m-k}{m-k\choose k}</math>.
}}

By the above lemma, we have that <math>r_k=\frac{2n}{2n-k}{2n-k\choose k}</math>. Then apply the theorem of counting permutations with restricted positions,
:<math>
N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!=\sum_{k=0}^n(-1)^k\frac{2n}{2n-k}{2n-k\choose k}(n-k)!.
</math>

This gives the number of ways of seating the <math>n</math> husbands ''after the ladies are seated''. Recall that there are <math>2n!</math> ways of seating the <math>n</math> ladies. Thus, the total number of ways of seating <math>n</math> couples as required by problème des ménages is
:<math>
2n!\sum_{k=0}^n(-1)^k\frac{2n}{2n-k}{2n-k\choose k}(n-k)!.
</math>

=== The Euler totient function ===
Two integers <math>m, n</math> are said to be '''relatively prime''' if their greatest common diviser <math>\mathrm{gcd}(m,n)=1</math>. For a positive integer <math>n</math>, let <math>\phi(n)</math> be the number of positive integers from <math>\{1,2,\ldots,n\}</math> that are relative prime to <math>n</math>. This function, called the Euler <math>\phi</math> function or '''the Euler totient function''', is fundamental in number theory.

We know derive a formula for this function by using the principle of inclusion-exclusion.
{{Theorem|Theorem (The Euler totient function)|
Suppose <math>n</math> is divisible by precisely <math>r</math> different primes, denoted <math>p_1,\ldots,p_r</math>. Then
:<math>\phi(n)=n\prod_{i=1}^r\left(1-\frac{1}{p_i}\right)</math>.
}}
{{Proof|
Let <math>U=\{1,2,\ldots,n\}</math> be the universe. The number of positive integers from <math>U</math> which is divisible by some <math>p_{i_1},p_{i_2},\ldots,p_{i_s}\in\{p_1,\ldots,p_r\}</math>, is <math>\frac{n}{p_{i_1}p_{i_2}\cdots p_{i_s}}</math>.

<math>\phi(n)</math> is the number of integers from <math>U</math> which is not divisible by any <math>p_1,\ldots,p_r</math>.
By principle of inclusion-exclusion,
:<math>
\begin{align}
\phi(n)
&=n+\sum_{k=1}^r(-1)^k\sum_{1\le i_1<i_2<\cdots <i_k\le n}\frac{n}{p_{i_1}p_{i_2}\cdots p_{i_k}}\\
&=n-\sum_{1\le i\le n}\frac{n}{p_i}+\sum_{1\le i<j\le n}\frac{n}{p_i p_j}-\sum_{1\le i<j<k\le n}\frac{n}{p_{i} p_{j} p_{k}}+\cdots + (-1)^r\frac{n}{p_{1}p_{2}\cdots p_{r}}\\
&=n\left(1-\sum_{1\le i\le n}\frac{1}{p_i}+\sum_{1\le i<j\le n}\frac{1}{p_i p_j}-\sum_{1\le i<j<k\le n}\frac{1}{p_{i} p_{j} p_{k}}+\cdots + (-1)^r\frac{1}{p_{1}p_{2}\cdots p_{r}}\right)\\
&=n\prod_{i=1}^n\left(1-\frac{1}{p_i}\right).
\end{align}
</math>
}}

== Reference ==
* ''Stanley,'' Enumerative Combinatorics, Volume 1, Chapter 2.
* ''van Lin and Wilson'', A course in combinatorics, Chapter 10, 15.

Combinatorics (Fall 2010)/Generating functions

2010-09-13T00:21:52Z

210.28.131.82: /* Catalan Number */

== Generating Functions ==
In Stanley's magnificent book ''Enumerative Combinatorics'', he comments the generating function as "the most useful but most difficult to understand method (for counting)".

The solution to a counting problem is usually represented as some <math>a_n</math> depending a parameter <math>n</math>. Sometimes this <math>a_n</math> is called a ''counting function'' as it is a function of the parameter <math>n</math>. <math>a_n</math> can also be treated as a infinite series:
:<math>a_0,a_1,a_2,\ldots</math>

The '''ordinary generating function (OGF)''' defined by <math>a_n</math> is
:<math>
G(x)=\sum_{n\ge 0} a_nx^n.
</math>

So <math>G(x)=a_0+a_1x+a_2x^2+\cdots</math>. An expression in this form is called a [http://en.wikipedia.org/wiki/Formal_power_series '''formal power series'''], and <math>a_0,a_1,a_2,\ldots</math> is the sequence of '''coefficients'''.

Furthermore, the generating function can be expanded as
:G(x)=<math>(\underbrace{1+\cdots+1}_{a_0})+(\underbrace{x+\cdots+x}_{a_1})+(\underbrace{x^2+\cdots+x^2}_{a_2})+\cdots+(\underbrace{x^n+\cdots+x^n}_{a_n})+\cdots</math>
so it indeed "generates" all the possible instances of the objects we want to count.

Usually, we do not evaluate the generating function <math>GF(x)</math> on any particular value. <math>x</math> remains as a '''formal variable''' without assuming any value. The numbers that we want to count are the coefficients carried by the terms in the formal power series. So far the generating function is just another way to represent the sequence
:<math>(a_0,a_1,a_2,\ldots\ldots)</math>.

The true power of generating functions comes from the various algebraic operations that we can perform on these generating functions. We use an example to demonstrate this.

=== Combinations ===
Suppose we wish to enumerate all subsets of an <math>n</math>-set. To construct a subset, we specifies for every element of the <math>n</math>-set whether the element is chosen or not. Let us denote the choice to omit an element by <math>x_0</math>, and the choice to include it by <math>x_1</math>. Using "<math>+</math>" to represent "OR", and using the multiplication to denote "AND", the choices of subsets of the <math>n</math>-set are expressed as
:<math>\underbrace{(x_0+x_1)(x_0+x_1)\cdots (x_0+x_1)}_{n\mbox{ elements}}=(x_0+x_1)^n</math>.

For example, when <math>n=3</math>, we have
:<math>\begin{align}
(x_0+x_1)^3
&=x_0x_0x_0+x_0x_0x_1+x_0x_1x_0+x_0x_1x_1\\
&\quad +x_1x_0x_0+x_1x_0x_1+x_1x_1x_0+x_1x_1x_1
\end{align}</math>.

So it "generate" all subsets of the 3-set. Writing <math>1</math> for <math>x_0</math> and <math>x</math> for <math>x_1</math>, we have <math>(1+x)^3=1+3x+3x^2+x^3</math>. The coefficient of <math>x^k</math> is the number of <math>k</math>-subsets of a 3-element set.

In general, <math>(1+x)^n</math> has the coefficients which are the number of subsets of fixed sizes of an <math>n</math>-element set.

-----

Suppose that we have twelve balls: <font color="red">3 red</font>, <font color="blue">4 blue</font>, and <font color="green">5 green</font>. Balls with the same color are indistinguishable.

We want to determine the number of ways to select <math>k</math> balls from these twelve balls, for some <math>0\le k\le 12</math>.

The generating function of this sequence is
:<math>\begin{align}
&\quad {\color{Red}(1+x+x^2+x^3)}{\color{Blue}(1+x+x^2+x^3+x^4)}{\color{OliveGreen}(1+x+x^2+x^3+x^4+x^5)}\\
&=1+3x+6x^2+10x^3+14x^4+17x^5+18x^6+17x^7+14x^8+10x^9+6x^{10}+3x^{11}+x^{12}.
\end{align}</math>
The coefficient of <math>x^k</math> gives the number of ways to select <math>k</math> balls.

=== Fibonacci numbers ===
Consider the following counting problems.
* Count the number of ways that the nonnegative integer <math>n</math> can be written as a sum of ones and twos (in order).
: The problem asks for the number of compositions of <math>n</math> with summands from <math>\{1,2\}</math>. Formally, we are counting the number of tuples <math>(x_1,x_2,\ldots,x_k)</math> for some <math>k\le n</math> such that <math>x_i\in\{1,2\}</math> and <math>x_1+x_2+\cdots+x_k=n</math>.
: Let <math>F_n</math> be the solution. We observe that a composition either starts with a 1, in which case the rest is a composition of <math>n-1</math>; or starts with a 2, in which case the rest is a composition of <math>n-2</math>. So we have the recursion for <math>F_n</math> that
::<math>F_n=F_{n-1}+F_{n-2}</math>.
* Count the ways to completely cover a <math>2\times n</math> rectangle with <math>2\times 1</math> dominos without any overlaps.
: Dominos are identical <math>2\times 1</math> rectangles, so that only their orientations --- vertical or horizontal matter.
: Let <math>F_n</math> be the solution. It also holds that <math>F_n=F_{n-1}+F_{n-2}</math>. The proof is left as an exercise.

In both problems, the solution is given by <math>F_n</math> which satisfies the following recursion.
:<math>F_n=\begin{cases}
F_{n-1}+F_{n-2} & \mbox{if }n\ge 2,\\
1 & \mbox{if }n=1\\
0 & \mbox{if }n=0.
\end{cases}</math>

<math>F_n</math> is called the [http://en.wikipedia.org/wiki/Fibonacci_number Fibonacci number].

{{Theorem|Theorem|
::<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)</math>,
:where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.
}}
The quantity <math>\phi=\frac{1+\sqrt{5}}{2}</math> is the so-called [http://en.wikipedia.org/wiki/Golden_ratio golden ratio], a constant with some significance in mathematics and aesthetics.

We now prove this theorem by using generating functions.
The ordinary generating function for the Fibonacci number <math>F_{n}</math> is
:<math>G(x)=\sum_{n\ge 0}F_n x^n</math>.
We have that <math>F_{n}=F_{n-1}+F_{n-2}</math> for <math>n\ge 2</math>, thus
:<math>\begin{align}
G(x)
&=
\sum_{n\ge 0}F_n x^n
&=
x+\sum_{n\ge 2}(F_{n-1}+F_{n-2})x^n.
\end{align}
</math>
For generating functions, there are general ways to generate <math>F_{n-1}</math> and <math>F_{n-2}</math>, or the coefficients with any smaller indices.
:<math>
\begin{align}
xG(x)
&=\sum_{n\ge 0}F_n x^{n+1}=\sum_{n\ge 1}F_{n-1} x^n=\sum_{n\ge 2}F_{n-1} x^n\\
x^2G(x)
&=\sum_{n\ge 0}F_n x^{n+2}=\sum_{n\ge 2}F_{n-2} x^n.
\end{align}
</math>
So we have
:<math>G(x)=x+(x+x^2)G(x)\,</math>,
hence
:<math>G(x)=\frac{x}{1-x-x^2}</math>.
The value of <math>F_n</math> is the coefficient of <math>x^n</math> in the Taylor series for this formular, which is <math>\frac{G^{(n)}(0)}{n!}=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>. Although this expansion works in principle, the detailed calculus is rather painful.

----
It is easier to expand the generating function by breaking it into two geometric series.
{{Theorem|Proposition|
:Let <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>. It holds that
::<math>\frac{x}{1-x-x^2}=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}</math>.
}}

It is easy to verify the above equation, but to deduce it, we need some (high school) calculation.

{|border="2" width="100%" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|
:{|
|
<math>1-x-x^2</math> has two roots <math>\frac{-1\pm\sqrt{5}}{2}</math>.

Denote that <math>\phi=\frac{2}{-1+\sqrt{5}}=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{2}{-1-\sqrt{5}}=\frac{1-\sqrt{5}}{2}</math>.

Then <math>(1-x-x^2)=(1-\phi x)(1-\hat{\phi}x)</math>, so we can write
:<math>
\begin{align}
\frac{x}{1-x-x^2}
&=\frac{x}{(1-\phi x)(1-\hat{\phi} x)}\\
&=\frac{\alpha}{(1-\phi x)}+\frac{\beta}{(1-\hat{\phi} x)},
\end{align}
</math>
where <math>\alpha</math> and <math>\beta</math> satisfying that
:<math>\begin{cases}
\alpha+\beta=0\\
\alpha\phi+\beta\hat{\phi}= -1.
\end{cases}</math>
Solving this we have that <math>\alpha=\frac{1}{\sqrt{5}}</math> and <math>\beta=-\frac{1}{\sqrt{5}}</math>. Thus,
:<math>G(x)=\frac{x}{1-x-x^2}=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}</math>.
|}
:<math>\square</math>
|}

Note that the expression <math>\frac{1}{1-z}</math> has a well known geometric expansion:
:<math>\frac{1}{1-z}=\sum_{n\ge 0}z^n</math>.

Therefore, <math>G(x)</math> can be expanded as
:<math>
\begin{align}
G(x)
&=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}\\
&=\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\phi x)^n-\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\hat{\phi} x)^n\\
&=\sum_{n\ge 0}\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)x^n.
\end{align}</math>
So the <math>n</math>th Fibonacci number is given by
:<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>.

== Solving recurrences ==
The following steps describe a general methodology of solving recurrences by generating functions.
:1. Give a recursion that computes <math>a_n</math>. In the case of Fibonacci sequence
::<math>a_n=a_{n-1}+a_{n-2}</math>.
:2. Multiply both sides of the equation by <math>x^n</math> and sum over all <math>n</math>. This gives the generating function
::<math>G(x)=\sum_{n\ge 0}a_nx^n=\sum_{n\ge 0}(a_{n-1}+a_{n-2})x^n</math>.
:: And manipulate the right hand side of the equation so that it becomes some other expression involving <math>G(x)</math>.
::<math>G(x)=x+(x+x^2)G(x)\,</math>.
:3. Solve the resulting equation to derive an explicit formula for <math>G(x)</math>.
::<math>G(x)=\frac{x}{1-x-x^2}</math>.
:4. Expand <math>G(x)</math> into a power series and read off the coefficient of <math>x^n</math>, which is a closed form for <math>a_n</math>.

The first step is usually established by combinatorial observations, or explicitly given by the problem. The third step is trivial.

The second and the forth steps need some non-trivial analytic techniques.

=== Algebraic operations on generating functions ===
The second step in the above methodology is somehow tricky. It involves first applying the recurrence to the coefficients of <math>G(x)</math>, which is easy; and then manipulating the resulting formal power series to express it in terms of <math>G(x)</math>, which is more difficult (because it works backwards).

We can apply several natural algebraic operations on the formal power series.

{{Theorem|Generating function manipulation|
:Let <math>G(x)=\sum_{n\ge 0}g_nx^n</math> and <math>F(x)=\sum_{n\ge 0}f_nx^n</math>.
----

::<math>
\begin{align}
x^k G(x)
&= \sum_{n\ge k}g_{n-k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\frac{G(x)-\sum_{i=0}^{k-1}g_iz^i}{x^k}
&=\sum_{n\ge 0}g_{n+k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\alpha F(x)+\beta G(x)
&= \sum_{n\ge 0} (\alpha f_n+\beta g_n)x^n\\
F(x)G(x)
&= \sum_{n\ge 0}\sum_{k=0}^nf_kg_{n-k}x^n\\
G(cx)
&= \sum_{n\ge 0} c^ng_n x^n\\
G'(x)
&=
\sum_{n\ge 0}(n+1)g_{n+1}x^n
\end{align}
</math>
}}

When manipulating generating functions, these rules are applied backwards; that is, from the right-hand-side to the left-hand-side.

=== Expanding generating functions ===
The last step of solving recurrences by generating function is expanding the closed form generating function <math>G(x)</math> to evaluate its <math>n</math>-th coefficient. In principle, we can always use the [http://en.wikipedia.org/wiki/Taylor_series Taylor series]
:<math>G(x)=\sum_{n\ge 0}\frac{G^{(n)}(0)}{n!}x^n</math>,
where <math>G^{(n)}(0)</math> is the value of the <math>n</math>-th derivative of <math>G(x)</math> evaluated at <math>x=0</math>.

Some interesting special cases are very useful.

====Geometric sequence====
In the example of Fibonacci numbers, we use the well known geometric series:
:<math>\frac{1}{1-x}=\sum_{n\ge 0}x^n</math>.
It is useful when we can express the generating function in the form of <math>G(x)=\frac{a_1}{1-b_1x}+\frac{a_2}{1-b_2x}+\cdots+\frac{a_k}{1-b_kx}</math>. The coefficient of <math>x^n</math> in such <math>G(x)</math> is <math>a_1b_1^n+a_2b_2^n+\cdots+a_kb_k^n</math>.

====Binomial theorem====
The <math>n</math>-th derivative of <math>(1+x)^\alpha</math> for some real <math>\alpha</math> is
:<math>\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)(1+x)^{\alpha-n}</math>.
By Taylor series, we get a generalized version of the binomial theorem known as [http://en.wikipedia.org/wiki/Binomial_coefficient#Newton.27s_binomial_series '''Newton's formula''']:
{{Theorem|Newton's formular (generalized binomial theorem)|
If <math>|x|<1</math>, then
:<math>(1+x)^\alpha=\sum_{n\ge 0}{\alpha\choose n}x^{n}</math>,
where <math>{\alpha\choose n}</math> is the '''generalized binomial coefficient''' defined by
:<math>{\alpha\choose n}=\frac{\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)}{n!}</math>.
}}

=== Example: multisets ===
In the last lecture we gave a combinatorial proof of the number of <math>k</math>-multisets on an <math>n</math>-set. Now we give a generating function approach to the problem.

Let <math>S=\{x_1,x_2,\ldots,x_n\}</math> be an <math>n</math>-element set. We have
:<math>(1+x_1+x_1^2+\cdots)(1+x_2+x_2^2+\cdots)\cdots(1+x_n+x_n^2+\cdots)=\sum_{m:S\rightarrow\mathbb{N}} \prod_{x_i\in S}x_i^{m(x_i)}</math>,
where each <math>m:S\rightarrow\mathbb{N}</math> species a possible multiset on <math>S</math> with multiplicity function <math>m</math>.

Let all <math>x_i=x</math>. Then
:<math>
\begin{align}
(1+x+x^2+\cdots)^n
&=
\sum_{m:S\rightarrow\mathbb{N}}x^{m(x_1)+\cdots+m(x_n)}\\
&=
\sum_{\text{multiset }M\text{ on }S}x^{|M|}\\
&=
\sum_{k\ge 0}\left({n\choose k}\right)x^k.
\end{align}
</math>
The last equation is due to the the definition of <math>\left({n\choose k}\right)</math>. Our task is to evaluate <math>\left({n\choose k}\right)</math>.

Due to the geometric sequence and the Newton's formula
:<math>
(1+x+x^2+\cdots)^n=(1-x)^{-n}=\sum_{k\ge 0}{-n\choose k}(-x)^k.
</math>
So
:<math>
\left({n\choose k}\right)=(-1)^k{-n\choose k}={n+k-1\choose k}.
</math>
The last equation is due to the definition of the generalized binomial coefficient. We use an analytic (generating function) proof to get the same result of <math>\left({n\choose k}\right)</math> as the combinatorial proof.

=== Example: Quicksort ===

== Catalan Number ==
We now introduce a class of counting problems, all with the same solution, called [http://en.wikipedia.org/wiki/Catalan_number '''Catalan number'''].

The <math>n</math>th Catalan number is denoted as <math>C_n</math>.
In Volume 2 of Stanley's ''Enumerative Combinatorics'', a set of exercises describe 66 different interpretations of the Catalan numbers. We give a few examples, cited from Wikipedia.
* ''C''<sub>''n''</sub> is the number of '''Dyck words''' of length 2''n''. A Dyck word is a string consisting of ''n'' X's and ''n'' Y's such that no initial segment of the string has more Y's than X's (see also [http://en.wikipedia.org/wiki/Dyck_language Dyck language]). For example, the following are the Dyck words of length 6:
<div class="center"><big> XXXYYY     XYXXYY     XYXYXY     XXYYXY     XXYXYY.</big></div>

* Re-interpreting the symbol X as an open parenthesis and Y as a close parenthesis, ''C''<sub>''n''</sub> counts the number of expressions containing ''n'' pairs of parentheses which are correctly matched:
<div class="center"><big> ((()))     ()(())     ()()()     (())()     (()()) </big></div>

* ''C''<sub>''n''</sub> is the number of different ways ''n'' + 1 factors can be completely parenthesized (or the number of ways of associating ''n'' applications of a '''binary operator'''). For ''n'' = 3, for example, we have the following five different parenthesizations of four factors:
<div class="center"><math>((ab)c)d \quad (a(bc))d \quad(ab)(cd) \quad a((bc)d) \quad a(b(cd))</math></div>

* Successive applications of a binary operator can be represented in terms of a '''full binary tree'''. (A rooted binary tree is ''full'' if every vertex has either two children or no children.) It follows that ''C''<sub>''n''</sub> is the number of full binary trees with ''n'' + 1 leaves:
[[Image:Catalan number binary tree example.png|center]]

* ''C''<sub>''n''</sub> is the number of '''monotonic paths''' along the edges of a grid with ''n'' × ''n'' square cells, which do not pass above the diagonal. A monotonic path is one which starts in the lower left corner, finishes in the upper right corner, and consists entirely of edges pointing rightwards or upwards. Counting such paths is equivalent to counting Dyck words: X stands for "move right" and Y stands for "move up". The following diagrams show the case ''n'' = 4:
[[Image:Catalan number 4x4 grid example.svg.png|450px|center]]

* ''C''<sub>''n''</sub> is the number of different ways a [http://en.wikipedia.org/wiki/Convex_polygon '''convex polygon'''] with ''n'' + 2 sides can be cut into '''triangles''' by connecting vertices with straight lines. The following hexagons illustrate the case ''n'' = 4:
[[Image:Catalan-Hexagons-example.png|400px|center]]

* ''C''<sub>''n''</sub> is the number of [http://en.wikipedia.org/wiki/Stack_(data_structure) '''stack''']-sortable permutations of {1, ..., ''n''}. A permutation ''w'' is called '''stack-sortable''' if ''S''(''w'') = (1, ..., ''n''), where ''S''(''w'') is defined recursively as follows: write ''w'' = ''unv'' where ''n'' is the largest element in ''w'' and ''u'' and ''v'' are shorter sequences, and set ''S''(''w'') = ''S''(''u'')''S''(''v'')''n'', with ''S'' being the identity for one-element sequences.

* ''C''<sub>''n''</sub> is the number of ways to tile a stairstep shape of height ''n'' with ''n'' rectangles. The following figure illustrates the case ''n'' = 4:
[[Image:Catalan stairsteps 4.png|400px|center]]

{{Theorem|Recurrence relation for Catalan numbers|
:<math>C_0=1</math>, and for <math>n\ge1</math>,
::<math>
C_n=
\sum_{k=0}^{n-1}C_kC_{n-1-k}</math>.
}}

Let <math>G(x)=\sum_{n\ge 0}C_nx^n</math> be the generating function. Then
:<math>
\begin{align}
G(x)^2
&=\sum_{n\ge 0}\sum_{k=0}^{n}C_kC_{n-k}x^n\\
xG(x)^2
&=\sum_{n\ge 0}\sum_{k=0}^{n}C_kC_{n-k}x^{n+1}=\sum_{n\ge 1}\sum_{k=0}^{n-1}C_kC_{n-1-k}x^n.
\end{align}
</math>
Due to the recurrence,
:<math>G(x)=\sum_{n\ge 0}C_nx^n=C_0+\sum_{n\ge 1}\sum_{k=1}^{n-1}C_kC_{n-k}x^n=1+xG(x)^2</math>.
Solving <math>xG(x)^2-G(x)+1=0</math>, we obtain
:<math>G(x)=\frac{1\pm(1-4x)^{1/2}}{2x}</math>.
Only one of these functions can be the generating function for <math>C_n</math>, and it must satisfy
:<math>\lim_{x\rightarrow 0}G(x)=C_0=1</math>.
It is easy to check that the correct function is
:<math>G(x)=\frac{1-(1-4x)^{1/2}}{2x}</math>.
Expanding <math>(1-4x)^{1/2}</math> by Newton's formula,
:<math>
\begin{align}
(1-4x)^{1/2}
&=
\sum_{n\ge 0}{1/2\choose n}(-4x)^n\\
&=
1+\sum_{n\ge 1}{1/2\choose n}(-4x)^n\\
&=
1-4x\sum_{n\ge 0}{1/2\choose n+1}(-4x)^n
\end{align}
</math>
Then, we have
:<math>
\begin{align}
G(x)
&=
\frac{1-(1-4x)^{1/2}}{2x}\\
&=
2\sum_{n\ge 0}{1/2\choose n+1}(-4x)^n
\end{align}
</math>
Thus,
:<math>
\begin{align}
C_n
&=2{1/2\choose n+1}(-4)^n\\
&=2\cdot\left(\frac{1}{2}\cdot\frac{-1}{2}\cdot\frac{-3}{2}\cdots\frac{-(2n-1)}{2}\right)\cdot\frac{1}{(n+1)!}\cdot(-4)^n\\
&=\frac{2^n}{(n+1)!}\prod_{k=1}^n(2k-1)\\
&=\frac{2^n}{(n+1)!}\prod_{k=1}^n\frac{(2k-1)2k}{2k}\\
&=\frac{1}{n!(n+1)!}\prod_{k=1}^n (2k-1)2k\\
&=\frac{(2n)!}{n!(n+1)!}\\
&=\frac{1}{n+1}{2n\choose n}.
\end{align}
</math>
So we prove the following closed form for Catalan number.
{{Theorem|Theorem|
:<math>C_n=\frac{1}{n+1}{2n\choose n}</math>.
}}

== Reference ==
* ''Graham, Knuth, and Patashnik'', Concrete Mathematics: A Foundation for Computer Science, Chapter 7.
* ''Cameron'', Combinatorics: Topics, Techniques, Algorithms, Chapter 4.
* ''van Lin and Wilson'', A course in combinatorics, Chapter 14.

Combinatorics (Fall 2010)/Partitions, sieve methods

2010-09-12T00:32:00Z

210.28.131.82: /* Ferrers diagram */

== Partitions ==
We count the ways of partitioning <math>n</math> ''identical'' objects into <math>k</math> ''unordered'' groups. This is equivalent to counting the ways partitioning a number <math>n</math> into <math>k</math> unordered parts.

A '''<math>k</math>-partition''' of a number <math>n</math> is a multiset <math>\{x_1,x_2,\ldots,x_k\}</math> with <math>x_i\ge 1</math> for every element <math>x_i</math> and <math>x_1+x_2+\cdots+x_k=n</math>.

We define <math>p_k(n)</math> as the number of <math>k</math>-partitions of <math>n</math>.

For example, number 7 has the following partitions:
<div class="center"><math>
\begin{align}
&\{7\}
& p_1(7)=1\\
&\{1,6\},\{2,5\},\{3,4\}
& p_2(7)=3\\
&\{1,1,5\}, \{1,2,4\}, \{1,3,3\}, \{2,2,3\}
& p_3(7)=4\\
&\{1,1,1,4\},\{1,1,2,3\}, \{1,2,2,2\}
& p_4(7)=3\\
&\{1,1,1,1,3\},\{1,1,1,2,2\}
& p_5(7)=2\\
&\{1,1,1,1,1,2\}
& p_6(7)=1\\
&\{1,1,1,1,1,1,1\}
& p_7(7)=1
\end{align}
</math></div>

Equivalently, we can also define that A <math>k</math>-partition of a number <math>n</math> is a <math>k</math>-tuple <math>(x_1,x_2,\ldots,x_k)</math> with:
* <math>x_1\ge x_2\ge\cdots\ge x_k\ge 1</math>;
* <math>x_1+x_2+\cdots+x_k=n</math>.

<math>p_k(n)</math> the number of integral solutions to the above system.

Let <math>p(n)=\sum_{k=1}^n p_k(n)</math> be the total number of partitions of <math>n</math>. The function <math>p(n)</math> is called the '''partition number'''.

=== Counting <math>p_k(n)</math>===
We now try to determine <math>p_k(n)</math>. Unlike most problems we learned in the last lecture, <math>p_k(n)</math> does not have a nice closed form formula. We now give a recurrence for <math>p_k(n)</math>.

{{Theorem|Proposition|
:<math>p_k(n)=p_{k-1}(n-1)+p_k(n-k)\,</math>.
}}
{{Proof|
Suppose that <math>(x_1,\ldots,x_k)</math> is a <math>k</math>-partition of <math>n</math>. Note that it must hold that
:<math>x_1\ge x_2\ge \cdots \ge x_k\ge 1</math>.
There are two cases: <math>x_k=1</math> or <math>x_k>1</math>.
;Case 1.
:If <math>x_k=1</math>, then <math>(x_1,\cdots,x_{k-1})</math> is a distinct <math>(k-1)</math>-partition of <math>n-1</math>. And every <math>(k-1)</math>-partition of <math>n-1</math> can be obtained in this way. Thus the number of <math>k</math>-partitions of <math>n</math> in this case is <math>p_{k-1}(n-1)</math>.
;Case 2.
:If <math>x_k>1</math>, then <math>(x_1-1,\cdots,x_{k}-1)</math> is a distinct <math>k</math>-partition of <math>n-k</math>. And every <math>k</math>-partition of <math>n-k</math> can be obtained in this way. Thus the number of <math>k</math>-partitions of <math>n</math> in this case is <math>p_{k}(n-k)</math>.
In conclusion, the number of <math>k</math>-partitions of <math>n</math> is <math>p_{k-1}(n-1)+p_k(n-k)</math>, i.e.
:<math>p_k(n)=p_{k-1}(n-1)+p_k(n-k)\,</math>.
}}

Use the above recurrence, we can compute the <math>p_k(n)</math> for some decent <math>n</math> and <math>k</math> by computer simulation.

If we are not restricted ourselves to the precise estimation of <math>p_k(n)</math>, the next theorem gives an asymptotic estimation of <math>p_k(n)</math>. Note that it only holds for '''constant''' <math>k</math>, i.e. <math>k</math> does not depend on <math>n</math>.

{{Theorem|Theorem|
For any fixed <math>k</math>,
:<math>p_k(n)\sim\frac{n^{k-1}}{k!(k-1)!}</math>,
as <math>n\rightarrow \infty</math>.
}}
{{Proof|
Suppose that <math>(x_1,\ldots,x_k)</math> is a <math>k</math>-partition of <math>n</math>. Then <math>x_1+x_2+\cdots+x_k=n</math> and <math>x_1\ge x_2\ge \cdots \ge x_k\ge 1</math>.

The <math>k!</math> permutations of <math>(x_1,\ldots,x_k)</math> yield at most <math>k!</math> many <math>k</math>-compositions (the ''ordered'' sum of <math>k</math> positive integers). There are <math>{n-1\choose k-1}</math> many <math>k</math>-compositions of <math>n</math>, every one of which can be yielded in this way by permuting a partition. Thus,
:<math>k!p_k(n)\ge{n-1\choose k-1}</math>.

Let <math>y_i=x_i+k-i</math>. That is, <math>y_k=x_k, y_{k-1}=x_k+1, y_{k-2}=x_k+2,\ldots, y_{1}=x_k+k-1</math>. Then, it holds that
* <math>y_1>y_2>\cdots>y_k\ge 1</math>; and
* <math>y_1+y_2+\cdots+y_k=n+\frac{k(k-1)}{2}</math>.
Each permutation of <math>(y_1,y_2,\ldots,y_k)</math> yields a '''distinct''' <math>k</math>-composition of <math>n+\frac{k(k-1)}{2}</math>, because all <math>y_i</math> are distinct.
Thus,
:<math>k!p_k(n)\le {n+\frac{k(k-1)}{2}-1\choose k-1}</math>.

Combining the two inequalities, we have
:<math>\frac{{n-1\choose k-1}}{k!}\le p_k(n)\le \frac{{n+\frac{k(k-1)}{2}-1\choose k-1}}{k!}</math>.
The theorem follows.
}}

=== Ferrers diagram ===
A partition of a number <math>n</math> can be represented as a diagram of dots (or squares), called a '''Ferrers diagram''' (the square version of Ferrers diagram is also called a '''Young diagram''', named after a structured called Young tableaux).

Let <math>(x_1,x_2,\ldots,x_k)</math> with that <math>x_1\ge x_2\ge \cdots x_k\ge 1</math> be a partition of <math>n</math>. Its Ferrers diagram consists of <math>k</math> rows, where the <math>i</math>-th row contains <math>x_i</math> dots (or squares).

<div class="center">
{|border="0"
|
{|border="0"
|[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]
|-
|[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]
|-
|[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]
|-
|[[File:Chess xot45.svg|22px]]
|}
|
[[File:Chess t45.svg|120px]]
|align=center|
{|border="2" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|}
|-
|align=center|Ferrers diagram (''dot version'') of (5,4,2,1)||
|align=center|Ferrers diagram (''square version'') of (5,4,2,1)
|}
</div>

;Conjugate partition
The partition we get by reading the Ferrers diagram by column instead of rows is called the '''conjugate''' of the original partition.
<div class="center">
{|border="0"
|align=center|
{|border="2" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|}
|
[[File:Chess t45.svg|120px]]
|align=center|
{|border="2" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|}
|-
|align=center|<math>(6,4,4,2,1)</math>||
|align=center|conjugate: <math>(5,4,3,3,1,1)</math>
|}
</div>

Clearly,
* different partitions cannot have the same conjugate, and
* every partition of <math>n</math> is the conjugate of some partition of <math>n</math>,
so the conjugation mapping is a permutation on the set of partitions of <math>n</math>. This fact is very useful in proving theorems for partitions numbers.

Some theorems of partitions can be easily proved by representing partitions in Ferrers diagrams.

{{Theorem|Proposition|
# The number of partitions of <math>n</math> which have largest summand <math>k</math>, is <math>p_k(n)</math>.
# The number of <math>n</math> into <math>k</math> parts equals the number of partitions of <math>n-k</math> into at most <math>k</math> parts. Formally,
::<math>p_k(n)=\sum_{j=1}^k p_j(n-k)</math>.
}}
{{Proof|
# For every <math>k</math>-partition, the conjugate partition has largest part <math>k</math>. And vice versa.
# For a <math>k</math>-partition of <math>n</math>, remove the leftmost cell of every row of the Ferrers diagram. Totally <math>k</math> cells are removed and the remaining diagram is a partition of <math>n-k</math> into at most <math>k</math> parts. And for a partition of <math>n-k</math> into at most <math>k</math> parts, add a cell to each of the <math>k</math> rows (including the empty ones). This will give us a <math>k</math>-partition of <math>n</math>. It is easy to see the above mappings are 1-1 correspondences. Thus, the number of <math>n</math> into <math>k</math> parts equals the number of partitions of <math>n-k</math> into at most <math>k</math> parts.
}}

== Principle of Inclusion-Exclusion ==
Let <math>A</math> and <math>B</math> be two finite sets. The cardinality of their union is
:<math>|A\cup B|=|A|+|B|-{\color{Blue}|A\cap B|}</math>.
For three sets <math>A</math>, <math>B</math>, and <math>C</math>, the cardinality of the union of these three sets is computed as
:<math>|A\cup B\cup C|=|A|+|B|+|C|-{\color{Blue}|A\cap B|}-{\color{Blue}|A\cap C|}-{\color{Blue}|B\cap C|}+{\color{Red}|A\cap B\cap C|}</math>.
This is illustrated by the following figure.
::[[Image:Inclusion-exclusion.png|200px|border|center]]

Generally, the '''Principle of Inclusion-Exclusion''' states the rule for computing the union of <math>n</math> finite sets <math>A_1,A_2,\ldots,A_n</math>, such that
{{Equation|
<math>
\begin{align}
\left|\bigcup_{i=1}^nA_i\right|
&=
\sum_{I\subseteq\{1,\ldots,n\}}(-1)^{|I|-1}\left|\bigcap_{i\in I}A_i\right|.
\end{align}
</math>
}}
-----

In combinatorial enumeration, the Principle of Inclusion-Exclusion is usually applied in its complement form.

Let <math>A_1,A_2,\ldots,A_n\subseteq U</math> be subsets of some finite set <math>U</math>. Here <math>U</math> is some universe of combinatorial objects, whose cardinality is easy to calculate (e.g. all strings, tuples, permutations), and each <math>A_i</math> contains the objects with some specific property (e.g. a "pattern") which we want to avoid. The problem is to count the number of objects without any of the <math>n</math> properties. We write <math>\bar{A_i}=U-A</math>. The number of objects without any of the properties <math>A_1,A_2,\ldots,A_n</math> is
{{Equation|
<math>
\begin{align}
\left|\bar{A_1}\cap\bar{A_2}\cap\cdots\cap\bar{A_n}\right|=\left|U-\bigcup_{i=1}^nA_i\right|
&=
|U|-\sum_{I\subseteq\{1,\ldots,n\}}(-1)^{|I|}\left|\bigcap_{i\in I}A_i\right|.
\end{align}
</math>
}}
For an <math>I\subseteq\{1,2,\ldots,n\}</math>, we denote
:<math>A_I=\bigcap_{i\in I}A_i</math>
with the convention that <math>A_\emptyset=U</math>. The above equation is stated as:
{{Theorem|Principle of Inclusion-Exclusion|
:Let <math>A_1,A_2,\ldots,A_n</math> be a family of subsets of <math>U</math>. Then the number of elements of <math>U</math> which lie in none of the subsets <math>A_i</math> is
::<math>\sum_{I\subseteq\{1,\ldots, n\}}(-1)^{|I|}|A_I|</math>.
}}

Let <math>S_k=\sum_{|I|=k}|A_I|\,</math>. Conventionally, <math>S_0=|A_\emptyset|=|U|</math>. The principle of inclusion-exclusion can be expressed as
{{Equation|<math>
S_0-S_1+S_2+\cdots+(-1)^nS_n.
</math>
}}

=== Surjections ===
In the twelvefold way, we discuss the counting problems incurred by the mappings <math>f:N\rightarrow M</math>. The basic case is that elements from both <math>N</math> and <math>M</math> are distinguishable. In this case, it is easy to count the number of arbitrary mappings (which is <math>m^n</math>) and the number of injective (one-to-one) mappings (which is <math>(m)_n</math>), but the number of surjective is difficult. Here we apply the principle of inclusion-exclusion to count the number of surjective (onto) mappings.
{{Theorem|Theorem|
:The number of surjective mappings from an <math>n</math>-set to an <math>m</math>-set is given by
::<math>\sum_{k=1}^m(-1)^{m-k}{m\choose k}k^n</math>.
}}
{{Proof|
Let <math>U=\{f:[n]\rightarrow[m]\}</math> be the set of mappings from <math>[n]</math> to <math>[m]</math>. Then <math>|U|=m^n</math>.

For <math>i\in[m]</math>, let <math>A_i</math> be the set of mappings <math>f:[n]\rightarrow[m]</math> that none of <math>j\in[n]</math> is mapped to <math>i</math>, i.e. <math>A_i=\{f:[n]\rightarrow[m]\setminus\{i\}\}</math>, thus <math>|A_i|=(m-1)^n</math>.

More generally, for <math>I\subseteq [m]</math>, <math>A_I=\bigcap_{i\in I}A_i</math> contains the mappings <math>f:[n]\rightarrow[m]\setminus I</math>. And <math>|A_I|=(m-|I|)^n\,</math>.

A mapping <math>f:[n]\rightarrow[m]</math> is surjective if <math>f</math> lies in none of <math>A_i</math>. By the principle of inclusion-exclusion, the number of surjective <math>f:[n]\rightarrow[m]</math> is
:<math>\sum_{I\subseteq[m]}(-1)^{|I|}\left|A_I\right|=\sum_{I\subseteq[m]}(-1)^{|I|}(m-|I|)^n=\sum_{j=0}^m(-1)^j{m\choose j}(m-j)^n</math>.
Let <math>k=m-j</math>. The theorem is proved.
}}

Recall that, in the twelvefold way, we establish a relation between surjections and partitions.

* Surjection to ordered partition:
:For a surjective <math>f:[n]\rightarrow[m]</math>, <math>(f^{-1}(0),f^{-1}(1),\ldots,f^{-1}(m-1))</math> is an '''ordered partition''' of <math>[n]</math>.
* Ordered partition to surjection:
:For an ordered <math>m</math>-partition <math>(B_0,B_1,\ldots, B_{m-1})</math> of <math>[n]</math>, we can define a function <math>f:[n]\rightarrow[m]</math> by letting <math>f(i)=j</math> if and only if <math>i\in B_j</math>. <math>f</math> is surjective since as a partition, none of <math>B_i</math> is empty.

Therefore, we have a one-to-one correspondence between surjective mappings from an <math>n</math>-set to an <math>m</math>-set and the ordered <math>m</math>-partitions of an <math>n</math>-set.

The Stirling number of the second kind <math>S(n,m)</math> is the number of <math>m</math>-partitions of an <math>n</math>-set. There are <math>m!</math> ways to order an <math>m</math>-partition, thus the number of surjective mappings <math>f:[n]\rightarrow[m]</math> is <math>m! S(n,m)</math>. Combining with what we have proved for surjections, we give the following result for the Stirling number of the second kind.

{{Theorem|Proposition|
:<math>S(n,m)=\frac{1}{m!}\sum_{k=1}^m(-1)^{m-k}{m\choose k}k^n</math>.
}}

=== Derangements ===
We now count the number of bijections from a set to itself with no fixed points. This is the '''derangement problem'''.

For a permutation <math>\pi</math> of <math>\{1,2,\ldots,n\}</math>, a '''fixed point''' is such an <math>i\in\{1,2,\ldots,n\}</math> that <math>\pi(i)=i</math>.
A [http://en.wikipedia.org/wiki/Derangement '''derangement'''] of <math>\{1,2,\ldots,n\}</math> is a permutation of <math>\{1,2,\ldots,n\}</math> that has no fixed points.

{{Theorem|Theorem|
:The number of derangements of <math>\{1,2,\ldots,n\}</math> given by
::<math>n!\sum_{k=0}^n\frac{(-1)^k}{k!}\approx \frac{n!}{\mathrm{e}}</math>.
}}
{{Proof|
Let <math>U</math> be the set of all permutations of <math>\{1,2,\ldots,n\}</math>. So <math>|U|=n!</math>.

Let <math>A_i</math> be the set of permutations with fixed point <math>i</math>; so <math>|A_i|=(n-1)!</math>. More generally, for any <math>I\subseteq \{1,2,\ldots,n\}</math>, <math>A_I=\bigcap_{i\in I}A_i</math>, and <math>|A_I|=(n-|I|)!</math>, since permutations in <math>A_I</math> fix every point in <math>I</math> and permute the remaining points arbitrarily. A permutation is a derangement if and only if it lies in none of the sets <math>A_i</math>. So the number of derangements is
:<math>\sum_{I\subseteq\{1,2,\ldots,n\}}(-1)^{|I|}(n-|I|)!=\sum_{k=0}^n(-1)^k{n\choose k}(n-k)!=n!\sum_{k=0}^n\frac{(-1)^k}{k!}.</math>
By Taylor's series,
:<math>\frac{1}{\mathrm{e}}=\sum_{k=0}^\infty\frac{(-1)^k}{k!}=\sum_{k=0}^n\frac{(-1)^k}{k!}\pm o\left(\frac{1}{n!}\right)</math>.
It is not hard to see that <math>n!\sum_{k=0}^n\frac{(-1)^k}{k!}</math> is the closest integer to <math>\frac{n!}{\mathrm{e}}</math>.
}}

Therefore, there are about <math>\frac{1}{\mathrm{e}}</math> fraction of all permutations with no fixed points.

=== Permutations with restricted positions ===
We introduce a general theory of counting permutations with restricted positions. In the derangement problem, we count the number of permutations that <math>\pi(i)\neq i</math>. We now generalize to the problem of counting permutations which avoid a set of arbitrarily specified positions.

It is traditionally described using terminology from the game of chess. Let <math>B\subseteq \{1,\ldots,n\}\times \{1,\ldots,n\}</math>, called a '''board'''. As illustrated below, we can think of <math>B</math> as a chess board, with the positions in <math>B</math> marked by "<math>\times</math>".
{{Chess diagram small
|
|
|=
8 |__|xx|xx|__|xx|__|__|xx|=
7 |xx|__|__|xx|__|__|xx|__|=
6 |xx|__|xx|xx|__|xx|xx|__|=
5 |__|xx|__|__|xx|__|xx|__|=
4 |xx|__|__|__|xx|xx|xx|__|=
3 |__|xx|__|xx|__|__|__|xx|=
2 |__|__|xx|__|xx|__|__|xx|=
1 |xx|__|__|xx|__|xx|__|__|=
a b c d e f g h
|
}}
For a permutation <math>\pi</math> of <math>\{1,\ldots,n\}</math>, define the '''graph''' <math>G_\pi(V,E)</math> as
:<math>
\begin{align}
G_\pi &= \{(i,\pi(i))\mid i\in \{1,2,\ldots,n\}\}.
\end{align}
</math>
This can also be viewed as a set of marked positions on a chess board. Each row and each column has only one marked position, because <math>\pi</math> is a permutation. Thus, we can identify each <math>G_\pi</math> as a placement of <math>n</math> rooks (“城堡”，规则同中国象棋里的“车”) without attacking each other.

For example, the following is the <math>G_\pi</math> of such <math>\pi</math> that <math>\pi(i)=i</math>.
{{Chess diagram small
|
|
|=
8 |rl|__|__|__|__|__|__|__|=
7 |__|rl|__|__|__|__|__|__|=
6 |__|__|rl|__|__|__|__|__|=
5 |__|__|__|rl|__|__|__|__|=
4 |__|__|__|__|rl|__|__|__|=
3 |__|__|__|__|__|rl|__|__|=
2 |__|__|__|__|__|__|rl|__|=
1 |__|__|__|__|__|__|__|rl|=
a b c d e f g h
|
}}
Now define
:<math>\begin{align}
N_0 &= \left|\left\{\pi\mid B\cap G_\pi=\emptyset\right\}\right|\\
r_k &= \mbox{number of }k\mbox{-subsets of }B\mbox{ such that no two elements have a common coordinate}\\
&=\left|\left\{S\in{B\choose k} \,\bigg|\, \forall (i_1,j_1),(i_2,j_2)\in S, i_1\neq i_2, j_1\neq j_2 \right\}\right|
\end{align}
</math>
Interpreted in chess game,
* <math>B</math>: a set of marked positions in an <math>[n]\times [n]</math> chess board.
* <math>N_0</math>: the number of ways of placing <math>n</math> non-attacking rooks on the chess board such that none of these rooks lie in <math>B</math>.
* <math>r_k</math>: number of ways of placing <math>k</math> non-attacking rooks on <math>B</math>.

Our goal is to count <math>N_0</math> in terms of <math>r_k</math>. This gives the number of permutations avoid all positions in a <math>B</math>.

{{Theorem|Theorem|
:<math>N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!</math>.
}}
{{Proof|
For each <math>i\in[n]</math>, let <math>A_i=\{\pi\mid (i,\pi(i))\in B\}</math> be the set of permutations <math>\pi</math> whose <math>i</math>-th position is in <math>B</math>.

<math>N_0</math> is the number of permutations avoid all positions in <math>B</math>. Thus, our goal is to count the number of permutations <math>\pi</math> in none of <math>A_i</math> for <math>i\in [n]</math>.

For each <math>I\subseteq [n]</math>, let <math>A_I=\bigcap_{i\in I}A_i</math>, which is the set of permutations <math>\pi</math> such that <math>(i,\pi(i))\in B</math> for all <math>i\in I</math>. Due to the principle of inclusion-exclusion,
:<math>N_0=\sum_{I\subseteq [n]} (-1)^{|I|}|A_I|=\sum_{k=0}^n(-1)^k\sum_{I\in{[n]\choose k}}|A_I|</math>.

The next observation is that
:<math>\sum_{I\in{[n]\choose k}}|A_I|=r_k(n-k)!</math>,
because we can count both sides by first placing <math>k</math> non-attacking rooks on <math>B</math> and placing <math>n-k</math> additional non-attacking rooks on <math>[n]\times [n]</math> in <math>(n-k)!</math> ways.

Therefore,
:<math>N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!</math>.
}}

====Derangement problem====
We use the above general method to solve the derange problem again.

Take <math>B=\{(1,1),(2,2),\ldots,(n,n)\}</math> as the chess board. A derangement <math>\pi</math> is a placement of <math>n</math> non-attacking rooks such that none of them is in <math>B</math>.
{{Chess diagram small
|
|
|=
8 |xx|__|__|__|__|__|__|__|=
7 |__|xx|__|__|__|__|__|__|=
6 |__|__|xx|__|__|__|__|__|=
5 |__|__|__|xx|__|__|__|__|=
4 |__|__|__|__|xx|__|__|__|=
3 |__|__|__|__|__|xx|__|__|=
2 |__|__|__|__|__|__|xx|__|=
1 |__|__|__|__|__|__|__|xx|=
a b c d e f g h
|
}}
Clearly, the number of ways of placing <math>k</math> non-attacking rooks on <math>B</math> is <math>r_k={n\choose k}</math>. We want to count <math>N_0</math>, which gives the number of ways of placing <math>n</math> non-attacking rooks such that none of these rooks lie in <math>B</math>.

By the above theorem
:<math>
N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!=\sum_{k=0}^n(-1)^k{n\choose k}(n-k)!=\sum_{k=0}^n(-1)^k\frac{n!}{k!}=n!\sum_{k=0}^n(-1)^k\frac{1}{k!}\approx\frac{n!}{e}.
</math>

====Problème des ménages====
Suppose that in a banquet, we want to seat <math>n</math> couples at a circular table, satisfying the following constraints:
* Men and women are in alternate places.
* No one sits next to his/her spouse.

In how many ways can this be done?

(For convenience, we assume that every seat at the table marked differently so that rotating the seats clockwise or anti-clockwise will end up with a '''different''' solution.)

First, let the <math>n</math> ladies find their seats. They may either sit at the odd numbered seats or even numbered seats, in either case, there are <math>n!</math> different orders. Thus, there are <math>2(n!)</math> ways to seat the <math>n</math> ladies.

After sitting the wives, we label the remaining <math>n</math> places clockwise as <math>0,1,\ldots, n-1</math>. And a seating of the <math>n</math> husbands is given by a permutation <math>\pi</math> of <math>[n]</math> defined as follows. Let <math>\pi(i)</math> be the seat of the husband of he lady sitting at the <math>i</math>-th place.

It is easy to see that <math>\pi</math> satisfies that <math>\pi(i)\neq i</math> and <math>\pi(i)\not\equiv i+1\pmod n</math>, and every permutation <math>\pi</math> with these properties gives a feasible seating of the <math>n</math> husbands. Thus, we only need to count the number of permutations <math>\pi</math> such that <math>\pi(i)\not\equiv i, i+1\pmod n</math>.

Take <math>B=\{(0,0),(1,1),\ldots,(n-1,n-1), (0,1),(1,2),\ldots,(n-2,n-1),(n-1,0)\}</math> as the chess board. A permutation <math>\pi</math> which defines a way of seating the husbands, is a placement of <math>n</math> non-attacking rooks such that none of them is in <math>B</math>.
{{Chess diagram small
|
|
|=
8 |xx|xx|__|__|__|__|__|__|=
7 |__|xx|xx|__|__|__|__|__|=
6 |__|__|xx|xx|__|__|__|__|=
5 |__|__|__|xx|xx|__|__|__|=
4 |__|__|__|__|xx|xx|__|__|=
3 |__|__|__|__|__|xx|xx|__|=
2 |__|__|__|__|__|__|xx|xx|=
1 |xx|__|__|__|__|__|__|xx|=
a b c d e f g h
|
}}
We need to compute <math>r_k</math>, the number of ways of placing <math>k</math> non-attacking rooks on <math>B</math>. For our choice of <math>B</math>, <math>r_k</math> is the number of ways of choosing <math>k</math> points, no two consecutive, from a collection of <math>2n</math> points arranged in a circle.

We first see how to do this in a ''line''.
{{Theorem|Lemma|
:The number of ways of choosing <math>k</math> ''non-consecutive'' objects from a collection of <math>m</math> objects arranged in a ''line'', is <math>{m-k+1\choose k}</math>.
}}
{{Proof|
We draw a line of <math>m-k</math> black points, and then insert <math>k</math> red points into the <math>m-k+1</math> spaces between the black points (including the beginning and end).
::<math>
\begin{align}
&\sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \\
&\qquad\qquad\qquad\quad\Downarrow\\
&\sqcup \, \bullet \,\, {\color{Red}\bullet} \, \bullet \,\, {\color{Red}\bullet} \, \bullet \, \sqcup \, \bullet \,\, {\color{Red}\bullet}\, \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \,\, {\color{Red}\bullet}
\end{align}
</math>
This gives us a line of <math>m</math> points, and the red points specifies the chosen objects, which are non-consecutive. The mapping is 1-1 correspondence.
There are <math>{m-k+1\choose k}</math> ways of placing <math>k</math> red points into <math>m-k+1</math> spaces.
}}

The problem of choosing non-consecutive objects in a circle can be reduced to the case that the objects are in a line.

{{Theorem|Lemma|
:The number of ways of choosing <math>k</math> ''non-consecutive'' objects from a collection of <math>m</math> objects arranged in a ''circle'', is <math>\frac{m}{m-k}{m-k\choose k}</math>.
}}
{{Proof|
Let <math>f(m,k)</math> be the desired number; and let <math>g(m,k)</math> be the number of ways of choosing <math>k</math> non-consecutive points from <math>m</math> points arranged in a circle, next coloring the <math>k</math> points red, and then coloring one of the uncolored point blue.

Clearly, <math>g(m,k)=(m-k)f(m,k)</math>.

But we can also compute <math>g(m,k)</math> as follows:
* Choose one of the <math>m</math> points and color it blue. This gives us <math>m</math> ways.
* Cut the circle to make a line of <math>m-1</math> points by removing the blue point.
* Choose <math>k</math> non-consecutive points from the line of <math>m-1</math> points and color them red. This gives <math>{m-k\choose k}</math> ways due to the previous lemma.

Thus, <math>g(m,k)=m{m-k\choose k}</math>. Therefore we have the desired number <math>f(m,k)=\frac{m}{m-k}{m-k\choose k}</math>.
}}

By the above lemma, we have that <math>r_k=\frac{2n}{2n-k}{2n-k\choose k}</math>. Then apply the theorem of counting permutations with restricted positions,
:<math>
N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!=\sum_{k=0}^n(-1)^k\frac{2n}{2n-k}{2n-k\choose k}(n-k)!.
</math>

This gives the number of ways of seating the <math>n</math> husbands ''after the ladies are seated''. Recall that there are <math>2n!</math> ways of seating the <math>n</math> ladies. Thus, the total number of ways of seating <math>n</math> couples as required by problème des ménages is
:<math>
2n!\sum_{k=0}^n(-1)^k\frac{2n}{2n-k}{2n-k\choose k}(n-k)!.
</math>

=== The Euler totient function ===
Two integers <math>m, n</math> are said to be '''relatively prime''' if their greatest common diviser <math>\mathrm{gcd}(m,n)=1</math>. For a positive integer <math>n</math>, let <math>\phi(n)</math> be the number of positive integers from <math>\{1,2,\ldots,n\}</math> that are relative prime to <math>n</math>. This function, called the Euler <math>\phi</math> function or '''the Euler totient function''', is fundamental in number theory.

We know derive a formula for this function by using the principle of inclusion-exclusion.
{{Theorem|Theorem (The Euler totient function)|
Suppose <math>n</math> is divisible by precisely <math>r</math> different primes, denoted <math>p_1,\ldots,p_r</math>. Then
:<math>\phi(n)=n\prod_{i=1}^r\left(1-\frac{1}{p_i}\right)</math>.
}}
{{Proof|
Let <math>U=\{1,2,\ldots,n\}</math> be the universe. The number of positive integers from <math>U</math> which is divisible by some <math>p_{i_1},p_{i_2},\ldots,p_{i_s}\in\{p_1,\ldots,p_r\}</math>, is <math>\frac{n}{p_{i_1}p_{i_2}\cdots p_{i_s}}</math>.

<math>\phi(n)</math> is the number of integers from <math>U</math> which is not divisible by any <math>p_1,\ldots,p_r</math>.
By principle of inclusion-exclusion,
:<math>
\begin{align}
\phi(n)
&=n+\sum_{k=1}^r(-1)^k\sum_{1\le i_1<i_2<\cdots <i_k\le n}\frac{n}{p_{i_1}p_{i_2}\cdots p_{i_k}}\\
&=n-\sum_{1\le i\le n}\frac{n}{p_i}+\sum_{1\le i<j\le n}\frac{n}{p_i p_j}-\sum_{1\le i<j<k\le n}\frac{n}{p_{i} p_{j} p_{k}}+\cdots + (-1)^r\frac{n}{p_{1}p_{2}\cdots p_{r}}\\
&=n\left(1-\sum_{1\le i\le n}\frac{1}{p_i}+\sum_{1\le i<j\le n}\frac{1}{p_i p_j}-\sum_{1\le i<j<k\le n}\frac{1}{p_{i} p_{j} p_{k}}+\cdots + (-1)^r\frac{1}{p_{1}p_{2}\cdots p_{r}}\right)\\
&=n\prod_{i=1}^n\left(1-\frac{1}{p_i}\right).
\end{align}
</math>
}}

== Reference ==
* ''Stanley,'' Enumerative Combinatorics, Volume 1, Chapter 2.
* "van Lin and Wilson", A course in combinatorics, Chapter 10, 15.

Combinatorics (Fall 2010)/Generating functions

2010-09-11T06:03:26Z

210.28.131.82: /* Fibonacci numbers */

== Generating Functions ==
In Stanley's magnificent book ''Enumerative Combinatorics'', he comments the generating function as "the most useful but most difficult to understand method (for counting)".

The solution to a counting problem is usually represented as some <math>a_n</math> depending a parameter <math>n</math>. Sometimes this <math>a_n</math> is called a ''counting function'' as it is a function of the parameter <math>n</math>. <math>a_n</math> can also be treated as a infinite series:
:<math>a_0,a_1,a_2,\ldots</math>

The '''ordinary generating function (OGF)''' defined by <math>a_n</math> is
:<math>
G(x)=\sum_{n\ge 0} a_nx^n.
</math>

So <math>G(x)=a_0+a_1x+a_2x^2+\cdots</math>. An expression in this form is called a [http://en.wikipedia.org/wiki/Formal_power_series '''formal power series'''], and <math>a_0,a_1,a_2,\ldots</math> is the sequence of '''coefficients'''.

Furthermore, the generating function can be expanded as
:G(x)=<math>(\underbrace{1+\cdots+1}_{a_0})+(\underbrace{x+\cdots+x}_{a_1})+(\underbrace{x^2+\cdots+x^2}_{a_2})+\cdots+(\underbrace{x^n+\cdots+x^n}_{a_n})+\cdots</math>
so it indeed "generates" all the possible instances of the objects we want to count.

Usually, we do not evaluate the generating function <math>GF(x)</math> on any particular value. <math>x</math> remains as a '''formal variable''' without assuming any value. The numbers that we want to count are the coefficients carried by the terms in the formal power series. So far the generating function is just another way to represent the sequence
:<math>(a_0,a_1,a_2,\ldots\ldots)</math>.

The true power of generating functions comes from the various algebraic operations that we can perform on these generating functions. We use an example to demonstrate this.

=== Fibonacci numbers ===
Consider the following counting problems.
* Count the number of ways that the nonnegative integer <math>n</math> can be written as a sum of ones and twos (in order).
: The problem asks for the number of compositions of <math>n</math> with summands from <math>\{1,2\}</math>. Formally, we are counting the number of tuples <math>(x_1,x_2,\ldots,x_k)</math> for some <math>k\le n</math> such that <math>x_i\in\{1,2\}</math> and <math>x_1+x_2+\cdots+x_k=n</math>.
: Let <math>F_n</math> be the solution. We observe that a composition either starts with a 1, in which case the rest is a composition of <math>n-1</math>; or starts with a 2, in which case the rest is a composition of <math>n-2</math>. So we have the recursion for <math>F_n</math> that
::<math>F_n=F_{n-1}+F_{n-2}</math>.
* Count the ways to completely cover a <math>2\times n</math> rectangle with <math>2\times 1</math> dominos without any overlaps.
: Dominos are identical <math>2\times 1</math> rectangles, so that only their orientations --- vertical or horizontal matter.
: Let <math>F_n</math> be the solution. It also holds that <math>F_n=F_{n-1}+F_{n-2}</math>. The proof is left as an exercise.

In both problems, the solution is given by <math>F_n</math> which satisfies the following recursion.
:<math>F_n=\begin{cases}
F_{n-1}+F_{n-2} & \mbox{if }n\ge 2,\\
0 & \mbox{if }n=0,\\
1 & \mbox{if }n=1.
\end{cases}</math>

<math>F_n</math> is called the [http://en.wikipedia.org/wiki/Fibonacci_number Fibonacci number].

{{Theorem|Theorem|
::<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)</math>,
:where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.
}}
The quantity <math>\phi=\frac{1+\sqrt{5}}{2}</math> is the so-called [http://en.wikipedia.org/wiki/Golden_ratio golden ratio], a constant with some significance in mathematics and aesthetics.

We now prove this theorem by using generating functions.

The ordinary generating function for the Fibonacci number <math>F_{n}</math> is
:<math>G(x)=\sum_{n\ge 0}F_n x^n</math>.
We have that <math>F_{n}=F_{n-1}+F_{n-2}</math> for <math>n\ge 2</math>, thus
:<math>\begin{align}
G(x)
&=
\sum_{n\ge 0}F_n x^n
&=
x+\sum_{n\ge 2}(F_{n-1}+F_{n-2})x^n.
\end{align}
</math>
For generating functions, there are general ways to generate <math>F_{n-1}</math> and <math>F_{n-2}</math>, or the coefficients with any smaller indices.
:<math>
\begin{align}
xG(x)
&=\sum_{n\ge 0}F_n x^{n+1}=\sum_{n\ge 1}F_{n-1} x^n=\sum_{n\ge 2}F_{n-1} x^n\\
x^2G(x)
&=\sum_{n\ge 0}F_n x^{n+2}=\sum_{n\ge 2}F_{n-2} x^n.
\end{align}
</math>
So we have
:<math>G(x)=x+(x+x^2)G(x)\,</math>,
hence
:<math>G(x)=\frac{x}{1-x-x^2}</math>.
The value of <math>F_n</math> is the coefficient of <math>x^n</math> in the Taylor series for this formular, which is <math>\frac{G^{(n)}(0)}{n!}=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>. Although this expansion works in principle, the detailed calculus is rather painful.

----
There is an easier way to get this coefficient than directly expanding the Taylor series.

<math>1-x-x^2</math> has two roots <math>\frac{-1\pm\sqrt{5}}{2}</math>.

Denote that <math>\phi=\frac{2}{-1+\sqrt{5}}=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{2}{-1-\sqrt{5}}=\frac{1-\sqrt{5}}{2}</math>. Then <math>(1-x-x^2)=(1-\phi x)(1-\hat{\phi}x)</math>, so we can write
:<math>
\begin{align}
\frac{x}{1-x-x^2}
&=\frac{x}{(1-\phi x)(1-\hat{\phi} x)}\\
&=\frac{\alpha}{(1-\phi x)}+\frac{\beta}{(1-\hat{\phi} x)},
\end{align}
</math>
where <math>\alpha</math> and <math>\beta</math> satisfying that
:<math>\begin{cases}
\alpha+\beta=0\\
\alpha\phi+\beta\hat{\phi}= -1.
\end{cases}</math>
Solving this we have that <math>\alpha=\frac{1}{\sqrt{5}}</math> and <math>\beta=-\frac{1}{\sqrt{5}}</math>. And
:<math>G(x)=\frac{x}{1-x-x^2}=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}</math>
where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.

Note that the expression <math>\frac{1}{1-z}</math> has a well known expansion:
:<math>\frac{1}{1-z}=\sum_{n\ge 0}z^n</math>.

Therefore, <math>G(x)</math> can be expanded as
:<math>
\begin{align}
G(x)
&=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}\\
&=\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\phi x)^n-\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\hat{\phi} x)^n\\
&=\sum_{n\ge 0}\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)x^n.
\end{align}</math>
So the <math>n</math>th Fibonacci number is given by
:<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>.

== Solving recurrences ==
In the above analysis of Fibonacci numbers, we apply the following general methodology of solving recurrences by generating functions.
:1. Give a recursion that computes <math>a_n</math>; that is, an equation expressing <math>a_n</math> in terms of other elements of the sequence, such as
::<math>a_n=f(a_0,a_1,\ldots,a_{n-1})</math> for some function <math>f</math>.
:2. Multiply both sides of the equation by <math>x^n</math> and sum over all <math>n</math>. This gives the generating function
::<math>G(x)=\sum_{n\ge 0}a_nx^n=\sum_{n\ge 0}f(a_0,a_1,\ldots,a_{n-1})x^n</math>.
:: And manipulate the right hand side of the equation so that it becomes some other expression involving <math>G(x)</math>.
:3. Solve the resulting equation to derive an explicit formula for <math>G(x)</math>.
:4. Expand <math>G(x)</math> into a power series and read off the coefficient of <math>x^n</math>, which is a closed form for <math>a_n</math>.

=== Algebraic operations on generating functions ===
The second step in the above methodology is somehow tricky. It involves first applying the recurrence to the coefficients of <math>G(x)</math>, which is easy; and then manipulating the resulting formal power series to express it in terms of <math>G(x)</math>, which is more difficult (because it works backwards).

We can apply several natural algebraic operations on the formal power series.

{{Theorem|Generating function manipulation|
:Let <math>G(x)=\sum_{n\ge 0}g_nx^n</math> and <math>F(x)=\sum_{n\ge 0}f_nx^n</math>.
----

::<math>
\begin{align}
x^k G(x)
&= \sum_{n\ge k}g_{n-k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\frac{G(x)-\sum_{i=0}^{k-1}g_iz^i}{x^k}
&=\sum_{n\ge 0}g_{n+k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\alpha F(x)+\beta G(x)
&= \sum_{n\ge 0} (\alpha f_n+\beta g_n)x^n\\
F(x)G(x)
&= \sum_{n\ge 0}\sum_{k=0}^nf_kg_{n-k}x^n\\
G(cx)
&= \sum_{n\ge 0} c^ng_n x^n\\
G'(x)
&=
\sum_{n\ge 0}(n+1)g_{n+1}x^n
\end{align}
</math>
}}

When manipulating generating functions, these rules are applied backwards; that is, from the right-hand-side to the left-hand-side.

=== Expanding generating functions ===
The last step of solving recurrences by generating function is expanding the closed form generating function <math>G(x)</math> to evaluate its <math>n</math>-th coefficient. In principle, we can always use the [http://en.wikipedia.org/wiki/Taylor_series Taylor series]
:<math>G(x)=\sum_{n\ge 0}\frac{G^{(n)}(0)}{n!}x^n</math>,
where <math>G^{(n)}(0)</math> is the value of the <math>n</math>-th derivative of <math>G(x)</math> evaluated at <math>x=0</math>.

Some interesting special cases are very useful.

====Geometric sequence====
In the example of Fibonacci numbers, we use the well known geometric series:
:<math>\frac{1}{1-x}=\sum_{n\ge 0}x^n</math>.
It is useful when we can express the generating function in the form of <math>G(x)=\frac{a_1}{1-b_1x}+\frac{a_2}{1-b_2x}+\cdots+\frac{a_k}{1-b_kx}</math>. The coefficient of <math>x^n</math> in such <math>G(x)</math> is <math>a_1b_1^n+a_2b_2^n+\cdots+a_kb_k^n</math>.

====Binomial theorem====
The <math>n</math>-th derivative of <math>(1+x)^\alpha</math> for some real <math>\alpha</math> is
:<math>\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)(1+x)^{\alpha-n}</math>.
By Taylor series, we get a generalized version of the binomial theorem known as [http://en.wikipedia.org/wiki/Binomial_coefficient#Newton.27s_binomial_series '''Newton's formula''']:
{{Theorem|Newton's formular (generalized binomial theorem)|
If <math>|x|<1</math>, then
:<math>(1+x)^\alpha=\sum_{n\ge 0}{\alpha\choose n}x^{n}</math>,
where <math>{\alpha\choose n}</math> is the '''generalized binomial coefficient''' defined by
:<math>{\alpha\choose n}=\frac{\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)}{n!}</math>.
}}

=== Example: multisets ===
In the last lecture we gave a combinatorial proof of the number of <math>k</math>-multisets on an <math>n</math>-set. Now we give a generating function approach to the problem.

Let <math>S=\{x_1,x_2,\ldots,x_n\}</math> be an <math>n</math>-element set. We have
:<math>(1+x_1+x_1^2+\cdots)(1+x_2+x_2^2+\cdots)\cdots(1+x_n+x_n^2+\cdots)=\sum_{m:S\rightarrow\mathbb{N}} \prod_{x_i\in S}x_i^{m(x_i)}</math>,
where each <math>m:S\rightarrow\mathbb{N}</math> species a possible multiset on <math>S</math> with multiplicity function <math>m</math>.

Let all <math>x_i=x</math>. Then
:<math>
\begin{align}
(1+x+x^2+\cdots)^n
&=
\sum_{m:S\rightarrow\mathbb{N}}x^{m(x_1)+\cdots+m(x_n)}\\
&=
\sum_{\text{multiset }M\text{ on }S}x^{|M|}\\
&=
\sum_{k\ge 0}\left({n\choose k}\right)x^k.
\end{align}
</math>
The last equation is due to the the definition of <math>\left({n\choose k}\right)</math>. Our task is to evaluate <math>\left({n\choose k}\right)</math>.

Due to the geometric sequence and the Newton's formula
:<math>
(1+x+x^2+\cdots)^n=(1-x)^{-n}=\sum_{k\ge 0}{-n\choose k}(-x)^k.
</math>
So
:<math>
\left({n\choose k}\right)=(-1)^k{-n\choose k}={n+k-1\choose k}.
</math>
The last equation is due to the definition of the generalized binomial coefficient. We use an analytic (generating function) proof to get the same result of <math>\left({n\choose k}\right)</math> as the combinatorial proof.

== Pólya's problem of changing money ==

== Catalan Number ==
We now introduce a class of counting problems, all with the same solution, called [http://en.wikipedia.org/wiki/Catalan_number '''Catalan number'''].

The <math>n</math>th Catalan number is denoted as <math>C_n</math>.
In Volume 2 of Stanley's ''Enumerative Combinatorics'', a set of exercises describe 66 different interpretations of the Catalan numbers. We give a few examples, cited by Wikipedia.
* ''C''<sub>''n''</sub> is the number of '''Dyck words''' of length 2''n''. A Dyck word is a string consisting of ''n'' X's and ''n'' Y's such that no initial segment of the string has more Y's than X's (see also [http://en.wikipedia.org/wiki/Dyck_language Dyck language]). For example, the following are the Dyck words of length 6:
<div class="center"><big> XXXYYY     XYXXYY     XYXYXY     XXYYXY     XXYXYY.</big></div>

* Re-interpreting the symbol X as an open parenthesis and Y as a close parenthesis, ''C''<sub>''n''</sub> counts the number of expressions containing ''n'' pairs of parentheses which are correctly matched:
<div class="center"><big> ((()))     ()(())     ()()()     (())()     (()()) </big></div>

* ''C''<sub>''n''</sub> is the number of different ways ''n'' + 1 factors can be completely parenthesized (or the number of ways of associating ''n'' applications of a '''binary operator'''). For ''n'' = 3, for example, we have the following five different parenthesizations of four factors:
<div class="center"><math>((ab)c)d \quad (a(bc))d \quad(ab)(cd) \quad a((bc)d) \quad a(b(cd))</math></div>

* Successive applications of a binary operator can be represented in terms of a '''full binary tree'''. (A rooted binary tree is ''full'' if every vertex has either two children or no children.) It follows that ''C''<sub>''n''</sub> is the number of full binary trees with ''n'' + 1 leaves:
[[Image:Catalan number binary tree example.png|center]]

* ''C''<sub>''n''</sub> is the number of '''monotonic paths''' along the edges of a grid with ''n'' × ''n'' square cells, which do not pass above the diagonal. A monotonic path is one which starts in the lower left corner, finishes in the upper right corner, and consists entirely of edges pointing rightwards or upwards. Counting such paths is equivalent to counting Dyck words: X stands for "move right" and Y stands for "move up". The following diagrams show the case ''n'' = 4:
[[Image:Catalan number 4x4 grid example.svg.png|450px|center]]

* ''C''<sub>''n''</sub> is the number of different ways a [http://en.wikipedia.org/wiki/Convex_polygon '''convex polygon'''] with ''n'' + 2 sides can be cut into '''triangles''' by connecting vertices with straight lines. The following hexagons illustrate the case ''n'' = 4:
[[Image:Catalan-Hexagons-example.png|400px|center]]

* ''C''<sub>''n''</sub> is the number of [http://en.wikipedia.org/wiki/Stack_(data_structure) '''stack''']-sortable permutations of {1, ..., ''n''}. A permutation ''w'' is called '''stack-sortable''' if ''S''(''w'') = (1, ..., ''n''), where ''S''(''w'') is defined recursively as follows: write ''w'' = ''unv'' where ''n'' is the largest element in ''w'' and ''u'' and ''v'' are shorter sequences, and set ''S''(''w'') = ''S''(''u'')''S''(''v'')''n'', with ''S'' being the identity for one-element sequences.

* ''C''<sub>''n''</sub> is the number of ways to tile a stairstep shape of height ''n'' with ''n'' rectangles. The following figure illustrates the case ''n'' = 4:
[[Image:Catalan stairsteps 4.png|400px|center]]

{{Theorem|Recurrence relation for Catalan numbers|
:<math>C_0=0</math>, <math>C_1=1</math>, and for <math>n>1</math>,
::<math>
C_n=\sum_{i=1}^{n-1}C_iC_{n-i}.
</math>
}}

Let <math>G(x)=\sum_{n\ge 0}C_nx^n</math> be the generating function. Apply the product rule,
:<math>G(x)^2=\sum_{n\ge 0}\sum_{k=0}^{n}C_kC_{n-k}x^n=\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n</math>.
Due to the recurrence,
:<math>G(x)=\sum_{n\ge 0}C_nx^n=x+\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n=x+G(x)^2</math>.
Solving this, we obtain
:<math>G(x)=\frac{1\pm(1-4x)^{1/2}}{2}</math>.
Because <math>C_0=0</math>, it must hold that <math>G(x)=\frac{1-(1-4x)^{1/2}}{2}</math>, or otherwise the constant term is not zero. Expanding <math>(1-4x)^{1/2}</math> by Newton's formula, we have
:<math>
\begin{align}
G(x)
&=
\frac{1-(1-4x)^{1/2}}{2}\\
&=
1-\frac{1}{2}\sum_{n\ge 0}{1/2\choose n}(-4x)^n
\end{align}
</math>
Thus,
:<math>
\begin{align}
C_n
&=-\frac{1}{2}{1/2\choose n}(-4)^n\\
&=-\frac{1}{2}\cdot\frac{1}{2}\cdot\frac{-1}{2}\cdot\frac{-3}{2}\cdots\frac{-(2n-3)}{2}\cdot(-4)^n/n!\\
&=\frac{(2n-2)!}{(n-1)!n!}\\
&=\frac{1}{n}{2n-2\choose n-1}.
\end{align}
</math>
So we prove the following closed form for Catalan number.
{{Theorem|Theorem|
:<math>C_n=\frac{1}{n}{2n-2\choose n-1}</math>.
}}

Combinatorics (Fall 2010)/Generating functions

2010-09-11T06:01:33Z

210.28.131.82: /* Fibonacci numbers */

== Generating Functions ==
In Stanley's magnificent book ''Enumerative Combinatorics'', he comments the generating function as "the most useful but most difficult to understand method (for counting)".

The solution to a counting problem is usually represented as some <math>a_n</math> depending a parameter <math>n</math>. Sometimes this <math>a_n</math> is called a ''counting function'' as it is a function of the parameter <math>n</math>. <math>a_n</math> can also be treated as a infinite series:
:<math>a_0,a_1,a_2,\ldots</math>

The '''ordinary generating function (OGF)''' defined by <math>a_n</math> is
:<math>
G(x)=\sum_{n\ge 0} a_nx^n.
</math>

So <math>G(x)=a_0+a_1x+a_2x^2+\cdots</math>. An expression in this form is called a [http://en.wikipedia.org/wiki/Formal_power_series '''formal power series'''], and <math>a_0,a_1,a_2,\ldots</math> is the sequence of '''coefficients'''.

Furthermore, the generating function can be expanded as
:G(x)=<math>(\underbrace{1+\cdots+1}_{a_0})+(\underbrace{x+\cdots+x}_{a_1})+(\underbrace{x^2+\cdots+x^2}_{a_2})+\cdots+(\underbrace{x^n+\cdots+x^n}_{a_n})+\cdots</math>
so it indeed "generates" all the possible instances of the objects we want to count.

Usually, we do not evaluate the generating function <math>GF(x)</math> on any particular value. <math>x</math> remains as a '''formal variable''' without assuming any value. The numbers that we want to count are the coefficients carried by the terms in the formal power series. So far the generating function is just another way to represent the sequence
:<math>(a_0,a_1,a_2,\ldots\ldots)</math>.

The true power of generating functions comes from the various algebraic operations that we can perform on these generating functions. We use an example to demonstrate this.

=== Fibonacci numbers ===
Consider the following counting problems.
* Count the number of ways that the nonnegative integer <math>n</math> can be written as a sum of ones and twos (in order).
: The problem asks for the number of compositions of <math>n</math> with summands from <math>\{1,2\}</math>. Formally, we are counting the number of tuples <math>(x_1,x_2,\ldots,x_k)</math> for some <math>k\le n</math> such that <math>x_i\in\{1,2\}</math> and <math>\sum_{i=1}^k x_i=n</math>.
: Let <math>F_n</math> be the solution. We observe that a composition either starts with a 1, in which case the rest is a composition of <math>n-1</math>; or starts with a 2, in which case the rest is a composition of <math>n-2</math>. So we have the recursion for <math>F_n</math> that
::<math>F_n=F_{n-1}+F_{n-2}</math>.
* Count the ways to completely cover a <math>2\times n</math> rectangle with <math>2\times 1</math> dominos without any overlaps.
: Dominos are identical <math>2\times 1</math> rectangles, so that only their orientations --- vertical or horizontal matter.
: Let <math>F_n</math> be the solution. It also holds that <math>F_n=F_{n-1}+F_{n-2}</math>. The proof is left as an exercise.

In both problems, the solution is given by <math>F_n</math> which satisfies the following recursion.
:<math>F_n=\begin{cases}
F_{n-1}+F_{n-2} & \mbox{if }n\ge 2,\\
0 & \mbox{if }n=0,\\
1 & \mbox{if }n=1.
\end{cases}</math>

<math>F_n</math> is called the [http://en.wikipedia.org/wiki/Fibonacci_number Fibonacci number].

{{Theorem|Theorem|
::<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)</math>,
:where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.
}}
The quantity <math>\phi=\frac{1+\sqrt{5}}{2}</math> is the so-called [http://en.wikipedia.org/wiki/Golden_ratio golden ratio], a constant with some significance in mathematics and aesthetics.

We now prove this theorem by using generating functions.

The ordinary generating function for the Fibonacci number <math>F_{n}</math> is
:<math>G(x)=\sum_{n\ge 0}F_n x^n</math>.
We have that <math>F_{n}=F_{n-1}+F_{n-2}</math> for <math>n\ge 2</math>, thus
:<math>\begin{align}
G(x)
&=
\sum_{n\ge 0}F_n x^n
&=
x+\sum_{n\ge 2}(F_{n-1}+F_{n-2})x^n.
\end{align}
</math>
For generating functions, there are general ways to generate <math>F_{n-1}</math> and <math>F_{n-2}</math>, or the coefficients with any smaller indices.
:<math>
\begin{align}
xG(x)
&=\sum_{n\ge 0}F_n x^{n+1}=\sum_{n\ge 1}F_{n-1} x^n=\sum_{n\ge 2}F_{n-1} x^n\\
x^2G(x)
&=\sum_{n\ge 0}F_n x^{n+2}=\sum_{n\ge 2}F_{n-2} x^n.
\end{align}
</math>
So we have
:<math>G(x)=x+(x+x^2)G(x)\,</math>,
hence
:<math>G(x)=\frac{x}{1-x-x^2}</math>.
The value of <math>F_n</math> is the coefficient of <math>x^n</math> in the Taylor series for this formular, which is <math>\frac{G^{(n)}(0)}{n!}=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>. Although this expansion works in principle, the detailed calculus is rather painful.

----
There is an easier way to get this coefficient than directly expanding the Taylor series.

<math>1-x-x^2</math> has two roots <math>\frac{-1\pm\sqrt{5}}{2}</math>.

Denote that <math>\phi=\frac{2}{-1+\sqrt{5}}=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{2}{-1-\sqrt{5}}=\frac{1-\sqrt{5}}{2}</math>. Then <math>(1-x-x^2)=(1-\phi x)(1-\hat{\phi}x)</math>, so we can write
:<math>
\begin{align}
\frac{x}{1-x-x^2}
&=\frac{x}{(1-\phi x)(1-\hat{\phi} x)}\\
&=\frac{\alpha}{(1-\phi x)}+\frac{\beta}{(1-\hat{\phi} x)},
\end{align}
</math>
where <math>\alpha</math> and <math>\beta</math> satisfying that
:<math>\begin{cases}
\alpha+\beta=0\\
\alpha\phi+\beta\hat{\phi}= -1.
\end{cases}</math>
Solving this we have that <math>\alpha=\frac{1}{\sqrt{5}}</math> and <math>\beta=-\frac{1}{\sqrt{5}}</math>. And
:<math>G(x)=\frac{x}{1-x-x^2}=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}</math>
where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.

Note that the expression <math>\frac{1}{1-z}</math> has a well known expansion:
:<math>\frac{1}{1-z}=\sum_{n\ge 0}z^n</math>.

Therefore, <math>G(x)</math> can be expanded as
:<math>
\begin{align}
G(x)
&=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}\\
&=\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\phi x)^n-\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\hat{\phi} x)^n\\
&=\sum_{n\ge 0}\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)x^n.
\end{align}</math>
So the <math>n</math>th Fibonacci number is given by
:<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>.

== Solving recurrences ==
In the above analysis of Fibonacci numbers, we apply the following general methodology of solving recurrences by generating functions.
:1. Give a recursion that computes <math>a_n</math>; that is, an equation expressing <math>a_n</math> in terms of other elements of the sequence, such as
::<math>a_n=f(a_0,a_1,\ldots,a_{n-1})</math> for some function <math>f</math>.
:2. Multiply both sides of the equation by <math>x^n</math> and sum over all <math>n</math>. This gives the generating function
::<math>G(x)=\sum_{n\ge 0}a_nx^n=\sum_{n\ge 0}f(a_0,a_1,\ldots,a_{n-1})x^n</math>.
:: And manipulate the right hand side of the equation so that it becomes some other expression involving <math>G(x)</math>.
:3. Solve the resulting equation to derive an explicit formula for <math>G(x)</math>.
:4. Expand <math>G(x)</math> into a power series and read off the coefficient of <math>x^n</math>, which is a closed form for <math>a_n</math>.

=== Algebraic operations on generating functions ===
The second step in the above methodology is somehow tricky. It involves first applying the recurrence to the coefficients of <math>G(x)</math>, which is easy; and then manipulating the resulting formal power series to express it in terms of <math>G(x)</math>, which is more difficult (because it works backwards).

We can apply several natural algebraic operations on the formal power series.

{{Theorem|Generating function manipulation|
:Let <math>G(x)=\sum_{n\ge 0}g_nx^n</math> and <math>F(x)=\sum_{n\ge 0}f_nx^n</math>.
----

::<math>
\begin{align}
x^k G(x)
&= \sum_{n\ge k}g_{n-k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\frac{G(x)-\sum_{i=0}^{k-1}g_iz^i}{x^k}
&=\sum_{n\ge 0}g_{n+k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\alpha F(x)+\beta G(x)
&= \sum_{n\ge 0} (\alpha f_n+\beta g_n)x^n\\
F(x)G(x)
&= \sum_{n\ge 0}\sum_{k=0}^nf_kg_{n-k}x^n\\
G(cx)
&= \sum_{n\ge 0} c^ng_n x^n\\
G'(x)
&=
\sum_{n\ge 0}(n+1)g_{n+1}x^n
\end{align}
</math>
}}

When manipulating generating functions, these rules are applied backwards; that is, from the right-hand-side to the left-hand-side.

=== Expanding generating functions ===
The last step of solving recurrences by generating function is expanding the closed form generating function <math>G(x)</math> to evaluate its <math>n</math>-th coefficient. In principle, we can always use the [http://en.wikipedia.org/wiki/Taylor_series Taylor series]
:<math>G(x)=\sum_{n\ge 0}\frac{G^{(n)}(0)}{n!}x^n</math>,
where <math>G^{(n)}(0)</math> is the value of the <math>n</math>-th derivative of <math>G(x)</math> evaluated at <math>x=0</math>.

Some interesting special cases are very useful.

====Geometric sequence====
In the example of Fibonacci numbers, we use the well known geometric series:
:<math>\frac{1}{1-x}=\sum_{n\ge 0}x^n</math>.
It is useful when we can express the generating function in the form of <math>G(x)=\frac{a_1}{1-b_1x}+\frac{a_2}{1-b_2x}+\cdots+\frac{a_k}{1-b_kx}</math>. The coefficient of <math>x^n</math> in such <math>G(x)</math> is <math>a_1b_1^n+a_2b_2^n+\cdots+a_kb_k^n</math>.

====Binomial theorem====
The <math>n</math>-th derivative of <math>(1+x)^\alpha</math> for some real <math>\alpha</math> is
:<math>\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)(1+x)^{\alpha-n}</math>.
By Taylor series, we get a generalized version of the binomial theorem known as [http://en.wikipedia.org/wiki/Binomial_coefficient#Newton.27s_binomial_series '''Newton's formula''']:
{{Theorem|Newton's formular (generalized binomial theorem)|
If <math>|x|<1</math>, then
:<math>(1+x)^\alpha=\sum_{n\ge 0}{\alpha\choose n}x^{n}</math>,
where <math>{\alpha\choose n}</math> is the '''generalized binomial coefficient''' defined by
:<math>{\alpha\choose n}=\frac{\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)}{n!}</math>.
}}

=== Example: multisets ===
In the last lecture we gave a combinatorial proof of the number of <math>k</math>-multisets on an <math>n</math>-set. Now we give a generating function approach to the problem.

Let <math>S=\{x_1,x_2,\ldots,x_n\}</math> be an <math>n</math>-element set. We have
:<math>(1+x_1+x_1^2+\cdots)(1+x_2+x_2^2+\cdots)\cdots(1+x_n+x_n^2+\cdots)=\sum_{m:S\rightarrow\mathbb{N}} \prod_{x_i\in S}x_i^{m(x_i)}</math>,
where each <math>m:S\rightarrow\mathbb{N}</math> species a possible multiset on <math>S</math> with multiplicity function <math>m</math>.

Let all <math>x_i=x</math>. Then
:<math>
\begin{align}
(1+x+x^2+\cdots)^n
&=
\sum_{m:S\rightarrow\mathbb{N}}x^{m(x_1)+\cdots+m(x_n)}\\
&=
\sum_{\text{multiset }M\text{ on }S}x^{|M|}\\
&=
\sum_{k\ge 0}\left({n\choose k}\right)x^k.
\end{align}
</math>
The last equation is due to the the definition of <math>\left({n\choose k}\right)</math>. Our task is to evaluate <math>\left({n\choose k}\right)</math>.

Due to the geometric sequence and the Newton's formula
:<math>
(1+x+x^2+\cdots)^n=(1-x)^{-n}=\sum_{k\ge 0}{-n\choose k}(-x)^k.
</math>
So
:<math>
\left({n\choose k}\right)=(-1)^k{-n\choose k}={n+k-1\choose k}.
</math>
The last equation is due to the definition of the generalized binomial coefficient. We use an analytic (generating function) proof to get the same result of <math>\left({n\choose k}\right)</math> as the combinatorial proof.

== Pólya's problem of changing money ==

== Catalan Number ==
We now introduce a class of counting problems, all with the same solution, called [http://en.wikipedia.org/wiki/Catalan_number '''Catalan number'''].

The <math>n</math>th Catalan number is denoted as <math>C_n</math>.
In Volume 2 of Stanley's ''Enumerative Combinatorics'', a set of exercises describe 66 different interpretations of the Catalan numbers. We give a few examples, cited by Wikipedia.
* ''C''<sub>''n''</sub> is the number of '''Dyck words''' of length 2''n''. A Dyck word is a string consisting of ''n'' X's and ''n'' Y's such that no initial segment of the string has more Y's than X's (see also [http://en.wikipedia.org/wiki/Dyck_language Dyck language]). For example, the following are the Dyck words of length 6:
<div class="center"><big> XXXYYY     XYXXYY     XYXYXY     XXYYXY     XXYXYY.</big></div>

* Re-interpreting the symbol X as an open parenthesis and Y as a close parenthesis, ''C''<sub>''n''</sub> counts the number of expressions containing ''n'' pairs of parentheses which are correctly matched:
<div class="center"><big> ((()))     ()(())     ()()()     (())()     (()()) </big></div>

* ''C''<sub>''n''</sub> is the number of different ways ''n'' + 1 factors can be completely parenthesized (or the number of ways of associating ''n'' applications of a '''binary operator'''). For ''n'' = 3, for example, we have the following five different parenthesizations of four factors:
<div class="center"><math>((ab)c)d \quad (a(bc))d \quad(ab)(cd) \quad a((bc)d) \quad a(b(cd))</math></div>

* Successive applications of a binary operator can be represented in terms of a '''full binary tree'''. (A rooted binary tree is ''full'' if every vertex has either two children or no children.) It follows that ''C''<sub>''n''</sub> is the number of full binary trees with ''n'' + 1 leaves:
[[Image:Catalan number binary tree example.png|center]]

* ''C''<sub>''n''</sub> is the number of '''monotonic paths''' along the edges of a grid with ''n'' × ''n'' square cells, which do not pass above the diagonal. A monotonic path is one which starts in the lower left corner, finishes in the upper right corner, and consists entirely of edges pointing rightwards or upwards. Counting such paths is equivalent to counting Dyck words: X stands for "move right" and Y stands for "move up". The following diagrams show the case ''n'' = 4:
[[Image:Catalan number 4x4 grid example.svg.png|450px|center]]

* ''C''<sub>''n''</sub> is the number of different ways a [http://en.wikipedia.org/wiki/Convex_polygon '''convex polygon'''] with ''n'' + 2 sides can be cut into '''triangles''' by connecting vertices with straight lines. The following hexagons illustrate the case ''n'' = 4:
[[Image:Catalan-Hexagons-example.png|400px|center]]

* ''C''<sub>''n''</sub> is the number of [http://en.wikipedia.org/wiki/Stack_(data_structure) '''stack''']-sortable permutations of {1, ..., ''n''}. A permutation ''w'' is called '''stack-sortable''' if ''S''(''w'') = (1, ..., ''n''), where ''S''(''w'') is defined recursively as follows: write ''w'' = ''unv'' where ''n'' is the largest element in ''w'' and ''u'' and ''v'' are shorter sequences, and set ''S''(''w'') = ''S''(''u'')''S''(''v'')''n'', with ''S'' being the identity for one-element sequences.

* ''C''<sub>''n''</sub> is the number of ways to tile a stairstep shape of height ''n'' with ''n'' rectangles. The following figure illustrates the case ''n'' = 4:
[[Image:Catalan stairsteps 4.png|400px|center]]

{{Theorem|Recurrence relation for Catalan numbers|
:<math>C_0=0</math>, <math>C_1=1</math>, and for <math>n>1</math>,
::<math>
C_n=\sum_{i=1}^{n-1}C_iC_{n-i}.
</math>
}}

Let <math>G(x)=\sum_{n\ge 0}C_nx^n</math> be the generating function. Apply the product rule,
:<math>G(x)^2=\sum_{n\ge 0}\sum_{k=0}^{n}C_kC_{n-k}x^n=\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n</math>.
Due to the recurrence,
:<math>G(x)=\sum_{n\ge 0}C_nx^n=x+\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n=x+G(x)^2</math>.
Solving this, we obtain
:<math>G(x)=\frac{1\pm(1-4x)^{1/2}}{2}</math>.
Because <math>C_0=0</math>, it must hold that <math>G(x)=\frac{1-(1-4x)^{1/2}}{2}</math>, or otherwise the constant term is not zero. Expanding <math>(1-4x)^{1/2}</math> by Newton's formula, we have
:<math>
\begin{align}
G(x)
&=
\frac{1-(1-4x)^{1/2}}{2}\\
&=
1-\frac{1}{2}\sum_{n\ge 0}{1/2\choose n}(-4x)^n
\end{align}
</math>
Thus,
:<math>
\begin{align}
C_n
&=-\frac{1}{2}{1/2\choose n}(-4)^n\\
&=-\frac{1}{2}\cdot\frac{1}{2}\cdot\frac{-1}{2}\cdot\frac{-3}{2}\cdots\frac{-(2n-3)}{2}\cdot(-4)^n/n!\\
&=\frac{(2n-2)!}{(n-1)!n!}\\
&=\frac{1}{n}{2n-2\choose n-1}.
\end{align}
</math>
So we prove the following closed form for Catalan number.
{{Theorem|Theorem|
:<math>C_n=\frac{1}{n}{2n-2\choose n-1}</math>.
}}

Combinatorics (Fall 2010)/Generating functions

2010-09-11T06:01:22Z

210.28.131.82: /* Fibonacci numbers */

== Generating Functions ==
In Stanley's magnificent book ''Enumerative Combinatorics'', he comments the generating function as "the most useful but most difficult to understand method (for counting)".

The solution to a counting problem is usually represented as some <math>a_n</math> depending a parameter <math>n</math>. Sometimes this <math>a_n</math> is called a ''counting function'' as it is a function of the parameter <math>n</math>. <math>a_n</math> can also be treated as a infinite series:
:<math>a_0,a_1,a_2,\ldots</math>

The '''ordinary generating function (OGF)''' defined by <math>a_n</math> is
:<math>
G(x)=\sum_{n\ge 0} a_nx^n.
</math>

So <math>G(x)=a_0+a_1x+a_2x^2+\cdots</math>. An expression in this form is called a [http://en.wikipedia.org/wiki/Formal_power_series '''formal power series'''], and <math>a_0,a_1,a_2,\ldots</math> is the sequence of '''coefficients'''.

Furthermore, the generating function can be expanded as
:G(x)=<math>(\underbrace{1+\cdots+1}_{a_0})+(\underbrace{x+\cdots+x}_{a_1})+(\underbrace{x^2+\cdots+x^2}_{a_2})+\cdots+(\underbrace{x^n+\cdots+x^n}_{a_n})+\cdots</math>
so it indeed "generates" all the possible instances of the objects we want to count.

Usually, we do not evaluate the generating function <math>GF(x)</math> on any particular value. <math>x</math> remains as a '''formal variable''' without assuming any value. The numbers that we want to count are the coefficients carried by the terms in the formal power series. So far the generating function is just another way to represent the sequence
:<math>(a_0,a_1,a_2,\ldots\ldots)</math>.

The true power of generating functions comes from the various algebraic operations that we can perform on these generating functions. We use an example to demonstrate this.

=== Fibonacci numbers ===
Consider the following counting problems.
* Count the number of ways that the nonnegative integer <math>n</math> can be written as a sum of ones and twos (in order).
: The problem asks for the number of compositions of <math>n</math> with summands from <math>\{1,2\}</math>. Formally, we are counting the number of tuples <math>(x_1,x_2,\ldots,x_k)</math> for some <math>k\le n</math> such that <math>x_i\in\{1,2\}</math> and <math>\sum_{i=1}^k x_i=n</math>.
: Let <math>F_n</math> be the solution. We observe that a composition either starts with a 1, in which case the rest is a composition of <math>n-1</math>; or starts with a 2, in which case the rest is a composition of <math>n-2</math>. So we have the recursion for <math>F_n</math> that
::<math>F_n=F_{n-1}+F_{n-2}</math>.
* Count the ways to completely cover a <math>2\times n</math> rectangle with <math>2\times 1</math> dominos without any overlaps.
: Dominos are identical <math>2\times 1</math> rectangles, so that only their orientations --- vertical or horizontal matter.
: Let <math>F_n</math> be the solution. It also holds that <math>F_n=F_{n-1}+F_{n-2}</math>. The proof is left as an exercise.

In both problems, the solution is given by <math>F_n</math> which satisfies the following recursion.
:<math>F_n=\begin{cases}
F_{n-1}+F_{n-2} & \mbox{if}n\ge 2,\\
0 & \mbox{if }n=0,\\
1 & \mbox{if }n=1.
\end{cases}</math>

<math>F_n</math> is called the [http://en.wikipedia.org/wiki/Fibonacci_number Fibonacci number].

{{Theorem|Theorem|
::<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)</math>,
:where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.
}}
The quantity <math>\phi=\frac{1+\sqrt{5}}{2}</math> is the so-called [http://en.wikipedia.org/wiki/Golden_ratio golden ratio], a constant with some significance in mathematics and aesthetics.

We now prove this theorem by using generating functions.

The ordinary generating function for the Fibonacci number <math>F_{n}</math> is
:<math>G(x)=\sum_{n\ge 0}F_n x^n</math>.
We have that <math>F_{n}=F_{n-1}+F_{n-2}</math> for <math>n\ge 2</math>, thus
:<math>\begin{align}
G(x)
&=
\sum_{n\ge 0}F_n x^n
&=
x+\sum_{n\ge 2}(F_{n-1}+F_{n-2})x^n.
\end{align}
</math>
For generating functions, there are general ways to generate <math>F_{n-1}</math> and <math>F_{n-2}</math>, or the coefficients with any smaller indices.
:<math>
\begin{align}
xG(x)
&=\sum_{n\ge 0}F_n x^{n+1}=\sum_{n\ge 1}F_{n-1} x^n=\sum_{n\ge 2}F_{n-1} x^n\\
x^2G(x)
&=\sum_{n\ge 0}F_n x^{n+2}=\sum_{n\ge 2}F_{n-2} x^n.
\end{align}
</math>
So we have
:<math>G(x)=x+(x+x^2)G(x)\,</math>,
hence
:<math>G(x)=\frac{x}{1-x-x^2}</math>.
The value of <math>F_n</math> is the coefficient of <math>x^n</math> in the Taylor series for this formular, which is <math>\frac{G^{(n)}(0)}{n!}=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>. Although this expansion works in principle, the detailed calculus is rather painful.

----
There is an easier way to get this coefficient than directly expanding the Taylor series.

<math>1-x-x^2</math> has two roots <math>\frac{-1\pm\sqrt{5}}{2}</math>.

Denote that <math>\phi=\frac{2}{-1+\sqrt{5}}=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{2}{-1-\sqrt{5}}=\frac{1-\sqrt{5}}{2}</math>. Then <math>(1-x-x^2)=(1-\phi x)(1-\hat{\phi}x)</math>, so we can write
:<math>
\begin{align}
\frac{x}{1-x-x^2}
&=\frac{x}{(1-\phi x)(1-\hat{\phi} x)}\\
&=\frac{\alpha}{(1-\phi x)}+\frac{\beta}{(1-\hat{\phi} x)},
\end{align}
</math>
where <math>\alpha</math> and <math>\beta</math> satisfying that
:<math>\begin{cases}
\alpha+\beta=0\\
\alpha\phi+\beta\hat{\phi}= -1.
\end{cases}</math>
Solving this we have that <math>\alpha=\frac{1}{\sqrt{5}}</math> and <math>\beta=-\frac{1}{\sqrt{5}}</math>. And
:<math>G(x)=\frac{x}{1-x-x^2}=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}</math>
where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.

Note that the expression <math>\frac{1}{1-z}</math> has a well known expansion:
:<math>\frac{1}{1-z}=\sum_{n\ge 0}z^n</math>.

Therefore, <math>G(x)</math> can be expanded as
:<math>
\begin{align}
G(x)
&=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}\\
&=\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\phi x)^n-\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\hat{\phi} x)^n\\
&=\sum_{n\ge 0}\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)x^n.
\end{align}</math>
So the <math>n</math>th Fibonacci number is given by
:<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>.

== Solving recurrences ==
In the above analysis of Fibonacci numbers, we apply the following general methodology of solving recurrences by generating functions.
:1. Give a recursion that computes <math>a_n</math>; that is, an equation expressing <math>a_n</math> in terms of other elements of the sequence, such as
::<math>a_n=f(a_0,a_1,\ldots,a_{n-1})</math> for some function <math>f</math>.
:2. Multiply both sides of the equation by <math>x^n</math> and sum over all <math>n</math>. This gives the generating function
::<math>G(x)=\sum_{n\ge 0}a_nx^n=\sum_{n\ge 0}f(a_0,a_1,\ldots,a_{n-1})x^n</math>.
:: And manipulate the right hand side of the equation so that it becomes some other expression involving <math>G(x)</math>.
:3. Solve the resulting equation to derive an explicit formula for <math>G(x)</math>.
:4. Expand <math>G(x)</math> into a power series and read off the coefficient of <math>x^n</math>, which is a closed form for <math>a_n</math>.

=== Algebraic operations on generating functions ===
The second step in the above methodology is somehow tricky. It involves first applying the recurrence to the coefficients of <math>G(x)</math>, which is easy; and then manipulating the resulting formal power series to express it in terms of <math>G(x)</math>, which is more difficult (because it works backwards).

We can apply several natural algebraic operations on the formal power series.

{{Theorem|Generating function manipulation|
:Let <math>G(x)=\sum_{n\ge 0}g_nx^n</math> and <math>F(x)=\sum_{n\ge 0}f_nx^n</math>.
----

::<math>
\begin{align}
x^k G(x)
&= \sum_{n\ge k}g_{n-k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\frac{G(x)-\sum_{i=0}^{k-1}g_iz^i}{x^k}
&=\sum_{n\ge 0}g_{n+k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\alpha F(x)+\beta G(x)
&= \sum_{n\ge 0} (\alpha f_n+\beta g_n)x^n\\
F(x)G(x)
&= \sum_{n\ge 0}\sum_{k=0}^nf_kg_{n-k}x^n\\
G(cx)
&= \sum_{n\ge 0} c^ng_n x^n\\
G'(x)
&=
\sum_{n\ge 0}(n+1)g_{n+1}x^n
\end{align}
</math>
}}

When manipulating generating functions, these rules are applied backwards; that is, from the right-hand-side to the left-hand-side.

=== Expanding generating functions ===
The last step of solving recurrences by generating function is expanding the closed form generating function <math>G(x)</math> to evaluate its <math>n</math>-th coefficient. In principle, we can always use the [http://en.wikipedia.org/wiki/Taylor_series Taylor series]
:<math>G(x)=\sum_{n\ge 0}\frac{G^{(n)}(0)}{n!}x^n</math>,
where <math>G^{(n)}(0)</math> is the value of the <math>n</math>-th derivative of <math>G(x)</math> evaluated at <math>x=0</math>.

Some interesting special cases are very useful.

====Geometric sequence====
In the example of Fibonacci numbers, we use the well known geometric series:
:<math>\frac{1}{1-x}=\sum_{n\ge 0}x^n</math>.
It is useful when we can express the generating function in the form of <math>G(x)=\frac{a_1}{1-b_1x}+\frac{a_2}{1-b_2x}+\cdots+\frac{a_k}{1-b_kx}</math>. The coefficient of <math>x^n</math> in such <math>G(x)</math> is <math>a_1b_1^n+a_2b_2^n+\cdots+a_kb_k^n</math>.

====Binomial theorem====
The <math>n</math>-th derivative of <math>(1+x)^\alpha</math> for some real <math>\alpha</math> is
:<math>\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)(1+x)^{\alpha-n}</math>.
By Taylor series, we get a generalized version of the binomial theorem known as [http://en.wikipedia.org/wiki/Binomial_coefficient#Newton.27s_binomial_series '''Newton's formula''']:
{{Theorem|Newton's formular (generalized binomial theorem)|
If <math>|x|<1</math>, then
:<math>(1+x)^\alpha=\sum_{n\ge 0}{\alpha\choose n}x^{n}</math>,
where <math>{\alpha\choose n}</math> is the '''generalized binomial coefficient''' defined by
:<math>{\alpha\choose n}=\frac{\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)}{n!}</math>.
}}

=== Example: multisets ===
In the last lecture we gave a combinatorial proof of the number of <math>k</math>-multisets on an <math>n</math>-set. Now we give a generating function approach to the problem.

Let <math>S=\{x_1,x_2,\ldots,x_n\}</math> be an <math>n</math>-element set. We have
:<math>(1+x_1+x_1^2+\cdots)(1+x_2+x_2^2+\cdots)\cdots(1+x_n+x_n^2+\cdots)=\sum_{m:S\rightarrow\mathbb{N}} \prod_{x_i\in S}x_i^{m(x_i)}</math>,
where each <math>m:S\rightarrow\mathbb{N}</math> species a possible multiset on <math>S</math> with multiplicity function <math>m</math>.

Let all <math>x_i=x</math>. Then
:<math>
\begin{align}
(1+x+x^2+\cdots)^n
&=
\sum_{m:S\rightarrow\mathbb{N}}x^{m(x_1)+\cdots+m(x_n)}\\
&=
\sum_{\text{multiset }M\text{ on }S}x^{|M|}\\
&=
\sum_{k\ge 0}\left({n\choose k}\right)x^k.
\end{align}
</math>
The last equation is due to the the definition of <math>\left({n\choose k}\right)</math>. Our task is to evaluate <math>\left({n\choose k}\right)</math>.

Due to the geometric sequence and the Newton's formula
:<math>
(1+x+x^2+\cdots)^n=(1-x)^{-n}=\sum_{k\ge 0}{-n\choose k}(-x)^k.
</math>
So
:<math>
\left({n\choose k}\right)=(-1)^k{-n\choose k}={n+k-1\choose k}.
</math>
The last equation is due to the definition of the generalized binomial coefficient. We use an analytic (generating function) proof to get the same result of <math>\left({n\choose k}\right)</math> as the combinatorial proof.

== Pólya's problem of changing money ==

== Catalan Number ==
We now introduce a class of counting problems, all with the same solution, called [http://en.wikipedia.org/wiki/Catalan_number '''Catalan number'''].

The <math>n</math>th Catalan number is denoted as <math>C_n</math>.
In Volume 2 of Stanley's ''Enumerative Combinatorics'', a set of exercises describe 66 different interpretations of the Catalan numbers. We give a few examples, cited by Wikipedia.
* ''C''<sub>''n''</sub> is the number of '''Dyck words''' of length 2''n''. A Dyck word is a string consisting of ''n'' X's and ''n'' Y's such that no initial segment of the string has more Y's than X's (see also [http://en.wikipedia.org/wiki/Dyck_language Dyck language]). For example, the following are the Dyck words of length 6:
<div class="center"><big> XXXYYY     XYXXYY     XYXYXY     XXYYXY     XXYXYY.</big></div>

* Re-interpreting the symbol X as an open parenthesis and Y as a close parenthesis, ''C''<sub>''n''</sub> counts the number of expressions containing ''n'' pairs of parentheses which are correctly matched:
<div class="center"><big> ((()))     ()(())     ()()()     (())()     (()()) </big></div>

* ''C''<sub>''n''</sub> is the number of different ways ''n'' + 1 factors can be completely parenthesized (or the number of ways of associating ''n'' applications of a '''binary operator'''). For ''n'' = 3, for example, we have the following five different parenthesizations of four factors:
<div class="center"><math>((ab)c)d \quad (a(bc))d \quad(ab)(cd) \quad a((bc)d) \quad a(b(cd))</math></div>

* Successive applications of a binary operator can be represented in terms of a '''full binary tree'''. (A rooted binary tree is ''full'' if every vertex has either two children or no children.) It follows that ''C''<sub>''n''</sub> is the number of full binary trees with ''n'' + 1 leaves:
[[Image:Catalan number binary tree example.png|center]]

* ''C''<sub>''n''</sub> is the number of '''monotonic paths''' along the edges of a grid with ''n'' × ''n'' square cells, which do not pass above the diagonal. A monotonic path is one which starts in the lower left corner, finishes in the upper right corner, and consists entirely of edges pointing rightwards or upwards. Counting such paths is equivalent to counting Dyck words: X stands for "move right" and Y stands for "move up". The following diagrams show the case ''n'' = 4:
[[Image:Catalan number 4x4 grid example.svg.png|450px|center]]

* ''C''<sub>''n''</sub> is the number of different ways a [http://en.wikipedia.org/wiki/Convex_polygon '''convex polygon'''] with ''n'' + 2 sides can be cut into '''triangles''' by connecting vertices with straight lines. The following hexagons illustrate the case ''n'' = 4:
[[Image:Catalan-Hexagons-example.png|400px|center]]

* ''C''<sub>''n''</sub> is the number of [http://en.wikipedia.org/wiki/Stack_(data_structure) '''stack''']-sortable permutations of {1, ..., ''n''}. A permutation ''w'' is called '''stack-sortable''' if ''S''(''w'') = (1, ..., ''n''), where ''S''(''w'') is defined recursively as follows: write ''w'' = ''unv'' where ''n'' is the largest element in ''w'' and ''u'' and ''v'' are shorter sequences, and set ''S''(''w'') = ''S''(''u'')''S''(''v'')''n'', with ''S'' being the identity for one-element sequences.

* ''C''<sub>''n''</sub> is the number of ways to tile a stairstep shape of height ''n'' with ''n'' rectangles. The following figure illustrates the case ''n'' = 4:
[[Image:Catalan stairsteps 4.png|400px|center]]

{{Theorem|Recurrence relation for Catalan numbers|
:<math>C_0=0</math>, <math>C_1=1</math>, and for <math>n>1</math>,
::<math>
C_n=\sum_{i=1}^{n-1}C_iC_{n-i}.
</math>
}}

Let <math>G(x)=\sum_{n\ge 0}C_nx^n</math> be the generating function. Apply the product rule,
:<math>G(x)^2=\sum_{n\ge 0}\sum_{k=0}^{n}C_kC_{n-k}x^n=\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n</math>.
Due to the recurrence,
:<math>G(x)=\sum_{n\ge 0}C_nx^n=x+\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n=x+G(x)^2</math>.
Solving this, we obtain
:<math>G(x)=\frac{1\pm(1-4x)^{1/2}}{2}</math>.
Because <math>C_0=0</math>, it must hold that <math>G(x)=\frac{1-(1-4x)^{1/2}}{2}</math>, or otherwise the constant term is not zero. Expanding <math>(1-4x)^{1/2}</math> by Newton's formula, we have
:<math>
\begin{align}
G(x)
&=
\frac{1-(1-4x)^{1/2}}{2}\\
&=
1-\frac{1}{2}\sum_{n\ge 0}{1/2\choose n}(-4x)^n
\end{align}
</math>
Thus,
:<math>
\begin{align}
C_n
&=-\frac{1}{2}{1/2\choose n}(-4)^n\\
&=-\frac{1}{2}\cdot\frac{1}{2}\cdot\frac{-1}{2}\cdot\frac{-3}{2}\cdots\frac{-(2n-3)}{2}\cdot(-4)^n/n!\\
&=\frac{(2n-2)!}{(n-1)!n!}\\
&=\frac{1}{n}{2n-2\choose n-1}.
\end{align}
</math>
So we prove the following closed form for Catalan number.
{{Theorem|Theorem|
:<math>C_n=\frac{1}{n}{2n-2\choose n-1}</math>.
}}

Combinatorics (Fall 2010)/Generating functions

2010-09-11T06:00:28Z

210.28.131.82: /* Multisets */

== Generating Functions ==
In Stanley's magnificent book ''Enumerative Combinatorics'', he comments the generating function as "the most useful but most difficult to understand method (for counting)".

The solution to a counting problem is usually represented as some <math>a_n</math> depending a parameter <math>n</math>. Sometimes this <math>a_n</math> is called a ''counting function'' as it is a function of the parameter <math>n</math>. <math>a_n</math> can also be treated as a infinite series:
:<math>a_0,a_1,a_2,\ldots</math>

The '''ordinary generating function (OGF)''' defined by <math>a_n</math> is
:<math>
G(x)=\sum_{n\ge 0} a_nx^n.
</math>

So <math>G(x)=a_0+a_1x+a_2x^2+\cdots</math>. An expression in this form is called a [http://en.wikipedia.org/wiki/Formal_power_series '''formal power series'''], and <math>a_0,a_1,a_2,\ldots</math> is the sequence of '''coefficients'''.

Furthermore, the generating function can be expanded as
:G(x)=<math>(\underbrace{1+\cdots+1}_{a_0})+(\underbrace{x+\cdots+x}_{a_1})+(\underbrace{x^2+\cdots+x^2}_{a_2})+\cdots+(\underbrace{x^n+\cdots+x^n}_{a_n})+\cdots</math>
so it indeed "generates" all the possible instances of the objects we want to count.

Usually, we do not evaluate the generating function <math>GF(x)</math> on any particular value. <math>x</math> remains as a '''formal variable''' without assuming any value. The numbers that we want to count are the coefficients carried by the terms in the formal power series. So far the generating function is just another way to represent the sequence
:<math>(a_0,a_1,a_2,\ldots\ldots)</math>.

The true power of generating functions comes from the various algebraic operations that we can perform on these generating functions. We use an example to demonstrate this.

=== Fibonacci numbers ===
Consider the following counting problems.
* Count the number of ways that the nonnegative integer <math>n</math> can be written as a sum of ones and twos (in order).
: The problem asks for the number of compositions of <math>n</math> with summands from <math>\{1,2\}</math>. Formally, we are counting the number of tuples <math>(x_1,x_2,\ldots,x_k)</math> for some <math>k\le n</math> such that <math>x_i\in\{1,2\}</math> and <math>\sum_{i=1}^k x_i=n</math>.
: Let <math>F_n</math> be the solution. We observe that a composition either starts with a 1, in which case the rest is a composition of <math>n-1</math>; or starts with a 2, in which case the rest is a composition of <math>n-2</math>. So we have the recursion for <math>F_n</math> that
::<math>F_n=F_{n-1}+F_{n-2}</math>.
* Count the ways to completely cover a <math>2\times n</math> rectangle with <math>2\times 1</math> dominos without any overlaps.
: Dominos are identical <math>2\times 1</math> rectangles, so that only their orientations --- vertical or horizontal matter.
: Let <math>F_n</math> be the solution. It also holds that <math>F_n=F_{n-1}+F_{n-2}</math>. The proof is left as an exercise.

In both problems, the solution is given by <math>F_n</math> which satisfies the following recursion.
:<math>F_n=\begin{cases}
0 & \mbox{if }n=0\\
1 & \mbox{if }n=1\\
F_{n-1}+F_{n-2} & \mbox{if}n\ge 2.
\end{cases}</math>

<math>F_n</math> is called the [http://en.wikipedia.org/wiki/Fibonacci_number Fibonacci number].

{{Theorem|Theorem|
::<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)</math>,
:where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.
}}
The quantity <math>\phi=\frac{1+\sqrt{5}}{2}</math> is the so-called [http://en.wikipedia.org/wiki/Golden_ratio golden ratio], a constant with some significance in mathematics and aesthetics.

We now prove this theorem by using generating functions.

The ordinary generating function for the Fibonacci number <math>F_{n}</math> is
:<math>G(x)=\sum_{n\ge 0}F_n x^n</math>.
We have that <math>F_{n}=F_{n-1}+F_{n-2}</math> for <math>n\ge 2</math>, thus
:<math>\begin{align}
G(x)
&=
\sum_{n\ge 0}F_n x^n
&=
x+\sum_{n\ge 2}(F_{n-1}+F_{n-2})x^n.
\end{align}
</math>
For generating functions, there are general ways to generate <math>F_{n-1}</math> and <math>F_{n-2}</math>, or the coefficients with any smaller indices.
:<math>
\begin{align}
xG(x)
&=\sum_{n\ge 0}F_n x^{n+1}=\sum_{n\ge 1}F_{n-1} x^n=\sum_{n\ge 2}F_{n-1} x^n\\
x^2G(x)
&=\sum_{n\ge 0}F_n x^{n+2}=\sum_{n\ge 2}F_{n-2} x^n.
\end{align}
</math>
So we have
:<math>G(x)=x+(x+x^2)G(x)\,</math>,
hence
:<math>G(x)=\frac{x}{1-x-x^2}</math>.
The value of <math>F_n</math> is the coefficient of <math>x^n</math> in the Taylor series for this formular, which is <math>\frac{G^{(n)}(0)}{n!}=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>. Although this expansion works in principle, the detailed calculus is rather painful.

----
There is an easier way to get this coefficient than directly expanding the Taylor series.

<math>1-x-x^2</math> has two roots <math>\frac{-1\pm\sqrt{5}}{2}</math>.

Denote that <math>\phi=\frac{2}{-1+\sqrt{5}}=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{2}{-1-\sqrt{5}}=\frac{1-\sqrt{5}}{2}</math>. Then <math>(1-x-x^2)=(1-\phi x)(1-\hat{\phi}x)</math>, so we can write
:<math>
\begin{align}
\frac{x}{1-x-x^2}
&=\frac{x}{(1-\phi x)(1-\hat{\phi} x)}\\
&=\frac{\alpha}{(1-\phi x)}+\frac{\beta}{(1-\hat{\phi} x)},
\end{align}
</math>
where <math>\alpha</math> and <math>\beta</math> satisfying that
:<math>\begin{cases}
\alpha+\beta=0\\
\alpha\phi+\beta\hat{\phi}= -1.
\end{cases}</math>
Solving this we have that <math>\alpha=\frac{1}{\sqrt{5}}</math> and <math>\beta=-\frac{1}{\sqrt{5}}</math>. And
:<math>G(x)=\frac{x}{1-x-x^2}=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}</math>
where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.

Note that the expression <math>\frac{1}{1-z}</math> has a well known expansion:
:<math>\frac{1}{1-z}=\sum_{n\ge 0}z^n</math>.

Therefore, <math>G(x)</math> can be expanded as
:<math>
\begin{align}
G(x)
&=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}\\
&=\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\phi x)^n-\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\hat{\phi} x)^n\\
&=\sum_{n\ge 0}\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)x^n.
\end{align}</math>
So the <math>n</math>th Fibonacci number is given by
:<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>.

== Solving recurrences ==
In the above analysis of Fibonacci numbers, we apply the following general methodology of solving recurrences by generating functions.
:1. Give a recursion that computes <math>a_n</math>; that is, an equation expressing <math>a_n</math> in terms of other elements of the sequence, such as
::<math>a_n=f(a_0,a_1,\ldots,a_{n-1})</math> for some function <math>f</math>.
:2. Multiply both sides of the equation by <math>x^n</math> and sum over all <math>n</math>. This gives the generating function
::<math>G(x)=\sum_{n\ge 0}a_nx^n=\sum_{n\ge 0}f(a_0,a_1,\ldots,a_{n-1})x^n</math>.
:: And manipulate the right hand side of the equation so that it becomes some other expression involving <math>G(x)</math>.
:3. Solve the resulting equation to derive an explicit formula for <math>G(x)</math>.
:4. Expand <math>G(x)</math> into a power series and read off the coefficient of <math>x^n</math>, which is a closed form for <math>a_n</math>.

=== Algebraic operations on generating functions ===
The second step in the above methodology is somehow tricky. It involves first applying the recurrence to the coefficients of <math>G(x)</math>, which is easy; and then manipulating the resulting formal power series to express it in terms of <math>G(x)</math>, which is more difficult (because it works backwards).

We can apply several natural algebraic operations on the formal power series.

{{Theorem|Generating function manipulation|
:Let <math>G(x)=\sum_{n\ge 0}g_nx^n</math> and <math>F(x)=\sum_{n\ge 0}f_nx^n</math>.
----

::<math>
\begin{align}
x^k G(x)
&= \sum_{n\ge k}g_{n-k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\frac{G(x)-\sum_{i=0}^{k-1}g_iz^i}{x^k}
&=\sum_{n\ge 0}g_{n+k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\alpha F(x)+\beta G(x)
&= \sum_{n\ge 0} (\alpha f_n+\beta g_n)x^n\\
F(x)G(x)
&= \sum_{n\ge 0}\sum_{k=0}^nf_kg_{n-k}x^n\\
G(cx)
&= \sum_{n\ge 0} c^ng_n x^n\\
G'(x)
&=
\sum_{n\ge 0}(n+1)g_{n+1}x^n
\end{align}
</math>
}}

When manipulating generating functions, these rules are applied backwards; that is, from the right-hand-side to the left-hand-side.

=== Expanding generating functions ===
The last step of solving recurrences by generating function is expanding the closed form generating function <math>G(x)</math> to evaluate its <math>n</math>-th coefficient. In principle, we can always use the [http://en.wikipedia.org/wiki/Taylor_series Taylor series]
:<math>G(x)=\sum_{n\ge 0}\frac{G^{(n)}(0)}{n!}x^n</math>,
where <math>G^{(n)}(0)</math> is the value of the <math>n</math>-th derivative of <math>G(x)</math> evaluated at <math>x=0</math>.

Some interesting special cases are very useful.

====Geometric sequence====
In the example of Fibonacci numbers, we use the well known geometric series:
:<math>\frac{1}{1-x}=\sum_{n\ge 0}x^n</math>.
It is useful when we can express the generating function in the form of <math>G(x)=\frac{a_1}{1-b_1x}+\frac{a_2}{1-b_2x}+\cdots+\frac{a_k}{1-b_kx}</math>. The coefficient of <math>x^n</math> in such <math>G(x)</math> is <math>a_1b_1^n+a_2b_2^n+\cdots+a_kb_k^n</math>.

====Binomial theorem====
The <math>n</math>-th derivative of <math>(1+x)^\alpha</math> for some real <math>\alpha</math> is
:<math>\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)(1+x)^{\alpha-n}</math>.
By Taylor series, we get a generalized version of the binomial theorem known as [http://en.wikipedia.org/wiki/Binomial_coefficient#Newton.27s_binomial_series '''Newton's formula''']:
{{Theorem|Newton's formular (generalized binomial theorem)|
If <math>|x|<1</math>, then
:<math>(1+x)^\alpha=\sum_{n\ge 0}{\alpha\choose n}x^{n}</math>,
where <math>{\alpha\choose n}</math> is the '''generalized binomial coefficient''' defined by
:<math>{\alpha\choose n}=\frac{\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)}{n!}</math>.
}}

=== Example: multisets ===
In the last lecture we gave a combinatorial proof of the number of <math>k</math>-multisets on an <math>n</math>-set. Now we give a generating function approach to the problem.

Let <math>S=\{x_1,x_2,\ldots,x_n\}</math> be an <math>n</math>-element set. We have
:<math>(1+x_1+x_1^2+\cdots)(1+x_2+x_2^2+\cdots)\cdots(1+x_n+x_n^2+\cdots)=\sum_{m:S\rightarrow\mathbb{N}} \prod_{x_i\in S}x_i^{m(x_i)}</math>,
where each <math>m:S\rightarrow\mathbb{N}</math> species a possible multiset on <math>S</math> with multiplicity function <math>m</math>.

Let all <math>x_i=x</math>. Then
:<math>
\begin{align}
(1+x+x^2+\cdots)^n
&=
\sum_{m:S\rightarrow\mathbb{N}}x^{m(x_1)+\cdots+m(x_n)}\\
&=
\sum_{\text{multiset }M\text{ on }S}x^{|M|}\\
&=
\sum_{k\ge 0}\left({n\choose k}\right)x^k.
\end{align}
</math>
The last equation is due to the the definition of <math>\left({n\choose k}\right)</math>. Our task is to evaluate <math>\left({n\choose k}\right)</math>.

Due to the geometric sequence and the Newton's formula
:<math>
(1+x+x^2+\cdots)^n=(1-x)^{-n}=\sum_{k\ge 0}{-n\choose k}(-x)^k.
</math>
So
:<math>
\left({n\choose k}\right)=(-1)^k{-n\choose k}={n+k-1\choose k}.
</math>
The last equation is due to the definition of the generalized binomial coefficient. We use an analytic (generating function) proof to get the same result of <math>\left({n\choose k}\right)</math> as the combinatorial proof.

== Pólya's problem of changing money ==

== Catalan Number ==
We now introduce a class of counting problems, all with the same solution, called [http://en.wikipedia.org/wiki/Catalan_number '''Catalan number'''].

The <math>n</math>th Catalan number is denoted as <math>C_n</math>.
In Volume 2 of Stanley's ''Enumerative Combinatorics'', a set of exercises describe 66 different interpretations of the Catalan numbers. We give a few examples, cited by Wikipedia.
* ''C''<sub>''n''</sub> is the number of '''Dyck words''' of length 2''n''. A Dyck word is a string consisting of ''n'' X's and ''n'' Y's such that no initial segment of the string has more Y's than X's (see also [http://en.wikipedia.org/wiki/Dyck_language Dyck language]). For example, the following are the Dyck words of length 6:
<div class="center"><big> XXXYYY     XYXXYY     XYXYXY     XXYYXY     XXYXYY.</big></div>

* Re-interpreting the symbol X as an open parenthesis and Y as a close parenthesis, ''C''<sub>''n''</sub> counts the number of expressions containing ''n'' pairs of parentheses which are correctly matched:
<div class="center"><big> ((()))     ()(())     ()()()     (())()     (()()) </big></div>

* ''C''<sub>''n''</sub> is the number of different ways ''n'' + 1 factors can be completely parenthesized (or the number of ways of associating ''n'' applications of a '''binary operator'''). For ''n'' = 3, for example, we have the following five different parenthesizations of four factors:
<div class="center"><math>((ab)c)d \quad (a(bc))d \quad(ab)(cd) \quad a((bc)d) \quad a(b(cd))</math></div>

* Successive applications of a binary operator can be represented in terms of a '''full binary tree'''. (A rooted binary tree is ''full'' if every vertex has either two children or no children.) It follows that ''C''<sub>''n''</sub> is the number of full binary trees with ''n'' + 1 leaves:
[[Image:Catalan number binary tree example.png|center]]

* ''C''<sub>''n''</sub> is the number of '''monotonic paths''' along the edges of a grid with ''n'' × ''n'' square cells, which do not pass above the diagonal. A monotonic path is one which starts in the lower left corner, finishes in the upper right corner, and consists entirely of edges pointing rightwards or upwards. Counting such paths is equivalent to counting Dyck words: X stands for "move right" and Y stands for "move up". The following diagrams show the case ''n'' = 4:
[[Image:Catalan number 4x4 grid example.svg.png|450px|center]]

* ''C''<sub>''n''</sub> is the number of different ways a [http://en.wikipedia.org/wiki/Convex_polygon '''convex polygon'''] with ''n'' + 2 sides can be cut into '''triangles''' by connecting vertices with straight lines. The following hexagons illustrate the case ''n'' = 4:
[[Image:Catalan-Hexagons-example.png|400px|center]]

* ''C''<sub>''n''</sub> is the number of [http://en.wikipedia.org/wiki/Stack_(data_structure) '''stack''']-sortable permutations of {1, ..., ''n''}. A permutation ''w'' is called '''stack-sortable''' if ''S''(''w'') = (1, ..., ''n''), where ''S''(''w'') is defined recursively as follows: write ''w'' = ''unv'' where ''n'' is the largest element in ''w'' and ''u'' and ''v'' are shorter sequences, and set ''S''(''w'') = ''S''(''u'')''S''(''v'')''n'', with ''S'' being the identity for one-element sequences.

* ''C''<sub>''n''</sub> is the number of ways to tile a stairstep shape of height ''n'' with ''n'' rectangles. The following figure illustrates the case ''n'' = 4:
[[Image:Catalan stairsteps 4.png|400px|center]]

{{Theorem|Recurrence relation for Catalan numbers|
:<math>C_0=0</math>, <math>C_1=1</math>, and for <math>n>1</math>,
::<math>
C_n=\sum_{i=1}^{n-1}C_iC_{n-i}.
</math>
}}

Let <math>G(x)=\sum_{n\ge 0}C_nx^n</math> be the generating function. Apply the product rule,
:<math>G(x)^2=\sum_{n\ge 0}\sum_{k=0}^{n}C_kC_{n-k}x^n=\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n</math>.
Due to the recurrence,
:<math>G(x)=\sum_{n\ge 0}C_nx^n=x+\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n=x+G(x)^2</math>.
Solving this, we obtain
:<math>G(x)=\frac{1\pm(1-4x)^{1/2}}{2}</math>.
Because <math>C_0=0</math>, it must hold that <math>G(x)=\frac{1-(1-4x)^{1/2}}{2}</math>, or otherwise the constant term is not zero. Expanding <math>(1-4x)^{1/2}</math> by Newton's formula, we have
:<math>
\begin{align}
G(x)
&=
\frac{1-(1-4x)^{1/2}}{2}\\
&=
1-\frac{1}{2}\sum_{n\ge 0}{1/2\choose n}(-4x)^n
\end{align}
</math>
Thus,
:<math>
\begin{align}
C_n
&=-\frac{1}{2}{1/2\choose n}(-4)^n\\
&=-\frac{1}{2}\cdot\frac{1}{2}\cdot\frac{-1}{2}\cdot\frac{-3}{2}\cdots\frac{-(2n-3)}{2}\cdot(-4)^n/n!\\
&=\frac{(2n-2)!}{(n-1)!n!}\\
&=\frac{1}{n}{2n-2\choose n-1}.
\end{align}
</math>
So we prove the following closed form for Catalan number.
{{Theorem|Theorem|
:<math>C_n=\frac{1}{n}{2n-2\choose n-1}</math>.
}}

Combinatorics (Fall 2010)/Generating functions

2010-09-11T05:59:58Z

210.28.131.82: /* Binomial theorem */

== Generating Functions ==
In Stanley's magnificent book ''Enumerative Combinatorics'', he comments the generating function as "the most useful but most difficult to understand method (for counting)".

The solution to a counting problem is usually represented as some <math>a_n</math> depending a parameter <math>n</math>. Sometimes this <math>a_n</math> is called a ''counting function'' as it is a function of the parameter <math>n</math>. <math>a_n</math> can also be treated as a infinite series:
:<math>a_0,a_1,a_2,\ldots</math>

The '''ordinary generating function (OGF)''' defined by <math>a_n</math> is
:<math>
G(x)=\sum_{n\ge 0} a_nx^n.
</math>

So <math>G(x)=a_0+a_1x+a_2x^2+\cdots</math>. An expression in this form is called a [http://en.wikipedia.org/wiki/Formal_power_series '''formal power series'''], and <math>a_0,a_1,a_2,\ldots</math> is the sequence of '''coefficients'''.

Furthermore, the generating function can be expanded as
:G(x)=<math>(\underbrace{1+\cdots+1}_{a_0})+(\underbrace{x+\cdots+x}_{a_1})+(\underbrace{x^2+\cdots+x^2}_{a_2})+\cdots+(\underbrace{x^n+\cdots+x^n}_{a_n})+\cdots</math>
so it indeed "generates" all the possible instances of the objects we want to count.

Usually, we do not evaluate the generating function <math>GF(x)</math> on any particular value. <math>x</math> remains as a '''formal variable''' without assuming any value. The numbers that we want to count are the coefficients carried by the terms in the formal power series. So far the generating function is just another way to represent the sequence
:<math>(a_0,a_1,a_2,\ldots\ldots)</math>.

The true power of generating functions comes from the various algebraic operations that we can perform on these generating functions. We use an example to demonstrate this.

=== Fibonacci numbers ===
Consider the following counting problems.
* Count the number of ways that the nonnegative integer <math>n</math> can be written as a sum of ones and twos (in order).
: The problem asks for the number of compositions of <math>n</math> with summands from <math>\{1,2\}</math>. Formally, we are counting the number of tuples <math>(x_1,x_2,\ldots,x_k)</math> for some <math>k\le n</math> such that <math>x_i\in\{1,2\}</math> and <math>\sum_{i=1}^k x_i=n</math>.
: Let <math>F_n</math> be the solution. We observe that a composition either starts with a 1, in which case the rest is a composition of <math>n-1</math>; or starts with a 2, in which case the rest is a composition of <math>n-2</math>. So we have the recursion for <math>F_n</math> that
::<math>F_n=F_{n-1}+F_{n-2}</math>.
* Count the ways to completely cover a <math>2\times n</math> rectangle with <math>2\times 1</math> dominos without any overlaps.
: Dominos are identical <math>2\times 1</math> rectangles, so that only their orientations --- vertical or horizontal matter.
: Let <math>F_n</math> be the solution. It also holds that <math>F_n=F_{n-1}+F_{n-2}</math>. The proof is left as an exercise.

In both problems, the solution is given by <math>F_n</math> which satisfies the following recursion.
:<math>F_n=\begin{cases}
0 & \mbox{if }n=0\\
1 & \mbox{if }n=1\\
F_{n-1}+F_{n-2} & \mbox{if}n\ge 2.
\end{cases}</math>

<math>F_n</math> is called the [http://en.wikipedia.org/wiki/Fibonacci_number Fibonacci number].

{{Theorem|Theorem|
::<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)</math>,
:where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.
}}
The quantity <math>\phi=\frac{1+\sqrt{5}}{2}</math> is the so-called [http://en.wikipedia.org/wiki/Golden_ratio golden ratio], a constant with some significance in mathematics and aesthetics.

We now prove this theorem by using generating functions.

The ordinary generating function for the Fibonacci number <math>F_{n}</math> is
:<math>G(x)=\sum_{n\ge 0}F_n x^n</math>.
We have that <math>F_{n}=F_{n-1}+F_{n-2}</math> for <math>n\ge 2</math>, thus
:<math>\begin{align}
G(x)
&=
\sum_{n\ge 0}F_n x^n
&=
x+\sum_{n\ge 2}(F_{n-1}+F_{n-2})x^n.
\end{align}
</math>
For generating functions, there are general ways to generate <math>F_{n-1}</math> and <math>F_{n-2}</math>, or the coefficients with any smaller indices.
:<math>
\begin{align}
xG(x)
&=\sum_{n\ge 0}F_n x^{n+1}=\sum_{n\ge 1}F_{n-1} x^n=\sum_{n\ge 2}F_{n-1} x^n\\
x^2G(x)
&=\sum_{n\ge 0}F_n x^{n+2}=\sum_{n\ge 2}F_{n-2} x^n.
\end{align}
</math>
So we have
:<math>G(x)=x+(x+x^2)G(x)\,</math>,
hence
:<math>G(x)=\frac{x}{1-x-x^2}</math>.
The value of <math>F_n</math> is the coefficient of <math>x^n</math> in the Taylor series for this formular, which is <math>\frac{G^{(n)}(0)}{n!}=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>. Although this expansion works in principle, the detailed calculus is rather painful.

----
There is an easier way to get this coefficient than directly expanding the Taylor series.

<math>1-x-x^2</math> has two roots <math>\frac{-1\pm\sqrt{5}}{2}</math>.

Denote that <math>\phi=\frac{2}{-1+\sqrt{5}}=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{2}{-1-\sqrt{5}}=\frac{1-\sqrt{5}}{2}</math>. Then <math>(1-x-x^2)=(1-\phi x)(1-\hat{\phi}x)</math>, so we can write
:<math>
\begin{align}
\frac{x}{1-x-x^2}
&=\frac{x}{(1-\phi x)(1-\hat{\phi} x)}\\
&=\frac{\alpha}{(1-\phi x)}+\frac{\beta}{(1-\hat{\phi} x)},
\end{align}
</math>
where <math>\alpha</math> and <math>\beta</math> satisfying that
:<math>\begin{cases}
\alpha+\beta=0\\
\alpha\phi+\beta\hat{\phi}= -1.
\end{cases}</math>
Solving this we have that <math>\alpha=\frac{1}{\sqrt{5}}</math> and <math>\beta=-\frac{1}{\sqrt{5}}</math>. And
:<math>G(x)=\frac{x}{1-x-x^2}=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}</math>
where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.

Note that the expression <math>\frac{1}{1-z}</math> has a well known expansion:
:<math>\frac{1}{1-z}=\sum_{n\ge 0}z^n</math>.

Therefore, <math>G(x)</math> can be expanded as
:<math>
\begin{align}
G(x)
&=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}\\
&=\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\phi x)^n-\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\hat{\phi} x)^n\\
&=\sum_{n\ge 0}\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)x^n.
\end{align}</math>
So the <math>n</math>th Fibonacci number is given by
:<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>.

== Solving recurrences ==
In the above analysis of Fibonacci numbers, we apply the following general methodology of solving recurrences by generating functions.
:1. Give a recursion that computes <math>a_n</math>; that is, an equation expressing <math>a_n</math> in terms of other elements of the sequence, such as
::<math>a_n=f(a_0,a_1,\ldots,a_{n-1})</math> for some function <math>f</math>.
:2. Multiply both sides of the equation by <math>x^n</math> and sum over all <math>n</math>. This gives the generating function
::<math>G(x)=\sum_{n\ge 0}a_nx^n=\sum_{n\ge 0}f(a_0,a_1,\ldots,a_{n-1})x^n</math>.
:: And manipulate the right hand side of the equation so that it becomes some other expression involving <math>G(x)</math>.
:3. Solve the resulting equation to derive an explicit formula for <math>G(x)</math>.
:4. Expand <math>G(x)</math> into a power series and read off the coefficient of <math>x^n</math>, which is a closed form for <math>a_n</math>.

=== Algebraic operations on generating functions ===
The second step in the above methodology is somehow tricky. It involves first applying the recurrence to the coefficients of <math>G(x)</math>, which is easy; and then manipulating the resulting formal power series to express it in terms of <math>G(x)</math>, which is more difficult (because it works backwards).

We can apply several natural algebraic operations on the formal power series.

{{Theorem|Generating function manipulation|
:Let <math>G(x)=\sum_{n\ge 0}g_nx^n</math> and <math>F(x)=\sum_{n\ge 0}f_nx^n</math>.
----

::<math>
\begin{align}
x^k G(x)
&= \sum_{n\ge k}g_{n-k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\frac{G(x)-\sum_{i=0}^{k-1}g_iz^i}{x^k}
&=\sum_{n\ge 0}g_{n+k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\alpha F(x)+\beta G(x)
&= \sum_{n\ge 0} (\alpha f_n+\beta g_n)x^n\\
F(x)G(x)
&= \sum_{n\ge 0}\sum_{k=0}^nf_kg_{n-k}x^n\\
G(cx)
&= \sum_{n\ge 0} c^ng_n x^n\\
G'(x)
&=
\sum_{n\ge 0}(n+1)g_{n+1}x^n
\end{align}
</math>
}}

When manipulating generating functions, these rules are applied backwards; that is, from the right-hand-side to the left-hand-side.

=== Expanding generating functions ===
The last step of solving recurrences by generating function is expanding the closed form generating function <math>G(x)</math> to evaluate its <math>n</math>-th coefficient. In principle, we can always use the [http://en.wikipedia.org/wiki/Taylor_series Taylor series]
:<math>G(x)=\sum_{n\ge 0}\frac{G^{(n)}(0)}{n!}x^n</math>,
where <math>G^{(n)}(0)</math> is the value of the <math>n</math>-th derivative of <math>G(x)</math> evaluated at <math>x=0</math>.

Some interesting special cases are very useful.

====Geometric sequence====
In the example of Fibonacci numbers, we use the well known geometric series:
:<math>\frac{1}{1-x}=\sum_{n\ge 0}x^n</math>.
It is useful when we can express the generating function in the form of <math>G(x)=\frac{a_1}{1-b_1x}+\frac{a_2}{1-b_2x}+\cdots+\frac{a_k}{1-b_kx}</math>. The coefficient of <math>x^n</math> in such <math>G(x)</math> is <math>a_1b_1^n+a_2b_2^n+\cdots+a_kb_k^n</math>.

====Binomial theorem====
The <math>n</math>-th derivative of <math>(1+x)^\alpha</math> for some real <math>\alpha</math> is
:<math>\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)(1+x)^{\alpha-n}</math>.
By Taylor series, we get a generalized version of the binomial theorem known as [http://en.wikipedia.org/wiki/Binomial_coefficient#Newton.27s_binomial_series '''Newton's formula''']:
{{Theorem|Newton's formular (generalized binomial theorem)|
If <math>|x|<1</math>, then
:<math>(1+x)^\alpha=\sum_{n\ge 0}{\alpha\choose n}x^{n}</math>,
where <math>{\alpha\choose n}</math> is the '''generalized binomial coefficient''' defined by
:<math>{\alpha\choose n}=\frac{\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)}{n!}</math>.
}}

=== Multisets ===
In the last lecture we gave a combinatorial proof of the number of <math>k</math>-multisets on an <math>n</math>-set. Now we give a generating function approach to the problem.

Let <math>S=\{x_1,x_2,\ldots,x_n\}</math> be an <math>n</math>-element set. We have
:<math>(1+x_1+x_1^2+\cdots)(1+x_2+x_2^2+\cdots)\cdots(1+x_n+x_n^2+\cdots)=\sum_{m:S\rightarrow\mathbb{N}} \prod_{x_i\in S}x_i^{m(x_i)}</math>,
where each <math>m:S\rightarrow\mathbb{N}</math> species a possible multiset on <math>S</math> with multiplicity function <math>m</math>.

Let all <math>x_i=x</math>. Then
:<math>
\begin{align}
(1+x+x^2+\cdots)^n
&=
\sum_{m:S\rightarrow\mathbb{N}}x^{m(x_1)+\cdots+m(x_n)}\\
&=
\sum_{\text{multiset }M\text{ on }S}x^{|M|}\\
&=
\sum_{k\ge 0}\left({n\choose k}\right)x^k.
\end{align}
</math>
The last equation is due to the the definition of <math>\left({n\choose k}\right)</math>. Our task is to evaluate <math>\left({n\choose k}\right)</math>.

Due to the geometric sequence and the Newton's formula
:<math>
(1+x+x^2+\cdots)^n=(1-x)^{-n}=\sum_{k\ge 0}{-n\choose k}(-x)^k.
</math>
So
:<math>
\left({n\choose k}\right)=(-1)^k{-n\choose k}={n+k-1\choose k}.
</math>
The last equation is due to the definition of the generalized binomial coefficient. We use an analytic (generating function) proof to get the same result of <math>\left({n\choose k}\right)</math> as the combinatorial proof.

== Pólya's problem of changing money ==

== Catalan Number ==
We now introduce a class of counting problems, all with the same solution, called [http://en.wikipedia.org/wiki/Catalan_number '''Catalan number'''].

The <math>n</math>th Catalan number is denoted as <math>C_n</math>.
In Volume 2 of Stanley's ''Enumerative Combinatorics'', a set of exercises describe 66 different interpretations of the Catalan numbers. We give a few examples, cited by Wikipedia.
* ''C''<sub>''n''</sub> is the number of '''Dyck words''' of length 2''n''. A Dyck word is a string consisting of ''n'' X's and ''n'' Y's such that no initial segment of the string has more Y's than X's (see also [http://en.wikipedia.org/wiki/Dyck_language Dyck language]). For example, the following are the Dyck words of length 6:
<div class="center"><big> XXXYYY     XYXXYY     XYXYXY     XXYYXY     XXYXYY.</big></div>

* Re-interpreting the symbol X as an open parenthesis and Y as a close parenthesis, ''C''<sub>''n''</sub> counts the number of expressions containing ''n'' pairs of parentheses which are correctly matched:
<div class="center"><big> ((()))     ()(())     ()()()     (())()     (()()) </big></div>

* ''C''<sub>''n''</sub> is the number of different ways ''n'' + 1 factors can be completely parenthesized (or the number of ways of associating ''n'' applications of a '''binary operator'''). For ''n'' = 3, for example, we have the following five different parenthesizations of four factors:
<div class="center"><math>((ab)c)d \quad (a(bc))d \quad(ab)(cd) \quad a((bc)d) \quad a(b(cd))</math></div>

* Successive applications of a binary operator can be represented in terms of a '''full binary tree'''. (A rooted binary tree is ''full'' if every vertex has either two children or no children.) It follows that ''C''<sub>''n''</sub> is the number of full binary trees with ''n'' + 1 leaves:
[[Image:Catalan number binary tree example.png|center]]

* ''C''<sub>''n''</sub> is the number of '''monotonic paths''' along the edges of a grid with ''n'' × ''n'' square cells, which do not pass above the diagonal. A monotonic path is one which starts in the lower left corner, finishes in the upper right corner, and consists entirely of edges pointing rightwards or upwards. Counting such paths is equivalent to counting Dyck words: X stands for "move right" and Y stands for "move up". The following diagrams show the case ''n'' = 4:
[[Image:Catalan number 4x4 grid example.svg.png|450px|center]]

* ''C''<sub>''n''</sub> is the number of different ways a [http://en.wikipedia.org/wiki/Convex_polygon '''convex polygon'''] with ''n'' + 2 sides can be cut into '''triangles''' by connecting vertices with straight lines. The following hexagons illustrate the case ''n'' = 4:
[[Image:Catalan-Hexagons-example.png|400px|center]]

* ''C''<sub>''n''</sub> is the number of [http://en.wikipedia.org/wiki/Stack_(data_structure) '''stack''']-sortable permutations of {1, ..., ''n''}. A permutation ''w'' is called '''stack-sortable''' if ''S''(''w'') = (1, ..., ''n''), where ''S''(''w'') is defined recursively as follows: write ''w'' = ''unv'' where ''n'' is the largest element in ''w'' and ''u'' and ''v'' are shorter sequences, and set ''S''(''w'') = ''S''(''u'')''S''(''v'')''n'', with ''S'' being the identity for one-element sequences.

* ''C''<sub>''n''</sub> is the number of ways to tile a stairstep shape of height ''n'' with ''n'' rectangles. The following figure illustrates the case ''n'' = 4:
[[Image:Catalan stairsteps 4.png|400px|center]]

{{Theorem|Recurrence relation for Catalan numbers|
:<math>C_0=0</math>, <math>C_1=1</math>, and for <math>n>1</math>,
::<math>
C_n=\sum_{i=1}^{n-1}C_iC_{n-i}.
</math>
}}

Let <math>G(x)=\sum_{n\ge 0}C_nx^n</math> be the generating function. Apply the product rule,
:<math>G(x)^2=\sum_{n\ge 0}\sum_{k=0}^{n}C_kC_{n-k}x^n=\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n</math>.
Due to the recurrence,
:<math>G(x)=\sum_{n\ge 0}C_nx^n=x+\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n=x+G(x)^2</math>.
Solving this, we obtain
:<math>G(x)=\frac{1\pm(1-4x)^{1/2}}{2}</math>.
Because <math>C_0=0</math>, it must hold that <math>G(x)=\frac{1-(1-4x)^{1/2}}{2}</math>, or otherwise the constant term is not zero. Expanding <math>(1-4x)^{1/2}</math> by Newton's formula, we have
:<math>
\begin{align}
G(x)
&=
\frac{1-(1-4x)^{1/2}}{2}\\
&=
1-\frac{1}{2}\sum_{n\ge 0}{1/2\choose n}(-4x)^n
\end{align}
</math>
Thus,
:<math>
\begin{align}
C_n
&=-\frac{1}{2}{1/2\choose n}(-4)^n\\
&=-\frac{1}{2}\cdot\frac{1}{2}\cdot\frac{-1}{2}\cdot\frac{-3}{2}\cdots\frac{-(2n-3)}{2}\cdot(-4)^n/n!\\
&=\frac{(2n-2)!}{(n-1)!n!}\\
&=\frac{1}{n}{2n-2\choose n-1}.
\end{align}
</math>
So we prove the following closed form for Catalan number.
{{Theorem|Theorem|
:<math>C_n=\frac{1}{n}{2n-2\choose n-1}</math>.
}}

Combinatorics (Fall 2010)/Generating functions

2010-09-11T05:56:20Z

210.28.131.82: /* Expanding generating functions */

== Generating Functions ==
In Stanley's magnificent book ''Enumerative Combinatorics'', he comments the generating function as "the most useful but most difficult to understand method (for counting)".

The solution to a counting problem is usually represented as some <math>a_n</math> depending a parameter <math>n</math>. Sometimes this <math>a_n</math> is called a ''counting function'' as it is a function of the parameter <math>n</math>. <math>a_n</math> can also be treated as a infinite series:
:<math>a_0,a_1,a_2,\ldots</math>

The '''ordinary generating function (OGF)''' defined by <math>a_n</math> is
:<math>
G(x)=\sum_{n\ge 0} a_nx^n.
</math>

So <math>G(x)=a_0+a_1x+a_2x^2+\cdots</math>. An expression in this form is called a [http://en.wikipedia.org/wiki/Formal_power_series '''formal power series'''], and <math>a_0,a_1,a_2,\ldots</math> is the sequence of '''coefficients'''.

Furthermore, the generating function can be expanded as
:G(x)=<math>(\underbrace{1+\cdots+1}_{a_0})+(\underbrace{x+\cdots+x}_{a_1})+(\underbrace{x^2+\cdots+x^2}_{a_2})+\cdots+(\underbrace{x^n+\cdots+x^n}_{a_n})+\cdots</math>
so it indeed "generates" all the possible instances of the objects we want to count.

Usually, we do not evaluate the generating function <math>GF(x)</math> on any particular value. <math>x</math> remains as a '''formal variable''' without assuming any value. The numbers that we want to count are the coefficients carried by the terms in the formal power series. So far the generating function is just another way to represent the sequence
:<math>(a_0,a_1,a_2,\ldots\ldots)</math>.

The true power of generating functions comes from the various algebraic operations that we can perform on these generating functions. We use an example to demonstrate this.

=== Fibonacci numbers ===
Consider the following counting problems.
* Count the number of ways that the nonnegative integer <math>n</math> can be written as a sum of ones and twos (in order).
: The problem asks for the number of compositions of <math>n</math> with summands from <math>\{1,2\}</math>. Formally, we are counting the number of tuples <math>(x_1,x_2,\ldots,x_k)</math> for some <math>k\le n</math> such that <math>x_i\in\{1,2\}</math> and <math>\sum_{i=1}^k x_i=n</math>.
: Let <math>F_n</math> be the solution. We observe that a composition either starts with a 1, in which case the rest is a composition of <math>n-1</math>; or starts with a 2, in which case the rest is a composition of <math>n-2</math>. So we have the recursion for <math>F_n</math> that
::<math>F_n=F_{n-1}+F_{n-2}</math>.
* Count the ways to completely cover a <math>2\times n</math> rectangle with <math>2\times 1</math> dominos without any overlaps.
: Dominos are identical <math>2\times 1</math> rectangles, so that only their orientations --- vertical or horizontal matter.
: Let <math>F_n</math> be the solution. It also holds that <math>F_n=F_{n-1}+F_{n-2}</math>. The proof is left as an exercise.

In both problems, the solution is given by <math>F_n</math> which satisfies the following recursion.
:<math>F_n=\begin{cases}
0 & \mbox{if }n=0\\
1 & \mbox{if }n=1\\
F_{n-1}+F_{n-2} & \mbox{if}n\ge 2.
\end{cases}</math>

<math>F_n</math> is called the [http://en.wikipedia.org/wiki/Fibonacci_number Fibonacci number].

{{Theorem|Theorem|
::<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)</math>,
:where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.
}}
The quantity <math>\phi=\frac{1+\sqrt{5}}{2}</math> is the so-called [http://en.wikipedia.org/wiki/Golden_ratio golden ratio], a constant with some significance in mathematics and aesthetics.

We now prove this theorem by using generating functions.

The ordinary generating function for the Fibonacci number <math>F_{n}</math> is
:<math>G(x)=\sum_{n\ge 0}F_n x^n</math>.
We have that <math>F_{n}=F_{n-1}+F_{n-2}</math> for <math>n\ge 2</math>, thus
:<math>\begin{align}
G(x)
&=
\sum_{n\ge 0}F_n x^n
&=
x+\sum_{n\ge 2}(F_{n-1}+F_{n-2})x^n.
\end{align}
</math>
For generating functions, there are general ways to generate <math>F_{n-1}</math> and <math>F_{n-2}</math>, or the coefficients with any smaller indices.
:<math>
\begin{align}
xG(x)
&=\sum_{n\ge 0}F_n x^{n+1}=\sum_{n\ge 1}F_{n-1} x^n=\sum_{n\ge 2}F_{n-1} x^n\\
x^2G(x)
&=\sum_{n\ge 0}F_n x^{n+2}=\sum_{n\ge 2}F_{n-2} x^n.
\end{align}
</math>
So we have
:<math>G(x)=x+(x+x^2)G(x)\,</math>,
hence
:<math>G(x)=\frac{x}{1-x-x^2}</math>.
The value of <math>F_n</math> is the coefficient of <math>x^n</math> in the Taylor series for this formular, which is <math>\frac{G^{(n)}(0)}{n!}=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>. Although this expansion works in principle, the detailed calculus is rather painful.

----
There is an easier way to get this coefficient than directly expanding the Taylor series.

<math>1-x-x^2</math> has two roots <math>\frac{-1\pm\sqrt{5}}{2}</math>.

Denote that <math>\phi=\frac{2}{-1+\sqrt{5}}=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{2}{-1-\sqrt{5}}=\frac{1-\sqrt{5}}{2}</math>. Then <math>(1-x-x^2)=(1-\phi x)(1-\hat{\phi}x)</math>, so we can write
:<math>
\begin{align}
\frac{x}{1-x-x^2}
&=\frac{x}{(1-\phi x)(1-\hat{\phi} x)}\\
&=\frac{\alpha}{(1-\phi x)}+\frac{\beta}{(1-\hat{\phi} x)},
\end{align}
</math>
where <math>\alpha</math> and <math>\beta</math> satisfying that
:<math>\begin{cases}
\alpha+\beta=0\\
\alpha\phi+\beta\hat{\phi}= -1.
\end{cases}</math>
Solving this we have that <math>\alpha=\frac{1}{\sqrt{5}}</math> and <math>\beta=-\frac{1}{\sqrt{5}}</math>. And
:<math>G(x)=\frac{x}{1-x-x^2}=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}</math>
where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.

Note that the expression <math>\frac{1}{1-z}</math> has a well known expansion:
:<math>\frac{1}{1-z}=\sum_{n\ge 0}z^n</math>.

Therefore, <math>G(x)</math> can be expanded as
:<math>
\begin{align}
G(x)
&=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}\\
&=\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\phi x)^n-\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\hat{\phi} x)^n\\
&=\sum_{n\ge 0}\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)x^n.
\end{align}</math>
So the <math>n</math>th Fibonacci number is given by
:<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>.

== Solving recurrences ==
In the above analysis of Fibonacci numbers, we apply the following general methodology of solving recurrences by generating functions.
:1. Give a recursion that computes <math>a_n</math>; that is, an equation expressing <math>a_n</math> in terms of other elements of the sequence, such as
::<math>a_n=f(a_0,a_1,\ldots,a_{n-1})</math> for some function <math>f</math>.
:2. Multiply both sides of the equation by <math>x^n</math> and sum over all <math>n</math>. This gives the generating function
::<math>G(x)=\sum_{n\ge 0}a_nx^n=\sum_{n\ge 0}f(a_0,a_1,\ldots,a_{n-1})x^n</math>.
:: And manipulate the right hand side of the equation so that it becomes some other expression involving <math>G(x)</math>.
:3. Solve the resulting equation to derive an explicit formula for <math>G(x)</math>.
:4. Expand <math>G(x)</math> into a power series and read off the coefficient of <math>x^n</math>, which is a closed form for <math>a_n</math>.

=== Algebraic operations on generating functions ===
The second step in the above methodology is somehow tricky. It involves first applying the recurrence to the coefficients of <math>G(x)</math>, which is easy; and then manipulating the resulting formal power series to express it in terms of <math>G(x)</math>, which is more difficult (because it works backwards).

We can apply several natural algebraic operations on the formal power series.

{{Theorem|Generating function manipulation|
:Let <math>G(x)=\sum_{n\ge 0}g_nx^n</math> and <math>F(x)=\sum_{n\ge 0}f_nx^n</math>.
----

::<math>
\begin{align}
x^k G(x)
&= \sum_{n\ge k}g_{n-k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\frac{G(x)-\sum_{i=0}^{k-1}g_iz^i}{x^k}
&=\sum_{n\ge 0}g_{n+k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\alpha F(x)+\beta G(x)
&= \sum_{n\ge 0} (\alpha f_n+\beta g_n)x^n\\
F(x)G(x)
&= \sum_{n\ge 0}\sum_{k=0}^nf_kg_{n-k}x^n\\
G(cx)
&= \sum_{n\ge 0} c^ng_n x^n\\
G'(x)
&=
\sum_{n\ge 0}(n+1)g_{n+1}x^n
\end{align}
</math>
}}

When manipulating generating functions, these rules are applied backwards; that is, from the right-hand-side to the left-hand-side.

=== Expanding generating functions ===
The last step of solving recurrences by generating function is expanding the closed form generating function <math>G(x)</math> to evaluate its <math>n</math>-th coefficient. In principle, we can always use the [http://en.wikipedia.org/wiki/Taylor_series Taylor series]
:<math>G(x)=\sum_{n\ge 0}\frac{G^{(n)}(0)}{n!}x^n</math>,
where <math>G^{(n)}(0)</math> is the value of the <math>n</math>-th derivative of <math>G(x)</math> evaluated at <math>x=0</math>.

Some interesting special cases are very useful.

====Geometric sequence====
In the example of Fibonacci numbers, we use the well known geometric series:
:<math>\frac{1}{1-x}=\sum_{n\ge 0}x^n</math>.
It is useful when we can express the generating function in the form of <math>G(x)=\frac{a_1}{1-b_1x}+\frac{a_2}{1-b_2x}+\cdots+\frac{a_k}{1-b_kx}</math>. The coefficient of <math>x^n</math> in such <math>G(x)</math> is <math>a_1b_1^n+a_2b_2^n+\cdots+a_kb_k^n</math>.

====Binomial theorem====
The <math>n</math>-th derivative of <math>(1+x)^\alpha</math> for some real <math>\alpha</math> is
:<math>\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)(1+x)^{\alpha-n}</math>.
By Taylor series, we get a generalized version of the binomial theorem known as [http://en.wikipedia.org/wiki/Binomial_coefficient#Newton.27s_binomial_series '''Newton's formula''']:
{{Theorem|Newton's formular (generalized binomial theorem)|
If <math>|x|<1</math>, then
:<math>(1+x)^\alpha=\sum_{n\ge 0}{\alpha\choose n}x^{n}</math>,
where <math>{\alpha\choose n}</math> is the '''generalized binomial coefficient''' defined by
:<math>{\alpha\choose n}=\frac{\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)}{n!}</math>.
}}
In the last lecture we gave a combinatorial proof of the number of <math>k</math>-multisets on an <math>n</math>-set. Now we give a generating function approach to the problem.

Let <math>S=\{x_1,x_2,\ldots,x_n\}</math> be an <math>n</math>-element set. We have
:<math>(1+x_1+x_1^2+\cdots)(1+x_2+x_2^2+\cdots)\cdots(1+x_n+x_n^2+\cdots)=\sum_{m:S\rightarrow\mathbb{N}} \prod_{x_i\in S}x_i^{m(x_i)}</math>,
where each <math>m:S\rightarrow\mathbb{N}</math> species a possible multiset on <math>S</math> with multiplicity function <math>m</math>.

Let all <math>x_i=x</math>. Then
:<math>
\begin{align}
(1+x+x^2+\cdots)^n
&=
\sum_{m:S\rightarrow\mathbb{N}}x^{m(x_1)+\cdots+m(x_n)}\\
&=
\sum_{\text{multiset }M\text{ on }S}x^{|M|}\\
&=
\sum_{k\ge 0}\left({n\choose k}\right)x^k.
\end{align}
</math>
The last equation is due to the the definition of <math>\left({n\choose k}\right)</math>. Our task is to evaluate <math>\left({n\choose k}\right)</math>.

Due to the geometric sequence and the Newton's formula
:<math>
(1+x+x^2+\cdots)^n=(1-x)^{-n}=\sum_{k\ge 0}{-n\choose k}(-x)^k.
</math>
So
:<math>
\left({n\choose k}\right)=(-1)^k{-n\choose k}={n+k-1\choose k}.
</math>
The last equation is due to the definition of the generalized binomial coefficient. We use an analytic (generating function) proof to get the same result of <math>\left({n\choose k}\right)</math> as the combinatorial proof.

== Pólya's problem of changing money ==

== Catalan Number ==
We now introduce a class of counting problems, all with the same solution, called [http://en.wikipedia.org/wiki/Catalan_number '''Catalan number'''].

The <math>n</math>th Catalan number is denoted as <math>C_n</math>.
In Volume 2 of Stanley's ''Enumerative Combinatorics'', a set of exercises describe 66 different interpretations of the Catalan numbers. We give a few examples, cited by Wikipedia.
* ''C''<sub>''n''</sub> is the number of '''Dyck words''' of length 2''n''. A Dyck word is a string consisting of ''n'' X's and ''n'' Y's such that no initial segment of the string has more Y's than X's (see also [http://en.wikipedia.org/wiki/Dyck_language Dyck language]). For example, the following are the Dyck words of length 6:
<div class="center"><big> XXXYYY     XYXXYY     XYXYXY     XXYYXY     XXYXYY.</big></div>

* Re-interpreting the symbol X as an open parenthesis and Y as a close parenthesis, ''C''<sub>''n''</sub> counts the number of expressions containing ''n'' pairs of parentheses which are correctly matched:
<div class="center"><big> ((()))     ()(())     ()()()     (())()     (()()) </big></div>

* ''C''<sub>''n''</sub> is the number of different ways ''n'' + 1 factors can be completely parenthesized (or the number of ways of associating ''n'' applications of a '''binary operator'''). For ''n'' = 3, for example, we have the following five different parenthesizations of four factors:
<div class="center"><math>((ab)c)d \quad (a(bc))d \quad(ab)(cd) \quad a((bc)d) \quad a(b(cd))</math></div>

* Successive applications of a binary operator can be represented in terms of a '''full binary tree'''. (A rooted binary tree is ''full'' if every vertex has either two children or no children.) It follows that ''C''<sub>''n''</sub> is the number of full binary trees with ''n'' + 1 leaves:
[[Image:Catalan number binary tree example.png|center]]

* ''C''<sub>''n''</sub> is the number of '''monotonic paths''' along the edges of a grid with ''n'' × ''n'' square cells, which do not pass above the diagonal. A monotonic path is one which starts in the lower left corner, finishes in the upper right corner, and consists entirely of edges pointing rightwards or upwards. Counting such paths is equivalent to counting Dyck words: X stands for "move right" and Y stands for "move up". The following diagrams show the case ''n'' = 4:
[[Image:Catalan number 4x4 grid example.svg.png|450px|center]]

* ''C''<sub>''n''</sub> is the number of different ways a [http://en.wikipedia.org/wiki/Convex_polygon '''convex polygon'''] with ''n'' + 2 sides can be cut into '''triangles''' by connecting vertices with straight lines. The following hexagons illustrate the case ''n'' = 4:
[[Image:Catalan-Hexagons-example.png|400px|center]]

* ''C''<sub>''n''</sub> is the number of [http://en.wikipedia.org/wiki/Stack_(data_structure) '''stack''']-sortable permutations of {1, ..., ''n''}. A permutation ''w'' is called '''stack-sortable''' if ''S''(''w'') = (1, ..., ''n''), where ''S''(''w'') is defined recursively as follows: write ''w'' = ''unv'' where ''n'' is the largest element in ''w'' and ''u'' and ''v'' are shorter sequences, and set ''S''(''w'') = ''S''(''u'')''S''(''v'')''n'', with ''S'' being the identity for one-element sequences.

* ''C''<sub>''n''</sub> is the number of ways to tile a stairstep shape of height ''n'' with ''n'' rectangles. The following figure illustrates the case ''n'' = 4:
[[Image:Catalan stairsteps 4.png|400px|center]]

{{Theorem|Recurrence relation for Catalan numbers|
:<math>C_0=0</math>, <math>C_1=1</math>, and for <math>n>1</math>,
::<math>
C_n=\sum_{i=1}^{n-1}C_iC_{n-i}.
</math>
}}

Let <math>G(x)=\sum_{n\ge 0}C_nx^n</math> be the generating function. Apply the product rule,
:<math>G(x)^2=\sum_{n\ge 0}\sum_{k=0}^{n}C_kC_{n-k}x^n=\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n</math>.
Due to the recurrence,
:<math>G(x)=\sum_{n\ge 0}C_nx^n=x+\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n=x+G(x)^2</math>.
Solving this, we obtain
:<math>G(x)=\frac{1\pm(1-4x)^{1/2}}{2}</math>.
Because <math>C_0=0</math>, it must hold that <math>G(x)=\frac{1-(1-4x)^{1/2}}{2}</math>, or otherwise the constant term is not zero. Expanding <math>(1-4x)^{1/2}</math> by Newton's formula, we have
:<math>
\begin{align}
G(x)
&=
\frac{1-(1-4x)^{1/2}}{2}\\
&=
1-\frac{1}{2}\sum_{n\ge 0}{1/2\choose n}(-4x)^n
\end{align}
</math>
Thus,
:<math>
\begin{align}
C_n
&=-\frac{1}{2}{1/2\choose n}(-4)^n\\
&=-\frac{1}{2}\cdot\frac{1}{2}\cdot\frac{-1}{2}\cdot\frac{-3}{2}\cdots\frac{-(2n-3)}{2}\cdot(-4)^n/n!\\
&=\frac{(2n-2)!}{(n-1)!n!}\\
&=\frac{1}{n}{2n-2\choose n-1}.
\end{align}
</math>
So we prove the following closed form for Catalan number.
{{Theorem|Theorem|
:<math>C_n=\frac{1}{n}{2n-2\choose n-1}</math>.
}}

Combinatorics (Fall 2010)/Generating functions

2010-09-11T05:55:16Z

210.28.131.82: /* Pólya's problem of changing money */

== Generating Functions ==
In Stanley's magnificent book ''Enumerative Combinatorics'', he comments the generating function as "the most useful but most difficult to understand method (for counting)".

The solution to a counting problem is usually represented as some <math>a_n</math> depending a parameter <math>n</math>. Sometimes this <math>a_n</math> is called a ''counting function'' as it is a function of the parameter <math>n</math>. <math>a_n</math> can also be treated as a infinite series:
:<math>a_0,a_1,a_2,\ldots</math>

The '''ordinary generating function (OGF)''' defined by <math>a_n</math> is
:<math>
G(x)=\sum_{n\ge 0} a_nx^n.
</math>

So <math>G(x)=a_0+a_1x+a_2x^2+\cdots</math>. An expression in this form is called a [http://en.wikipedia.org/wiki/Formal_power_series '''formal power series'''], and <math>a_0,a_1,a_2,\ldots</math> is the sequence of '''coefficients'''.

Furthermore, the generating function can be expanded as
:G(x)=<math>(\underbrace{1+\cdots+1}_{a_0})+(\underbrace{x+\cdots+x}_{a_1})+(\underbrace{x^2+\cdots+x^2}_{a_2})+\cdots+(\underbrace{x^n+\cdots+x^n}_{a_n})+\cdots</math>
so it indeed "generates" all the possible instances of the objects we want to count.

Usually, we do not evaluate the generating function <math>GF(x)</math> on any particular value. <math>x</math> remains as a '''formal variable''' without assuming any value. The numbers that we want to count are the coefficients carried by the terms in the formal power series. So far the generating function is just another way to represent the sequence
:<math>(a_0,a_1,a_2,\ldots\ldots)</math>.

The true power of generating functions comes from the various algebraic operations that we can perform on these generating functions. We use an example to demonstrate this.

=== Fibonacci numbers ===
Consider the following counting problems.
* Count the number of ways that the nonnegative integer <math>n</math> can be written as a sum of ones and twos (in order).
: The problem asks for the number of compositions of <math>n</math> with summands from <math>\{1,2\}</math>. Formally, we are counting the number of tuples <math>(x_1,x_2,\ldots,x_k)</math> for some <math>k\le n</math> such that <math>x_i\in\{1,2\}</math> and <math>\sum_{i=1}^k x_i=n</math>.
: Let <math>F_n</math> be the solution. We observe that a composition either starts with a 1, in which case the rest is a composition of <math>n-1</math>; or starts with a 2, in which case the rest is a composition of <math>n-2</math>. So we have the recursion for <math>F_n</math> that
::<math>F_n=F_{n-1}+F_{n-2}</math>.
* Count the ways to completely cover a <math>2\times n</math> rectangle with <math>2\times 1</math> dominos without any overlaps.
: Dominos are identical <math>2\times 1</math> rectangles, so that only their orientations --- vertical or horizontal matter.
: Let <math>F_n</math> be the solution. It also holds that <math>F_n=F_{n-1}+F_{n-2}</math>. The proof is left as an exercise.

In both problems, the solution is given by <math>F_n</math> which satisfies the following recursion.
:<math>F_n=\begin{cases}
0 & \mbox{if }n=0\\
1 & \mbox{if }n=1\\
F_{n-1}+F_{n-2} & \mbox{if}n\ge 2.
\end{cases}</math>

<math>F_n</math> is called the [http://en.wikipedia.org/wiki/Fibonacci_number Fibonacci number].

{{Theorem|Theorem|
::<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)</math>,
:where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.
}}
The quantity <math>\phi=\frac{1+\sqrt{5}}{2}</math> is the so-called [http://en.wikipedia.org/wiki/Golden_ratio golden ratio], a constant with some significance in mathematics and aesthetics.

We now prove this theorem by using generating functions.

The ordinary generating function for the Fibonacci number <math>F_{n}</math> is
:<math>G(x)=\sum_{n\ge 0}F_n x^n</math>.
We have that <math>F_{n}=F_{n-1}+F_{n-2}</math> for <math>n\ge 2</math>, thus
:<math>\begin{align}
G(x)
&=
\sum_{n\ge 0}F_n x^n
&=
x+\sum_{n\ge 2}(F_{n-1}+F_{n-2})x^n.
\end{align}
</math>
For generating functions, there are general ways to generate <math>F_{n-1}</math> and <math>F_{n-2}</math>, or the coefficients with any smaller indices.
:<math>
\begin{align}
xG(x)
&=\sum_{n\ge 0}F_n x^{n+1}=\sum_{n\ge 1}F_{n-1} x^n=\sum_{n\ge 2}F_{n-1} x^n\\
x^2G(x)
&=\sum_{n\ge 0}F_n x^{n+2}=\sum_{n\ge 2}F_{n-2} x^n.
\end{align}
</math>
So we have
:<math>G(x)=x+(x+x^2)G(x)\,</math>,
hence
:<math>G(x)=\frac{x}{1-x-x^2}</math>.
The value of <math>F_n</math> is the coefficient of <math>x^n</math> in the Taylor series for this formular, which is <math>\frac{G^{(n)}(0)}{n!}=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>. Although this expansion works in principle, the detailed calculus is rather painful.

----
There is an easier way to get this coefficient than directly expanding the Taylor series.

<math>1-x-x^2</math> has two roots <math>\frac{-1\pm\sqrt{5}}{2}</math>.

Denote that <math>\phi=\frac{2}{-1+\sqrt{5}}=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{2}{-1-\sqrt{5}}=\frac{1-\sqrt{5}}{2}</math>. Then <math>(1-x-x^2)=(1-\phi x)(1-\hat{\phi}x)</math>, so we can write
:<math>
\begin{align}
\frac{x}{1-x-x^2}
&=\frac{x}{(1-\phi x)(1-\hat{\phi} x)}\\
&=\frac{\alpha}{(1-\phi x)}+\frac{\beta}{(1-\hat{\phi} x)},
\end{align}
</math>
where <math>\alpha</math> and <math>\beta</math> satisfying that
:<math>\begin{cases}
\alpha+\beta=0\\
\alpha\phi+\beta\hat{\phi}= -1.
\end{cases}</math>
Solving this we have that <math>\alpha=\frac{1}{\sqrt{5}}</math> and <math>\beta=-\frac{1}{\sqrt{5}}</math>. And
:<math>G(x)=\frac{x}{1-x-x^2}=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}</math>
where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.

Note that the expression <math>\frac{1}{1-z}</math> has a well known expansion:
:<math>\frac{1}{1-z}=\sum_{n\ge 0}z^n</math>.

Therefore, <math>G(x)</math> can be expanded as
:<math>
\begin{align}
G(x)
&=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}\\
&=\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\phi x)^n-\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\hat{\phi} x)^n\\
&=\sum_{n\ge 0}\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)x^n.
\end{align}</math>
So the <math>n</math>th Fibonacci number is given by
:<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>.

== Solving recurrences ==
In the above analysis of Fibonacci numbers, we apply the following general methodology of solving recurrences by generating functions.
:1. Give a recursion that computes <math>a_n</math>; that is, an equation expressing <math>a_n</math> in terms of other elements of the sequence, such as
::<math>a_n=f(a_0,a_1,\ldots,a_{n-1})</math> for some function <math>f</math>.
:2. Multiply both sides of the equation by <math>x^n</math> and sum over all <math>n</math>. This gives the generating function
::<math>G(x)=\sum_{n\ge 0}a_nx^n=\sum_{n\ge 0}f(a_0,a_1,\ldots,a_{n-1})x^n</math>.
:: And manipulate the right hand side of the equation so that it becomes some other expression involving <math>G(x)</math>.
:3. Solve the resulting equation to derive an explicit formula for <math>G(x)</math>.
:4. Expand <math>G(x)</math> into a power series and read off the coefficient of <math>x^n</math>, which is a closed form for <math>a_n</math>.

=== Algebraic operations on generating functions ===
The second step in the above methodology is somehow tricky. It involves first applying the recurrence to the coefficients of <math>G(x)</math>, which is easy; and then manipulating the resulting formal power series to express it in terms of <math>G(x)</math>, which is more difficult (because it works backwards).

We can apply several natural algebraic operations on the formal power series.

{{Theorem|Generating function manipulation|
:Let <math>G(x)=\sum_{n\ge 0}g_nx^n</math> and <math>F(x)=\sum_{n\ge 0}f_nx^n</math>.
----

::<math>
\begin{align}
x^k G(x)
&= \sum_{n\ge k}g_{n-k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\frac{G(x)-\sum_{i=0}^{k-1}g_iz^i}{x^k}
&=\sum_{n\ge 0}g_{n+k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\alpha F(x)+\beta G(x)
&= \sum_{n\ge 0} (\alpha f_n+\beta g_n)x^n\\
F(x)G(x)
&= \sum_{n\ge 0}\sum_{k=0}^nf_kg_{n-k}x^n\\
G(cx)
&= \sum_{n\ge 0} c^ng_n x^n\\
G'(x)
&=
\sum_{n\ge 0}(n+1)g_{n+1}x^n
\end{align}
</math>
}}

When manipulating generating functions, these rules are applied backwards; that is, from the right-hand-side to the left-hand-side.

=== Expanding generating functions ===
The last step of solving recurrences by generating function is expanding the closed form generating function <math>G(x)</math> to evaluate its <math>n</math>-th coefficient. In principle, we can always use the [http://en.wikipedia.org/wiki/Taylor_series Taylor series]
:<math>G(x)=\sum_{n\ge 0}\frac{G^{(n)}(0)}{n!}x^n</math>,
where <math>G^{(n)}(0)</math> is the value of the <math>n</math>-th derivative of <math>G(x)</math> evaluated at <math>x=0</math>.

Some interesting special cases are very useful.

;Geometric sequence
In the example of Fibonacci numbers, we use the well known geometric series:
:<math>\frac{1}{1-x}=\sum_{n\ge 0}x^n</math>.
It is useful when we can express the generating function in the form of <math>G(x)=\frac{a_1}{1-b_1x}+\frac{a_2}{1-b_2x}+\cdots+\frac{a_k}{1-b_kx}</math>. The coefficient of <math>x^n</math> in such <math>G(x)</math> is <math>a_1b_1^n+a_2b_2^n+\cdots+a_kb_k^n</math>.
;Binomial theorem
The <math>n</math>-th derivative of <math>(1+x)^\alpha</math> for some real <math>\alpha</math> is
:<math>\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)(1+x)^{\alpha-n}</math>.
By Taylor series, we get a generalized version of the binomial theorem known as [http://en.wikipedia.org/wiki/Binomial_coefficient#Newton.27s_binomial_series '''Newton's formula''']:
{{Theorem|Newton's formular (generalized binomial theorem)|
If <math>|x|<1</math>, then
:<math>(1+x)^\alpha=\sum_{n\ge 0}{\alpha\choose n}x^{n}</math>,
where <math>{\alpha\choose n}</math> is the '''generalized binomial coefficient''' defined by
:<math>{\alpha\choose n}=\frac{\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)}{n!}</math>.
}}
In the last lecture we gave a combinatorial proof of the number of <math>k</math>-multisets on an <math>n</math>-set. Now we give a generating function approach to the problem.

Let <math>S=\{x_1,x_2,\ldots,x_n\}</math> be an <math>n</math>-element set. We have
:<math>(1+x_1+x_1^2+\cdots)(1+x_2+x_2^2+\cdots)\cdots(1+x_n+x_n^2+\cdots)=\sum_{m:S\rightarrow\mathbb{N}} \prod_{x_i\in S}x_i^{m(x_i)}</math>,
where each <math>m:S\rightarrow\mathbb{N}</math> species a possible multiset on <math>S</math> with multiplicity function <math>m</math>.

Let all <math>x_i=x</math>. Then
:<math>
\begin{align}
(1+x+x^2+\cdots)^n
&=
\sum_{m:S\rightarrow\mathbb{N}}x^{m(x_1)+\cdots+m(x_n)}\\
&=
\sum_{\text{multiset }M\text{ on }S}x^{|M|}\\
&=
\sum_{k\ge 0}\left({n\choose k}\right)x^k.
\end{align}
</math>
The last equation is due to the the definition of <math>\left({n\choose k}\right)</math>. Our task is to evaluate <math>\left({n\choose k}\right)</math>.

Due to the geometric sequence and the Newton's formula
:<math>
(1+x+x^2+\cdots)^n=(1-x)^{-n}=\sum_{k\ge 0}{-n\choose k}(-x)^k.
</math>
So
:<math>
\left({n\choose k}\right)=(-1)^k{-n\choose k}={n+k-1\choose k}.
</math>
The last equation is due to the definition of the generalized binomial coefficient. We use an analytic (generating function) proof to get the same result of <math>\left({n\choose k}\right)</math> as the combinatorial proof.

== Pólya's problem of changing money ==

== Catalan Number ==
We now introduce a class of counting problems, all with the same solution, called [http://en.wikipedia.org/wiki/Catalan_number '''Catalan number'''].

The <math>n</math>th Catalan number is denoted as <math>C_n</math>.
In Volume 2 of Stanley's ''Enumerative Combinatorics'', a set of exercises describe 66 different interpretations of the Catalan numbers. We give a few examples, cited by Wikipedia.
* ''C''<sub>''n''</sub> is the number of '''Dyck words''' of length 2''n''. A Dyck word is a string consisting of ''n'' X's and ''n'' Y's such that no initial segment of the string has more Y's than X's (see also [http://en.wikipedia.org/wiki/Dyck_language Dyck language]). For example, the following are the Dyck words of length 6:
<div class="center"><big> XXXYYY     XYXXYY     XYXYXY     XXYYXY     XXYXYY.</big></div>

* Re-interpreting the symbol X as an open parenthesis and Y as a close parenthesis, ''C''<sub>''n''</sub> counts the number of expressions containing ''n'' pairs of parentheses which are correctly matched:
<div class="center"><big> ((()))     ()(())     ()()()     (())()     (()()) </big></div>

* ''C''<sub>''n''</sub> is the number of different ways ''n'' + 1 factors can be completely parenthesized (or the number of ways of associating ''n'' applications of a '''binary operator'''). For ''n'' = 3, for example, we have the following five different parenthesizations of four factors:
<div class="center"><math>((ab)c)d \quad (a(bc))d \quad(ab)(cd) \quad a((bc)d) \quad a(b(cd))</math></div>

* Successive applications of a binary operator can be represented in terms of a '''full binary tree'''. (A rooted binary tree is ''full'' if every vertex has either two children or no children.) It follows that ''C''<sub>''n''</sub> is the number of full binary trees with ''n'' + 1 leaves:
[[Image:Catalan number binary tree example.png|center]]

* ''C''<sub>''n''</sub> is the number of '''monotonic paths''' along the edges of a grid with ''n'' × ''n'' square cells, which do not pass above the diagonal. A monotonic path is one which starts in the lower left corner, finishes in the upper right corner, and consists entirely of edges pointing rightwards or upwards. Counting such paths is equivalent to counting Dyck words: X stands for "move right" and Y stands for "move up". The following diagrams show the case ''n'' = 4:
[[Image:Catalan number 4x4 grid example.svg.png|450px|center]]

* ''C''<sub>''n''</sub> is the number of different ways a [http://en.wikipedia.org/wiki/Convex_polygon '''convex polygon'''] with ''n'' + 2 sides can be cut into '''triangles''' by connecting vertices with straight lines. The following hexagons illustrate the case ''n'' = 4:
[[Image:Catalan-Hexagons-example.png|400px|center]]

* ''C''<sub>''n''</sub> is the number of [http://en.wikipedia.org/wiki/Stack_(data_structure) '''stack''']-sortable permutations of {1, ..., ''n''}. A permutation ''w'' is called '''stack-sortable''' if ''S''(''w'') = (1, ..., ''n''), where ''S''(''w'') is defined recursively as follows: write ''w'' = ''unv'' where ''n'' is the largest element in ''w'' and ''u'' and ''v'' are shorter sequences, and set ''S''(''w'') = ''S''(''u'')''S''(''v'')''n'', with ''S'' being the identity for one-element sequences.

* ''C''<sub>''n''</sub> is the number of ways to tile a stairstep shape of height ''n'' with ''n'' rectangles. The following figure illustrates the case ''n'' = 4:
[[Image:Catalan stairsteps 4.png|400px|center]]

{{Theorem|Recurrence relation for Catalan numbers|
:<math>C_0=0</math>, <math>C_1=1</math>, and for <math>n>1</math>,
::<math>
C_n=\sum_{i=1}^{n-1}C_iC_{n-i}.
</math>
}}

Let <math>G(x)=\sum_{n\ge 0}C_nx^n</math> be the generating function. Apply the product rule,
:<math>G(x)^2=\sum_{n\ge 0}\sum_{k=0}^{n}C_kC_{n-k}x^n=\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n</math>.
Due to the recurrence,
:<math>G(x)=\sum_{n\ge 0}C_nx^n=x+\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n=x+G(x)^2</math>.
Solving this, we obtain
:<math>G(x)=\frac{1\pm(1-4x)^{1/2}}{2}</math>.
Because <math>C_0=0</math>, it must hold that <math>G(x)=\frac{1-(1-4x)^{1/2}}{2}</math>, or otherwise the constant term is not zero. Expanding <math>(1-4x)^{1/2}</math> by Newton's formula, we have
:<math>
\begin{align}
G(x)
&=
\frac{1-(1-4x)^{1/2}}{2}\\
&=
1-\frac{1}{2}\sum_{n\ge 0}{1/2\choose n}(-4x)^n
\end{align}
</math>
Thus,
:<math>
\begin{align}
C_n
&=-\frac{1}{2}{1/2\choose n}(-4)^n\\
&=-\frac{1}{2}\cdot\frac{1}{2}\cdot\frac{-1}{2}\cdot\frac{-3}{2}\cdots\frac{-(2n-3)}{2}\cdot(-4)^n/n!\\
&=\frac{(2n-2)!}{(n-1)!n!}\\
&=\frac{1}{n}{2n-2\choose n-1}.
\end{align}
</math>
So we prove the following closed form for Catalan number.
{{Theorem|Theorem|
:<math>C_n=\frac{1}{n}{2n-2\choose n-1}</math>.
}}

Combinatorics (Fall 2010)/Generating functions

2010-09-11T05:55:03Z

210.28.131.82: /* Solving recurrences */

== Generating Functions ==
In Stanley's magnificent book ''Enumerative Combinatorics'', he comments the generating function as "the most useful but most difficult to understand method (for counting)".

The solution to a counting problem is usually represented as some <math>a_n</math> depending a parameter <math>n</math>. Sometimes this <math>a_n</math> is called a ''counting function'' as it is a function of the parameter <math>n</math>. <math>a_n</math> can also be treated as a infinite series:
:<math>a_0,a_1,a_2,\ldots</math>

The '''ordinary generating function (OGF)''' defined by <math>a_n</math> is
:<math>
G(x)=\sum_{n\ge 0} a_nx^n.
</math>

So <math>G(x)=a_0+a_1x+a_2x^2+\cdots</math>. An expression in this form is called a [http://en.wikipedia.org/wiki/Formal_power_series '''formal power series'''], and <math>a_0,a_1,a_2,\ldots</math> is the sequence of '''coefficients'''.

Furthermore, the generating function can be expanded as
:G(x)=<math>(\underbrace{1+\cdots+1}_{a_0})+(\underbrace{x+\cdots+x}_{a_1})+(\underbrace{x^2+\cdots+x^2}_{a_2})+\cdots+(\underbrace{x^n+\cdots+x^n}_{a_n})+\cdots</math>
so it indeed "generates" all the possible instances of the objects we want to count.

Usually, we do not evaluate the generating function <math>GF(x)</math> on any particular value. <math>x</math> remains as a '''formal variable''' without assuming any value. The numbers that we want to count are the coefficients carried by the terms in the formal power series. So far the generating function is just another way to represent the sequence
:<math>(a_0,a_1,a_2,\ldots\ldots)</math>.

The true power of generating functions comes from the various algebraic operations that we can perform on these generating functions. We use an example to demonstrate this.

=== Fibonacci numbers ===
Consider the following counting problems.
* Count the number of ways that the nonnegative integer <math>n</math> can be written as a sum of ones and twos (in order).
: The problem asks for the number of compositions of <math>n</math> with summands from <math>\{1,2\}</math>. Formally, we are counting the number of tuples <math>(x_1,x_2,\ldots,x_k)</math> for some <math>k\le n</math> such that <math>x_i\in\{1,2\}</math> and <math>\sum_{i=1}^k x_i=n</math>.
: Let <math>F_n</math> be the solution. We observe that a composition either starts with a 1, in which case the rest is a composition of <math>n-1</math>; or starts with a 2, in which case the rest is a composition of <math>n-2</math>. So we have the recursion for <math>F_n</math> that
::<math>F_n=F_{n-1}+F_{n-2}</math>.
* Count the ways to completely cover a <math>2\times n</math> rectangle with <math>2\times 1</math> dominos without any overlaps.
: Dominos are identical <math>2\times 1</math> rectangles, so that only their orientations --- vertical or horizontal matter.
: Let <math>F_n</math> be the solution. It also holds that <math>F_n=F_{n-1}+F_{n-2}</math>. The proof is left as an exercise.

In both problems, the solution is given by <math>F_n</math> which satisfies the following recursion.
:<math>F_n=\begin{cases}
0 & \mbox{if }n=0\\
1 & \mbox{if }n=1\\
F_{n-1}+F_{n-2} & \mbox{if}n\ge 2.
\end{cases}</math>

<math>F_n</math> is called the [http://en.wikipedia.org/wiki/Fibonacci_number Fibonacci number].

{{Theorem|Theorem|
::<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)</math>,
:where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.
}}
The quantity <math>\phi=\frac{1+\sqrt{5}}{2}</math> is the so-called [http://en.wikipedia.org/wiki/Golden_ratio golden ratio], a constant with some significance in mathematics and aesthetics.

We now prove this theorem by using generating functions.

The ordinary generating function for the Fibonacci number <math>F_{n}</math> is
:<math>G(x)=\sum_{n\ge 0}F_n x^n</math>.
We have that <math>F_{n}=F_{n-1}+F_{n-2}</math> for <math>n\ge 2</math>, thus
:<math>\begin{align}
G(x)
&=
\sum_{n\ge 0}F_n x^n
&=
x+\sum_{n\ge 2}(F_{n-1}+F_{n-2})x^n.
\end{align}
</math>
For generating functions, there are general ways to generate <math>F_{n-1}</math> and <math>F_{n-2}</math>, or the coefficients with any smaller indices.
:<math>
\begin{align}
xG(x)
&=\sum_{n\ge 0}F_n x^{n+1}=\sum_{n\ge 1}F_{n-1} x^n=\sum_{n\ge 2}F_{n-1} x^n\\
x^2G(x)
&=\sum_{n\ge 0}F_n x^{n+2}=\sum_{n\ge 2}F_{n-2} x^n.
\end{align}
</math>
So we have
:<math>G(x)=x+(x+x^2)G(x)\,</math>,
hence
:<math>G(x)=\frac{x}{1-x-x^2}</math>.
The value of <math>F_n</math> is the coefficient of <math>x^n</math> in the Taylor series for this formular, which is <math>\frac{G^{(n)}(0)}{n!}=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>. Although this expansion works in principle, the detailed calculus is rather painful.

----
There is an easier way to get this coefficient than directly expanding the Taylor series.

<math>1-x-x^2</math> has two roots <math>\frac{-1\pm\sqrt{5}}{2}</math>.

Denote that <math>\phi=\frac{2}{-1+\sqrt{5}}=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{2}{-1-\sqrt{5}}=\frac{1-\sqrt{5}}{2}</math>. Then <math>(1-x-x^2)=(1-\phi x)(1-\hat{\phi}x)</math>, so we can write
:<math>
\begin{align}
\frac{x}{1-x-x^2}
&=\frac{x}{(1-\phi x)(1-\hat{\phi} x)}\\
&=\frac{\alpha}{(1-\phi x)}+\frac{\beta}{(1-\hat{\phi} x)},
\end{align}
</math>
where <math>\alpha</math> and <math>\beta</math> satisfying that
:<math>\begin{cases}
\alpha+\beta=0\\
\alpha\phi+\beta\hat{\phi}= -1.
\end{cases}</math>
Solving this we have that <math>\alpha=\frac{1}{\sqrt{5}}</math> and <math>\beta=-\frac{1}{\sqrt{5}}</math>. And
:<math>G(x)=\frac{x}{1-x-x^2}=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}</math>
where <math>\phi=\frac{1+\sqrt{5}}{2}</math> and <math>\hat{\phi}=\frac{1-\sqrt{5}}{2}</math>.

Note that the expression <math>\frac{1}{1-z}</math> has a well known expansion:
:<math>\frac{1}{1-z}=\sum_{n\ge 0}z^n</math>.

Therefore, <math>G(x)</math> can be expanded as
:<math>
\begin{align}
G(x)
&=\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\phi x}-\frac{1}{\sqrt{5}}\cdot\frac{1}{1-\hat{\phi} x}\\
&=\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\phi x)^n-\frac{1}{\sqrt{5}}\sum_{n\ge 0}(\hat{\phi} x)^n\\
&=\sum_{n\ge 0}\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)x^n.
\end{align}</math>
So the <math>n</math>th Fibonacci number is given by
:<math>F_n=\frac{1}{\sqrt{5}}\left(\phi^n-\hat{\phi}^n\right)=\frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^n-\frac{1}{\sqrt{5}}\left(\frac{1-\sqrt{5}}{2}\right)^n</math>.

== Solving recurrences ==
In the above analysis of Fibonacci numbers, we apply the following general methodology of solving recurrences by generating functions.
:1. Give a recursion that computes <math>a_n</math>; that is, an equation expressing <math>a_n</math> in terms of other elements of the sequence, such as
::<math>a_n=f(a_0,a_1,\ldots,a_{n-1})</math> for some function <math>f</math>.
:2. Multiply both sides of the equation by <math>x^n</math> and sum over all <math>n</math>. This gives the generating function
::<math>G(x)=\sum_{n\ge 0}a_nx^n=\sum_{n\ge 0}f(a_0,a_1,\ldots,a_{n-1})x^n</math>.
:: And manipulate the right hand side of the equation so that it becomes some other expression involving <math>G(x)</math>.
:3. Solve the resulting equation to derive an explicit formula for <math>G(x)</math>.
:4. Expand <math>G(x)</math> into a power series and read off the coefficient of <math>x^n</math>, which is a closed form for <math>a_n</math>.

=== Algebraic operations on generating functions ===
The second step in the above methodology is somehow tricky. It involves first applying the recurrence to the coefficients of <math>G(x)</math>, which is easy; and then manipulating the resulting formal power series to express it in terms of <math>G(x)</math>, which is more difficult (because it works backwards).

We can apply several natural algebraic operations on the formal power series.

{{Theorem|Generating function manipulation|
:Let <math>G(x)=\sum_{n\ge 0}g_nx^n</math> and <math>F(x)=\sum_{n\ge 0}f_nx^n</math>.
----

::<math>
\begin{align}
x^k G(x)
&= \sum_{n\ge k}g_{n-k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\frac{G(x)-\sum_{i=0}^{k-1}g_iz^i}{x^k}
&=\sum_{n\ge 0}g_{n+k}x^n, &\qquad (\mbox{integer }k\ge 0)\\
\alpha F(x)+\beta G(x)
&= \sum_{n\ge 0} (\alpha f_n+\beta g_n)x^n\\
F(x)G(x)
&= \sum_{n\ge 0}\sum_{k=0}^nf_kg_{n-k}x^n\\
G(cx)
&= \sum_{n\ge 0} c^ng_n x^n\\
G'(x)
&=
\sum_{n\ge 0}(n+1)g_{n+1}x^n
\end{align}
</math>
}}

When manipulating generating functions, these rules are applied backwards; that is, from the right-hand-side to the left-hand-side.

=== Expanding generating functions ===
The last step of solving recurrences by generating function is expanding the closed form generating function <math>G(x)</math> to evaluate its <math>n</math>-th coefficient. In principle, we can always use the [http://en.wikipedia.org/wiki/Taylor_series Taylor series]
:<math>G(x)=\sum_{n\ge 0}\frac{G^{(n)}(0)}{n!}x^n</math>,
where <math>G^{(n)}(0)</math> is the value of the <math>n</math>-th derivative of <math>G(x)</math> evaluated at <math>x=0</math>.

Some interesting special cases are very useful.

;Geometric sequence
In the example of Fibonacci numbers, we use the well known geometric series:
:<math>\frac{1}{1-x}=\sum_{n\ge 0}x^n</math>.
It is useful when we can express the generating function in the form of <math>G(x)=\frac{a_1}{1-b_1x}+\frac{a_2}{1-b_2x}+\cdots+\frac{a_k}{1-b_kx}</math>. The coefficient of <math>x^n</math> in such <math>G(x)</math> is <math>a_1b_1^n+a_2b_2^n+\cdots+a_kb_k^n</math>.
;Binomial theorem
The <math>n</math>-th derivative of <math>(1+x)^\alpha</math> for some real <math>\alpha</math> is
:<math>\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)(1+x)^{\alpha-n}</math>.
By Taylor series, we get a generalized version of the binomial theorem known as [http://en.wikipedia.org/wiki/Binomial_coefficient#Newton.27s_binomial_series '''Newton's formula''']:
{{Theorem|Newton's formular (generalized binomial theorem)|
If <math>|x|<1</math>, then
:<math>(1+x)^\alpha=\sum_{n\ge 0}{\alpha\choose n}x^{n}</math>,
where <math>{\alpha\choose n}</math> is the '''generalized binomial coefficient''' defined by
:<math>{\alpha\choose n}=\frac{\alpha(\alpha-1)(\alpha-2)\cdots(\alpha-n+1)}{n!}</math>.
}}
In the last lecture we gave a combinatorial proof of the number of <math>k</math>-multisets on an <math>n</math>-set. Now we give a generating function approach to the problem.

Let <math>S=\{x_1,x_2,\ldots,x_n\}</math> be an <math>n</math>-element set. We have
:<math>(1+x_1+x_1^2+\cdots)(1+x_2+x_2^2+\cdots)\cdots(1+x_n+x_n^2+\cdots)=\sum_{m:S\rightarrow\mathbb{N}} \prod_{x_i\in S}x_i^{m(x_i)}</math>,
where each <math>m:S\rightarrow\mathbb{N}</math> species a possible multiset on <math>S</math> with multiplicity function <math>m</math>.

Let all <math>x_i=x</math>. Then
:<math>
\begin{align}
(1+x+x^2+\cdots)^n
&=
\sum_{m:S\rightarrow\mathbb{N}}x^{m(x_1)+\cdots+m(x_n)}\\
&=
\sum_{\text{multiset }M\text{ on }S}x^{|M|}\\
&=
\sum_{k\ge 0}\left({n\choose k}\right)x^k.
\end{align}
</math>
The last equation is due to the the definition of <math>\left({n\choose k}\right)</math>. Our task is to evaluate <math>\left({n\choose k}\right)</math>.

Due to the geometric sequence and the Newton's formula
:<math>
(1+x+x^2+\cdots)^n=(1-x)^{-n}=\sum_{k\ge 0}{-n\choose k}(-x)^k.
</math>
So
:<math>
\left({n\choose k}\right)=(-1)^k{-n\choose k}={n+k-1\choose k}.
</math>
The last equation is due to the definition of the generalized binomial coefficient. We use an analytic (generating function) proof to get the same result of <math>\left({n\choose k}\right)</math> as the combinatorial proof.

=== Pólya's problem of changing money ===

== Catalan Number ==
We now introduce a class of counting problems, all with the same solution, called [http://en.wikipedia.org/wiki/Catalan_number '''Catalan number'''].

The <math>n</math>th Catalan number is denoted as <math>C_n</math>.
In Volume 2 of Stanley's ''Enumerative Combinatorics'', a set of exercises describe 66 different interpretations of the Catalan numbers. We give a few examples, cited by Wikipedia.
* ''C''<sub>''n''</sub> is the number of '''Dyck words''' of length 2''n''. A Dyck word is a string consisting of ''n'' X's and ''n'' Y's such that no initial segment of the string has more Y's than X's (see also [http://en.wikipedia.org/wiki/Dyck_language Dyck language]). For example, the following are the Dyck words of length 6:
<div class="center"><big> XXXYYY     XYXXYY     XYXYXY     XXYYXY     XXYXYY.</big></div>

* Re-interpreting the symbol X as an open parenthesis and Y as a close parenthesis, ''C''<sub>''n''</sub> counts the number of expressions containing ''n'' pairs of parentheses which are correctly matched:
<div class="center"><big> ((()))     ()(())     ()()()     (())()     (()()) </big></div>

* ''C''<sub>''n''</sub> is the number of different ways ''n'' + 1 factors can be completely parenthesized (or the number of ways of associating ''n'' applications of a '''binary operator'''). For ''n'' = 3, for example, we have the following five different parenthesizations of four factors:
<div class="center"><math>((ab)c)d \quad (a(bc))d \quad(ab)(cd) \quad a((bc)d) \quad a(b(cd))</math></div>

* Successive applications of a binary operator can be represented in terms of a '''full binary tree'''. (A rooted binary tree is ''full'' if every vertex has either two children or no children.) It follows that ''C''<sub>''n''</sub> is the number of full binary trees with ''n'' + 1 leaves:
[[Image:Catalan number binary tree example.png|center]]

* ''C''<sub>''n''</sub> is the number of '''monotonic paths''' along the edges of a grid with ''n'' × ''n'' square cells, which do not pass above the diagonal. A monotonic path is one which starts in the lower left corner, finishes in the upper right corner, and consists entirely of edges pointing rightwards or upwards. Counting such paths is equivalent to counting Dyck words: X stands for "move right" and Y stands for "move up". The following diagrams show the case ''n'' = 4:
[[Image:Catalan number 4x4 grid example.svg.png|450px|center]]

* ''C''<sub>''n''</sub> is the number of different ways a [http://en.wikipedia.org/wiki/Convex_polygon '''convex polygon'''] with ''n'' + 2 sides can be cut into '''triangles''' by connecting vertices with straight lines. The following hexagons illustrate the case ''n'' = 4:
[[Image:Catalan-Hexagons-example.png|400px|center]]

* ''C''<sub>''n''</sub> is the number of [http://en.wikipedia.org/wiki/Stack_(data_structure) '''stack''']-sortable permutations of {1, ..., ''n''}. A permutation ''w'' is called '''stack-sortable''' if ''S''(''w'') = (1, ..., ''n''), where ''S''(''w'') is defined recursively as follows: write ''w'' = ''unv'' where ''n'' is the largest element in ''w'' and ''u'' and ''v'' are shorter sequences, and set ''S''(''w'') = ''S''(''u'')''S''(''v'')''n'', with ''S'' being the identity for one-element sequences.

* ''C''<sub>''n''</sub> is the number of ways to tile a stairstep shape of height ''n'' with ''n'' rectangles. The following figure illustrates the case ''n'' = 4:
[[Image:Catalan stairsteps 4.png|400px|center]]

{{Theorem|Recurrence relation for Catalan numbers|
:<math>C_0=0</math>, <math>C_1=1</math>, and for <math>n>1</math>,
::<math>
C_n=\sum_{i=1}^{n-1}C_iC_{n-i}.
</math>
}}

Let <math>G(x)=\sum_{n\ge 0}C_nx^n</math> be the generating function. Apply the product rule,
:<math>G(x)^2=\sum_{n\ge 0}\sum_{k=0}^{n}C_kC_{n-k}x^n=\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n</math>.
Due to the recurrence,
:<math>G(x)=\sum_{n\ge 0}C_nx^n=x+\sum_{n\ge 2}\sum_{k=1}^{n-1}C_kC_{n-k}x^n=x+G(x)^2</math>.
Solving this, we obtain
:<math>G(x)=\frac{1\pm(1-4x)^{1/2}}{2}</math>.
Because <math>C_0=0</math>, it must hold that <math>G(x)=\frac{1-(1-4x)^{1/2}}{2}</math>, or otherwise the constant term is not zero. Expanding <math>(1-4x)^{1/2}</math> by Newton's formula, we have
:<math>
\begin{align}
G(x)
&=
\frac{1-(1-4x)^{1/2}}{2}\\
&=
1-\frac{1}{2}\sum_{n\ge 0}{1/2\choose n}(-4x)^n
\end{align}
</math>
Thus,
:<math>
\begin{align}
C_n
&=-\frac{1}{2}{1/2\choose n}(-4)^n\\
&=-\frac{1}{2}\cdot\frac{1}{2}\cdot\frac{-1}{2}\cdot\frac{-3}{2}\cdots\frac{-(2n-3)}{2}\cdot(-4)^n/n!\\
&=\frac{(2n-2)!}{(n-1)!n!}\\
&=\frac{1}{n}{2n-2\choose n-1}.
\end{align}
</math>
So we prove the following closed form for Catalan number.
{{Theorem|Theorem|
:<math>C_n=\frac{1}{n}{2n-2\choose n-1}</math>.
}}

Combinatorics (Fall 2010)/Partitions, sieve methods

2010-09-10T01:11:38Z

210.28.131.82: /* Counting p_k(n) */

== Partitions ==
We count the ways of partitioning <math>n</math> ''identical'' objects into <math>k</math> ''unordered'' groups. This is the problem of counting the ways partitioning a number <math>n</math> into <math>k</math> unordered parts.

A '''<math>k</math>-partition''' of a number <math>n</math> is a multiset <math>\{x_1,x_2,\ldots,x_k\}</math> with <math>x_i\ge 1</math> for every element <math>x_i</math> and <math>x_1+x_2+\cdots+x_k=n</math>.

We define <math>p_k(n)</math> as the number of <math>k</math>-partitions of <math>n</math>.

For example, number 7 has the following partitions:
<div class="center"><math>
\begin{align}
&\{7\}
& p_1(7)=1\\
&\{1,6\},\{2,5\},\{3,4\}
& p_2(7)=3\\
&\{1,1,5\}, \{1,2,4\}, \{1,3,3\}, \{2,2,3\}
& p_3(7)=4\\
&\{1,1,1,4\},\{1,1,2,3\}, \{1,2,2,2\}
& p_4(7)=3\\
&\{1,1,1,1,3\},\{1,1,1,2,2\}
& p_5(7)=2\\
&\{1,1,1,1,1,2\}
& p_6(7)=1\\
&\{1,1,1,1,1,1,1\}
& p_7(7)=1
\end{align}
</math></div>

Equivalently, we can also define that A <math>k</math>-partition of a number <math>n</math> is a <math>k</math>-tuple <math>(x_1,x_2,\ldots,x_k)</math> with:
* <math>x_1\ge x_2\ge\cdots\ge x_k\ge 1</math>;
* <math>x_1+x_2+\cdots+x_k=n</math>.

<math>p_k(n)</math> the number of integral solutions to the above system.

Let <math>p(n)=\sum_{k=1}^n p_k(n)</math> be the total number of partitions of <math>n</math>. The function <math>p(n)</math> is called the '''partition number'''.

=== Counting <math>p_k(n)</math>===

{{Theorem|Proposition|
:<math>p_k(n)=p_{k-1}(n-1)+p_k(n-k)\,</math>.
}}
{{Proof|
Suppose that <math>(x_1,\ldots,x_k)</math> is a <math>k</math>-partition of <math>n</math>. Note that it must hold that
:<math>x_1\ge x_2\ge \cdots \ge x_k\ge 1</math>.
There are two cases: <math>x_k=1</math> or <math>x_k>1</math>.
;Case 1.
:If <math>x_k=1</math>, then <math>(x_1,\cdots,x_{k-1})</math> is a distinct <math>(k-1)</math>-partition of <math>n-1</math>. And every <math>(k-1)</math>-partition of <math>n-1</math> can be obtained in this way. Thus the number of <math>k</math>-partitions of <math>n</math> in this case is <math>p_{k-1}(n-1)</math>.
;Case 2.
:If <math>x_k>1</math>, then <math>(x_1-1,\cdots,x_{k}-1)</math> is a distinct <math>k</math>-partition of <math>n-k</math>. And every <math>k</math>-partition of <math>n-k</math> can be obtained in this way. Thus the number of <math>k</math>-partitions of <math>n</math> in this case is <math>p_{k}(n-k)</math>.
In conclusion, the number of <math>k</math>-partitions of <math>n</math> is <math>p_{k-1}(n-1)+p_k(n-k)</math>, i.e.
:<math>p_k(n)=p_{k-1}(n-1)+p_k(n-k)\,</math>.
}}

{{Theorem|Theorem|
For any fixed <math>k</math>,
:<math>p_k(n)\sim\frac{n^{k-1}}{k!(k-1)!}</math>,
as <math>n\rightarrow \infty</math>.
}}
{{Proof|
Suppose that <math>(x_1,\ldots,x_k)</math> is a <math>k</math>-partition of <math>n</math>. Then <math>x_1+x_2+\cdots+x_k=n</math> and <math>x_1\ge x_2\ge \cdots \ge x_k\ge 1</math>.

The <math>k!</math> permutations of <math>(x_1,\ldots,x_k)</math> yield at most <math>k!</math> many <math>k</math>-compositions (the ''ordered'' sum of <math>k</math> positive integers). There are <math>{n-1\choose k-1}</math> many <math>k</math>-compositions of <math>n</math>.
:<math>k!p_k(n)\ge{n-1\choose k-1}</math>.

Let <math>y_i=x_i+k-i</math>. That is, <math>y_k=x_k, y_{k-1}=x_{k-1}+1, y_{k-2}=x_{k-2}+2,\ldots, y_{1}=x_1+k-1</math>. Then, it holds that
* <math>y_1>y_2>\cdots>y_k\ge 1</math>; and
* <math>y_1+y_2+\cdots+y_k=n+\frac{k(k-1)}{2}</math>.
Each permutation of <math>(y_1,y_2,\ldots,y_k)</math> yields a '''distinct''' <math>k</math>-composition of <math>n+\frac{k(k-1)}{2}</math>, because all <math>y_i</math> are distinct.
Thus,
:<math>k!p_k(n)\le {n+\frac{k(k-1)}{2}-1\choose k-1}</math>.

Combining the two inequalities, we have
:<math>\frac{{n-1\choose k-1}}{k!}\le p_k(n)\le \frac{{n+\frac{k(k-1)}{2}-1\choose k-1}}{k!}</math>.
The theorem follows.
}}

=== Ferrers diagram ===
:{|border="0"
|[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]
|-
|[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]
|-
|[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]
|-
|[[File:Chess xot45.svg|22px]]
|}
:{|border="2" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|}

<div class="center">
{|border="0"
|align=center|
{|border="2" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|}
|
[[File:Chess t45.svg|120px]]
|align=center|
{|border="2" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|}
|-
|align=center|<math>(6,4,4,2,1)</math>||
|align=center|conjugate: <math>(5,4,3,3,1,1)</math>
|}
</div>

{{Theorem|Proposition|
# The number of partitions of <math>n</math> which have largest summand <math>k</math>, is <math>p_k(n)</math>.
# The number of <math>n</math> into <math>k</math> parts equals the number of partitions of <math>n-k</math> into at most <math>k</math> parts. Formally,
::<math>p_k(n)=\sum_{j=1}^k p_j(n-k)</math>.
}}
{{Proof|
# For every <math>k</math>-partition, the conjugate partition has largest part <math>k</math>. And vice versa.
#
}}

We now consider the case of partitions of <math>n</math> into '''distinct''' parts: the <math>(x_1,x_2,\ldots,x_k)</math> has that <math>x_1>x_2>\cdots>x_k\ge 1</math> and <math>x_1+\cdots+x_k=n</math>.

Define <math>q_e(n)</math> to be the number of partitions of <math>n</math> into an ''even'' number of ''distinct'' parts, and <math>q_o(n)</math> to be the number of partitions of <math>n</math> into an ''odd'' number of ''distinct'' parts.

{{Theorem|Theorem (Euler's Pentagonal Number Theorem)|
:<math>q_e(n)-q_o(n)=\begin{cases}
(-1)^k & \mbox{if }n=\frac{k(3k\pm 1)}{2},\\
0 & \mbox{otherwise.}
\end{cases}</math>
}}

== Principle of Inclusion-Exclusion ==
Let <math>A</math> and <math>B</math> be two finite sets. The cardinality of their union is
:<math>|A\cup B|=|A|+|B|-{\color{Blue}|A\cap B|}</math>.
For three sets <math>A</math>, <math>B</math>, and <math>C</math>, the cardinality of the union of these three sets is computed as
:<math>|A\cup B\cup C|=|A|+|B|+|C|-{\color{Blue}|A\cap B|}-{\color{Blue}|A\cap C|}-{\color{Blue}|B\cap C|}+{\color{Red}|A\cap B\cap C|}</math>.
This is illustrated by the following figure.
::[[Image:Inclusion-exclusion.png|200px|border|center]]

Generally, the '''Principle of Inclusion-Exclusion''' states the rule for computing the union of <math>n</math> finite sets <math>A_1,A_2,\ldots,A_n</math>, such that
{{Equation|
<math>
\begin{align}
\left|\bigcup_{i=1}^nA_i\right|
&=
\sum_{I\subseteq\{1,\ldots,n\}}(-1)^{|I|-1}\left|\bigcap_{i\in I}A_i\right|.
\end{align}
</math>
}}
-----

In combinatorial enumeration, the Principle of Inclusion-Exclusion is usually applied in its complement form.

Let <math>A_1,A_2,\ldots,A_n\subseteq U</math> be subsets of some finite set <math>U</math>. Here <math>U</math> is some universe of combinatorial objects, whose cardinality is easy to calculate (e.g. all strings, tuples, permutations), and each <math>A_i</math> contains the objects with some specific property (e.g. a "pattern") which we want to avoid. The problem is to count the number of objects without any of the <math>n</math> properties. We write <math>\bar{A_i}=U-A</math>. The number of objects without any of the properties <math>A_1,A_2,\ldots,A_n</math> is
{{Equation|
<math>
\begin{align}
\left|\bar{A_1}\cap\bar{A_2}\cap\cdots\cap\bar{A_n}\right|=\left|U-\bigcup_{i=1}^nA_i\right|
&=
|U|-\sum_{I\subseteq\{1,\ldots,n\}}(-1)^{|I|}\left|\bigcap_{i\in I}A_i\right|.
\end{align}
</math>
}}
For an <math>I\subseteq\{1,2,\ldots,n\}</math>, we denote
:<math>A_I=\bigcap_{i\in I}A_i</math>
with the convention that <math>A_\emptyset=U</math>. The above equation is stated as:
{{Theorem|Principle of Inclusion-Exclusion|
:Let <math>A_1,A_2,\ldots,A_n</math> be a family of subsets of <math>U</math>. Then the number of elements of <math>U</math> which lie in none of the subsets <math>A_i</math> is
::<math>\sum_{I\subseteq\{1,\ldots, n\}}(-1)^{|I|}|A_I|</math>.
}}

Let <math>S_k=\sum_{|I|=k}|A_I|\,</math>. Conventionally, <math>S_0=|A_\emptyset|=|U|</math>. The principle of inclusion-exclusion can be expressed as
{{Equation|<math>
S_0-S_1+S_2+\cdots+(-1)^nS_n.
</math>
}}

=== Surjections ===
In the twelvefold way, we discuss the counting problems incurred by the mappings <math>f:N\rightarrow M</math>. The basic case is that elements from both <math>N</math> and <math>M</math> are distinguishable. In this case, it is easy to count the number of arbitrary mappings (which is <math>m^n</math>) and the number of injective (one-to-one) mappings (which is <math>(m)_n</math>), but the number of surjective is difficult. Here we apply the principle of inclusion-exclusion to count the number of surjective (onto) mappings.
{{Theorem|Theorem|
:The number of surjective mappings from an <math>n</math>-set to an <math>m</math>-set is given by
::<math>\sum_{k=1}^m(-1)^{m-k}{m\choose k}k^n</math>.
}}
{{Proof|
Let <math>U=\{f:[n]\rightarrow[m]\}</math> be the set of mappings from <math>[n]</math> to <math>[m]</math>. Then <math>|U|=m^n</math>.

For <math>i\in[m]</math>, let <math>A_i</math> be the set of mappings <math>f:[n]\rightarrow[m]</math> that none of <math>j\in[n]</math> is mapped to <math>i</math>, i.e. <math>A_i=\{f:[n]\rightarrow[m]\setminus\{i\}\}</math>, thus <math>|A_i|=(m-1)^n</math>.

More generally, for <math>I\subseteq [m]</math>, <math>A_I=\bigcap_{i\in I}A_i</math> contains the mappings <math>f:[n]\rightarrow[m]\setminus I</math>. And <math>|A_I|=(m-|I|)^n\,</math>.

A mapping <math>f:[n]\rightarrow[m]</math> is surjective if <math>f</math> lies in none of <math>A_i</math>. By the principle of inclusion-exclusion, the number of surjective <math>f:[n]\rightarrow[m]</math> is
:<math>\sum_{I\subseteq[m]}(-1)^{|I|}\left|A_I\right|=\sum_{I\subseteq[m]}(-1)^{|I|}(m-|I|)^n=\sum_{j=0}^m(-1)^j{m\choose j}(m-j)^n</math>.
Let <math>k=m-j</math>. The theorem is proved.
}}

Recall that, in the twelvefold way, we establish a relation between surjections and partitions.

* Surjection to ordered partition:
:For a surjective <math>f:[n]\rightarrow[m]</math>, <math>(f^{-1}(0),f^{-1}(1),\ldots,f^{-1}(m-1))</math> is an '''ordered partition''' of <math>[n]</math>.
* Ordered partition to surjection:
:For an ordered <math>m</math>-partition <math>(B_0,B_1,\ldots, B_{m-1})</math> of <math>[n]</math>, we can define a function <math>f:[n]\rightarrow[m]</math> by letting <math>f(i)=j</math> if and only if <math>i\in B_j</math>. <math>f</math> is surjective since as a partition, none of <math>B_i</math> is empty.

Therefore, we have a one-to-one correspondence between surjective mappings from an <math>n</math>-set to an <math>m</math>-set and the ordered <math>m</math>-partitions of an <math>n</math>-set.

The Stirling number of the second kind <math>S(n,m)</math> is the number of <math>m</math>-partitions of an <math>n</math>-set. There are <math>m!</math> ways to order an <math>m</math>-partition, thus the number of surjective mappings <math>f:[n]\rightarrow[m]</math> is <math>m! S(n,m)</math>. Combining with what we have proved for surjections, we give the following result for the Stirling number of the second kind.

{{Theorem|Proposition|
:<math>S(n,m)=\frac{1}{m!}\sum_{k=1}^m(-1)^{m-k}{m\choose k}k^n</math>.
}}

=== Derangements ===
We now count the number of bijections from a set to itself with no fixed points. This is the '''derangement problem'''.

For a permutation <math>\pi</math> of <math>\{1,2,\ldots,n\}</math>, a '''fixed point''' is such an <math>i\in\{1,2,\ldots,n\}</math> that <math>\pi(i)=i</math>.
A [http://en.wikipedia.org/wiki/Derangement '''derangement'''] of <math>\{1,2,\ldots,n\}</math> is a permutation of <math>\{1,2,\ldots,n\}</math> that has no fixed points.

{{Theorem|Theorem|
:The number of derangements of <math>\{1,2,\ldots,n\}</math> given by
::<math>n!\sum_{k=0}^n\frac{(-1)^k}{k!}\approx \frac{n!}{\mathrm{e}}</math>.
}}
{{Proof|
Let <math>U</math> be the set of all permutations of <math>\{1,2,\ldots,n\}</math>. So <math>|U|=n!</math>.

Let <math>A_i</math> be the set of permutations with fixed point <math>i</math>; so <math>|A_i|=(n-1)!</math>. More generally, for any <math>I\subseteq \{1,2,\ldots,n\}</math>, <math>A_I=\bigcap_{i\in I}A_i</math>, and <math>|A_I|=(n-|I|)!</math>, since permutations in <math>A_I</math> fix every point in <math>I</math> and permute the remaining points arbitrarily. A permutation is a derangement if and only if it lies in none of the sets <math>A_i</math>. So the number of derangements is
:<math>\sum_{I\subseteq\{1,2,\ldots,n\}}(-1)^{|I|}(n-|I|)!=\sum_{k=0}^n(-1)^k{n\choose k}(n-k)!=n!\sum_{k=0}^n\frac{(-1)^k}{k!}.</math>
By Taylor's series,
:<math>\frac{1}{\mathrm{e}}=\sum_{k=0}^\infty\frac{(-1)^k}{k!}=\sum_{k=0}^n\frac{(-1)^k}{k!}\pm o\left(\frac{1}{n!}\right)</math>.
It is not hard to see that <math>n!\sum_{k=0}^n\frac{(-1)^k}{k!}</math> is the closest integer to <math>\frac{n!}{\mathrm{e}}</math>.
}}

Therefore, there are about <math>\frac{1}{\mathrm{e}}</math> fraction of all permutations with no fixed points.

=== Permutations with restricted positions ===
We introduce a general theory of counting permutations with restricted positions. In the derangement problem, we count the number of permutations that <math>\pi(i)\neq i</math>. We now generalize to the problem of counting permutations which avoid a set of arbitrarily specified positions.

It is traditionally described using terminology from the game of chess. Let <math>B\subseteq \{1,\ldots,n\}\times \{1,\ldots,n\}</math>, called a '''board'''. As illustrated below, we can think of <math>B</math> as a chess board, with the positions in <math>B</math> marked by "<math>\times</math>".
{{Chess diagram small
|
|
|=
8 |__|xx|xx|__|xx|__|__|xx|=
7 |xx|__|__|xx|__|__|xx|__|=
6 |xx|__|xx|xx|__|xx|xx|__|=
5 |__|xx|__|__|xx|__|xx|__|=
4 |xx|__|__|__|xx|xx|xx|__|=
3 |__|xx|__|xx|__|__|__|xx|=
2 |__|__|xx|__|xx|__|__|xx|=
1 |xx|__|__|xx|__|xx|__|__|=
a b c d e f g h
|
}}
For a permutation <math>\pi</math> of <math>\{1,\ldots,n\}</math>, define the '''graph''' <math>G_\pi(V,E)</math> as
:<math>
\begin{align}
G_\pi &= \{(i,\pi(i))\mid i\in \{1,2,\ldots,n\}\}.
\end{align}
</math>
This can also be viewed as a set of marked positions on a chess board. Each row and each column has only one marked position, because <math>\pi</math> is a permutation. Thus, we can identify each <math>G_\pi</math> as a placement of <math>n</math> rooks (“城堡”，规则同中国象棋里的“车”) without attacking each other.

For example, the following is the <math>G_\pi</math> of such <math>\pi</math> that <math>\pi(i)=i</math>.
{{Chess diagram small
|
|
|=
8 |rl|__|__|__|__|__|__|__|=
7 |__|rl|__|__|__|__|__|__|=
6 |__|__|rl|__|__|__|__|__|=
5 |__|__|__|rl|__|__|__|__|=
4 |__|__|__|__|rl|__|__|__|=
3 |__|__|__|__|__|rl|__|__|=
2 |__|__|__|__|__|__|rl|__|=
1 |__|__|__|__|__|__|__|rl|=
a b c d e f g h
|
}}
Now define
:<math>\begin{align}
N_0 &= \left|\left\{\pi\mid B\cap G_\pi=\emptyset\right\}\right|\\
r_k &= \mbox{number of }k\mbox{-subsets of }B\mbox{ such that no two elements have a common coordinate}\\
&=\left|\left\{S\in{B\choose k} \,\bigg|\, \forall (i_1,j_1),(i_2,j_2)\in S, i_1\neq i_2, j_1\neq j_2 \right\}\right|
\end{align}
</math>
Interpreted in chess game,
* <math>B</math>: a set of marked positions in an <math>[n]\times [n]</math> chess board.
* <math>N_0</math>: the number of ways of placing <math>n</math> non-attacking rooks on the chess board such that none of these rooks lie in <math>B</math>.
* <math>r_k</math>: number of ways of placing <math>k</math> non-attacking rooks on <math>B</math>.

Our goal is to count <math>N_0</math> in terms of <math>r_k</math>. This gives the number of permutations avoid all positions in a <math>B</math>.

{{Theorem|Theorem|
:<math>N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!</math>.
}}
{{Proof|
For each <math>i\in[n]</math>, let <math>A_i=\{\pi\mid (i,\pi(i))\in B\}</math> be the set of permutations <math>\pi</math> whose <math>i</math>-th position is in <math>B</math>.

<math>N_0</math> is the number of permutations avoid all positions in <math>B</math>. Thus, our goal is to count the number of permutations <math>\pi</math> in none of <math>A_i</math> for <math>i\in [n]</math>.

For each <math>I\subseteq [n]</math>, let <math>A_I=\bigcap_{i\in I}A_i</math>, which is the set of permutations <math>\pi</math> such that <math>(i,\pi(i))\in B</math> for all <math>i\in I</math>. Due to the principle of inclusion-exclusion,
:<math>N_0=\sum_{I\subseteq [n]} (-1)^{|I|}|A_I|=\sum_{k=0}^n(-1)^k\sum_{I\in{[n]\choose k}}|A_I|</math>.

The next observation is that
:<math>\sum_{I\in{[n]\choose k}}|A_I|=r_k(n-k)!</math>,
because we can count both sides by first placing <math>k</math> non-attacking rooks on <math>B</math> and placing <math>n-k</math> additional non-attacking rooks on <math>[n]\times [n]</math> in <math>(n-k)!</math> ways.

Therefore,
:<math>N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!</math>.
}}

====Derangement problem====
We use the above general method to solve the derange problem again.

Take <math>B=\{(1,1),(2,2),\ldots,(n,n)\}</math> as the chess board. A derangement <math>\pi</math> is a placement of <math>n</math> non-attacking rooks such that none of them is in <math>B</math>.
{{Chess diagram small
|
|
|=
8 |xx|__|__|__|__|__|__|__|=
7 |__|xx|__|__|__|__|__|__|=
6 |__|__|xx|__|__|__|__|__|=
5 |__|__|__|xx|__|__|__|__|=
4 |__|__|__|__|xx|__|__|__|=
3 |__|__|__|__|__|xx|__|__|=
2 |__|__|__|__|__|__|xx|__|=
1 |__|__|__|__|__|__|__|xx|=
a b c d e f g h
|
}}
Clearly, the number of ways of placing <math>k</math> non-attacking rooks on <math>B</math> is <math>r_k={n\choose k}</math>. We want to count <math>N_0</math>, which gives the number of ways of placing <math>n</math> non-attacking rooks such that none of these rooks lie in <math>B</math>.

By the above theorem
:<math>
N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!=\sum_{k=0}^n(-1)^k{n\choose k}(n-k)!=\sum_{k=0}^n(-1)^k\frac{n!}{k!}=n!\sum_{k=0}^n(-1)^k\frac{1}{k!}\approx\frac{n!}{e}.
</math>

====Problème des ménages====

{{Theorem|Lemma|
:The number of ways of choosing <math>k</math> ''non-consecutive'' objects from a collection of <math>m</math> objects arranged in a ''line'', is <math>{m-k+1\choose k}</math>.
}}
{{Proof|
We draw a line of <math>m-k</math> black points, and then insert <math>k</math> red points into the <math>m-k+1</math> spaces between the black points (including the beginning and end).
::<math>
\begin{align}
&\sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \\
&\qquad\qquad\qquad\quad\Downarrow\\
&\sqcup \, \bullet \,\, {\color{Red}\bullet} \, \bullet \,\, {\color{Red}\bullet} \, \bullet \, \sqcup \, \bullet \,\, {\color{Red}\bullet}\, \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \,\, {\color{Red}\bullet}
\end{align}
</math>
This gives us a line of <math>m</math> points, and the red points specifies the chosen objects, which are non-consecutive. The mapping is 1-1 correspondence.
There are <math>{m-k+1\choose k}</math> ways of placing <math>k</math> red points into <math>m-k+1</math> spaces.
}}

{{Theorem|Lemma|
:The number of ways of choosing <math>k</math> ''non-consecutive'' objects from a collection of <math>m</math> objects arranged in a ''circle'', is <math>\frac{m}{m-k}{m-k\choose k}</math>.
}}
{{Proof|
Let <math>f(m,k)</math> be the desired number; and let <math>g(m,k)</math> be the number of ways of choosing <math>k</math> non-consecutive points from <math>m</math> points arranged in a circle, next coloring the <math>k</math> points red, and then coloring one of the uncolored point blue.

Clearly, <math>g(m,k)=(m-k)f(m,k)</math>.

But we can also compute <math>g(m,k)</math> as follows:
* Choose one of the <math>m</math> points and color it blue. This gives us <math>m</math> ways.
* Cut the circle to make a line of <math>m-1</math> points by removing the blue point.
* Choose <math>k</math> non-consecutive points from the line of <math>m-1</math> points and color them red. This gives <math>{m-k\choose k}</math> ways due to the previous lemma.

Thus, <math>g(m,k)=m{m-k\choose k}</math>. Therefore we have the desired number <math>f(m,k)=\frac{m}{m-k}{m-k\choose k}</math>.
}}

=== The Euler totient function ===
Two integers <math>m, n</math> are said to be '''relatively prime''' if their greatest common diviser <math>\mathrm{gcd}(m,n)=1</math>. For a positive integer <math>n</math>, let <math>\phi(n)</math> be the number of positive integers from <math>\{1,2,\ldots,n\}</math> that are relative prime to <math>n</math>. This function, called the Euler <math>\phi</math> function or '''the Euler totient function''', is fundamental in number theory.

We know derive a formula for this function by using the principle of inclusion-exclusion.
{{Theorem|Theorem (The Euler totient function)|
Suppose <math>n</math> is divisible by precisely <math>r</math> different primes, denoted <math>p_1,\ldots,p_r</math>. Then
:<math>\phi(n)=n\prod_{i=1}^r\left(1-\frac{1}{p_i}\right)</math>.
}}
{{Proof|
Let <math>U=\{1,2,\ldots,n\}</math> be the universe. The number of positive integers from <math>U</math> which is divisible by some <math>p_{i_1},p_{i_2},\ldots,p_{i_s}\in\{p_1,\ldots,p_r\}</math>, is <math>\frac{n}{p_{i_1}p_{i_2}\cdots p_{i_s}}</math>.

<math>\phi(n)</math> is the number of integers from <math>U</math> which is not divisible by any <math>p_1,\ldots,p_r</math>.
By principle of inclusion-exclusion,
:<math>
\begin{align}
\phi(n)
&=n+\sum_{k=1}^r(-1)^k\sum_{1\le i_1<i_2<\cdots <i_k\le n}\frac{n}{p_{i_1}p_{i_2}\cdots p_{i_k}}\\
&=n-\sum_{1\le i\le n}\frac{n}{p_i}+\sum_{1\le i<j\le n}\frac{n}{p_i p_j}-\sum_{1\le i<j<k\le n}\frac{n}{p_{i} p_{j} p_{k}}+\cdots + (-1)^r\frac{n}{p_{1}p_{2}\cdots p_{r}}\\
&=n\left(1-\sum_{1\le i\le n}\frac{1}{p_i}+\sum_{1\le i<j\le n}\frac{1}{p_i p_j}-\sum_{1\le i<j<k\le n}\frac{1}{p_{i} p_{j} p_{k}}+\cdots + (-1)^r\frac{1}{p_{1}p_{2}\cdots p_{r}}\right)\\
&=n\prod_{i=1}^n\left(1-\frac{1}{p_i}\right).
\end{align}
</math>
}}

Combinatorics (Fall 2010)/Partitions, sieve methods

2010-09-09T10:54:09Z

210.28.131.82: /* Young tableaux */

== Partitions ==
We count the ways of partitioning <math>n</math> ''identical'' objects into <math>k</math> ''unordered'' groups. This is the problem of counting the ways partitioning a number <math>n</math> into <math>k</math> unordered parts.

A '''<math>k</math>-partition''' of a number <math>n</math> is a multiset <math>\{x_1,x_2,\ldots,x_k\}</math> with <math>x_i\ge 1</math> for every element <math>x_i</math> and <math>x_1+x_2+\cdots+x_k=n</math>.

We define <math>p_k(n)</math> as the number of <math>k</math>-partitions of <math>n</math>.

For example, number 7 has the following partitions:
<div class="center"><math>
\begin{align}
&\{7\}
& p_1(7)=1\\
&\{1,6\},\{2,5\},\{3,4\}
& p_2(7)=3\\
&\{1,1,5\}, \{1,2,4\}, \{1,3,3\}, \{2,2,3\}
& p_3(7)=4\\
&\{1,1,1,4\},\{1,1,2,3\}, \{1,2,2,2\}
& p_4(7)=3\\
&\{1,1,1,1,3\},\{1,1,1,2,2\}
& p_5(7)=2\\
&\{1,1,1,1,1,2\}
& p_6(7)=1\\
&\{1,1,1,1,1,1,1\}
& p_7(7)=1
\end{align}
</math></div>

Equivalently, we can also define that A <math>k</math>-partition of a number <math>n</math> is a <math>k</math>-tuple <math>(x_1,x_2,\ldots,x_k)</math> with:
* <math>x_1\ge x_2\ge\cdots\ge x_k\ge 1</math>;
* <math>x_1+x_2+\cdots+x_k=n</math>.

<math>p_k(n)</math> the number of integral solutions to the above system.

Let <math>p(n)=\sum_{k=1}^n p_k(n)</math> be the total number of partitions of <math>n</math>. The function <math>p(n)</math> is called the '''partition number'''.

=== Counting <math>p_k(n)</math>===

{{Theorem|Proposition|
:<math>p_k(n)=p_{k-1}(n-1)+p_k(n-k)\,</math>.
}}
{{Proof|
Suppose that <math>(x_1,\ldots,x_k)</math> is a <math>k</math>-partition of <math>n</math>. Note that it must hold that
:<math>x_1\ge x_2\ge \cdots \ge x_k\ge 1</math>.
There are two cases: <math>x_k=1</math> or <math>x_k>1</math>.
;Case 1.
:If <math>x_k=1</math>, then <math>(x_1,\cdots,x_{k-1})</math> is a distinct <math>(k-1)</math>-partition of <math>n-1</math>. And every <math>(k-1)</math>-partition of <math>n-1</math> can be obtained in this way. Thus the number of <math>k</math>-partitions of <math>n</math> in this case is <math>p_{k-1}(n-1)</math>.
;Case 2.
:If <math>x_k>1</math>, then <math>(x_1-1,\cdots,x_{k}-1)</math> is a distinct <math>k</math>-partition of <math>n-k</math>. And every <math>k</math>-partition of <math>n-k</math> can be obtained in this way. Thus the number of <math>k</math>-partitions of <math>n</math> in this case is <math>p_{k}(n-k)</math>.
In conclusion, the number of <math>k</math>-partitions of <math>n</math> is <math>p_{k-1}(n-1)+p_k(n-k)</math>, i.e.
:<math>p_k(n)=p_{k-1}(n-1)+p_k(n-k)\,</math>.
}}

{{Theorem|Theorem|
For any fixed <math>k</math>,
:<math>p_k(n)\sim\frac{n^{k-1}}{k!(k-1)!}</math>,
as <math>n\rightarrow \infty</math>.
}}
{{Proof|
Suppose that <math>(x_1,\ldots,x_k)</math> is a <math>k</math>-partition of <math>n</math>. Then <math>x_1+x_2+\cdots+x_k=n</math> and <math>x_1\ge x_2\ge \cdots \ge x_k\ge 1</math>.

The <math>k!</math> permutations of <math>(x_1,\ldots,x_k)</math> yield at most <math>k!</math> many <math>k</math>-compositions (the ''ordered'' sum of <math>k</math> positive integers). There are <math>{n-1\choose k-1}</math> many <math>k</math>-compositions of <math>n</math>.
:<math>k!p_k(n)\ge{n-1\choose k-1}</math>.

Let <math>y_i=x_i+k-i</math>. That is, <math>y_k=x_k, y_{k-1}=x_k+1, y_{k-2}=x_k+2,\ldots, y_{1}=x_k+k-1</math>. Then, it holds that
* <math>y_1>y_2>\cdots>y_k\ge 1</math>; and
* <math>y_1+y_2+\cdots+y_k=n+\frac{k(k-1)}{2}</math>.
Each permutation of <math>(y_1,y_2,\ldots,y_k)</math> yields a '''distinct''' <math>k</math>-composition of <math>n+\frac{k(k-1)}{2}</math>, because all <math>y_i</math> are distinct.
Thus,
:<math>k!p_k(n)\le {n+\frac{k(k-1)}{2}-1\choose k-1}</math>.

Combining the two inequalities, we have
:<math>\frac{{n-1\choose k-1}}{k!}\le p_k(n)\le \frac{{n+\frac{k(k-1)}{2}-1\choose k-1}}{k!}</math>.
The theorem follows.
}}

=== Ferrers diagram ===
:{|border="0"
|[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]
|-
|[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]
|-
|[[File:Chess xot45.svg|22px]]||[[File:Chess xot45.svg|22px]]
|-
|[[File:Chess xot45.svg|22px]]
|}
:{|border="2" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|}

<div class="center">
{|border="0"
|align=center|
{|border="2" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|}
|
[[File:Chess t45.svg|120px]]
|align=center|
{|border="2" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]||[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|-
|[[File:Chess t45.svg|22px]]
|}
|-
|align=center|<math>(6,4,4,2,1)</math>||
|align=center|conjugate: <math>(5,4,3,3,1,1)</math>
|}
</div>

{{Theorem|Proposition|
# The number of partitions of <math>n</math> which have largest summand <math>k</math>, is <math>p_k(n)</math>.
# The number of <math>n</math> into <math>k</math> parts equals the number of partitions of <math>n-k</math> into at most <math>k</math> parts. Formally,
::<math>p_k(n)=\sum_{j=1}^k p_j(n-k)</math>.
}}
{{Proof|
# For every <math>k</math>-partition, the conjugate partition has largest part <math>k</math>. And vice versa.
#
}}

We now consider the case of partitions of <math>n</math> into '''distinct''' parts: the <math>(x_1,x_2,\ldots,x_k)</math> has that <math>x_1>x_2>\cdots>x_k\ge 1</math> and <math>x_1+\cdots+x_k=n</math>.

Define <math>q_e(n)</math> to be the number of partitions of <math>n</math> into an ''even'' number of ''distinct'' parts, and <math>q_o(n)</math> to be the number of partitions of <math>n</math> into an ''odd'' number of ''distinct'' parts.

{{Theorem|Theorem (Euler's Pentagonal Number Theorem)|
:<math>q_e(n)-q_o(n)=\begin{cases}
(-1)^k & \mbox{if }n=\frac{k(3k\pm 1)}{2},\\
0 & \mbox{otherwise.}
\end{cases}</math>
}}

== Principle of Inclusion-Exclusion ==
Let <math>A</math> and <math>B</math> be two finite sets. The cardinality of their union is
:<math>|A\cup B|=|A|+|B|-{\color{Blue}|A\cap B|}</math>.
For three sets <math>A</math>, <math>B</math>, and <math>C</math>, the cardinality of the union of these three sets is computed as
:<math>|A\cup B\cup C|=|A|+|B|+|C|-{\color{Blue}|A\cap B|}-{\color{Blue}|A\cap C|}-{\color{Blue}|B\cap C|}+{\color{Red}|A\cap B\cap C|}</math>.
This is illustrated by the following figure.
::[[Image:Inclusion-exclusion.png|200px|border|center]]

Generally, the '''Principle of Inclusion-Exclusion''' states the rule for computing the union of <math>n</math> finite sets <math>A_1,A_2,\ldots,A_n</math>, such that
{{Equation|
<math>
\begin{align}
\left|\bigcup_{i=1}^nA_i\right|
&=
\sum_{I\subseteq\{1,\ldots,n\}}(-1)^{|I|-1}\left|\bigcap_{i\in I}A_i\right|.
\end{align}
</math>
}}
-----

In combinatorial enumeration, the Principle of Inclusion-Exclusion is usually applied in its complement form.

Let <math>A_1,A_2,\ldots,A_n\subseteq U</math> be subsets of some finite set <math>U</math>. Here <math>U</math> is some universe of combinatorial objects, whose cardinality is easy to calculate (e.g. all strings, tuples, permutations), and each <math>A_i</math> contains the objects with some specific property (e.g. a "pattern") which we want to avoid. The problem is to count the number of objects without any of the <math>n</math> properties. We write <math>\bar{A_i}=U-A</math>. The number of objects without any of the properties <math>A_1,A_2,\ldots,A_n</math> is
{{Equation|
<math>
\begin{align}
\left|\bar{A_1}\cap\bar{A_2}\cap\cdots\cap\bar{A_n}\right|=\left|U-\bigcup_{i=1}^nA_i\right|
&=
|U|-\sum_{I\subseteq\{1,\ldots,n\}}(-1)^{|I|}\left|\bigcap_{i\in I}A_i\right|.
\end{align}
</math>
}}
For an <math>I\subseteq\{1,2,\ldots,n\}</math>, we denote
:<math>A_I=\bigcap_{i\in I}A_i</math>
with the convention that <math>A_\emptyset=U</math>. The above equation is stated as:
{{Theorem|Principle of Inclusion-Exclusion|
:Let <math>A_1,A_2,\ldots,A_n</math> be a family of subsets of <math>U</math>. Then the number of elements of <math>U</math> which lie in none of the subsets <math>A_i</math> is
::<math>\sum_{I\subseteq\{1,\ldots, n\}}(-1)^{|I|}|A_I|</math>.
}}

Let <math>S_k=\sum_{|I|=k}|A_I|\,</math>. Conventionally, <math>S_0=|A_\emptyset|=|U|</math>. The principle of inclusion-exclusion can be expressed as
{{Equation|<math>
S_0-S_1+S_2+\cdots+(-1)^nS_n.
</math>
}}

=== Surjections ===
In the twelvefold way, we discuss the counting problems incurred by the mappings <math>f:N\rightarrow M</math>. The basic case is that elements from both <math>N</math> and <math>M</math> are distinguishable. In this case, it is easy to count the number of arbitrary mappings (which is <math>m^n</math>) and the number of injective (one-to-one) mappings (which is <math>(m)_n</math>), but the number of surjective is difficult. Here we apply the principle of inclusion-exclusion to count the number of surjective (onto) mappings.
{{Theorem|Theorem|
:The number of surjective mappings from an <math>n</math>-set to an <math>m</math>-set is given by
::<math>\sum_{k=1}^m(-1)^{m-k}{m\choose k}k^n</math>.
}}
{{Proof|
Let <math>U=\{f:[n]\rightarrow[m]\}</math> be the set of mappings from <math>[n]</math> to <math>[m]</math>. Then <math>|U|=m^n</math>.

For <math>i\in[m]</math>, let <math>A_i</math> be the set of mappings <math>f:[n]\rightarrow[m]</math> that none of <math>j\in[n]</math> is mapped to <math>i</math>, i.e. <math>A_i=\{f:[n]\rightarrow[m]\setminus\{i\}\}</math>, thus <math>|A_i|=(m-1)^n</math>.

More generally, for <math>I\subseteq [m]</math>, <math>A_I=\bigcap_{i\in I}A_i</math> contains the mappings <math>f:[n]\rightarrow[m]\setminus I</math>. And <math>|A_I|=(m-|I|)^n\,</math>.

A mapping <math>f:[n]\rightarrow[m]</math> is surjective if <math>f</math> lies in none of <math>A_i</math>. By the principle of inclusion-exclusion, the number of surjective <math>f:[n]\rightarrow[m]</math> is
:<math>\sum_{I\subseteq[m]}(-1)^{|I|}\left|A_I\right|=\sum_{I\subseteq[m]}(-1)^{|I|}(m-|I|)^n=\sum_{j=0}^m(-1)^j{m\choose j}(m-j)^n</math>.
Let <math>k=m-j</math>. The theorem is proved.
}}

Recall that, in the twelvefold way, we establish a relation between surjections and partitions.

* Surjection to ordered partition:
:For a surjective <math>f:[n]\rightarrow[m]</math>, <math>(f^{-1}(0),f^{-1}(1),\ldots,f^{-1}(m-1))</math> is an '''ordered partition''' of <math>[n]</math>.
* Ordered partition to surjection:
:For an ordered <math>m</math>-partition <math>(B_0,B_1,\ldots, B_{m-1})</math> of <math>[n]</math>, we can define a function <math>f:[n]\rightarrow[m]</math> by letting <math>f(i)=j</math> if and only if <math>i\in B_j</math>. <math>f</math> is surjective since as a partition, none of <math>B_i</math> is empty.

Therefore, we have a one-to-one correspondence between surjective mappings from an <math>n</math>-set to an <math>m</math>-set and the ordered <math>m</math>-partitions of an <math>n</math>-set.

The Stirling number of the second kind <math>S(n,m)</math> is the number of <math>m</math>-partitions of an <math>n</math>-set. There are <math>m!</math> ways to order an <math>m</math>-partition, thus the number of surjective mappings <math>f:[n]\rightarrow[m]</math> is <math>m! S(n,m)</math>. Combining with what we have proved for surjections, we give the following result for the Stirling number of the second kind.

{{Theorem|Proposition|
:<math>S(n,m)=\frac{1}{m!}\sum_{k=1}^m(-1)^{m-k}{m\choose k}k^n</math>.
}}

=== Derangements ===
We now count the number of bijections from a set to itself with no fixed points. This is the '''derangement problem'''.

For a permutation <math>\pi</math> of <math>\{1,2,\ldots,n\}</math>, a '''fixed point''' is such an <math>i\in\{1,2,\ldots,n\}</math> that <math>\pi(i)=i</math>.
A [http://en.wikipedia.org/wiki/Derangement '''derangement'''] of <math>\{1,2,\ldots,n\}</math> is a permutation of <math>\{1,2,\ldots,n\}</math> that has no fixed points.

{{Theorem|Theorem|
:The number of derangements of <math>\{1,2,\ldots,n\}</math> given by
::<math>n!\sum_{k=0}^n\frac{(-1)^k}{k!}\approx \frac{n!}{\mathrm{e}}</math>.
}}
{{Proof|
Let <math>U</math> be the set of all permutations of <math>\{1,2,\ldots,n\}</math>. So <math>|U|=n!</math>.

Let <math>A_i</math> be the set of permutations with fixed point <math>i</math>; so <math>|A_i|=(n-1)!</math>. More generally, for any <math>I\subseteq \{1,2,\ldots,n\}</math>, <math>A_I=\bigcup_{i\in I}A_i</math>, and <math>|A_I|=(n-|I|)!</math>, since permutations in <math>A_I</math> fix every point in <math>I</math> and permute the remaining points arbitrarily. A permutation is a derangement if and only if it lies in none of the sets <math>A_i</math>. So the number of derangements is
:<math>\sum_{I\subseteq\{1,2,\ldots,n\}}(-1)^{|I|}(n-|I|)!=\sum_{k=0}^n(-1)^k{n\choose k}(n-k)!=n!\sum_{k=0}^n\frac{(-1)^k}{k!}.</math>
By Taylor's series,
:<math>\frac{1}{\mathrm{e}}=\sum_{k=0}^\infty\frac{(-1)^k}{k!}=\sum_{k=0}^n\frac{(-1)^k}{k!}\pm o\left(\frac{1}{n!}\right)</math>.
It is not hard to see that <math>n!\sum_{k=0}^n\frac{(-1)^k}{k!}</math> is the closest integer to <math>\frac{n!}{\mathrm{e}}</math>.
}}

Therefore, there are about <math>\frac{1}{\mathrm{e}}</math> fraction of all permutations with no fixed points.

=== Permutations with restricted positions ===
We introduce a general theory of counting permutations with restricted positions. In the derangement problem, we count the number of permutations that <math>\pi(i)\neq i</math>. We now generalize to the problem of counting permutations which avoid a set of arbitrarily specified positions.

It is traditionally described using terminology from the game of chess. Let <math>B\subseteq \{1,\ldots,n\}\times \{1,\ldots,n\}</math>, called a '''board'''. As illustrated below, we can think of <math>B</math> as a chess board, with the positions in <math>B</math> marked by "<math>\times</math>".
{{Chess diagram small
|
|
|=
8 |__|xx|xx|__|xx|__|__|xx|=
7 |xx|__|__|xx|__|__|xx|__|=
6 |xx|__|xx|xx|__|xx|xx|__|=
5 |__|xx|__|__|xx|__|xx|__|=
4 |xx|__|__|__|xx|xx|xx|__|=
3 |__|xx|__|xx|__|__|__|xx|=
2 |__|__|xx|__|xx|__|__|xx|=
1 |xx|__|__|xx|__|xx|__|__|=
a b c d e f g h
|
}}
For a permutation <math>\pi</math> of <math>\{1,\ldots,n\}</math>, define the '''graph''' <math>G_\pi(V,E)</math> as
:<math>
\begin{align}
G_\pi &= \{(i,\pi(i))\mid i\in \{1,2,\ldots,n\}\}.
\end{align}
</math>
This can also be viewed as a set of marked positions on a chess board. Each row and each column has only one marked position, because <math>\pi</math> is a permutation. Thus, we can identify each <math>G_\pi</math> as a placement of <math>n</math> rooks (“城堡”，规则同中国象棋里的“车”) without attacking each other.

For example, the following is the <math>G_\pi</math> of such <math>\pi</math> that <math>\pi(i)=i</math>.
{{Chess diagram small
|
|
|=
8 |rl|__|__|__|__|__|__|__|=
7 |__|rl|__|__|__|__|__|__|=
6 |__|__|rl|__|__|__|__|__|=
5 |__|__|__|rl|__|__|__|__|=
4 |__|__|__|__|rl|__|__|__|=
3 |__|__|__|__|__|rl|__|__|=
2 |__|__|__|__|__|__|rl|__|=
1 |__|__|__|__|__|__|__|rl|=
a b c d e f g h
|
}}
Now define
:<math>\begin{align}
N_0 &= \left|\left\{\pi\mid B\cap G_\pi=\emptyset\right\}\right|\\
r_k &= \mbox{number of }k\mbox{-subsets of }B\mbox{ such that no two elements have a common coordinate}\\
&=\left|\left\{S\in{B\choose k} \,\bigg|\, \forall (i_1,j_1),(i_2,j_2)\in S, i_1\neq i_2, j_1\neq j_2 \right\}\right|
\end{align}
</math>
Interpreted in chess game,
* <math>B</math>: a set of marked positions in an <math>[n]\times [n]</math> chess board.
* <math>N_0</math>: the number of ways of placing <math>n</math> non-attacking rooks on the chess board such that none of these rooks lie in <math>B</math>.
* <math>r_k</math>: number of ways of placing <math>k</math> non-attacking rooks on <math>B</math>.

Our goal is to count <math>N_0</math> in terms of <math>r_k</math>. This gives the number of permutations avoid all positions in a <math>B</math>.

{{Theorem|Theorem|
:<math>N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!</math>.
}}
{{Proof|
For each <math>i\in[n]</math>, let <math>A_i=\{\pi\mid (i,\pi(i))\in B\}</math> be the set of permutations <math>\pi</math> whose <math>i</math>-th position is in <math>B</math>.

<math>N_0</math> is the number of permutations avoid all positions in <math>B</math>. Thus, our goal is to count the number of permutations <math>\pi</math> in none of <math>A_i</math> for <math>i\in [n]</math>.

For each <math>I\subseteq [n]</math>, let <math>A_I=\bigcap_{i\in I}A_i</math>, which is the set of permutations <math>\pi</math> such that <math>(i,\pi(i))\in B</math> for all <math>i\in I</math>. Due to the principle of inclusion-exclusion,
:<math>N_0=\sum_{I\subseteq [n]} (-1)^{|I|}|A_I|=\sum_{k=0}^n(-1)^k\sum_{I\in{[n]\choose k}}|A_I|</math>.

The next observation is that
:<math>\sum_{I\in{[n]\choose k}}|A_I|=r_k(n-k)!</math>,
because we can count both sides by first placing <math>k</math> non-attacking rooks on <math>B</math> and placing <math>n-k</math> additional non-attacking rooks on <math>[n]\times [n]</math> in <math>(n-k)!</math> ways.

Therefore,
:<math>N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!</math>.
}}

====Derangement problem====
We use the above general method to solve the derange problem again.

Take <math>B=\{(1,1),(2,2),\ldots,(n,n)\}</math> as the chess board. A derangement <math>\pi</math> is a placement of <math>n</math> non-attacking rooks such that none of them is in <math>B</math>.
{{Chess diagram small
|
|
|=
8 |xx|__|__|__|__|__|__|__|=
7 |__|xx|__|__|__|__|__|__|=
6 |__|__|xx|__|__|__|__|__|=
5 |__|__|__|xx|__|__|__|__|=
4 |__|__|__|__|xx|__|__|__|=
3 |__|__|__|__|__|xx|__|__|=
2 |__|__|__|__|__|__|xx|__|=
1 |__|__|__|__|__|__|__|xx|=
a b c d e f g h
|
}}
Clearly, the number of ways of placing <math>k</math> non-attacking rooks on <math>B</math> is <math>r_k={n\choose k}</math>. We want to count <math>N_0</math>, which gives the number of ways of placing <math>n</math> non-attacking rooks such that none of these rooks lie in <math>B</math>.

By the above theorem
:<math>
N_0=\sum_{k=0}^n(-1)^kr_k(n-k)!=\sum_{k=0}^n(-1)^k{n\choose k}(n-k)!=\sum_{k=0}^n(-1)^k\frac{n!}{k!}=n!\sum_{k=0}^n(-1)^k\frac{1}{k!}\approx\frac{n!}{e}.
</math>

====Problème des ménages====

{{Theorem|Lemma|
:The number of ways of choosing <math>k</math> ''non-consecutive'' objects from a collection of <math>m</math> objects arranged in a ''line'', is <math>{m-k+1\choose k}</math>.
}}
{{Proof|
We draw a line of <math>m-k</math> black points, and then insert <math>k</math> red points into the <math>m-k+1</math> spaces between the black points (including the beginning and end).
::<math>
\begin{align}
&\sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \, \sqcup \\
&\qquad\qquad\qquad\quad\Downarrow\\
&\sqcup \, \bullet \,\, {\color{Red}\bullet} \, \bullet \,\, {\color{Red}\bullet} \, \bullet \, \sqcup \, \bullet \,\, {\color{Red}\bullet}\, \, \bullet \, \sqcup \, \bullet \, \sqcup \, \bullet \,\, {\color{Red}\bullet}
\end{align}
</math>
This gives us a line of <math>m</math> points, and the red points specifies the chosen objects, which are non-consecutive. The mapping is 1-1 correspondence.
There are <math>{m-k+1\choose k}</math> ways of placing <math>k</math> red points into <math>m-k+1</math> spaces.
}}

{{Theorem|Lemma|
:The number of ways of choosing <math>k</math> ''non-consecutive'' objects from a collection of <math>m</math> objects arranged in a ''circle'', is <math>\frac{m}{m-k}{m-k\choose k}</math>.
}}
{{Proof|
Let <math>f(m,k)</math> be the desired number; and let <math>g(m,k)</math> be the number of ways of choosing <math>k</math> non-consecutive points from <math>m</math> points arranged in a circle, next coloring the <math>k</math> points red, and then coloring one of the uncolored point blue.

Clearly, <math>g(m,k)=(m-k)f(m,k)</math>.

But we can also compute <math>g(m,k)</math> as follows:
* Choose one of the <math>m</math> points and color it blue. This gives us <math>m</math> ways.
* Cut the circle to make a line of <math>m-1</math> points by removing the blue point.
* Choose <math>k</math> non-consecutive points from the line of <math>m-1</math> points and color them red. This gives <math>{m-k\choose k}</math> ways due to the previous lemma.

Thus, <math>g(m,k)=m{m-k\choose k}</math>. Therefore we have the desired number <math>f(m,k)=\frac{m}{m-k}{m-k\choose k}</math>.
}}

=== The Euler totient function ===
Two integers <math>m, n</math> are said to be '''relatively prime''' if their greatest common diviser <math>\mathrm{gcd}(m,n)=1</math>. For a positive integer <math>n</math>, let <math>\phi(n)</math> be the number of positive integers from <math>\{1,2,\ldots,n\}</math> that are relative prime to <math>n</math>. This function, called the Euler <math>\phi</math> function or '''the Euler totient function''', is fundamental in number theory.

We know derive a formula for this function by using the principle of inclusion-exclusion.
{{Theorem|Theorem (The Euler totient function)|
Suppose <math>n</math> is divisible by precisely <math>r</math> different primes, denoted <math>p_1,\ldots,p_r</math>. Then
:<math>\phi(n)=n\prod_{i=1}^r\left(1-\frac{1}{p_i}\right)</math>.
}}
{{Proof|
Let <math>U=\{1,2,\ldots,n\}</math> be the universe. The number of positive integers from <math>U</math> which is divisible by some <math>p_{i_1},p_{i_2},\ldots,p_{i_s}\in\{p_1,\ldots,p_r\}</math>, is <math>\frac{n}{p_{i_1}p_{i_2}\cdots p_{i_s}}</math>.

<math>\phi(n)</math> is the number of integers from <math>U</math> which is not divisible by any <math>p_1,\ldots,p_r</math>.
By principle of inclusion-exclusion,
:<math>
\begin{align}
\phi(n)
&=n+\sum_{k=1}^r(-1)^k\sum_{1\le i_1<i_2<\cdots <i_k\le n}\frac{n}{p_{i_1}p_{i_2}\cdots p_{i_k}}\\
&=n-\sum_{1\le i\le n}\frac{n}{p_i}+\sum_{1\le i<j\le n}\frac{n}{p_i p_j}-\sum_{1\le i<j<k\le n}\frac{n}{p_{i} p_{j} p_{k}}+\cdots + (-1)^r\frac{n}{p_{1}p_{2}\cdots p_{r}}\\
&=n\left(1-\sum_{1\le i\le n}\frac{1}{p_i}+\sum_{1\le i<j\le n}\frac{1}{p_i p_j}-\sum_{1\le i<j<k\le n}\frac{1}{p_{i} p_{j} p_{k}}+\cdots + (-1)^r\frac{1}{p_{1}p_{2}\cdots p_{r}}\right)\\
&=n\prod_{i=1}^n\left(1-\frac{1}{p_i}\right).
\end{align}
</math>
}}

Combinatorics (Fall 2010)/Basic enumeration

2010-07-10T08:28:34Z

210.28.131.82: /* Multinomial coefficients */

== Counting Problems ==

== Sets and Multisets ==
=== Subsets ===
We want to count the number of subsets of a set.

Let<math>S=\{x_1,x_2,\ldots,x_n\}</math> be an <math>n</math>-element set, or '''<math>n</math>-set''' for short.
Let <math>2^S=\{T\mid T\subset S\}</math> denote the set of all subset of <math>S</math>. <math>2^S</math> is called the '''power set''' of <math>S</math>.

We give a combinatorial proof that <math>|2^S|=2^n</math>. We observe that every subset <math>T\in 2^S</math> corresponds to a unique bit-vector <math>v\in\{0,1\}^S</math>, such that each bit <math>v_i</math> indicates whether <math>x_i\in S</math>. Formally, define a map <math>\phi:2^S\rightarrow\{0,1\}^n</math> by <math>\phi(T)=(v_1,v_2,\ldots,v_n)</math>, where
:<math>
v_i=\begin{cases}
1 & \mbox{if }x_i\in T\\
0 & \mbox{if }x_i\not\in T.
\end{cases}
</math>
The map <math>\phi</math> is a '''bijection''' (a 1-1 correspondence). The proof that <math>\phi</math> is a bijection is left as an exercise.

Since there is a bijection between <math>2^S</math> and <math>\{0,1\}^n</math>, it holds that <math>|2^S|=|\{0,1\}^n|=2^n\,</math>.

Here we apply ''"the rule of bijection"''.
*'''The rule of bijection''': if there exists a bijection between finite sets <math>P</math> and <math>Q</math>, then <math>|P|=|Q|</math>.

How do we know that <math>|\{0,1\}^n|=2^n\,</math>? We use ''"the rule of product"''.
*'''The rule of product''': for any finite sets <math>P</math> and <math>Q</math>, the cardinality of the Cartesian product <math>|P\times Q|=|P|\cdot|Q|</math>.

To count the size of <math>\{0,1\}^n\,</math>, we write <math>\{0,1\}^n=\{0,1\}\times\{0,1\}^{n-1}</math>, thus <math>|\{0,1\}^n|=2|\{0,1\}^{n-1}|\,</math>. Solving the recursion, we have that <math>|\{0,1\}^n|=2^n\,</math>.

There are two elements of the proof:
* Find a 1-1 correspondence between subsets of an <math>n</math>-set and <math>n</math>-bit vectors.
: An application of this in Computer Science is that we can use bit-array as a data structure for sets: any set defined over a '''universe''' <math>U</math> can be represented by an array of <math>|U|</math> bits.
* The rule of bijection: if there is a 1-1 correspondence between two sets, then their cardinalities are the same.

Many counting problems are solved by establishing a bijection between the set to be counted and some easy-to-count set. This kind of proofs are usually called (non-rigorously) '''combinatorial proofs'''.

----
We give an alternative proof that <math>|2^S|=2^n</math>. The proof needs another basic counting rule: ''"the rule of sum"''.
*'''The rule of sum''': for any '''''disjoint''''' finite sets <math>P</math> and <math>Q</math>, the cardinality of the union <math>|P\cup Q|=|P|+|Q|</math>.

Define the function <math>f(n)=|2^{S_n}|</math>, where <math>S_n=\{x_1,x_2,\ldots,x_n\}</math> is an <math>n</math>-set. Our goal is to compute <math>f(n)</math>. We prove the following recursion for <math>f(n)</math>.

{{Theorem|Lemma|
:<math>f(n)=2f(n-1)\,</math>.
}}
{{Proof|
Fix an element <math>x_n</math>, let <math>U</math> be the set of subsets of <math>S_n</math> that contain <math>x_n</math> and let <math>V</math> be the set of subsets of <math>S_n</math> that do not contain <math>x_n</math>. It is obvious that <math>U</math> and <math>V</math> are disjoint (i.e. <math>U\cap V=\emptyset</math>) and <math>2^{S_n}=U\cup V</math>, because any subset of <math>S_n</math> either contains <math>x_n</math> or does not contain <math>x_n</math> but not both.

Applying the rule of sum,
:<math>f(n)=|U\cup V|=|U|+|V|</math>.

The next observation is that <math>|U|=|V|=f(n-1)</math>, because <math>V</math> is exactly the <math>2^{S_{n-1}}</math>, and <math>U</math> is the set resulting from adding <math>x_n</math> to every member of <math>2^{S_{n-1}}</math>. Therefore,
:<math>f(n)=|U|+|V|=f(n-1)+f(n-1)=2f(n-1)\,</math>.
}}
The elementary case <math>f(0)=1</math>, because <math>\emptyset</math> has only one subset <math>\emptyset</math>. Solving the recursion, we have that <math>|2^S|=f(n)=2^n</math>.

=== Subsets of fixed size ===
We then count the number of subsets of fixed size of a set. Again, let <math>S=\{x_1,x_2,\ldots,x_n\}</math> be an <math>n</math>-set. We define <math>{S\choose k}</math> to be the set of all <math>k</math>-elements subsets (or '''<math>k</math>-subsets''') of <math>S</math>. Formally, <math>{S\choose k}=\{T\subseteq S\mid |T|=k\}</math>. The set <math>{S\choose k}</math> is sometimes called the '''<math>k</math>-uniform''' of <math>S</math>.

We denote that <math>{n\choose k}=\left|{S\choose k}\right|</math>. The notation <math>{n\choose k}</math> is read "<math>n</math> choose <math>k</math>".

{{Theorem|Theorem|
:<math>{n\choose k}=\frac{n(n-1)\cdots(n-k+1)}{k(k-1)\cdots 1}=\frac{n!}{k!(n-k)!}</math>.
}}
{{Proof|
The number of '''ordered''' <math>k</math>-subsets of an <math>n</math>-set is <math>n(n-1)\cdots(n-k+1)</math>. Every <math>k</math>-subset has <math>k!=k(k-1)\cdots1</math> ways to order it.
}}

;Some notations
* <math>n!</math>, read "<math>n</math> factorial", is defined as that <math>n!=n(n-1)(n-2)\cdots 1</math>, with the convention that <math>0!=1</math>.
* <math>n(n-1)\cdots(n-k+1)=\frac{n!}{(n-k)!}</math> is usually denoted as <math>(n)_k\,</math>, read "<math>n</math> lower factorial <math>k</math>".

The quantity <math>{n\choose k}</math> is called a '''binomial coefficient'''.

{{Theorem|Proposition|
# <math>{n\choose k}={n\choose n-k}</math>;
# <math>\sum_{k=0}^n {n\choose k}=2^n</math>.
}}
{{Proof|
1. We give two proofs for the first equation:
:(1) (numerical proof)
::<math>{n\choose k}=\frac{n!}{k!(n-k)!}={n\choose n-k}</math>.
:(2) (combinatorial proof)
::Choosing <math>k</math> elements from an <math>n</math>-set is equivalent to choosing the <math>n-k</math> elements to leave out. Formally, every <math>k</math>-subset <math>T\in{S\choose k}</math> is uniquely specified by its complement <math>S\setminus T\in {S\choose n-k}</math>, and the same holds for <math>(n-k)</math>-subsets, thus we have a bijection between <math>{S\choose k}</math> and <math>{S\choose n-k}</math>.
2. The second equation can also be proved in different ways, but the combinatorial proof is much easier. For an <math>n</math>-element set <math>S</math>, it is obvious that we can enumerate all subsets of <math>S</math> by enumerating <math>k</math>-subsets for every possible size <math>k</math>, i.e. it holds that
:<math>
2^S=\bigcup_{k=0}^n{S\choose k}.
</math>
For different <math>k</math>, <math>{S\choose k}</math> are obviously disjoint. By the rule of sum,
:<math>2^n=|2^S|=\left|\bigcup_{k=0}^n{S\choose k}\right|=\sum_{k=0}^n\left|{S\choose k}\right|=\sum_{k=0}^n {n\choose k}</math>.
}}

<math>{n\choose k}</math> is called binomial coefficient for a reason. A binomial is a polynomial with two terms ("poly-" means many, and "bi-" means two, like in "binary", "bipartite", etc). The following celebrated '''Binomial Theorem''' states that if a power of a binomial is expanded, the coefficients in the resulting polynomial are the binomial coefficients.

{{Theorem|Theorem (Binomial theorem)|
:<math>(1+x)^n=\sum_{k=0}^n{n\choose k}x^k</math>.
}}
{{Proof|
Write <math>(1+x)^n</math> as the product of <math>n</math> factors
:<math>(1+x)(1+x)\cdots (1+x)</math>.
The term <math>x^k</math> is obtained by choosing <math>x</math> from <math>k</math> factors and 1 from the rest <math>(n-k)</math> factors. There are <math>{n\choose k}</math> ways of choosing these <math>k</math> factors, so the coefficient of <math>x^k</math> is <math>{n\choose k}</math>.
}}

The following proposition has an easy proof due to the binomial theorem.
{{Theorem| Proposition|
:For <math>n>0</math>, the numbers of subsets of an <math>n</math>-set of even and of odd cardinality are equal.
}}
{{Proof|
Set <math>x=-1</math> in the binomial theorem.
:<math>
0=(1-1)^n=\sum_{k=0}^n{n\choose k}(-1)^k=\sum_{\overset{0\le k\le n}{k \text{ even}}}{n\choose k}-\sum_{\overset{0\le k\le n}{k \text{ odd}}}{n\choose k},
</math>
therefore
:<math>\sum_{\overset{0\le k\le n}{k \text{ even}}}{n\choose k}=\sum_{\overset{0\le k\le n}{k \text{ odd}}}{n\choose k}.</math>
}}

For counting problems, what we care about are ''numbers''. In the binomial theorem, a formal ''variable'' <math>x</math> is introduced. It looks having nothing to do with our problem, but turns out to be very useful. This idea of introducing a formal variable is the basic idea of some advanced counting techniques, which will be discussed in future classes.

=== Compositions of an integer ===
A '''composition''' of <math>n</math> is an expression of <math>n</math> as an <font color="red">''ordered''</font> sum of <font color="red">''positive''</font> integers. A '''<math>k</math>-composition''' of <math>n</math> is a composition of <math>n</math> with exactly <math>k</math> positive summands.

Formally, a <math>k</math>-composition of <math>n</math> is a <math>k</math>-tuple <math>(a_1,a_2,\ldots,a_k)\in\{1,2,\ldots,n\}^k</math> such that <math>a_1+a_2+\cdots+a_k=n</math>.

Suppose we have <math>n</math> identical balls in a line. A <math>k</math>-composition partitions these <math>n</math> balls into <math>k</math> ''nonempty'' sets, illustrated as follows.
:<math>
\begin{array}{c|cc|c|c|ccc|cc}
\bigcirc \,&\, \bigcirc \,& \bigcirc \,&\, \bigcirc \,&\, \bigcirc \,&\, \bigcirc &\, \bigcirc &\, \bigcirc \,&\, \bigcirc \,& \bigcirc
\end{array}
</math>
So the number of <math>k</math>-compositions of <math>n</math> equals the number of ways we put <math>k-1</math> bars "<math>|</math>" into <math>n-1</math> slots "<math>\sqcup</math>", where each slot has at most one bar (because all the summands <math>a_i>0</math>):
:<math>
\bigcirc \sqcup \bigcirc \sqcup \bigcirc \sqcup \bigcirc \sqcup \bigcirc \sqcup \bigcirc \sqcup \bigcirc \sqcup \bigcirc \sqcup \bigcirc \sqcup \bigcirc
</math>
which is equal to the number of ways of choosing <math>k-1</math> slots out of <math>n-1</math> slots, which is <math>{n-1\choose k-1}</math>.

This graphic argument can be expressed as a formal proof. We construct a bijection between the set of <math>k</math>-compositions of <math>n</math> and <math>{\{1,2,\ldots,n-1\}\choose k-1}</math> as follows.

Let <math>\phi</math> be a mapping that given a <math>k</math>-composition <math>(a_1,a_2,\ldots,a_k)</math> of <math>n</math>,
:<math>
\begin{align}
\phi((a_1,a_2,\ldots,a_k))
&=\{a_1,\,\,a_1+a_2,\,\,a_1+a_2+a_3,\,\,\ldots,\,\,a_1+a_2+\cdots+a_{k-1}\}\\
&=\left\{\sum_{i=1}^ja_i\,\,\bigg|\,\, 1\le j<k\right\}.
\end{align}
</math>
<math>\phi</math> maps every <math>k</math>-composition to a <math>(k-1)</math>-subset of <math>\{1,2,\ldots,n-1\}</math>. It is easy to verify that <math>\phi</math> is a bijection, thus the number of <math>k</math>-compositions of <math>n</math> is <math>{n-1\choose k-1}</math>.
----
The number of <math>k</math>-compositions of <math>n</math> is equal to the number of ''positive'' integer solutions to <math>x_1+x_2+\cdots+x_k=n</math>. This suggests us to relax the constraint and count the number of ''nonnegative'' integer solutions to <math>x_1+x_2+\cdots+x_k=n</math>. We call such a solution a '''weak <math>k</math>-composition''' of <math>n</math>.

Formally, a weak <math>k</math>-composition of <math>n</math> is a tuple <math>(x_1,x_2,\ldots,x_k)\in[n+1]^k</math> such that <math>x_1+x_2+\cdots+x_k=n</math>.

Given a weak <math>k</math>-composition <math>(x_1,x_2,\ldots,x_k)</math> of <math>n</math>, if we set <math>y_i=x_i+1</math> for every <math>1\le i\le k</math>, then <math>y_i>0</math> and
:<math>
\begin{align}
y_1+y_2+\cdots +y_k
&=(x_1+1)+(x_2+1)+\cdots+(x_k+1)&=n+k,
\end{align}
</math>
i.e., <math>(y_1,y_2,\ldots,y_k)</math> is a <math>k</math>-composition of <math>n+k</math>. It is easy to see that it defines a bijection between weak <math>k</math>-compositions of <math>n</math> and <math>k</math>-compositions of <math>n+k</math>. Therefore, the number of weak <math>k</math>-compositions of <math>n</math> is <math>{n+k-1\choose k-1}</math>.
----
We now count the number of nonnegative integer solutions to <math>x_1+x_2+\cdots+x_k\le n</math>.

Let <math>x_{k+1}=n-(x_1+x_2+\cdots+x_k)</math>. Then <math>x_{k+1}\ge 0</math> and <math>x_1+x_2+\ldots+x_k+x_{k+1}=n</math>.
The problem is transformed to that counting the number of nonnegative integer solutions to the above equation. The answer is <math>{n+k\choose k}</math>.

=== Multisets ===
A <math>k</math>-subset of an <math>n</math>-set <math>S</math> is sometimes called a '''<math>k</math>-combination of <math>S</math> without repetitions'''. This suggests the problem of counting the number of <math>k</math>-combinations of <math>S</math> '''''with repetitions'''''; that is, we choose <math>k</math> elements of <math>S</math>, disregarding order and allowing repeated elements.

;Example
:<math>S=\{1,2,3,4\}</math>. All <math>3</math>-combination without repetitions are
::<math>\{1,2,3\},\{1,2,4\},\{1,3,4\},\{2,3,4\}\,</math>.
:Allowing repetitions, we also include the following 3-combinations:
::<math>
\begin{align}
&\{1,1,1\},\{1,1,2\},\{1,1,3\},\{1,1,4\},\{1,2,2\},\{1,3,3\},\{1,4,4\},\\
&\{2,2,2\},\{2,2,3\},\{2,2,4\},\{2,3,3\},\{2,4,4\},\\
&\{3,3,3\},\{3,3,4\},\{3,4,4\}\\
&\{4,4,4\}
\end{align}
</math>

Combinations with repetitions can be formally defined as '''multisets'''. A multiset is a set with repeated elements. Formally, a multiset <math>M</math> on a set <math>S</math> is a function <math>m:S\rightarrow \mathbb{N}</math>. For any element <math>x\in S</math>, the integer <math>m(x)\ge 0</math> is the number of repetitions of <math>x</math> in <math>M</math>, called the '''multiplicity''' of <math>x</math>. The sum of multiplicities <math>\sum_{x\in S}m(x)</math> is called the '''cardinality''' of <math>M</math> and is denoted as <math>|M|</math>.

A <math>k</math>-multiset on a set <math>S</math> is a multiset <math>M</math> on <math>S</math> with <math>|M|=k</math>. It is obvious that a <math>k</math>-combination of <math>S</math> with repetition is simply a <math>k</math>-multiset on <math>S</math>.

The set of all <math>k</math>-multisets on <math>S</math> is denoted <math>\left({S\choose k}\right)</math>. Assuming that <math>n=|S|</math>, denote <math>\left({n\choose k}\right)=\left|\left({S\choose k}\right)\right|</math>, which is the number of <math>k</math>-combinations of an <math>n</math>-set with repetitions.

Believe it or not: we have already evaluated the number <math>\left({n\choose k}\right)</math>. If <math>S=\{x_1,x_2,\ldots,x_n\}</math>, let <math>z_i=m(x_i)</math>, then <math>\left({n\choose k}\right)</math> is the number of nonnegative integer solutions to <math>z_1+z_2+\cdots+z_n=k</math>, which is the number of weak <math>n</math>-compositions of <math>k</math>, which we have seen is <math>{n+k-1\choose n-1}={n+k-1\choose k}</math>.

----

There is a direct combinatorial proof that <math>\left({n\choose k}\right)={n+k-1\choose k}</math>.

Given a <math>k</math>-multiset <math>0\le a_0\le a_1\le\cdots\le a_{k-1}\le n-1</math> on <math>[n]</math>, then defining <math>b_i=a_i+i</math>, we see that <math>\{b_0,b_1,\ldots,b_{k-1}\}</math> is a <math>k</math>-subset of <math>[n+k-1]</math>. Conversely, given a <math>k</math>-subset <math>0\le b_0\le b_1\le\cdots\le b_{k-1}\le n+k-2</math> of <math>[n+k-1]</math>, then defining <math>a_i=b_i-i</math>, we have that <math>\{b_0,b_1,\ldots,b_{k-1}\}</math> is a <math>k</math>-multiset on <math>[n]</math>. Therefore, we have a bijection between <math>\left({[n]\choose k}\right)</math> and <math>{[n+k-1]\choose k}</math>.

=== Multinomial coefficients ===
The binomial coefficient <math>{n\choose k}</math> may be interpreted as follows. Each element of an <math>n</math>-set is placed into two groups, with <math>k</math> elements in Group 1 and <math>n-k</math> elements in Group 2. The binomial coefficient <math>{n\choose k}</math> counts the number of such placements.

This suggests a generalization allowing more than two groups. Let <math>(a_1,a_2,\ldots,a_m)</math> be a tuple of nonnegative integers summing to <math>n</math>. Let <math>{n\choose a_1,a_2,\ldots,a_m}</math> denote the number of ways of assigning each element of an <math>n</math>-set to one of <math>m</math> groups <math>G_1,G_2,\ldots,G_m</math> so that <math>|G_i|=a_i</math>.

The binomial coefficient is just the case with two groups, and <math>{n\choose k}={n\choose k,n-k}</math>. The number <math>{n\choose a_1,a_2,\ldots,a_m}</math> is called a '''multinomial coefficient'''. We can think of it as that <math>n</math> labeled balls are assigned to <math>m</math> labeled bins, and <math>{n\choose a_1,a_2,\ldots,a_m}</math> is the number of assignments such that the <math>i</math>-th bin has <math>a_i</math> balls in it.

:<math>{n\choose a_1,a_2,\ldots,a_m}=\frac{n!}{a_1!a_2!\cdots a_m!}.</math>

== Permutations and Partitions ==

== The twelvfold way ==
We consider a very fundamental counting framework: counting functions <math>f:N\rightarrow M</math>. We can define different counting problems according to the types of mapping (1-1, on-to, arbitrary), and the types of the domain and the range (distinguishable, indistinguishable).
* Distinguishability of set:
* Types of mapping:

{|border="2" cellspacing="4" cellpadding="10" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|-
!bgcolor="#A7C1F2" | Elements of <math>N</math>
!bgcolor="#A7C1F2" | Elements of <math>M</math>
!bgcolor="#A7C1F2" | Any <math>f</math>
!bgcolor="#A7C1F2" | Injective (1-1) <math>f</math>
!bgcolor="#A7C1F2" | Surjective (on-to) <math>f</math>
|-
|align="center"| ''distinguishable''
|align="center"| ''distinguishable''
|align="center"| <math>m^n\,</math>
|align="center"| <math>\left(m\right)_n</math>
|align="center"| <math>m!S(n, m)\,</math>
|-
|align="center"| ''indistinguishable''
|align="center"| ''distinguishable''
|align="center"| <math>\left({m\choose n}\right)</math>
|align="center"|<math>{m\choose n}</math>
|align="center"|<math>\left({m\choose n-m}\right)</math>
|-
|align="center"| ''distinguishable''
|align="center"| ''indistinguishable''
|align="center"| <math>\sum_{k=1}^m S(n,k)</math>
|align="center"| <math>\begin{cases}1 & \mbox{if }n\le m\\ 0& \mbox{if }n>m\end{cases}</math>
|align="center"| <math>S(n,m)\,</math>
|-
|align="center"| ''indistinguishable''
|align="center"| ''indistinguishable''
|align="center"| <math>\sum_{k=1}^m p_k(n)</math>
|align="center"| <math>\begin{cases}1 & \mbox{if }n\le m\\ 0& \mbox{if }n>m\end{cases}</math>
|align="center"| <math>p_m(n)\,</math>
|}

{|border="2" cellspacing="4" cellpadding="10" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|-
!align="center" bgcolor="#A7C1F2" | balls per bin
!align="center" bgcolor="#A7C1F2" | unrestricted
!align="center" bgcolor="#A7C1F2" | ≤ 1
!align="center" bgcolor="#A7C1F2" | ≥ 1
|-
!align="center" bgcolor="#A7C1F2" | <math>n</math> labeled balls, <br><math>m</math> labeled bins
|align="center"| <math>n</math>-tuples <br>of <math>m</math> things
|align="center"| <math>n</math>-permutations <br>of <math>m</math> things
|align="center"| partition of <math>[n]</math> <br> into <math>m</math> ordered parts
|-
!align="center" bgcolor="#A7C1F2" | <math>n</math> unlabeled balls, <br><math>m</math> labeled bins
|align="center"| <math>n</math>-combinations of <math>[m]</math> <br>with repetitions
|align="center"| <math>n</math>-combinations of <math>[m]</math> <br> without repetitions
|align="center"| <math>m</math>-compositions <br>of <math>n</math>
|-
!align="center" bgcolor="#A7C1F2" | <math>n</math> labeled balls, <br><math>m</math> unlabeled bins
|align="center"| partitions of <math>[n]</math> <br>into <math>\le m</math> parts
|align="center"| <math>n</math> pigeons <br>into <math>m</math> holes
|align="center"| partitions of <math>[n]</math> <br>into <math>\le m</math> parts
|-
!align="center" bgcolor="#A7C1F2" | <math>n</math> unlabeled balls, <br><math>m</math> unlabeled bins
|align="center"| partitions of <math>n</math> <br>into <math>\le m</math> parts
|align="center"| <math>n</math> pigeons <br>into <math>m</math> holes
|align="center"| partitions of <math>n</math> <br>into <math>m</math> parts
|}

== Reference ==
* ''Stanley,'' Enumerative Combinatorics, Volume 1, Chapter 1.

Combinatorics (Fall 2010)/Basic enumeration

2010-07-10T07:21:23Z

210.28.131.82: /* The twelvfold way */

== Counting Problems ==

== Sets and Multisets ==
=== Subsets ===
We want to count the number of subsets of a set.

Let<math>S=\{x_1,x_2,\ldots,x_n\}</math> be an <math>n</math>-element set, or '''<math>n</math>-set''' for short.
Let <math>2^S=\{T\mid T\subset S\}</math> denote the set of all subset of <math>S</math>. <math>2^S</math> is called the '''power set''' of <math>S</math>.

We give a combinatorial proof that <math>|2^S|=2^n</math>. We observe that every subset <math>T\in 2^S</math> corresponds to a unique bit-vector <math>v\in\{0,1\}^S</math>, such that each bit <math>v_i</math> indicates whether <math>x_i\in S</math>. Formally, define a map <math>\phi:2^S\rightarrow\{0,1\}^n</math> by <math>\phi(T)=(v_1,v_2,\ldots,v_n)</math>, where
:<math>
v_i=\begin{cases}
1 & \mbox{if }x_i\in T\\
0 & \mbox{if }x_i\not\in T.
\end{cases}
</math>
The map <math>\phi</math> is a '''bijection''' (a 1-1 correspondence). The proof that <math>\phi</math> is a bijection is left as an exercise.

Since there is a bijection between <math>2^S</math> and <math>\{0,1\}^n</math>, it holds that <math>|2^S|=|\{0,1\}^n|=2^n\,</math>.

Here we apply ''"the rule of bijection"''.
*'''The rule of bijection''': if there exists a bijection between finite sets <math>P</math> and <math>Q</math>, then <math>|P|=|Q|</math>.

How do we know that <math>|\{0,1\}^n|=2^n\,</math>? We use ''"the rule of product"''.
*'''The rule of product''': for any finite sets <math>P</math> and <math>Q</math>, the cardinality of the Cartesian product <math>|P\times Q|=|P|\cdot|Q|</math>.

To count the size of <math>\{0,1\}^n\,</math>, we write <math>\{0,1\}^n=\{0,1\}\times\{0,1\}^{n-1}</math>, thus <math>|\{0,1\}^n|=2|\{0,1\}^{n-1}|\,</math>. Solving the recursion, we have that <math>|\{0,1\}^n|=2^n\,</math>.

There are two elements of the proof:
* Find a 1-1 correspondence between subsets of an <math>n</math>-set and <math>n</math>-bit vectors.
: An application of this in Computer Science is that we can use bit-array as a data structure for sets: any set defined over a '''universe''' <math>U</math> can be represented by an array of <math>|U|</math> bits.
* The rule of bijection: if there is a 1-1 correspondence between two sets, then their cardinalities are the same.

Many counting problems are solved by establishing a bijection between the set to be counted and some easy-to-count set. This kind of proofs are usually called (non-rigorously) '''combinatorial proofs'''.

----
We give an alternative proof that <math>|2^S|=2^n</math>. The proof needs another basic counting rule: ''"the rule of sum"''.
*'''The rule of sum''': for any '''''disjoint''''' finite sets <math>P</math> and <math>Q</math>, the cardinality of the union <math>|P\cup Q|=|P|+|Q|</math>.

Define the function <math>f(n)=|2^{S_n}|</math>, where <math>S_n=\{x_1,x_2,\ldots,x_n\}</math> is an <math>n</math>-set. Our goal is to compute <math>f(n)</math>. We prove the following recursion for <math>f(n)</math>.

{{Theorem|Lemma|
:<math>f(n)=2f(n-1)\,</math>.
}}
{{Proof|
Fix an element <math>x_n</math>, let <math>U</math> be the set of subsets of <math>S_n</math> that contain <math>x_n</math> and let <math>V</math> be the set of subsets of <math>S_n</math> that do not contain <math>x_n</math>. It is obvious that <math>U</math> and <math>V</math> are disjoint (i.e. <math>U\cap V=\emptyset</math>) and <math>2^{S_n}=U\cup V</math>, because any subset of <math>S_n</math> either contains <math>x_n</math> or does not contain <math>x_n</math> but not both.

Applying the rule of sum,
:<math>f(n)=|U\cup V|=|U|+|V|</math>.

The next observation is that <math>|U|=|V|=f(n-1)</math>, because <math>V</math> is exactly the <math>2^{S_{n-1}}</math>, and <math>U</math> is the set resulting from adding <math>x_n</math> to every member of <math>2^{S_{n-1}}</math>. Therefore,
:<math>f(n)=|U|+|V|=f(n-1)+f(n-1)=2f(n-1)\,</math>.
}}
The elementary case <math>f(0)=1</math>, because <math>\emptyset</math> has only one subset <math>\emptyset</math>. Solving the recursion, we have that <math>|2^S|=f(n)=2^n</math>.

=== Subsets of fixed size ===
We then count the number of subsets of fixed size of a set. Again, let <math>S=\{x_1,x_2,\ldots,x_n\}</math> be an <math>n</math>-set. We define <math>{S\choose k}</math> to be the set of all <math>k</math>-elements subsets (or '''<math>k</math>-subsets''') of <math>S</math>. Formally, <math>{S\choose k}=\{T\subseteq S\mid |T|=k\}</math>. The set <math>{S\choose k}</math> is sometimes called the '''<math>k</math>-uniform''' of <math>S</math>.

We denote that <math>{n\choose k}=\left|{S\choose k}\right|</math>. The notation <math>{n\choose k}</math> is read "<math>n</math> choose <math>k</math>".

{{Theorem|Theorem|
:<math>{n\choose k}=\frac{n(n-1)\cdots(n-k+1)}{k(k-1)\cdots 1}=\frac{n!}{k!(n-k)!}</math>.
}}
{{Proof|
The number of '''ordered''' <math>k</math>-subsets of an <math>n</math>-set is <math>n(n-1)\cdots(n-k+1)</math>. Every <math>k</math>-subset has <math>k!=k(k-1)\cdots1</math> ways to order it.
}}

;Some notations
* <math>n!</math>, read "<math>n</math> factorial", is defined as that <math>n!=n(n-1)(n-2)\cdots 1</math>, with the convention that <math>0!=1</math>.
* <math>n(n-1)\cdots(n-k+1)=\frac{n!}{(n-k)!}</math> is usually denoted as <math>(n)_k\,</math>, read "<math>n</math> lower factorial <math>k</math>".

The quantity <math>{n\choose k}</math> is called a '''binomial coefficient'''.

{{Theorem|Proposition|
# <math>{n\choose k}={n\choose n-k}</math>;
# <math>\sum_{k=0}^n {n\choose k}=2^n</math>.
}}
{{Proof|
1. We give two proofs for the first equation:
:(1) (numerical proof)
::<math>{n\choose k}=\frac{n!}{k!(n-k)!}={n\choose n-k}</math>.
:(2) (combinatorial proof)
::Choosing <math>k</math> elements from an <math>n</math>-set is equivalent to choosing the <math>n-k</math> elements to leave out. Formally, every <math>k</math>-subset <math>T\in{S\choose k}</math> is uniquely specified by its complement <math>S\setminus T\in {S\choose n-k}</math>, and the same holds for <math>(n-k)</math>-subsets, thus we have a bijection between <math>{S\choose k}</math> and <math>{S\choose n-k}</math>.
2. The second equation can also be proved in different ways, but the combinatorial proof is much easier. For an <math>n</math>-element set <math>S</math>, it is obvious that we can enumerate all subsets of <math>S</math> by enumerating <math>k</math>-subsets for every possible size <math>k</math>, i.e. it holds that
:<math>
2^S=\bigcup_{k=0}^n{S\choose k}.
</math>
For different <math>k</math>, <math>{S\choose k}</math> are obviously disjoint. By the rule of sum,
:<math>2^n=|2^S|=\left|\bigcup_{k=0}^n{S\choose k}\right|=\sum_{k=0}^n\left|{S\choose k}\right|=\sum_{k=0}^n {n\choose k}</math>.
}}

<math>{n\choose k}</math> is called binomial coefficient for a reason. A binomial is a polynomial with two terms ("poly-" means many, and "bi-" means two, like in "binary", "bipartite", etc). The following celebrated '''Binomial Theorem''' states that if a power of a binomial is expanded, the coefficients in the resulting polynomial are the binomial coefficients.

{{Theorem|Theorem (Binomial theorem)|
:<math>(1+x)^n=\sum_{k=0}^n{n\choose k}x^k</math>.
}}
{{Proof|
Write <math>(1+x)^n</math> as the product of <math>n</math> factors
:<math>(1+x)(1+x)\cdots (1+x)</math>.
The term <math>x^k</math> is obtained by choosing <math>x</math> from <math>k</math> factors and 1 from the rest <math>(n-k)</math> factors. There are <math>{n\choose k}</math> ways of choosing these <math>k</math> factors, so the coefficient of <math>x^k</math> is <math>{n\choose k}</math>.
}}

The following proposition has an easy proof due to the binomial theorem.
{{Theorem| Proposition|
:For <math>n>0</math>, the numbers of subsets of an <math>n</math>-set of even and of odd cardinality are equal.
}}
{{Proof|
Set <math>x=-1</math> in the binomial theorem.
:<math>
0=(1-1)^n=\sum_{k=0}^n{n\choose k}(-1)^k=\sum_{\overset{0\le k\le n}{k \text{ even}}}{n\choose k}-\sum_{\overset{0\le k\le n}{k \text{ odd}}}{n\choose k},
</math>
therefore
:<math>\sum_{\overset{0\le k\le n}{k \text{ even}}}{n\choose k}=\sum_{\overset{0\le k\le n}{k \text{ odd}}}{n\choose k}.</math>
}}

For counting problems, what we care about are ''numbers''. In the binomial theorem, a formal ''variable'' <math>x</math> is introduced. It looks having nothing to do with our problem, but turns out to be very useful. This idea of introducing a formal variable is the basic idea of some advanced counting techniques, which will be discussed in future classes.

=== Compositions of an integer ===
A '''composition''' of <math>n</math> is an expression of <math>n</math> as an <font color="red">''ordered''</font> sum of <font color="red">''positive''</font> integers. A '''<math>k</math>-composition''' of <math>n</math> is a composition of <math>n</math> with exactly <math>k</math> positive summands.

Formally, a <math>k</math>-composition of <math>n</math> is a <math>k</math>-tuple <math>(a_1,a_2,\ldots,a_k)\in\{1,2,\ldots,n\}^k</math> such that <math>a_1+a_2+\cdots+a_k=n</math>.

Suppose we have <math>n</math> identical balls in a line. A <math>k</math>-composition partitions these <math>n</math> balls into <math>k</math> ''nonempty'' sets, illustrated as follows.
:<math>
\begin{array}{c|cc|c|c|ccc|cc}
\bigcirc \,&\, \bigcirc \,& \bigcirc \,&\, \bigcirc \,&\, \bigcirc \,&\, \bigcirc &\, \bigcirc &\, \bigcirc \,&\, \bigcirc \,& \bigcirc
\end{array}
</math>
So the number of <math>k</math>-compositions of <math>n</math> equals the number of ways we put <math>k-1</math> bars "<math>|</math>" into <math>n-1</math> slots "<math>\sqcup</math>", where each slot has at most one bar (because all the summands <math>a_i>0</math>):
:<math>
\bigcirc \sqcup \bigcirc \sqcup \bigcirc \sqcup \bigcirc \sqcup \bigcirc \sqcup \bigcirc \sqcup \bigcirc \sqcup \bigcirc \sqcup \bigcirc \sqcup \bigcirc
</math>
which is equal to the number of ways of choosing <math>k-1</math> slots out of <math>n-1</math> slots, which is <math>{n-1\choose k-1}</math>.

This graphic argument can be expressed as a formal proof. We construct a bijection between the set of <math>k</math>-compositions of <math>n</math> and <math>{\{1,2,\ldots,n-1\}\choose k-1}</math> as follows.

Let <math>\phi</math> be a mapping that given a <math>k</math>-composition <math>(a_1,a_2,\ldots,a_k)</math> of <math>n</math>,
:<math>
\begin{align}
\phi((a_1,a_2,\ldots,a_k))
&=\{a_1,\,\,a_1+a_2,\,\,a_1+a_2+a_3,\,\,\ldots,\,\,a_1+a_2+\cdots+a_{k-1}\}\\
&=\left\{\sum_{i=1}^ja_i\,\,\bigg|\,\, 1\le j<k\right\}.
\end{align}
</math>
<math>\phi</math> maps every <math>k</math>-composition to a <math>(k-1)</math>-subset of <math>\{1,2,\ldots,n-1\}</math>. It is easy to verify that <math>\phi</math> is a bijection, thus the number of <math>k</math>-compositions of <math>n</math> is <math>{n-1\choose k-1}</math>.
----
The number of <math>k</math>-compositions of <math>n</math> is equal to the number of ''positive'' integer solutions to <math>x_1+x_2+\cdots+x_k=n</math>. This suggests us to relax the constraint and count the number of ''nonnegative'' integer solutions to <math>x_1+x_2+\cdots+x_k=n</math>. We call such a solution a '''weak <math>k</math>-composition''' of <math>n</math>.

Formally, a weak <math>k</math>-composition of <math>n</math> is a tuple <math>(x_1,x_2,\ldots,x_k)\in[n+1]^k</math> such that <math>x_1+x_2+\cdots+x_k=n</math>.

Given a weak <math>k</math>-composition <math>(x_1,x_2,\ldots,x_k)</math> of <math>n</math>, if we set <math>y_i=x_i+1</math> for every <math>1\le i\le k</math>, then <math>y_i>0</math> and
:<math>
\begin{align}
y_1+y_2+\cdots +y_k
&=(x_1+1)+(x_2+1)+\cdots+(x_k+1)&=n+k,
\end{align}
</math>
i.e., <math>(y_1,y_2,\ldots,y_k)</math> is a <math>k</math>-composition of <math>n+k</math>. It is easy to see that it defines a bijection between weak <math>k</math>-compositions of <math>n</math> and <math>k</math>-compositions of <math>n+k</math>. Therefore, the number of weak <math>k</math>-compositions of <math>n</math> is <math>{n+k-1\choose k-1}</math>.
----
We now count the number of nonnegative integer solutions to <math>x_1+x_2+\cdots+x_k\le n</math>.

Let <math>x_{k+1}=n-(x_1+x_2+\cdots+x_k)</math>. Then <math>x_{k+1}\ge 0</math> and <math>x_1+x_2+\ldots+x_k+x_{k+1}=n</math>.
The problem is transformed to that counting the number of nonnegative integer solutions to the above equation. The answer is <math>{n+k\choose k}</math>.

=== Multisets ===
A <math>k</math>-subset of an <math>n</math>-set <math>S</math> is sometimes called a '''<math>k</math>-combination of <math>S</math> without repetitions'''. This suggests the problem of counting the number of <math>k</math>-combinations of <math>S</math> '''''with repetitions'''''; that is, we choose <math>k</math> elements of <math>S</math>, disregarding order and allowing repeated elements.

;Example
:<math>S=\{1,2,3,4\}</math>. All <math>3</math>-combination without repetitions are
::<math>\{1,2,3\},\{1,2,4\},\{1,3,4\},\{2,3,4\}\,</math>.
:Allowing repetitions, we also include the following 3-combinations:
::<math>
\begin{align}
&\{1,1,1\},\{1,1,2\},\{1,1,3\},\{1,1,4\},\{1,2,2\},\{1,3,3\},\{1,4,4\},\\
&\{2,2,2\},\{2,2,3\},\{2,2,4\},\{2,3,3\},\{2,4,4\},\\
&\{3,3,3\},\{3,3,4\},\{3,4,4\}\\
&\{4,4,4\}
\end{align}
</math>

Combinations with repetitions can be formally defined as '''multisets'''. A multiset is a set with repeated elements. Formally, a multiset <math>M</math> on a set <math>S</math> is a function <math>m:S\rightarrow \mathbb{N}</math>. For any element <math>x\in S</math>, the integer <math>m(x)\ge 0</math> is the number of repetitions of <math>x</math> in <math>M</math>, called the '''multiplicity''' of <math>x</math>. The sum of multiplicities <math>\sum_{x\in S}m(x)</math> is called the '''cardinality''' of <math>M</math> and is denoted as <math>|M|</math>.

A <math>k</math>-multiset on a set <math>S</math> is a multiset <math>M</math> on <math>S</math> with <math>|M|=k</math>. It is obvious that a <math>k</math>-combination of <math>S</math> with repetition is simply a <math>k</math>-multiset on <math>S</math>.

The set of all <math>k</math>-multisets on <math>S</math> is denoted <math>\left({S\choose k}\right)</math>. Assuming that <math>n=|S|</math>, denote <math>\left({n\choose k}\right)=\left|\left({S\choose k}\right)\right|</math>, which is the number of <math>k</math>-combinations of an <math>n</math>-set with repetitions.

Believe it or not: we have already evaluated the number <math>\left({n\choose k}\right)</math>. If <math>S=\{x_1,x_2,\ldots,x_n\}</math>, let <math>z_i=m(x_i)</math>, then <math>\left({n\choose k}\right)</math> is the number of nonnegative integer solutions to <math>z_1+z_2+\cdots+z_n=k</math>, which is the number of weak <math>n</math>-compositions of <math>k</math>, which we have seen is <math>{n+k-1\choose n-1}={n+k-1\choose k}</math>.

----

There is a direct combinatorial proof that <math>\left({n\choose k}\right)={n+k-1\choose k}</math>.

Given a <math>k</math>-multiset <math>0\le a_0\le a_1\le\cdots\le a_{k-1}\le n-1</math> on <math>[n]</math>, then defining <math>b_i=a_i+i</math>, we see that <math>\{b_0,b_1,\ldots,b_{k-1}\}</math> is a <math>k</math>-subset of <math>[n+k-1]</math>. Conversely, given a <math>k</math>-subset <math>0\le b_0\le b_1\le\cdots\le b_{k-1}\le n+k-2</math> of <math>[n+k-1]</math>, then defining <math>a_i=b_i-i</math>, we have that <math>\{b_0,b_1,\ldots,b_{k-1}\}</math> is a <math>k</math>-multiset on <math>[n]</math>. Therefore, we have a bijection between <math>\left({[n]\choose k}\right)</math> and <math>{[n+k-1]\choose k}</math>.

=== Multinomial coefficients ===

== Permutations and Partitions ==

== The twelvfold way ==
We consider a very fundamental counting framework: counting functions <math>f:N\rightarrow M</math>. We can define different counting problems according to the types of mapping (1-1, on-to, arbitrary), and the types of the domain and the range (distinguishable, indistinguishable).
* Distinguishability of set:
* Types of mapping:

{|border="2" cellspacing="4" cellpadding="10" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|-
!bgcolor="#A7C1F2" | Elements of <math>N</math>
!bgcolor="#A7C1F2" | Elements of <math>M</math>
!bgcolor="#A7C1F2" | Any <math>f</math>
!bgcolor="#A7C1F2" | Injective (1-1) <math>f</math>
!bgcolor="#A7C1F2" | Surjective (on-to) <math>f</math>
|-
|align="center"| ''distinguishable''
|align="center"| ''distinguishable''
|align="center"| <math>m^n\,</math>
|align="center"| <math>\left(m\right)_n</math>
|align="center"| <math>m!S(n, m)\,</math>
|-
|align="center"| ''indistinguishable''
|align="center"| ''distinguishable''
|align="center"| <math>\left({m\choose n}\right)</math>
|align="center"|<math>{m\choose n}</math>
|align="center"|<math>\left({m\choose n-m}\right)</math>
|-
|align="center"| ''distinguishable''
|align="center"| ''indistinguishable''
|align="center"| <math>\sum_{k=1}^m S(n,k)</math>
|align="center"| <math>\begin{cases}1 & \mbox{if }n\le m\\ 0& \mbox{if }n>m\end{cases}</math>
|align="center"| <math>S(n,m)\,</math>
|-
|align="center"| ''indistinguishable''
|align="center"| ''indistinguishable''
|align="center"| <math>\sum_{k=1}^m p_k(n)</math>
|align="center"| <math>\begin{cases}1 & \mbox{if }n\le m\\ 0& \mbox{if }n>m\end{cases}</math>
|align="center"| <math>p_m(n)\,</math>
|}

{|border="2" cellspacing="4" cellpadding="10" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|-
!align="center" bgcolor="#A7C1F2" | balls per bin
!align="center" bgcolor="#A7C1F2" | unrestricted
!align="center" bgcolor="#A7C1F2" | ≤ 1
!align="center" bgcolor="#A7C1F2" | ≥ 1
|-
!align="center" bgcolor="#A7C1F2" | <math>n</math> labeled balls, <br><math>m</math> labeled bins
|align="center"| <math>n</math>-tuples <br>of <math>m</math> things
|align="center"| <math>n</math>-permutations <br>of <math>m</math> things
|align="center"| partition of <math>[n]</math> <br> into <math>m</math> ordered parts
|-
!align="center" bgcolor="#A7C1F2" | <math>n</math> unlabeled balls, <br><math>m</math> labeled bins
|align="center"| <math>n</math>-combinations of <math>[m]</math> <br>with repetitions
|align="center"| <math>n</math>-combinations of <math>[m]</math> <br> without repetitions
|align="center"| <math>m</math>-compositions <br>of <math>n</math>
|-
!align="center" bgcolor="#A7C1F2" | <math>n</math> labeled balls, <br><math>m</math> unlabeled bins
|align="center"| partitions of <math>[n]</math> <br>into <math>\le m</math> parts
|align="center"| <math>n</math> pigeons <br>into <math>m</math> holes
|align="center"| partitions of <math>[n]</math> <br>into <math>\le m</math> parts
|-
!align="center" bgcolor="#A7C1F2" | <math>n</math> unlabeled balls, <br><math>m</math> unlabeled bins
|align="center"| partitions of <math>n</math> <br>into <math>\le m</math> parts
|align="center"| <math>n</math> pigeons <br>into <math>m</math> holes
|align="center"| partitions of <math>n</math> <br>into <math>m</math> parts
|}

== Reference ==
* ''Stanley,'' Enumerative Combinatorics, Volume 1, Chapter 1.

Randomized Algorithms (Spring 2010)/Randomized approximation algorithms

2010-05-27T00:45:34Z

210.28.131.82: /* Rounding LPs */

== Approximation Algorithms ==

=== Coping with the NP-hardness ===

=== Combinatorial approximation algorithms ===

=== LP-based approximation algorithms ===

== Randomized Rounding ==

=== The integrality gap ===

=== Randomized rounding ===

=== Max-SAT ===

=== Covering and Packing ===

Randomized Algorithms (Spring 2010)/Approximate counting, linear programming

2010-05-21T17:08:11Z

210.28.131.82: /* Volume estimation */

== Counting Problems ==

=== Complexity model ===

=== FPRAS ===

== Approximate Counting ==
Let us consider the following formal problem.

Let <math>U</math> be a finite set of known size, and let <math>G\subseteq U</math>. We want to compute the size of <math>G</math>, <math>|G|</math>.

We assume two devices:
* A '''uniform sampler''' <math>\mathcal{U}</math>, which uniformly and independently samples a member of <math>U</math> upon each calling.
* A '''membership oracle''' of <math>G</math>, denoted <math>\mathcal{O}</math>. Given as the input an <math>x\in U</math>, <math>\mathcal{O}(x)</math> indicates whether or not <math>x</math> is a member of <math>G</math>.

Equipped by <math>\mathcal{U}</math> and <math>\mathcal{O}</math>, we can have the following Monte Carlo algorithm:
*Choose <math>N</math> independent samples from <math>U</math>by the uniform sampler <math>\mathcal{U}</math>, represented by the random variables <math>X_1,X_2,\ldots, X_N</math>.
* Let <math>Y_i</math> be the indicator random variable defined as <math>Y_i=\mathcal{O}(X_i)</math>, namely, <math>Y_i</math> indicates whether <math>X_i\in G</math>.
* Define the estimator random variable
::<math>Z=\frac{|U|}{N}\sum_{i=1}^N Y_i.</math>

It is easy to see that <math>\mathbf{E}[Z]=|G|</math> and we might hope that with high probability the value of <math>Z</math> is close to <math>|G|</math>. Formally, <math>Z</math> is called an <math>\epsilon</math>-approximation of <math>|G|</math> if
:<math>
(1-\epsilon)|G|\le Z\le (1+\epsilon)|G|.
</math>

The following theorem states that the probabilistic accuracy of the estimation depends on the number of samples and the ratio between <math>|G|</math> and <math>|U|</math>

{|border="1"
|'''Theorem (estimator theorem)'''
:Let <math>\alpha=\frac{|G|}{|U|}</math>. Then the Monte Carlo method yields an <math>\epsilon</math>-approximation to <math>|G|</math> with probability at least <math>1-\delta</math> provided
::<math>N\ge\frac{4}{\epsilon \alpha}\ln\frac{2}{\delta}</math>.
|}
'''Proof''': Use the Chernoff bound.

<math>\square</math>

A counting algorithm for the set <math>G</math> has to deal with the following three complications:
* Implement the membership oracle <math>\mathcal{O}</math>. This is usually straightforward, or assumed by the model.
* Implement the uniform sampler <math>\mathcal{U}</math>. As we have seen, this is usually approximated by random walks. How to design the random walk and bound its mixing rate is usually technical challenging, if possible at all.
* Deal with exponentially small <math>\alpha=\frac{|G|}{|U|}</math>. This requires us to cleverly choose the universe <math>U</math>. Sometimes this needs some nontrivial ideas.

=== Counting DNFs ===
A disjunctive normal form (DNF) formular is a disjunction (OR) of clauses, where each clause is a conjunction (AND) of literals. For example:
:<math>(x_1\wedge \overline{x_2}\wedge x_3)\vee(x_2\wedge x_4)\vee(\overline{x_1}\wedge x_3\wedge x_4)</math>.
Note the difference from the conjunctive normal forms (CNF).

Given a DNF formular <math>\phi</math> as the input, the problem is to count the number of satisfying assignments of <math>\phi</math>. This problem is '''#P-complete'''.

Naively applying the Monte Carlo method will not give a good answer. Suppose that there are <math>n</math> variables. Let <math>U=\{\mathrm{true},\mathrm{false}\}^n</math> be the set of all truth assignments of the <math>n</math> variables. Let <math>G=\{x\in U\mid \phi(x)=\mathrm{true}\}</math> be the set of satisfying assignments for <math>\phi</math>. The straightforward use of Monte Carlo method samples <math>N</math> assignments from <math>U</math> and check how many of them satisfy <math>\phi</math>. This algorithm fails when <math>|G|/|U|</math> is exponentially small, namely, when exponentially small fraction of the assignments satisfy the input DNF formula.

;The union of sets problem
We reformulate the DNF counting problem in a more abstract framework, called the '''union of sets''' problem.

Let <math>V</math> be a finite universe. We are given <math>m</math> subsets <math>H_1,H_2,\ldots,H_m\subseteq V</math>. The following assumptions hold:
*For all <math>i</math>, <math>|H_i|</math> is computable in poly-time.
*It is possible to sample uniformly from each individual <math>H_i</math>.
*For any <math>x\in V</math>, it can be determined in poly-time whether <math>x\in H_i</math>.

The goal is to compute the size of <math>H=\bigcup_{i=1}^m H_i</math>.

DNF counting can be interpreted in this general framework as follows. Suppose that the DNF formula <math>\phi</math> is defined on <math>n</math> variables, and <math>\phi</math> contains <math>m</math> clauses <math>C_1,C_2,\ldots,C_m</math>, where clause <math>C_i</math> has <math>k_i</math> literals. Without loss of generality, we assume that in each clause, each variable appears at most once.
* <math>V</math> is the set of all assignments.
*Each <math>H_i</math> is the set of satisfying assignments for the <math>i</math>-th clause <math>C_i</math> of the DNF formular <math>\phi</math>. Then the union of sets <math>H=\bigcup_i H_i</math> gives the set of satisfying assignments for <math>\phi</math>.
* Each clause <math>C_i</math> is a conjunction (AND) of literals. It is not hard to see that <math>|H_i|=2^{n-k_i}</math>, which is efficiently computable.
* Sampling from an <math>H_i</math> is simple: we just fix the assignments of the <math>k_i</math> literals of that clause, and sample uniformly and independently the rest <math>(n-k_i)</math> variable assignments.
* For each assignment <math>x</math>, it is easy to check whether it satisfies a clause <math>C_i</math>, thus it is easy to determine whether <math>x\in H_i</math>.

;The coverage algorithm
We now introduce the coverage algorithm for the union of sets problem.

Consider the multiset <math>U</math> defined by
:<math>U=H_1\uplus H_2\uplus\cdots \uplus H_m</math>,
where <math>\uplus</math> denotes the multiset union. It is more convenient to define <math>U</math> as the set
:<math>U=\{(x,i)\mid x\in H_i\}</math>.
For each <math>x\in H</math>, there may be more than one instances of <math>(x,i)\in U</math>. We can choose a unique representative among the multiple instances <math>(x,i)\in U</math> for the same <math>x\in H</math>, by choosing the <math>(x,i)</math> with the minimum <math>i</math>, and form a set <math>G</math>.

Formally, <math>G=\{(x,i)\in U\mid \forall (x,j)\in U, j\le i\}</math>. Every <math>x\in H</math> corresponds to a unique <math>(x,i)\in G</math> where <math>i</math> is the smallest among <math>x\in H_i</math>.

It is obvious that <math>G\subseteq U</math> and
:<math>|G|=|H|</math>.

Therefore, estimation of <math>|H|</math> is reduced to estimation of <math>|G|</math> with <math>G\subseteq U</math>. Then <math>|G|</math> can have an <math>\epsilon</math>-approximation with probability <math>(1-\delta)</math> in poly-time, if we can uniformly sample from <math>U</math> and <math>|G|/|U|</math> is suitably small.

An uniform sample from <math>U</math> can be implemented as follows:
* generate an <math>i\in\{1,2,\ldots,m\}</math> with probability <math>\frac{|H_i|}{\sum_{i=1}^m|H_i|}</math>;
* uniformly sample an <math>x\in H_i</math>, and return <math>(x,i)</math>.

It is easy to see that this gives a uniform member of <math>U</math>. The above sampling procedure is poly-time because each <math>|H_i|</math> can be computed in poly-time, and sampling uniformly from each <math>H_i</math> is poly-time.

We now only need to lower bound the ratio
:<math>\alpha=\frac{|G|}{|U|}</math>.

We claim that
:<math>\alpha\ge\frac{1}{m}</math>.
It is easy to see this, because each <math>x\in H</math> has at most <math>m</math> instances of <math>(x,i)</math> in <math>U</math>, and we already know that <math>|G|=|H|</math>.

Due to the estimator theorem, this needs <math>\frac{4m}{\epsilon}\ln\frac{2}{\delta}</math> uniform random samples from <math>U</math>.

This gives the coverage algorithm for the abstract problem of the union of sets. The DNF counting is a special case.

=== Permanents and perfect matchings ===

=== Volume estimation ===

== Linear Programming ==

=== LP and convex polytopes ===

=== The simplex algorithms ===

=== An LP solver via random walks ===

Randomized Algorithms (Spring 2010)/Approximate counting, linear programming

2010-05-21T16:44:30Z

210.28.131.82: /* Counting DNFs */

== Counting Problems ==

=== Complexity model ===

=== FPRAS ===

== Approximate Counting ==
Let us consider the following formal problem.

Let <math>U</math> be a finite set of known size, and let <math>G\subseteq U</math>. We want to compute the size of <math>G</math>, <math>|G|</math>.

We assume two devices:
* A '''uniform sampler''' <math>\mathcal{U}</math>, which uniformly and independently samples a member of <math>U</math> upon each calling.
* A '''membership oracle''' of <math>G</math>, denoted <math>\mathcal{O}</math>. Given as the input an <math>x\in U</math>, <math>\mathcal{O}(x)</math> indicates whether or not <math>x</math> is a member of <math>G</math>.

Equipped by <math>\mathcal{U}</math> and <math>\mathcal{O}</math>, we can have the following Monte Carlo algorithm:
*Choose <math>N</math> independent samples from <math>U</math>by the uniform sampler <math>\mathcal{U}</math>, represented by the random variables <math>X_1,X_2,\ldots, X_N</math>.
* Let <math>Y_i</math> be the indicator random variable defined as <math>Y_i=\mathcal{O}(X_i)</math>, namely, <math>Y_i</math> indicates whether <math>X_i\in G</math>.
* Define the estimator random variable
::<math>Z=\frac{|U|}{N}\sum_{i=1}^N Y_i.</math>

It is easy to see that <math>\mathbf{E}[Z]=|G|</math> and we might hope that with high probability the value of <math>Z</math> is close to <math>|G|</math>. Formally, <math>Z</math> is called an <math>\epsilon</math>-approximation of <math>|G|</math> if
:<math>
(1-\epsilon)|G|\le Z\le (1+\epsilon)|G|.
</math>

The following theorem states that the probabilistic accuracy of the estimation depends on the number of samples and the ratio between <math>|G|</math> and <math>|U|</math>

{|border="1"
|'''Theorem (estimator theorem)'''
:Let <math>\alpha=\frac{|G|}{|U|}</math>. Then the Monte Carlo method yields an <math>\epsilon</math>-approximation to <math>|G|</math> with probability at least <math>1-\delta</math> provided
::<math>N\ge\frac{4}{\epsilon \alpha}\ln\frac{2}{\delta}</math>.
|}
'''Proof''': Use the Chernoff bound.

<math>\square</math>

A counting algorithm for the set <math>G</math> has to deal with the following three complications:
* Implement the membership oracle <math>\mathcal{O}</math>. This is usually straightforward, or assumed by the model.
* Implement the uniform sampler <math>\mathcal{U}</math>. As we have seen, this is usually approximated by random walks. How to design the random walk and bound its mixing rate is usually technical challenging, if possible at all.
* Deal with exponentially small <math>\alpha=\frac{|G|}{|U|}</math>. This requires us to cleverly choose the universe <math>U</math>. Sometimes this needs some nontrivial ideas.

=== Counting DNFs ===
A disjunctive normal form (DNF) formular is a disjunction (OR) of clauses, where each clause is a conjunction (AND) of literals. For example:
:<math>(x_1\wedge \overline{x_2}\wedge x_3)\vee(x_2\wedge x_4)\vee(\overline{x_1}\wedge x_3\wedge x_4)</math>.
Note the difference from the conjunctive normal forms (CNF).

Given a DNF formular <math>\phi</math> as the input, the problem is to count the number of satisfying assignments of <math>\phi</math>. This problem is '''#P-complete'''.

Naively applying the Monte Carlo method will not give a good answer. Suppose that there are <math>n</math> variables. Let <math>U=\{\mathrm{true},\mathrm{false}\}^n</math> be the set of all truth assignments of the <math>n</math> variables. Let <math>G=\{x\in U\mid \phi(x)=\mathrm{true}\}</math> be the set of satisfying assignments for <math>\phi</math>. The straightforward use of Monte Carlo method samples <math>N</math> assignments from <math>U</math> and check how many of them satisfy <math>\phi</math>. This algorithm fails when <math>|G|/|U|</math> is exponentially small, namely, when exponentially small fraction of the assignments satisfy the input DNF formula.

;The union of sets problem
We reformulate the DNF counting problem in a more abstract framework, called the '''union of sets''' problem.

Let <math>V</math> be a finite universe. We are given <math>m</math> subsets <math>H_1,H_2,\ldots,H_m\subseteq V</math>. The following assumptions hold:
*For all <math>i</math>, <math>|H_i|</math> is computable in poly-time.
*It is possible to sample uniformly from each individual <math>H_i</math>.
*For any <math>x\in V</math>, it can be determined in poly-time whether <math>x\in H_i</math>.

The goal is to compute the size of <math>H=\bigcup_{i=1}^m H_i</math>.

DNF counting can be interpreted in this general framework as follows. Suppose that the DNF formula <math>\phi</math> is defined on <math>n</math> variables, and <math>\phi</math> contains <math>m</math> clauses <math>C_1,C_2,\ldots,C_m</math>, where clause <math>C_i</math> has <math>k_i</math> literals. Without loss of generality, we assume that in each clause, each variable appears at most once.
* <math>V</math> is the set of all assignments.
*Each <math>H_i</math> is the set of satisfying assignments for the <math>i</math>-th clause <math>C_i</math> of the DNF formular <math>\phi</math>. Then the union of sets <math>H=\bigcup_i H_i</math> gives the set of satisfying assignments for <math>\phi</math>.
* Each clause <math>C_i</math> is a conjunction (AND) of literals. It is not hard to see that <math>|H_i|=2^{n-k_i}</math>, which is efficiently computable.
* Sampling from an <math>H_i</math> is simple: we just fix the assignments of the <math>k_i</math> literals of that clause, and sample uniformly and independently the rest <math>(n-k_i)</math> variable assignments.
* For each assignment <math>x</math>, it is easy to check whether it satisfies a clause <math>C_i</math>, thus it is easy to determine whether <math>x\in H_i</math>.

;The coverage algorithm
We now introduce the coverage algorithm for the union of sets problem.

Consider the multiset <math>U</math> defined by
:<math>U=H_1\uplus H_2\uplus\cdots \uplus H_m</math>,
where <math>\uplus</math> denotes the multiset union. It is more convenient to define <math>U</math> as the set
:<math>U=\{(x,i)\mid x\in H_i\}</math>.
For each <math>x\in H</math>, there may be more than one instances of <math>(x,i)\in U</math>. We can choose a unique representative among the multiple instances <math>(x,i)\in U</math> for the same <math>x\in H</math>, by choosing the <math>(x,i)</math> with the minimum <math>i</math>, and form a set <math>G</math>.

Formally, <math>G=\{(x,i)\in U\mid \forall (x,j)\in U, j\le i\}</math>. Every <math>x\in H</math> corresponds to a unique <math>(x,i)\in G</math> where <math>i</math> is the smallest among <math>x\in H_i</math>.

It is obvious that <math>G\subseteq U</math> and
:<math>|G|=|H|</math>.

Therefore, estimation of <math>|H|</math> is reduced to estimation of <math>|G|</math> with <math>G\subseteq U</math>. Then <math>|G|</math> can have an <math>\epsilon</math>-approximation with probability <math>(1-\delta)</math> in poly-time, if we can uniformly sample from <math>U</math> and <math>|G|/|U|</math> is suitably small.

An uniform sample from <math>U</math> can be implemented as follows:
* generate an <math>i\in\{1,2,\ldots,m\}</math> with probability <math>\frac{|H_i|}{\sum_{i=1}^m|H_i|}</math>;
* uniformly sample an <math>x\in H_i</math>, and return <math>(x,i)</math>.

It is easy to see that this gives a uniform member of <math>U</math>. The above sampling procedure is poly-time because each <math>|H_i|</math> can be computed in poly-time, and sampling uniformly from each <math>H_i</math> is poly-time.

We now only need to lower bound the ratio
:<math>\alpha=\frac{|G|}{|U|}</math>.

We claim that
:<math>\alpha\ge\frac{1}{m}</math>.
It is easy to see this, because each <math>x\in H</math> has at most <math>m</math> instances of <math>(x,i)</math> in <math>U</math>, and we already know that <math>|G|=|H|</math>.

Due to the estimator theorem, this needs <math>\frac{4m}{\epsilon}\ln\frac{2}{\delta}</math> uniform random samples from <math>U</math>.

This gives the coverage algorithm for the abstract problem of the union of sets. The DNF counting is a special case.

=== Permanents and perfect matchings ===

== Volume estimation ==

== Linear Programming ==

=== LP and convex polytopes ===

=== The simplex algorithms ===

=== An LP solver via random walks ===

Randomized Algorithms (Spring 2010)/Approximate counting, linear programming

2010-05-21T15:30:59Z

210.28.131.82: /* Approximate Counting */

== Counting Problems ==

=== Complexity model ===

=== FPRAS ===

== Approximate Counting ==
Let us consider the following formal problem.

Let <math>U</math> be a finite set of known size, and let <math>G\subseteq U</math>. We want to compute the size of <math>G</math>, <math>|G|</math>.

We assume two devices:
* A '''uniform sampler''' <math>\mathcal{U}</math>, which uniformly and independently samples a member of <math>U</math> upon each calling.
* A '''membership oracle''' of <math>G</math>, denoted <math>\mathcal{O}</math>. Given as the input an <math>x\in U</math>, <math>\mathcal{O}(x)</math> indicates whether or not <math>x</math> is a member of <math>G</math>.

Equipped by <math>\mathcal{U}</math> and <math>\mathcal{O}</math>, we can have the following Monte Carlo algorithm:
*Choose <math>N</math> independent samples from <math>U</math>by the uniform sampler <math>\mathcal{U}</math>, represented by the random variables <math>X_1,X_2,\ldots, X_N</math>.
* Let <math>Y_i</math> be the indicator random variable defined as <math>Y_i=\mathcal{O}(X_i)</math>, namely, <math>Y_i</math> indicates whether <math>X_i\in G</math>.
* Define the estimator random variable
::<math>Z=\frac{|U|}{N}\sum_{i=1}^N Y_i.</math>

It is easy to see that <math>\mathbf{E}[Z]=|G|</math> and we might hope that with high probability the value of <math>Z</math> is close to <math>|G|</math>. Formally, <math>Z</math> is called an <math>\epsilon</math>-approximation of <math>|G|</math> if
:<math>
(1-\epsilon)|G|\le Z\le (1+\epsilon)|G|.
</math>

The following theorem states that the probabilistic accuracy of the estimation depends on the number of samples and the ratio between <math>|G|</math> and <math>|U|</math>

{|border="1"
|'''Theorem (estimator theorem)'''
:Let <math>\alpha=\frac{|G|}{|U|}</math>. Then the Monte Carlo method yields an <math>\epsilon</math>-approximation to <math>|G|</math> with probability at least <math>1-\delta</math> provided
::<math>N\ge\frac{4}{\epsilon \alpha}\ln\frac{2}{\delta}</math>.
|}
'''Proof''': Use the Chernoff bound.

<math>\square</math>

A counting algorithm for the set <math>G</math> has to deal with the following three complications:
* Implement the membership oracle <math>\mathcal{O}</math>. This is usually straightforward, or assumed by the model.
* Implement the uniform sampler <math>\mathcal{U}</math>. As we have seen, this is usually approximated by random walks. How to design the random walk and bound its mixing rate is usually technical challenging, if possible at all.
* Deal with exponentially small <math>\alpha=\frac{|G|}{|U|}</math>. This requires us to cleverly choose the universe <math>U</math>. Sometimes this needs some nontrivial ideas.

=== Counting DNFs ===
A disjunctive normal form (DNF) formular is a disjunction (OR) of clauses, where each clause is a conjunction (AND) of literals. For example:
:<math>(x_1\wedge \overline{x_2}\wedge x_3)\vee(x_2\wedge x_4)\vee(\overline{x_1}\wedge x_3\wedge x_4)</math>.
Note the difference from the conjunctive normal forms (CNF).

Given a DNF formular <math>\phi</math> as the input, the problem is to count the number of satisfying assignments of <math>\phi</math>. This problem is '''#P-complete'''.

Naively applying the Monte Carlo method will not give a good answer. Suppose that there are <math>n</math> variables. Let <math>U=\{\mathrm{true},\mathrm{false}\}^n</math> be the set of all truth assignments of the <math>n</math> variables. Let <math>G=\{x\in U\mid \phi(x)=\mathrm{true}\}</math> be the set of satisfying assignments for <math>\phi</math>. The straightforward use of Monte Carlo method samples <math>N</math> assignments from <math>U</math> and check how many of them satisfy <math>\phi</math>. This algorithm fails when <math>|G|/|U|</math> is exponentially small, namely, when exponentially small fraction of the assignments satisfy the input DNF formula.

;The union of sets problem
We reformulate the DNF counting problem in a more abstract framework, called the '''union of sets''' problem.

Let <math>V</math> be a finite universe. We are given <math>m</math> subsets <math>H_1,H_2,\ldots,H_m\subseteq V</math>. The following assumptions hold:
*For all <math>i</math>, <math>|H_i|</math> is computable in poly-time.
*It is possible to sample uniformly from each individual <math>H_i</math>.
*For any <math>x\in V</math>, it can be determined in poly-time whether <math>x\in H_i</math>.

The goal is to compute the size of <math>H=\bigcup_{i=1}^m H_i</math>.

DNF counting can be interpreted in this general framework as follows. Suppose that the DNF formula <math>\phi</math> is defined on <math>n</math> variables, and <math>\phi</math> contains <math>m</math> clauses <math>C_1,C_2,\ldots,C_m</math>, where clause <math>C_i</math> has <math>k_i</math> literals. Without loss of generality, we assume that in each clause, each variable appears at most once.
* <math>V</math> is the set of all assignments.
*Each <math>H_i</math> is the set of satisfying assignments for the <math>i</math>-th clause <math>C_i</math> of the DNF formular <math>\phi</math>. Then the union of sets <math>H=\bigcup_i H_i</math> gives the set of satisfying assignments for <math>\phi</math>.
* Each clause <math>C_i</math> is a conjunction (AND) of literals. It is not hard to see that <math>|H_i|=2^{n-k_i}</math>, which is efficiently computable.
* Sampling from an <math>H_i</math> is simple: we just fix the assignments of the <math>k_i</math> literals of that clause, and sample uniformly and independently the rest <math>(n-k_i)</math> variable assignments.
* For each assignment <math>x</math>, it is easy to check whether it satisfies a clause <math>C_i</math>, thus it is easy to determine whether <math>x\in H_i</math>.

;The coverage algorithm
We now introduce the coverage algorithm for the union of sets problem.

Consider the multiset <math>U</math> defined by
:<math>U=H_1\uplus H_2\uplus\cdots \uplus H_m</math>,
where <math>\uplus</math> denotes the multiset union. It is more convenient to define <math>U</math> as the set
:<math>U=\{(x,i)\mid x\in H_i\}</math>.
For each <math>x\in H</math>, there may be more than one instances of <math>(x,i)\in U</math>. We can choose a unique representative among the multiple instances <math>(x,i)\in U</math> for the same <math>x\in H</math>, by choosing the <math>(x,i)</math> with the minimum <math>i</math>, and form a set <math>G</math>.

Formally, <math>G=\{(x,i)\in U\mid \forall (x,j)\in U, j\le i\}</math>. Every <math>x\in H</math> corresponds to a unique <math>(x,i)\in G</math> where <math>i</math> is the smallest among <math>x\in H_i</math>.

It is obvious that <math>G\subseteq U</math> and
:<math>|G|=|H|</math>.

Therefore, estimation of <math>|H|</math> is reduced to estimation of <math>|G|</math> with <math>G\subseteq U</math>. Then <math>|G|</math> can have an <math>\epsilon</math>-approximation with probability <math>(1-\delta)</math> in poly-time, if we can uniformly sample from <math>U</math> and <math>|G|/|U|</math> is suitably small.

An uniform sample from <math>U</math> can be implemented as follows:
* generate an <math>i\in\{1,2,\ldots,m\}</math> with probability <math>\frac{|H_i|}{\sum_{i=1}^m|H_i|}</math>;
* uniformly sample an <math>x\in H_i</math>, and return <math>(x,i)</math>.

It is easy to see that this gives a uniform member of <math>U</math>. The above sampling procedure is poly-time because each <math>|H_i|</math> can be computed in poly-time, and sampling uniformly from each <math>H_i</math> is poly-time.

We now only need to lower bound the ratio
:<math>\alpha=\frac{|G|}{|U|}</math>.

We claim that
:<math>\alpha\ge\frac{1}{m}</math>.
It is easy to see this, because each <math>x\in H</math> has at most <math>m</math> instances of <math>(x,i)</math> in <math>U</math>, and we already know that <math>|G|=|H|</math>.

=== Permanents and perfect matchings ===

== Volume estimation ==

== Linear Programming ==

=== LP and convex polytopes ===

=== The simplex algorithms ===

=== An LP solver via random walks ===

Randomized Algorithms (Spring 2010)/Approximate counting, linear programming

2010-05-21T15:27:32Z

210.28.131.82: /* Counting DNFs */

== Counting Problems ==

=== Complexity model ===

=== FPRAS ===

== Approximate Counting ==
Let us consider the following formal problem.

Let <math>U</math> be a finite set of known size, and let <math>G\subseteq U</math>. We want to compute the size of <math>G</math>, <math>|G|</math>.

We assume two devices:
* A '''uniform sampler''' <math>\mathcal{U}</math>, which uniformly and independently samples a member of <math>U</math> upon each calling.
* A '''membership oracle''' of <math>G</math>, denoted <math>\mathcal{O}</math>. Given as the input an <math>x\in U</math>, <math>\mathcal{O}(x)</math> indicates whether or not <math>x</math> is a member of <math>G</math>.

Equipped by <math>\mathcal{U}</math> and <math>\mathcal{O}</math>, we can have the following Monte Carlo algorithm:
*Choose <math>N</math> independent samples from <math>U</math>by the uniform sampler <math>\mathcal{U}</math>, represented by the random variables <math>X_1,X_2,\ldots, X_N</math>.
* Let <math>Y_i</math> be the indicator random variable defined as <math>Y_i=\mathcal{O}(X_i)</math>, namely, <math>Y_i</math> indicates whether <math>X_i\in G</math>.
* Define the estimator random variable
::<math>Z=\frac{|U|}{N}\sum_{i=1}^N Y_i.</math>

It is easy to see that <math>\mathbf{E}[Z]=|G|</math> and we might hope that with high probability the value of <math>Z</math> is close to <math>|G|</math>. Formally, <math>Z</math> is called an <math>\epsilon</math>-approximation of <math>|G|</math> if
:<math>
(1-\epsilon)|G|\le Z\le (1+\epsilon)|G|.
</math>

The following theorem states that the probabilistic accuracy of the estimation depends on the number of samples and the ratio between <math>|G|</math> and <math>|U|</math>

{|border="1"
|'''Theorem (estimator theorem)'''
:Let <math>\alpha=\frac{|G|}{|U|}</math>. Then the Monte Carlo method yields an <math>\epsilon</math>-approximation to <math>|G|</math> with probability at least <math>1-\delta</math> provided
::<math>N\ge\frac{4}{\epsilon \alpha}\ln\frac{2}{\delta}</math>.
|}
'''Proof''': Use the Chernoff bound.

<math>\square</math>

A counting algorithm for the set <math>G</math> has to deal with the following three complications:
* Implement the membership oracle <math>\mathcal{O}</math>. This is usually straightforward, or assumed by the model.
* Implement the uniform sampler <math>\mathcal{U}</math>. As we have seen, this is usually approximated by random walks.
* Deal with exponentially small <math>\alpha=\frac{|G|}{|U|}</math>. This requires us to cleverly choose the universe <math>U</math>. Sometimes this needs some nontrivial ideas.

=== Counting DNFs ===
A disjunctive normal form (DNF) formular is a disjunction (OR) of clauses, where each clause is a conjunction (AND) of literals. For example:
:<math>(x_1\wedge \overline{x_2}\wedge x_3)\vee(x_2\wedge x_4)\vee(\overline{x_1}\wedge x_3\wedge x_4)</math>.
Note the difference from the conjunctive normal forms (CNF).

Given a DNF formular <math>\phi</math> as the input, the problem is to count the number of satisfying assignments of <math>\phi</math>. This problem is '''#P-complete'''.

Naively applying the Monte Carlo method will not give a good answer. Suppose that there are <math>n</math> variables. Let <math>U=\{\mathrm{true},\mathrm{false}\}^n</math> be the set of all truth assignments of the <math>n</math> variables. Let <math>G=\{x\in U\mid \phi(x)=\mathrm{true}\}</math> be the set of satisfying assignments for <math>\phi</math>. The straightforward use of Monte Carlo method samples <math>N</math> assignments from <math>U</math> and check how many of them satisfy <math>\phi</math>. This algorithm fails when <math>|G|/|U|</math> is exponentially small, namely, when exponentially small fraction of the assignments satisfy the input DNF formula.

;The union of sets problem
We reformulate the DNF counting problem in a more abstract framework, called the '''union of sets''' problem.

Let <math>V</math> be a finite universe. We are given <math>m</math> subsets <math>H_1,H_2,\ldots,H_m\subseteq V</math>. The following assumptions hold:
*For all <math>i</math>, <math>|H_i|</math> is computable in poly-time.
*It is possible to sample uniformly from each individual <math>H_i</math>.
*For any <math>x\in V</math>, it can be determined in poly-time whether <math>x\in H_i</math>.

The goal is to compute the size of <math>H=\bigcup_{i=1}^m H_i</math>.

DNF counting can be interpreted in this general framework as follows. Suppose that the DNF formula <math>\phi</math> is defined on <math>n</math> variables, and <math>\phi</math> contains <math>m</math> clauses <math>C_1,C_2,\ldots,C_m</math>, where clause <math>C_i</math> has <math>k_i</math> literals. Without loss of generality, we assume that in each clause, each variable appears at most once.
* <math>V</math> is the set of all assignments.
*Each <math>H_i</math> is the set of satisfying assignments for the <math>i</math>-th clause <math>C_i</math> of the DNF formular <math>\phi</math>. Then the union of sets <math>H=\bigcup_i H_i</math> gives the set of satisfying assignments for <math>\phi</math>.
* Each clause <math>C_i</math> is a conjunction (AND) of literals. It is not hard to see that <math>|H_i|=2^{n-k_i}</math>, which is efficiently computable.
* Sampling from an <math>H_i</math> is simple: we just fix the assignments of the <math>k_i</math> literals of that clause, and sample uniformly and independently the rest <math>(n-k_i)</math> variable assignments.
* For each assignment <math>x</math>, it is easy to check whether it satisfies a clause <math>C_i</math>, thus it is easy to determine whether <math>x\in H_i</math>.

;The coverage algorithm
We now introduce the coverage algorithm for the union of sets problem.

Consider the multiset <math>U</math> defined by
:<math>U=H_1\uplus H_2\uplus\cdots \uplus H_m</math>,
where <math>\uplus</math> denotes the multiset union. It is more convenient to define <math>U</math> as the set
:<math>U=\{(x,i)\mid x\in H_i\}</math>.
For each <math>x\in H</math>, there may be more than one instances of <math>(x,i)\in U</math>. We can choose a unique representative among the multiple instances <math>(x,i)\in U</math> for the same <math>x\in H</math>, by choosing the <math>(x,i)</math> with the minimum <math>i</math>, and form a set <math>G</math>.

Formally, <math>G=\{(x,i)\in U\mid \forall (x,j)\in U, j\le i\}</math>. Every <math>x\in H</math> corresponds to a unique <math>(x,i)\in G</math> where <math>i</math> is the smallest among <math>x\in H_i</math>.

It is obvious that <math>G\subseteq U</math> and
:<math>|G|=|H|</math>.

Therefore, estimation of <math>|H|</math> is reduced to estimation of <math>|G|</math> with <math>G\subseteq U</math>. Then <math>|G|</math> can have an <math>\epsilon</math>-approximation with probability <math>(1-\delta)</math> in poly-time, if we can uniformly sample from <math>U</math> and <math>|G|/|U|</math> is suitably small.

An uniform sample from <math>U</math> can be implemented as follows:
* generate an <math>i\in\{1,2,\ldots,m\}</math> with probability <math>\frac{|H_i|}{\sum_{i=1}^m|H_i|}</math>;
* uniformly sample an <math>x\in H_i</math>, and return <math>(x,i)</math>.

It is easy to see that this gives a uniform member of <math>U</math>. The above sampling procedure is poly-time because each <math>|H_i|</math> can be computed in poly-time, and sampling uniformly from each <math>H_i</math> is poly-time.

We now only need to lower bound the ratio
:<math>\alpha=\frac{|G|}{|U|}</math>.

We claim that
:<math>\alpha\ge\frac{1}{m}</math>.
It is easy to see this, because each <math>x\in H</math> has at most <math>m</math> instances of <math>(x,i)</math> in <math>U</math>, and we already know that <math>|G|=|H|</math>.

=== Permanents and perfect matchings ===

== Volume estimation ==

== Linear Programming ==

=== LP and convex polytopes ===

=== The simplex algorithms ===

=== An LP solver via random walks ===

Randomized Algorithms (Spring 2010)/Approximate counting, linear programming

2010-05-21T14:17:14Z

210.28.131.82: /* Linear Programming */

== Counting Problems ==

=== Complexity model ===

=== FPRAS ===

== Approximate Counting ==
Let us consider the following formal problem.

Let <math>U</math> be a finite set of known size, and let <math>G\subseteq U</math>. We want to compute the size of <math>G</math>, <math>|G|</math>.

We assume two devices:
* A '''uniform sampler''' <math>\mathcal{U}</math>, which uniformly and independently samples a member of <math>U</math> upon each calling.
* A '''membership oracle''' of <math>G</math>, denoted <math>\mathcal{O}</math>. Given as the input an <math>x\in U</math>, <math>\mathcal{O}(x)</math> indicates whether or not <math>x</math> is a member of <math>G</math>.

Equipped by <math>\mathcal{U}</math> and <math>\mathcal{O}</math>, we can have the following Monte Carlo algorithm:
*Choose <math>N</math> independent samples from <math>U</math>by the uniform sampler <math>\mathcal{U}</math>, represented by the random variables <math>X_1,X_2,\ldots, X_N</math>.
* Let <math>Y_i</math> be the indicator random variable defined as <math>Y_i=\mathcal{O}(X_i)</math>, namely, <math>Y_i</math> indicates whether <math>X_i\in G</math>.
* Define the estimator random variable
::<math>Z=\frac{|U|}{N}\sum_{i=1}^N Y_i.</math>

It is easy to see that <math>\mathbf{E}[Z]=|G|</math> and we might hope that with high probability the value of <math>Z</math> is close to <math>|G|</math>. Formally, <math>Z</math> is called an <math>\epsilon</math>-approximation of <math>|G|</math> if
:<math>
(1-\epsilon)|G|\le Z\le (1+\epsilon)|G|.
</math>

The following theorem states that the probabilistic accuracy of the estimation depends on the number of samples and the ratio between <math>|G|</math> and <math>|U|</math>

{|border="1"
|'''Theorem (estimator theorem)'''
:Let <math>\alpha=\frac{|G|}{|U|}</math>. Then the Monte Carlo method yields an <math>\epsilon</math>-approximation to <math>|G|</math> with probability at least <math>1-\delta</math> provided
::<math>N\ge\frac{4}{\epsilon \alpha}\ln\frac{2}{\delta}</math>.
|}
'''Proof''': Use the Chernoff bound.

<math>\square</math>

A counting algorithm for the set <math>G</math> has to deal with the following three complications:
* Implement the membership oracle <math>\mathcal{O}</math>. This is usually straightforward, or assumed by the model.
* Implement the uniform sampler <math>\mathcal{U}</math>. As we have seen, this is usually approximated by random walks.
* Deal with exponentially small <math>\alpha=\frac{|G|}{|U|}</math>. This requires us to cleverly choose the universe <math>U</math>. Sometimes this needs some nontrivial ideas.

=== Counting DNFs ===
A disjunctive normal form (DNF) formular is a disjunction (OR) of clauses, where each clause is a conjunction (AND) of literals. For example:
:<math>(x_1\wedge \overline{x_2}\wedge x_3)\vee(x_2\wedge x_4)\vee(\overline{x_1}\wedge x_3\wedge x_4)</math>.
Note the difference from the conjunctive normal forms (CNF).

Given a DNF formular <math>\phi</math> as the input, the problem is to count the number of satisfying assignments of <math>\phi</math>. This problem is '''#P-complete'''.

Naively applying the Monte Carlo method will not give a good answer. Suppose that there are <math>n</math> variables. Let <math>U=\{\mathrm{true},\mathrm{false}\}^n</math> be the set of all truth assignments of the <math>n</math> variables. Let <math>G=\{x\in U\mid \phi(x)=\mathrm{true}\}</math> be the set of satisfying assignments for <math>\phi</math>. The straightforward use of Monte Carlo method samples <math>N</math> assignments from <math>U</math> and check how many of them satisfy <math>\phi</math>. This algorithm fails when <math>|G|/|U|</math> is exponentially small, namely, when exponentially small fraction of the assignments satisfy the input DNF formula.

;The union of sets problem
We reformulate the DNF counting problem in a more abstract framework, called the '''union of sets''' problem.

Let <math>V</math> be a finite universe. We are given <math>m</math> subsets <math>H_1,H_2,\ldots,H_m\subseteq V</math>. The following assumptions hold:
*For all <math>i</math>, <math>|H_i|</math> is computable in poly-time.
*It is possible to sample uniformly from each individual <math>H_i</math>.
*For any <math>x\in V</math>, it can be determined in poly-time whether <math>x\in H_i</math>.

The goal is to compute the size of <math>H=\bigcup_{i=1}^m H_i</math>.

DNF counting can be interpreted in this general framework as follows. Suppose that the DNF formula <math>\phi</math> is defined on <math>n</math> variables, and <math>\phi</math> contains <math>m</math> clauses <math>C_1,C_2,\ldots,C_m</math>, where clause <math>C_i</math> has <math>k_i</math> literals. Without loss of generality, we assume that in each clause, each variable appears at most once.
* <math>V</math> is the set of all assignments.
*Each <math>H_i</math> is the set of satisfying assignments for the <math>i</math>-th clause <math>C_i</math> of the DNF formular <math>\phi</math>. Then the union of sets <math>H=\bigcup_i H_i</math> gives the set of satisfying assignments for <math>\phi</math>.
* Each clause <math>C_i</math> is a conjunction (AND) of literals. It is not hard to see that <math>|H_i|=2^{n-k_i}</math>, which is efficiently computable.
* Sampling from an <math>H_i</math> is simple: we just fix the assignments of the <math>k_i</math> literals of that clause, and sample uniformly and independently the rest <math>(n-k_i)</math> variable assignments.
* For each assignment <math>x</math>, it is easy to check whether it satisfies a clause <math>C_i</math>, thus it is easy to determine whether <math>x\in H_i</math>.

;The coverage algorithm
We now introduce the coverage algorithm for the union of sets problem.

Consider the multiset <math>U</math> defined by
:<math>U=H_1\uplus H_2\uplus\cdots \uplus H_m</math>,
where <math>\uplus</math> denotes the multiset union. It is more convenient to define <math>U</math> as the set
:<math>U=\{(x,i)\mid x\in H_i\}</math>.
For each <math>x\in H</math>, there may be more than one instances of <math>(x,i)\in U</math>. We can choose a unique representative among the multiple instances <math>(x,i)\in U</math> for the same <math>x\in H</math>, by choosing the <math>(x,i)</math> with the minimum <math>i</math>, and form a set <math>G</math>.

Formally, <math>G=\{(x,i)\in U\mid \forall (x,j)\in U, j\le i\}</math>. Every <math>x\in H</math> corresponds to a unique <math>(x,i)\in G</math> where <math>i</math> is the smallest among <math>x\in H_i</math>.

It is obvious that
:<math>|G|=|H|</math>,
because there is an 1-1 correspondence between them.

=== Permanents and perfect matchings ===

== Volume estimation ==

== Linear Programming ==

=== LP and convex polytopes ===

=== The simplex algorithms ===

=== An LP solver via random walks ===

Randomized Algorithms (Spring 2010)/Problem Set 1

2010-03-28T12:55:37Z

210.28.131.82:

==Problem 1 (10 points)==
[MR]-1.1

* Suppose that you are given a coin for which the probability of HEADS, say <math>p</math>, is ''unknown''. How can you use this coin to generate unbiased (i.e., <math>\Pr[\mathrm{HEADS}]=\Pr[\mathrm{TAILS}]=1/2</math>) coin-flips? Give a scheme for which the expected number of flips of the biased coin for extracting one unbiased coin-flip is no more than <math>\frac{1}{p(1-p)}</math>.

:('''Hint''': Consider two consecutive flips of the biased coin.)

* (Bonus problem) Devise an extension of the scheme that extracts the largest possible number of independent, unbiased coin-flips from a given number of flips of the biased coin.

==Problem 2 (10 points)==
The original Karger's algorithm returns a min-cut with probability <math>\ge\frac{2}{n(n-1)}</math> after <math>n-2</math> contractions.
We have seen that by running the original Karger's algorithm for multiple times, the probability of success can be improved. Consider the following variation. Starting with a graph with <math>n</math> vertices, first contract the graph down to <math>k</math> vertices using Karger's algorithm. Make <math>\ell</math> copies of the graph with <math>k</math> vertices, and now run Karger's algorithm independently on these <math>\ell</math> copies. Return the smallest returned cut of these <math>\ell</math> instances.
* What is the total number of contractions?
* What is the probability of finding a min-cut?
* Try to optimize the probability of success subject to the constraint of using no more than <math>2n</math> contractions.

==Problem 3 (10 points)==
Recall the following definitions:
{|border="1"
|'''Definition 2:''' The class '''NP''' consists of all decision problems <math>f</math> that have a polynomial time deterministic algorithm <math>V</math> such that for any input <math>x</math>,
:<math>f(x)=1</math> if and only if <math>\exists y</math>, <math>V(x,y)=1</math>, where the size of <math>y</math> is within polynomial of the size of <math>x</math>.
|}

{|border="1"
|'''Definition 4:''' The class '''ZPP''' consists of all decision problems <math>f</math> that have a randomized algorithm <math>A</math> running in expected polynomial time for any input such that for any input <math>x</math>, <math>A(x)=f(x)</math>.
|}

{|border="1"
|'''Definition 5:''' The class '''RP''' consists of all decision problems <math>f</math> that have a randomized algorithm <math>A</math> running in worst-case polynomial time such that for any input <math>x</math>,
*if <math>f(x)=1</math>, then <math>\Pr[A(x)=1]\ge 1-1/2</math>;
*if <math>f(x)=0</math>, then <math>\Pr[A(x)=0]=1</math>.
|}

Prove that '''ZPP'''<math>\subseteq</math>'''NP''' and '''RP'''<math>\subseteq</math>'''NP'''.

('''Hint''': Notice that a randomized algorithm <math>A</math> on input <math>x</math> can be represented as a deterministic algorithm <math>D</math> with two inputs: <math>x</math> and a sequence <math>s</math> of random bits.)

==Problem 4 (10 points)==
A parallel computer consists of <math>n</math> processors and <math>n</math> memory modules. During a step, each processor sends a memory request to one of the memory modules, and each memory modul answer the request if it receives exactly one request. Note that a memory module may receive more than one requests. There are two schemes for dealing with conflicted memory requests:
# Upon receiving more than one requests, a memory module does not answer any request.
# Upon receiving more than one requests, a memory module answers one of the received requests.
Assuming that each processor sends a request to a memory module chosen uniformly and independently at random:
:(a) with the first scheme, what is the expected number of processors whose requests are answered?
:(b) with the second scheme, what is the expected number of processors whose requests are answered?
:(c) We upgrade the memory of the machine, so that a memory module that receives either one or two requests can answer its request(s); modules that receive more than two requests will answer two requests and discard the rest. What is the expected number of processors whose requests are answered?

(You may assume the approximation <math>\left(1-\frac{1}{n}\right)^n\approx\frac{1}{e}</math>.)