组合数学 (Spring 2014)/Extremal set theory: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>Etone
imported>Etone
 
(2 intermediate revisions by the same user not shown)
Line 405: Line 405:


<math>\square</math>
<math>\square</math>
== The Kruskal–Katona Theorem ==
The '''shadow''' of a set system <math>\mathcal{F}</math>, denoted <math>\Delta\mathcal{F}</math>, consists of all sets  which can be obtained by removing an element from a set in <math>\mathcal{F}</math>.
{{Theorem|Definition|
:Let <math>\mathcal{F}\subseteq {X\choose k}</math>. The '''shadow''' of <math>\mathcal{F}</math> is defined to be
::<math>\Delta\mathcal{F}=\left\{T\in {X\choose k-1}\,\,\bigg|\,\, \exists S\in\mathcal{F}\mbox{ such that } T\subset S\right\}</math>.
}}
The shadow contains rich information about the set system. An extremal question is: for a system of fixed number of <math>k</math>-sets, how small can its shadow be? The Kruskal–Katona theorem gives an answer to this question.
To state the result of the Kruskal–Katona theorem, we need to introduce the concepts of the <math>k</math>-cascade representation of numbers and the colex order of sets.
=== <math>k</math>-cascade representation of a number ===
{{Theorem|Theorem|
:Given positive integers <math>m</math> and <math>k</math>, there exists a unique representation of <math>m</math> in the form
::<math>m={m_k\choose k}+{m_{k-1}\choose k-1}+\cdots+{m_t\choose t}=\sum_{\ell=t}^k{m_\ell\choose \ell}</math>,
:where <math>m_k>m_{k-1}>\cdots>m_t\ge t\ge 1</math>.
:This representation of <math>m</math> is called a '''<math>k</math>-cascade''' (or a '''<math>k</math>-binomial''') representation of <math>m</math>.
}}
{{Proof|
In fact, the <math>k</math>-cascade representation <math>(m_k,m_{k-1},\ldots,m_t)</math> of an <math>m</math> can be found by the following simple greedy algorithm:
<math>\ell=k;</math>
while (<math>m>0</math>) do
    let <math>m_\ell</math> be the largest integer for which <math>{m_\ell\choose \ell}\le m;</math>
    <math>m=m-m_\ell;</math>
    <math>\ell=\ell-1;</math>
end
We then show that the above algorithm constructs a sequence <math>(m_k,m_{k-1},\ldots,m_t)</math> with <math>m_k>m_{k-1}>\cdots>m_t\ge t\ge 1</math>.
Suppose the current <math>\ell</math> and the current <math>m</math>. To see that <math>m_{\ell-1}< m_\ell</math>, we suppose otherwise <math>m_{\ell-1}\ge m_\ell</math>. Then
:<math>m\ge {m_\ell\choose \ell}+{m_{\ell-1}\choose \ell-1}\ge{m_\ell\choose \ell}+{m_{\ell}\choose \ell-1}={1+m_{\ell}\choose \ell}</math>
contradicting the maximality of <math>m_\ell</math>. Therefore, <math>m_\ell>m_{\ell-1}</math>.
The algorithm continues reducing <math>m</math> to smaller nonnegative values, and eventually reaches a stage where the choice of <math>m_t</math> for some <math>t\ge 2</math> where <math>{m_t\choose t}</math> equals the current <math>m</math>; or gets right down to choosing <math>m_1</math> as the integer such that <math>m_1={m_1\choose 1}</math> equals the current <math>m</math>.
Therefore,  <math>m={m_k\choose k}+{m_{k-1}\choose k-1}+\cdots+{m_t\choose t}</math> and <math>m_k>m_{k-1}>\cdots>m_t\ge t\ge 1</math>.
The uniqueness of the <math>k</math>-cascade representation follows by the induction on <math>k</math>.
When <math>k=1</math>, <math>m</math> has a unique <math>k</math>-cascade representation <math>m={m\choose 1}</math>.
For general <math>k>1</math>, suppose that every nonnegative integer <math>m</math> has a unique <math>(k-1)</math>-cascade representation.
Suppose then that <math>m</math> has two <math>k</math>-cascade representations:
:<math>m={a_k\choose k}+{a_{k-1}\choose k-1}+\cdots+{a_t\choose t}={b_k\choose k}+{b_{k-1}\choose k-1}+\cdots+{b_r\choose r}</math>.
We then show that it must hold that <math>a_k=b_k</math>.
If <math>a_k\neq b_k</math>, WLOG, suppose that <math>a_k<b_k</math>. We obtain
:<math>
\begin{align}
m&
={a_k\choose k}+{a_{k-1}\choose k-1}+\cdots+{a_t\choose t}\\
&\le {a_k\choose k}+{a_{k}-1\choose k-1}+\cdots+{a_k-(k-t)\choose t}+\cdots+{a_k-k+1\choose 1}\\
&={a_k+1\choose k}-1,
\end{align}</math>
where the last equation is got by repeatedly applying the identity
:<math>{n\choose k}+{n\choose k-1}={n+1\choose k}</math>.
We then obtain
:<math>m<{a_k+1\choose k}\le{b_k\choose k}\le m</math>,
which is a contradiction. Therefore, <math>a_k=b_k</math>, and by the induction hypothesis, the remaining value <math>m-a_k=m-b_k</math> has a unique <math>(k-1)</math>-cascade representation, so <math>a_i=b_i</math> for all <math>i</math>.
}}
=== Co-lexicographic order of subsets ===
The co-lexicographic order of sets plays a particularly important role in the investigation of the size of the shadow of a system of <math>k</math>-sets.
{{Theorem|Definition|
: The '''co-lexicographic (colex) order''' (also called the '''reverse lexicographic order''') of sets is defined as follows: for any <math>A,B\subseteq [n]</math>, <math>A\neq B</math>,
::<math>A<B</math> if <math>\max A\setminus B < \max B\setminus A</math>.
}}
We can sort sets in colex order by first writing each set as a tuple, whose elements are in decreasing order, and then sorting the tuples in the lexicographic order of tuples.
For example, <math>{[5]\choose 3}</math> in colex order is
{3,2,1}
{4,2,1}
{4,3,1}
{4,3,2}
{5,2,1}
{5,3,1}
{5,3,2}
{5,4,1}
{5,4,2}
{5,4,3}
We find that the first <math>{n\choose 3}</math> sets in this order for <math>n=3,4,5</math>, form precisely <math>{[n]\choose 3}</math>. And if we write the colex order of <math>{[6]\choose 3}</math>, the above colex order of <math>{[5]\choose 3}</math> appears as a prefix of that order. Elaborating on this, we have:
{{Theorem|Proposition|
:Let <math>\mathcal{R}(m,k)</math> be the first <math>m</math> sets in the colex order of <math>{\mathbb{N}\choose k}</math>. Then
::<math>\mathcal{R}\left({n\choose k},k\right)={[n]\choose k}</math>,
:that is, the first <math>{n\choose k}</math> sets in the colex order of all <math>k</math>-sets of natural numbers is precisely <math>{[n]\choose k}</math>.
}}
This proposition says that the sets in <math>\mathcal{R}(m,k)</math> is highly overlapped, which suggests that <math>\mathcal{R}(m,k)</math> may have small shadow. The size of the shadow of <math>\mathcal{R}(m,k)</math> is closely related to the <math>k</math>-cascade representation of <math>m</math>.
{{Theorem|Theorem|
:Suppose the <math>k</math>-cascade representation of <math>m</math> is
::<math>m={m_k\choose k}+{m_{k-1}\choose k-1}+\cdots+{m_t\choose t}</math>.
:Then
::<math>|\Delta\mathcal{R}(m,k)|={m_k\choose k-1}+{m_{k-1}\choose k-2}+\cdots+{m_t\choose t-1}</math>.
}}
{{Proof|
Given <math>m</math> and its <math>k</math>-cascade representation <math>m={m_k\choose k}+{m_{k-1}\choose k-1}+\cdots+{m_t\choose t}</math>, the <math>\mathcal{R}(m,k)</math> is constructed as:
* all sets in <math>{[m_k]\choose k}</math>;
* all sets in <math>{[m_{k-1}]\choose k-1}</math>, unioned with <math>\{1+m_k\}\,</math>;
::<math>\vdots</math>
* all sets in <math>{[m_{t}]\choose t}</math>, unioned with <math>\{1+m_k,1+m_{k-1},\ldots,1+m_{t+1}\}</math>.
The shadow <math>\Delta\mathcal{R}(m,k)</math> is the collection of all <math>(k-1)</math>-sets contained by the above sets, which are
* all sets in <math>{[m_k]\choose k-1}</math>;
* all sets in <math>{[m_{k-1}]\choose k-2}</math>, unioned with <math>\{1+m_k\}\,</math>;
::<math>\vdots</math>
* all sets in <math>{[m_{t}]\choose t-1}</math>, unioned with <math>\{1+m_k,1+m_{k-1},\ldots,1+m_{t+1}\}</math>.
Thus,
:<math>|\Delta\mathcal{R}(m,k)|={m_k\choose k-1}+{m_{k-1}\choose k-2}+\cdots+{m_t\choose t-1}</math>.
}}
= The Kruskal–Katona theorem =
The Kruskal–Katona theorem states that among all systems of <math>m</math> <math>k</math>-sets, <math>\mathcal{R}(m,k)</math>, i.e., the first <math>m</math> <math>k</math>-sets in the colex order, has the smallest shadow.
The theorem is proved independently by Joseph Kruskal in 1963 and G.O.H. Katona in 1966, and is a fundamental result in finite set theory and combinatorial topology.
{{Theorem|Theorem (Kruskal 1963, Katona 1966)|
:Let <math>\mathcal{F}\subseteq {X\choose k}</math> with <math>|\mathcal{F}|=m</math>, and suppose that
::<math>m={m_k\choose k}+{m_{k-1}\choose k-1}+\cdots+{m_t\choose t}</math>
:where <math>m_k>m_{k-1}>\cdots>m_t\ge t\ge 1</math>. Then
::<math>|\Delta\mathcal{F}|\ge {m_k\choose k-1}+{m_{k-1}\choose k-2}+\cdots+{m_t\choose t-1}</math>.
}}
The <math>f</math>-vector of a set system <math>\mathcal{F}\subseteq 2^X</math> is a vector <math>(|\mathcal{F}_0|,|\mathcal{F}_1|,\ldots,|\mathcal{F}_n|)</math> where
:<math>\mathcal{F}_k=\{S\mid S\in\mathcal{F}, |S|=k\}</math>,
i.e., the <math>f</math>-vector gives the number of member sets of each size.
In a hereditary family (also called an [http://en.wikipedia.org/wiki/Abstract_simplicial_complex abstract simplicial complex]), <math>\mathcal{F}_k</math> is formed by the shadow of <math>\mathcal{F}_{k+1}</math> as well as some additional <math>k</math>-sets introduced in this level. The Kruskal-Katona theorem gives a lower bound on <math>|\mathcal{F}_i|</math>, given the <math>|\mathcal{F}_{i-1}|</math>.
The original proof of the theorem is rather complicated. In the following years, several different proofs were discovered. Here we present a proof dueto Frankl by the shifting technique.
;Frankl's shifting proof of Kruskal-Katonal theorem (Frankl 1984)
We take the classic <math>(i,j)</math>-shift operator <math>S_{ij}</math> defined in the original proof of the Erdős-Ko-Rado theorem.
{{Theorem|Definition (<math>(i,j)</math>-shift)|
: Assume <math>\mathcal{F}\subseteq 2^{[n]}</math>, and <math>0\le i<j\le n-1</math>. Define the '''<math>(i,j)</math>-shift''' <math>S_{ij}</math> as an operator on <math>\mathcal{F}</math> as follows:
:*for each <math>T\in\mathcal{F}</math>, write <math>T_{ij}=(T\setminus\{j\})\cup\{i\} </math>, and let
::<math>S_{ij}(T)=
\begin{cases}
T_{ij} & \mbox{if }j\in T, i\not\in T, \mbox{ and }T_{ij} \not\in\mathcal{F},\\
T & \mbox{otherwise;}
\end{cases}</math>
:* let <math>S_{ij}(\mathcal{F})=\{S_{ij}(T)\mid T\in \mathcal{F}\}</math>.
}}
It is immediate that shifting does not change the size of the set or the size of the system, i.e., <math>|S_{ij}(T)|=|T|\,</math> and  <math>|S_{ij}(\mathcal{F})|=\mathcal{F}</math>. And due to the finiteness, repeatedly applying <math>(i,j)</math>-shifts for any <math>1\le i<j\le n</math>, eventually <math>\mathcal{F}</math> does not changing any more. We called such an <math>\mathcal{F}</math> with  <math>\mathcal{F}=S_{ij}(\mathcal{F})</math> for any <math>1\le i<j\le n</math> '''shifted'''.
In order to make the shifting technique work for shadows, we have to prove that shifting does not increase the size of the shadow.
{{Theorem|Proposition|
:<math>\Delta S_{ij}(\mathcal{F})\subseteq S_{ij}(\Delta\mathcal{F})</math>.
}}
{{Proof|
We abuse the notation <math>\Delta</math> and let <math>\Delta A=\Delta\{A\}</math> if <math>A</math> is a set instead of a set system.
It is sufficient to show that for any <math>A\in\mathcal{F}</math>, <math>\Delta S_{ij}(A)\subseteq S_{ij}(\Delta\mathcal{F})</math>.
[[file:Property-shift-shadow.png|600px]]
}}
An immediate corollary of the above proposition is that the <math>(i,j)</math>-shifts <math>S_{ij}</math> for any <math>1\le i<j\le n</math> do not increase the size of the shadow.
{{Theorem|Corollary|
:<math>|\Delta S_{ij}(\mathcal{F})|\le |\Delta\mathcal{F}|</math>.
}}
{{Proof|
By the above proposition, <math>|\Delta S_{ij}(\mathcal{F})|\le|S_{ij}(\Delta\mathcal{F})|</math>, and we know that <math>S_{ij}</math> does not change the cardinality of a set family, that is, <math>|S_{ij}(\Delta\mathcal{F})|=|\Delta\mathcal{F}|</math>, therefore <math>\Delta S_{ij}(\mathcal{F})|\le|\Delta\mathcal{F}|</math>.
}}
;Proof of Kruskal-Katona theorem
We know that shifts never enlarge the shadow, thus it is sufficient to prove the theorem for shifted <math>\mathcal{F}</math>. We then assume <math>\mathcal{F}</math> is shifted.
Apply induction on <math>m</math> and for given <math>m</math> on <math>k</math>. The theorem holds trivially for the case that <math>k=1</math> and <math>m</math> is arbitrary.
Define
:<math>\mathcal{F}_0=\{A\in\mathcal{F}\mid 1\not\in A\}</math>,
:<math>\mathcal{F}_1=\{A\in\mathcal{F}\mid 1\in A\}</math>.
And let <math>\mathcal{F}_1'=\{A\setminus\{1\}\mid A\in\mathcal{F}_1\}</math>.
Clearly <math>\mathcal{F}_0,\mathcal{F}_1\subseteq{X\choose k}</math>, <math>\mathcal{F}_1'\subseteq{X\choose k-1}</math>, and
:<math>|\mathcal{F}|=|\mathcal{F}_0|+|\mathcal{F}_1|=|\mathcal{F}_0|+|\mathcal{F}_1'|</math>.
Our induction is based on the following observation regarding the size of the shadow.
{{Theorem|Lemma 1|
:<math>|\Delta\mathcal{F}|\ge|\Delta\mathcal{F}_1'|+|\mathcal{F}_1'|</math>.
}}
{{Proof|
Obviously <math>\mathcal{F}\supseteq\mathcal{F}_1</math> and <math>\Delta\mathcal{F}\supseteq\Delta\mathcal{F}_1</math>.
:<math>\begin{align}
\Delta\mathcal{F}_1
&=\left\{A\in{X\choose k-1}\,\,\bigg|\,\, \exists B\in\mathcal{F}_1, A\subset B\right\}\\
&=\left\{A\in{X\choose k-1}\,\,\bigg|\,\, 1\in A, \exists B\in\mathcal{F}_1, A\subset B\right\}\\
&\quad\, \cup \left\{A\in{X\choose k-1}\,\,\bigg|\,\, 1\not\in A, \exists B\in\mathcal{F}_1, A\subset B\right\}\\
&=\{S\cup\{1\}\mid S\in\Delta\mathcal{F}_1'\}\cup\mathcal{F}_1'.
\end{align}</math>
The union is taken over two disjoint families. Therefore,
:<math>|\Delta\mathcal{F}|\ge|\Delta\mathcal{F}_1|=|\Delta\mathcal{F}_1'|+|\mathcal{F}_1'|</math>.
}}
The following property of shifted <math>\mathcal{F}</math> is essential for our proof.
{{Theorem|Lemma 2|
:For shifted <math>\mathcal{F}</math>, it holds that <math>\Delta\mathcal{F}_0\subseteq \mathcal{F}_1'</math>.
}}
{{Proof|
If <math>A\in\Delta\mathcal{F}_0</math> then <math>A\cup\{j\}\in\mathcal{F}_0\subseteq\mathcal{F}</math> for some <math>j>1</math> so that, since <math>\mathcal{F}</math> is shifted, applying the <math>(1,j)</math>-shift <math>S_{1j}</math>, <math>A\cup\{1\}\in\mathcal{F}</math>, thus, <math>A\in\mathcal{F}_1'</math>.
}}
We then bound the size of <math>\mathcal{F}_1'</math> as:
{{Theorem|Lemma 3|
:If <math>\mathcal{F}</math> is shifted, then
::<math>|\mathcal{F}_1'|\ge{m_k-1\choose k-1}+{m_{k-1}-1\choose k-2}+\cdots+{m_t-1\choose t-1}</math>.
}}
{{Proof|
By contradiction, assume that
:<math>|\mathcal{F}_1'|<{m_k-1\choose k-1}+{m_{k-1}-1\choose k-2}+\cdots+{m_t-1\choose t-1}</math>.
Then by <math>|\mathcal{F}|=|\mathcal{F}_0|+|\mathcal{F}_1'|</math>, it holds that
:<math>
\begin{align}
|\mathcal{F}_0| &=m-|\mathcal{F}_1'|\\
&>\left\{{m_k\choose k}- {m_k-1\choose k-1}\right\}+\cdots+\left\{{m_t\choose t}- {m_t-1\choose t-1}\right\}\\
&={m_k-1\choose k}+\cdots+{m_t-1\choose t},
\end{align}</math>
so that, by the induction hypothesis,
:<math>|\Delta\mathcal{F}_0|\ge{m_k-1\choose k-1}+{m_{k-1}-1\choose k-1}+\cdots+{m_t-1\choose t-1}>|\mathcal{F}_1'|</math>.
On the other hand, by Lemma 2, <math>|\mathcal{F}_1'|\ge|\Delta\mathcal{F}_0|</math>. Thus <math>|\mathcal{F}_1'|\ge|\Delta\mathcal{F}_0|>|\mathcal{F}_1'|</math>, a contradiction.
}}
Now we officially apply the induction. By Lemma 1,
:<math>|\Delta\mathcal{F}|\ge|\Delta\mathcal{F}_1'|+|\mathcal{F}_1'|</math>.
Note that <math>\mathcal{F}_1'\subseteq{X\choose k-1}</math>, and due to Lemma 3,
:<math>|\mathcal{F}_1'|\ge{m_k-1\choose k-1}+{m_{k-1}-1\choose k-2}+\cdots+{m_t-1\choose t-1}</math>,
thus by the induction hypothesis,
:<math>|\Delta\mathcal{F}_1'|\ge{m_k-1\choose k-2}+{m_{k-1}-1\choose k-3}+\cdots+{m_t-1\choose t-2}</math>.
Combining them together, we have
:<math>
\begin{align}
|\Delta\mathcal{F}|
&\ge |\Delta\mathcal{F}_1'|+|\mathcal{F}_1'|\\
&\ge {m_k-1\choose k-2}+\cdots+{m_t-1\choose t-2}+{m_k-1\choose k-1}+\cdots+{m_t-1\choose t-1}\\
&= {m_k\choose k-1}+{m_{k-1}\choose k-2}+\cdots+{m_t\choose t-1}.
\end{align}</math>
:<math>\square</math>
=== Shadows of specific sizes ===
The definition of shadow can be generalized to the subsets of any size.
{{Theorem|Definition|
:The '''<math>r</math>-shadow''' of <math>\mathcal{F}</math> is defined as
::<math>\Delta_r\mathcal{F}=\left\{S\mid |S|=r\text{ and }\exists T\in\mathcal{F}, S\subseteq T\right\}</math>.
}}
And the general version of the Kruskal-Katona theorem can be deduced.
{{Theorem|Kruskal-Katona Theorem (general version)|
:Let <math>\mathcal{F}\subseteq {X\choose k}</math> with <math>|\mathcal{F}|=m</math>, and suppose that
::<math>m={m_k\choose k}+{m_{k-1}\choose k-1}+\cdots+{m_t\choose t}</math>
:where <math>m_k>m_{k-1}>\cdots>m_t\ge t\ge 1</math>. Then for all <math>r</math>, <math>1\le r\le k</math>,
::<math>\left|\Delta_r\mathcal{F}\right|\ge {m_k\choose r}+{m_{k-1}\choose r-1}+\cdots+{m_t\choose t-k+r}</math>.
}}
{{Proof|
Note that for <math>\mathcal{F}\subseteq {X\choose k}</math>,
:<math>\Delta_r\mathcal{F}=\underbrace{\Delta\cdots\Delta}_{k-r}\mathcal{F}</math>.
The theorem follows by repeatedly applying the Kruskal-Katona theorem for <math>\Delta\mathcal{F}</math>.
}}
=== The Erdős–Ko–Rado theorem, revisited===
To demonstrate the power of the Krulskal-Katona theorem, we show that it actually includes the Erdős–Ko–Rado theorem as a special case. The following elegant proof of the Erdős–Ko–Rado theorem is due to Daykin and Clements independently.
{{Theorem|Erdős–Ko–Rado Theorem|
:Let <math>\mathcal{F}\subseteq {X\choose k}</math> where <math>|X|=n</math> and <math>n\ge 2k</math>. If <math>S\cap T\neq\emptyset</math> for any <math>S,T\in\mathcal{F}</math>, then
::<math>|\mathcal{F}|\le{n-1\choose k-1}</math>.
}}
{{Prooftitle|Proof by the Kruskal-Katona theorem|(Daykin 1974, Clements 1976)
By contradiction, suppose that <math>|\mathcal{F}|>{n-1\choose k-1}</math>.
We define the dual system
:<math>\mathcal{G}=\{\bar{S}\mid S\in\mathcal{F}\}</math>, where <math>\bar{S}=X\setminus S</math>.
For any <math>S,T\in\mathcal{F}</math>, the condition <math>S\cap T\neq \emptyset</math> is equivalent to <math>S\not\subseteq \bar{T}</math>, so <math>\mathcal{F}</math> and <math>\Delta_k\mathcal{G}</math> are disjoint, thus
:<math>|\mathcal{F}|+|\Delta_k\mathcal{G}|\le{n\choose k}</math>.
Clearly, <math>\mathcal{G}\subseteq {X\choose n-k}</math> and
:<math>|\mathcal{G}|=|\mathcal{F}|>{n-1\choose k-1}={n-1\choose n-k}</math>.
By the Kruskal-Katona theorem, <math>|\Delta_k\mathcal{G}|\ge{n-1\choose k}</math>.
:<math>|\mathcal{F}|+|\Delta_k\mathcal{G}|>{n-1\choose k-1}+{n-1\choose k}={n\choose k}</math>,
a contradiction.
}}

Latest revision as of 07:57, 21 May 2014

Sperner system

A set family [math]\displaystyle{ \mathcal{F}\subseteq 2^X }[/math] with the relation [math]\displaystyle{ \subseteq }[/math] define a poset. Thus, a chain is a sequence [math]\displaystyle{ S_1\subseteq S_2\subseteq\cdots\subseteq S_k }[/math].

A set family [math]\displaystyle{ \mathcal{F}\subseteq 2^X }[/math] is an antichain (also called a Sperner system) if for all [math]\displaystyle{ S,T\in\mathcal{F} }[/math] that [math]\displaystyle{ S\neq T }[/math], we have [math]\displaystyle{ S\not\subseteq T }[/math].

The [math]\displaystyle{ k }[/math]-uniform [math]\displaystyle{ {X\choose k} }[/math] is an antichain. Let [math]\displaystyle{ n=|X| }[/math]. The size of [math]\displaystyle{ {X\choose k} }[/math] is maximized when [math]\displaystyle{ k=\lfloor n/2\rfloor }[/math]. We wonder whether this is also the largest possible size of any antichain [math]\displaystyle{ \mathcal{F}\subseteq 2^X }[/math].

In 1928, Emanuel Sperner proved a theorem saying that it is indeed the largest possible antichain. This result, called Sperner's theorem today, initiated the studies of extremal set theory.

Theorem (Sperner 1928)
Let [math]\displaystyle{ \mathcal{F}\subseteq 2^X }[/math] where [math]\displaystyle{ |X|=n }[/math]. If [math]\displaystyle{ \mathcal{F} }[/math] is an antichain, then
[math]\displaystyle{ |\mathcal{F}|\le{n\choose \lfloor n/2\rfloor} }[/math].

First proof (shadows)

We first introduce the original proof by Sperner, which uses concepts called shadows and shades of set systems.

Definition
Let [math]\displaystyle{ |X|=n\, }[/math] and [math]\displaystyle{ \mathcal{F}\subseteq {X\choose k} }[/math], [math]\displaystyle{ k\lt n\, }[/math].
The shade of [math]\displaystyle{ \mathcal{F} }[/math] is defined to be
[math]\displaystyle{ \nabla\mathcal{F}=\left\{T\in {X\choose k+1}\,\,\bigg|\,\, \exists S\in\mathcal{F}\mbox{ such that } S\subset T\right\} }[/math].
Thus the shade [math]\displaystyle{ \nabla\mathcal{F} }[/math] of [math]\displaystyle{ \mathcal{F} }[/math] consists of all subsets of [math]\displaystyle{ X }[/math] which can be obtained by adding an element to a set in [math]\displaystyle{ \mathcal{F} }[/math].
Similarly, the shadow of [math]\displaystyle{ \mathcal{F} }[/math] is defined to be
[math]\displaystyle{ \Delta\mathcal{F}=\left\{T\in {X\choose k-1}\,\,\bigg|\,\, \exists S\in\mathcal{F}\mbox{ such that } T\subset S\right\} }[/math].
Thus the shadow [math]\displaystyle{ \Delta\mathcal{F} }[/math] of [math]\displaystyle{ \mathcal{F} }[/math] consists of all subsets of [math]\displaystyle{ X }[/math] which can be obtained by removing an element from a set in [math]\displaystyle{ \mathcal{F} }[/math].

Next lemma bounds the effects of shadows and shades on the sizes of set systems.

Lemma (Sperner)
Let [math]\displaystyle{ |X|=n\, }[/math] and [math]\displaystyle{ \mathcal{F}\subseteq {X\choose k} }[/math]. Then
[math]\displaystyle{ \begin{align} &|\nabla\mathcal{F}|\ge\frac{n-k}{k+1}|\mathcal{F}| &\text{ if } k\lt n\\ &|\Delta\mathcal{F}|\ge\frac{k}{n-k+1}|\mathcal{F}| &\text{ if } k\gt 0. \end{align} }[/math]
Proof.

The lemma is proved by double counting. We prove the inequality of [math]\displaystyle{ |\nabla\mathcal{F}| }[/math]. Assume that [math]\displaystyle{ 0\le k\lt n }[/math].

Define

[math]\displaystyle{ \mathcal{R}=\{(S,T)\mid S\in\mathcal{F}, T\in\nabla\mathcal{F}, S\subset T\} }[/math].

We estimate [math]\displaystyle{ |\mathcal{R}| }[/math] in two ways.

For each [math]\displaystyle{ S\in\mathcal{F} }[/math], there are [math]\displaystyle{ n-k }[/math] different [math]\displaystyle{ T\in\nabla\mathcal{F} }[/math] that [math]\displaystyle{ S\subset T }[/math].

[math]\displaystyle{ |\mathcal{R}|=(n-k)|\mathcal{F}| }[/math].

For each [math]\displaystyle{ T\in\nabla\mathcal{F} }[/math], there are [math]\displaystyle{ k+1 }[/math] ways to choose an [math]\displaystyle{ S\subset T }[/math] with [math]\displaystyle{ |S|=k }[/math], some of which may not be in [math]\displaystyle{ \mathcal{F} }[/math].

[math]\displaystyle{ |\mathcal{R}|\le (k+1)|\nabla\mathcal{F}| }[/math].

Altogether, we show that [math]\displaystyle{ |\nabla\mathcal{F}|\ge\frac{n-k}{k+1}|\mathcal{F}| }[/math].

The inequality of [math]\displaystyle{ |\Delta\mathcal{F}| }[/math] can be proved in the same way.

[math]\displaystyle{ \square }[/math]

An immediate corollary of the previous lemma is as follows.

Proposition 1
If [math]\displaystyle{ k\le \frac{n-1}{2} }[/math], then [math]\displaystyle{ |\nabla\mathcal{F}|\ge|\mathcal{F}| }[/math].
If [math]\displaystyle{ k\ge \frac{n-1}{2} }[/math], then [math]\displaystyle{ |\Delta\mathcal{F}|\ge|\mathcal{F}| }[/math].

The idea of Sperner's proof is pretty clear:

  • we "push up" all the sets in [math]\displaystyle{ \mathcal{F} }[/math] of size [math]\displaystyle{ \lt \frac{n-1}{2} }[/math] replacing them by their shades;
  • and also "push down" all the sets in [math]\displaystyle{ \mathcal{F} }[/math] of size [math]\displaystyle{ \ge\frac{n+1}{2} }[/math] replacing them by their shadows.

Repeat this process we end up with a set system [math]\displaystyle{ \mathcal{F}\subseteq{X\choose \lfloor n/2\rfloor} }[/math]. We need to show that this process does not decrease the size of [math]\displaystyle{ \mathcal{F} }[/math].

Proposition 2
Suppose that [math]\displaystyle{ \mathcal{F}\subseteq2^X }[/math] where [math]\displaystyle{ |X|=n }[/math]. Let [math]\displaystyle{ \mathcal{F}_k=\mathcal{F}\cap{X\choose k} }[/math]. Let [math]\displaystyle{ k_\min }[/math] be the smallest [math]\displaystyle{ k }[/math] that [math]\displaystyle{ |\mathcal{F}_k|\gt 0 }[/math], and let
[math]\displaystyle{ \mathcal{F}'=\begin{cases} \mathcal{F}\setminus\mathcal{F}_{k_\min}\cup \nabla\mathcal{F}_{k_\min} & \mbox{if }k_\min\lt \frac{n-1}{2},\\ \mathcal{F} & \mbox{otherwise.} \end{cases} }[/math]
Similarly, let [math]\displaystyle{ k_\max }[/math] be the largest [math]\displaystyle{ k }[/math] that [math]\displaystyle{ |\mathcal{F}_k|\gt 0 }[/math], and let
[math]\displaystyle{ \mathcal{F}''=\begin{cases} \mathcal{F}\setminus\mathcal{F}_{k_\max}\cup \Delta\mathcal{F}_{k_\max} & \mbox{if }k_\max\ge\frac{n+1}{2},\\ \mathcal{F} & \mbox{otherwise.} \end{cases} }[/math]
If [math]\displaystyle{ \mathcal{F} }[/math] is an antichain, [math]\displaystyle{ \mathcal{F}' }[/math] and [math]\displaystyle{ \mathcal{F}'' }[/math] are antichains, and we have [math]\displaystyle{ |\mathcal{F}'|\ge|\mathcal{F}| }[/math] and [math]\displaystyle{ |\mathcal{F}''|\ge|\mathcal{F}| }[/math].
Proof.

We show that [math]\displaystyle{ \mathcal{F}' }[/math] is an antichain and [math]\displaystyle{ |\mathcal{F}'|\ge|\mathcal{F}| }[/math].

First, observe that [math]\displaystyle{ \nabla\mathcal{F}_k\cap\mathcal{F}=\emptyset }[/math], otherwise [math]\displaystyle{ \mathcal{F} }[/math] cannot be an antichain, and due to Proposition 1, [math]\displaystyle{ |\nabla\mathcal{F}_k|\ge|\mathcal{F}_k| }[/math] when [math]\displaystyle{ k\le \frac{n-1}{2} }[/math], so [math]\displaystyle{ |\mathcal{F}'|=|\mathcal{F}|-|\mathcal{F}_k|+|\nabla\mathcal{F}_k|\ge |\mathcal{F}| }[/math].

Now we prove that [math]\displaystyle{ \mathcal{F}' }[/math] is an antichain . By contradiction, assume that there are [math]\displaystyle{ S, T\in \mathcal{F}' }[/math], such that [math]\displaystyle{ S\subset T }[/math]. One of the [math]\displaystyle{ S,T }[/math] must be in [math]\displaystyle{ \nabla\mathcal{F}_{k_\min} }[/math], or otherwise [math]\displaystyle{ \mathcal{F} }[/math] cannot be an antichain. Recall that [math]\displaystyle{ k_\min }[/math] is the smallest [math]\displaystyle{ k }[/math] that [math]\displaystyle{ |\mathcal{F}_k|\gt 0 }[/math], thus it must be [math]\displaystyle{ S\in \nabla\mathcal{F}_{k_\min} }[/math], and [math]\displaystyle{ T\in\mathcal{F} }[/math]. This implies that there is an [math]\displaystyle{ R\in \mathcal{F}_{k_\min}\subseteq \mathcal{F} }[/math] such that [math]\displaystyle{ R\subset S\subset T }[/math], which contradicts that [math]\displaystyle{ \mathcal{F} }[/math] is an antichain.

The statement for [math]\displaystyle{ \mathcal{F}'' }[/math] can be proved in the same way.

[math]\displaystyle{ \square }[/math]

Applying the above process, we prove the Sperner's theorem.

Proof of Sperner's theorem
(original proof of Sperner)

Let [math]\displaystyle{ \mathcal{F}_k=\{S\in\mathcal{F}\mid |S|=k\} }[/math], where [math]\displaystyle{ 0\le k\le n }[/math].

We change [math]\displaystyle{ \mathcal{F} }[/math] as follows:

  • for the smallest [math]\displaystyle{ k }[/math] that [math]\displaystyle{ |\mathcal{F}_k|\gt 0 }[/math], if [math]\displaystyle{ k\lt \frac{n-1}{2} }[/math], replace [math]\displaystyle{ \mathcal{F}_k }[/math] by [math]\displaystyle{ \nabla\mathcal{F}_k }[/math].

Due to Proposition 2, this procedure preserves [math]\displaystyle{ \mathcal{F} }[/math] as an antichain and does not decrease [math]\displaystyle{ |\mathcal{F}| }[/math]. Repeat this procedure, until [math]\displaystyle{ |\mathcal{F}_k|=0 }[/math] for all [math]\displaystyle{ k\lt \frac{n-1}{2} }[/math], that is, there is no member set of [math]\displaystyle{ \mathcal{F} }[/math] has size less than [math]\displaystyle{ \frac{n-1}{2} }[/math].

We then define another symmetric procedure:

  • for the largest [math]\displaystyle{ k }[/math] that [math]\displaystyle{ |\mathcal{F}_k|\gt 0 }[/math], if [math]\displaystyle{ k\ge\frac{n+1}{2} }[/math], replace [math]\displaystyle{ \mathcal{F}_k }[/math] by [math]\displaystyle{ \Delta\mathcal{F}_k }[/math].

Also due to Proposition 2, this procedure preserves [math]\displaystyle{ \mathcal{F} }[/math] as an antichain and does not decrease [math]\displaystyle{ |\mathcal{F}| }[/math]. After repeatedly applying this procedure, [math]\displaystyle{ |\mathcal{F}_k|=0 }[/math] for all [math]\displaystyle{ k\ge\frac{n+1}{2} }[/math].

The resulting [math]\displaystyle{ \mathcal{F} }[/math] has [math]\displaystyle{ \mathcal{F}\subseteq{X\choose \lfloor n/2\rfloor} }[/math], and since [math]\displaystyle{ |\mathcal{F}| }[/math] is never decreased, for the original [math]\displaystyle{ \mathcal{F} }[/math], we have

[math]\displaystyle{ |\mathcal{F}|\le {n\choose \lfloor n/2\rfloor} }[/math].
[math]\displaystyle{ \square }[/math]

Second proof (counting)

We now introduce an elegant proof due to Lubell. The proof uses a counting argument, and tells more information than just the size of the set system.

Proof of Sperner's theorem
(Lubell 1966)

Let [math]\displaystyle{ \pi }[/math] be a permutation of [math]\displaystyle{ X }[/math]. We say that an [math]\displaystyle{ S\subseteq X }[/math] prefixes [math]\displaystyle{ \pi }[/math], if [math]\displaystyle{ S=\{\pi_1,\pi_2,\ldots, \pi_{|S|}\} }[/math], that is, [math]\displaystyle{ S }[/math] is precisely the set of the first [math]\displaystyle{ |S| }[/math] elements in the permutation [math]\displaystyle{ \pi }[/math].

Fix an [math]\displaystyle{ S\subseteq X }[/math]. It is easy to see that the number of permutations [math]\displaystyle{ \pi }[/math] of [math]\displaystyle{ X }[/math] prefixed by [math]\displaystyle{ S }[/math] is [math]\displaystyle{ |S|!(n-|S|)! }[/math]. Also, since [math]\displaystyle{ \mathcal{F} }[/math] is an antichain, no permutation [math]\displaystyle{ \pi }[/math] of [math]\displaystyle{ X }[/math] can be prefixed by more than one members of [math]\displaystyle{ \mathcal{F} }[/math], otherwise one of the member sets must contain the other, which contradicts that [math]\displaystyle{ \mathcal{F} }[/math] is an antichain. Thus, the number of permutations [math]\displaystyle{ \pi }[/math] prefixed by some [math]\displaystyle{ S\in\mathcal{F} }[/math] is

[math]\displaystyle{ \sum_{S\in\mathcal{F}}|S|!(n-|S|)! }[/math],

which cannot be larger than the total number of permutations, [math]\displaystyle{ n! }[/math], therefore,

[math]\displaystyle{ \sum_{S\in\mathcal{F}}|S|!(n-|S|)!\le n! }[/math].

Dividing both sides by [math]\displaystyle{ n! }[/math], we have

[math]\displaystyle{ \sum_{S\in\mathcal{F}}\frac{1}{{n\choose |S|}}=\sum_{S\in\mathcal{F}}\frac{|S|!(n-|S|)!}{n!}\le 1 }[/math],

where [math]\displaystyle{ {n\choose |S|}\le {n\choose \lfloor n/2\rfloor} }[/math], so

[math]\displaystyle{ \sum_{S\in\mathcal{F}}\frac{1}{{n\choose |S|}}\ge \frac{|\mathcal{F}|}{{n\choose \lfloor n/2\rfloor}} }[/math].

Combining this with the above inequality, we prove the Sperner's theorem.

[math]\displaystyle{ \square }[/math]

The LYM inequality

Lubell's proof proves the following inequality:

[math]\displaystyle{ \sum_{S\in\mathcal{F}}\frac{1}{{n\choose |S|}}\le 1 }[/math]

which is actually stronger than Sperner's original statement that [math]\displaystyle{ |\mathcal{F}|\le{n\choose \lfloor n/2\rfloor} }[/math].

This inequality is independently discovered by Lubell-Yamamoto, Meschalkin, and Bollobás, and is called the LYM inequality today.

Theorem (Lubell, Yamamoto 1954; Meschalkin 1963)
Let [math]\displaystyle{ \mathcal{F}\subseteq 2^X }[/math] where [math]\displaystyle{ |X|=n }[/math]. If [math]\displaystyle{ \mathcal{F} }[/math] is an antichain, then
[math]\displaystyle{ \sum_{S\in\mathcal{F}}\frac{1}{{n\choose |S|}}\le 1 }[/math].

In Lubell's counting argument proves the LYM inequality, which implies the Sperner's theorem. Here we give another proof of the LYM inequality by the probabilistic method, due to Noga Alon.

Third proof (the probabilistic method)
(Due to Alon.)

Let [math]\displaystyle{ \pi }[/math] be a uniformly random permutation of [math]\displaystyle{ X }[/math]. Define a random maximal chain by

[math]\displaystyle{ \mathcal{C}_\pi=\{\{\pi_i\mid 1\le i\le k\}\mid 0\le k\le n\} }[/math].

For any [math]\displaystyle{ S\in\mathcal{F} }[/math], let [math]\displaystyle{ X_S }[/math] be the 0-1 random variable which indicates whether [math]\displaystyle{ S\in\mathcal{C}_\pi }[/math], that is

[math]\displaystyle{ X_S=\begin{cases} 1 & \mbox{if }S\in\mathcal{C}_\pi,\\ 0 & \mbox{otherwise.} \end{cases} }[/math]

Note that for a uniformly random [math]\displaystyle{ \pi }[/math], [math]\displaystyle{ \mathcal{C}_\pi }[/math] has exact one member set of size [math]\displaystyle{ |S| }[/math], uniformly distributed over [math]\displaystyle{ {X\choose |S|} }[/math], thus

[math]\displaystyle{ \mathbf{E}[X_S]=\Pr[S\in\mathcal{C}_\pi]=\frac{1}{{n\choose |S|}} }[/math].

Let [math]\displaystyle{ X=\sum_{S\in\mathcal{F}}X_S }[/math]. Note that [math]\displaystyle{ X=|\mathcal{F}\cap\mathcal{C}_\pi| }[/math]. By the linearity of expectation,

[math]\displaystyle{ \mathbf{E}[X]=\sum_{S\in\mathcal{F}}\mathbf{E}[X_S]=\sum_{S\in\mathcal{F}}\frac{1}{{n\choose |S|}} }[/math].

On the other hand, since [math]\displaystyle{ \mathcal{F} }[/math] is an antichain, it can never intersect a chain at more than one elements, thus we always have [math]\displaystyle{ X=|\mathcal{F}\cap\mathcal{C}_\pi|\le 1 }[/math]. Therefore,

[math]\displaystyle{ \sum_{S\in\mathcal{F}}\frac{1}{{n\choose |S|}}\le \mathbf{E}[X] \le 1 }[/math].
[math]\displaystyle{ \square }[/math]

The Sperner's theorem is an immediate consequence of the LYM inequality.

Proposition
[math]\displaystyle{ \sum_{S\in\mathcal{F}}\frac{1}{{n\choose |S|}}\le 1 }[/math] implies that [math]\displaystyle{ |\mathcal{F}|\le{n\choose \lfloor n/2\rfloor} }[/math].
Proof.

It holds that [math]\displaystyle{ {n\choose k}\le {n\choose \lfloor n/2\rfloor} }[/math] for any [math]\displaystyle{ k }[/math]. Thus,

[math]\displaystyle{ 1\ge \sum_{S\in\mathcal{F}}\frac{1}{{n\choose |S|}}\ge \frac{|\mathcal{F}|}{{n\choose \lfloor n/2\rfloor}} }[/math],

which implies that [math]\displaystyle{ |\mathcal{F}|\le {n\choose \lfloor n/2\rfloor} }[/math].

[math]\displaystyle{ \square }[/math]

Sunflowers

An set system is a sunflower if all its member sets intersect at the same set of elements.

Definition (sunflower)
A set family [math]\displaystyle{ \mathcal{F}\subseteq 2^X }[/math] is a sunflower of size [math]\displaystyle{ r }[/math] with a core [math]\displaystyle{ C\subseteq X }[/math] if
[math]\displaystyle{ \forall S,T\in\mathcal{F} }[/math] that [math]\displaystyle{ S\neq T }[/math], [math]\displaystyle{ S\cap T=C }[/math].

Note that we do not require the core to be nonempty, thus a family of disjoint sets is also a sunflower (with the core [math]\displaystyle{ \emptyset }[/math]).

The next result due to Erdős and Rado, called the sunflower lemma, is a famous result in extremal set theory, and has some important applications in Boolean circuit complexity.

Sunflower Lemma (Erdős-Rado)
Let [math]\displaystyle{ \mathcal{F}\subseteq {X\choose k} }[/math]. If [math]\displaystyle{ |\mathcal{F}|\gt k!(r-1)^k }[/math], then [math]\displaystyle{ \mathcal{F} }[/math] contains a sunflower of size [math]\displaystyle{ r }[/math].
Proof.

We proceed by induction on [math]\displaystyle{ k }[/math]. For [math]\displaystyle{ k=1 }[/math], [math]\displaystyle{ \mathcal{F}\subseteq{X\choose 1} }[/math], thus all sets in [math]\displaystyle{ \mathcal{F} }[/math] are disjoint. And since [math]\displaystyle{ |\mathcal{F}|\gt r-1 }[/math], we can choose [math]\displaystyle{ r }[/math] of these sets and form a sunflower.

Now let [math]\displaystyle{ k\ge 2 }[/math] and assume the lemma holds for all smaller [math]\displaystyle{ k }[/math]. Take a maximal family [math]\displaystyle{ \mathcal{G}\subseteq \mathcal{F} }[/math] whose members are disjoint, i.e. for any [math]\displaystyle{ S,T\in \mathcal{G} }[/math] that [math]\displaystyle{ S\neq T }[/math], [math]\displaystyle{ S\cap T=\emptyset }[/math].

If [math]\displaystyle{ |\mathcal{G}|\ge r }[/math], then [math]\displaystyle{ \mathcal{G} }[/math] is a sunflower of size at least [math]\displaystyle{ r }[/math] and we are done.

Assume that [math]\displaystyle{ |\mathcal{G}|\le r-1 }[/math], and let [math]\displaystyle{ Y=\bigcup_{S\in\mathcal{G}}S }[/math]. Then [math]\displaystyle{ |Y|=k|\mathcal{G}|\le k(r-1) }[/math] (since all members of [math]\displaystyle{ \mathcal{G} }[/math]) are disjoint). We claim that [math]\displaystyle{ Y }[/math] intersets all members of [math]\displaystyle{ \mathcal{F} }[/math], since if otherwise, there exists an [math]\displaystyle{ S\in\mathcal{F} }[/math] such that [math]\displaystyle{ S\cap Y=\emptyset }[/math], then we can enlarge [math]\displaystyle{ \mathcal{G} }[/math] by adding [math]\displaystyle{ S }[/math] into [math]\displaystyle{ \mathcal{G} }[/math] and still have all members of [math]\displaystyle{ \mathcal{G} }[/math] disjoint, which contradicts the assumption that [math]\displaystyle{ \mathcal{G} }[/math] is the maximum of such families.

By the pigeonhole principle, some elements [math]\displaystyle{ y\in Y }[/math] must contained in at least

[math]\displaystyle{ \frac{|\mathcal{F}|}{|Y|}\gt \frac{k!(r-1)^k}{k(r-1)}=(k-1)!(r-1)^{k-1} }[/math]

members of [math]\displaystyle{ \mathcal{F} }[/math]. We delete this [math]\displaystyle{ y }[/math] from these sets and consider the family

[math]\displaystyle{ \mathcal{H}=\{S\setminus\{y\}\mid S\in\mathcal{F}\wedge y\in S\} }[/math].

We have [math]\displaystyle{ \mathcal{H}\subseteq {X\choose k-1} }[/math] and [math]\displaystyle{ |\mathcal{H}|\gt (k-1)!(r-1)^{k-1} }[/math], thus by the induction hypothesis, [math]\displaystyle{ \mathcal{H} }[/math]contains a sunflower of size [math]\displaystyle{ r }[/math]. Adding [math]\displaystyle{ y }[/math] to the members of this sunflower, we get the desired sunflower in the original family [math]\displaystyle{ \mathcal{F} }[/math].

[math]\displaystyle{ \square }[/math]

The Erdős–Ko–Rado Theorem

A set family [math]\displaystyle{ \mathcal{F}\subseteq 2^X }[/math] is called intersecting, if for any [math]\displaystyle{ S,T\in\mathcal{F} }[/math], [math]\displaystyle{ S\cap T\neq\emptyset }[/math]. A natural question of extremal favor is: "how large can an intersecting family be?"

Assume [math]\displaystyle{ |X|=n }[/math]. When [math]\displaystyle{ n\lt 2k }[/math], every pair of [math]\displaystyle{ k }[/math]-subsets of [math]\displaystyle{ X }[/math] intersects. So the non-trivial case is when [math]\displaystyle{ n\le 2k }[/math]. The famous Erdős–Ko–Rado theorem gives the largest possible cardinality of a nontrivially intersecting family.

According to Erdős, the theorem itself was proved in 1938, but was not published until 23 years later.

Erdős–Ko–Rado theorem (proved in 1938, published in 1961)
Let [math]\displaystyle{ \mathcal{F}\subseteq {X\choose k} }[/math] where [math]\displaystyle{ |X|=n }[/math] and [math]\displaystyle{ n\ge 2k }[/math]. If [math]\displaystyle{ \mathcal{F} }[/math] is intersecting, then
[math]\displaystyle{ |\mathcal{F}|\le{n-1\choose k-1} }[/math].

Katona's proof

We first introduce a proof discovered by Katona in 1972. The proof uses double counting.

Let [math]\displaystyle{ \pi }[/math] be a cyclic permutation of [math]\displaystyle{ X }[/math], that is, we think of assigning [math]\displaystyle{ X }[/math] in a circle and ignore the rotations of the circle. It is easy to see that there are [math]\displaystyle{ (n-1)! }[/math] cyclic permutations of an [math]\displaystyle{ n }[/math]-set (each cyclic permutation corresponds to [math]\displaystyle{ n }[/math] permutations). Let

[math]\displaystyle{ \mathcal{G}_\pi=\{\{\pi_{(i+j)\bmod n}\mid j\in[k]\}\mid i\in [n]\} }[/math].

The next lemma states the following observation: in a circle of [math]\displaystyle{ n }[/math] points, supposed [math]\displaystyle{ n\ge 2k }[/math], there can be at most [math]\displaystyle{ k }[/math] arcs, each consisting of [math]\displaystyle{ k }[/math] points, such that every pair of arcs share at least one point.

Lemma
Let [math]\displaystyle{ \mathcal{F}\subseteq {X\choose k} }[/math] where [math]\displaystyle{ |X|=n }[/math] and [math]\displaystyle{ n\ge 2k }[/math]. If [math]\displaystyle{ \mathcal{F} }[/math] is intersecting, then for any cyclic permutation [math]\displaystyle{ \pi }[/math] of [math]\displaystyle{ X }[/math], it holds that [math]\displaystyle{ |\mathcal{G}_\pi\cap\mathcal{F}|\le k }[/math].
Proof.

Fix a cyclic permutation [math]\displaystyle{ \pi }[/math] of [math]\displaystyle{ X }[/math]. Let [math]\displaystyle{ A_i=\{\pi_{(i+j+n)\bmod n}\mid j\in[k]\} }[/math]. Then [math]\displaystyle{ \mathcal{G}_\pi }[/math] can be written as [math]\displaystyle{ \mathcal{G}_\pi=\{A_i\mid i\in [n]\} }[/math].

Suppose that [math]\displaystyle{ A_t\in\mathcal{F} }[/math]. Since [math]\displaystyle{ \mathcal{F} }[/math] is intersecting, the only sets [math]\displaystyle{ A_i }[/math] that can be in [math]\displaystyle{ \mathcal{F} }[/math] other than [math]\displaystyle{ A_t }[/math] itself are the [math]\displaystyle{ 2k-2 }[/math] sets [math]\displaystyle{ A_i }[/math] with [math]\displaystyle{ t-(k-1)\le i\le t+k-1, i\neq t }[/math]. We partition these sets into [math]\displaystyle{ k-1 }[/math] pairs [math]\displaystyle{ \{A_i,A_{i+k}\} }[/math], where [math]\displaystyle{ t-(k-1)\le i\le t-1 }[/math].

Note that for [math]\displaystyle{ n\ge 2k }[/math], it holds that [math]\displaystyle{ A_i\cap C_{i+k}=\emptyset }[/math]. Since [math]\displaystyle{ \mathcal{F} }[/math] is intersecting, [math]\displaystyle{ \mathcal{F} }[/math] can contain at most one set of each such pair. The lemma follows.

[math]\displaystyle{ \square }[/math]

The Katona's proof of Erdős–Ko–Rado theorem is done by counting in two ways the pairs of member [math]\displaystyle{ S }[/math] of [math]\displaystyle{ \mathcal{F} }[/math] and cyclic permutation [math]\displaystyle{ \pi }[/math] which contain [math]\displaystyle{ S }[/math] as a continuous path on the circle (i.e., an arc).

Katona's proof of Erdős–Ko–Rado theorem
(double counting)

Let

[math]\displaystyle{ \mathcal{R}=\{(S,\pi)\mid \pi \text{ is a cyclic permutation of }X, \text{and }S\in\mathcal{F}\cap\mathcal{G}_\pi\} }[/math].

We count [math]\displaystyle{ \mathcal{R} }[/math] in two ways.

First, due to the lemma, [math]\displaystyle{ |\mathcal{F}\cap\mathcal{G}_\pi|\le k }[/math] for any cyclic permutation [math]\displaystyle{ \pi }[/math]. There are [math]\displaystyle{ (n-1)! }[/math] cyclic permutations in total. Thus,

[math]\displaystyle{ |\mathcal{R}|=\sum_{\text{cyclic }\pi}|\mathcal{F}\cap\mathcal{G}_\pi|\le k(n-1)! }[/math].

Next, for each [math]\displaystyle{ S\in\mathcal{F} }[/math], the number of cyclic permutations [math]\displaystyle{ \pi }[/math] in which [math]\displaystyle{ S }[/math] is continuous is [math]\displaystyle{ |S|!(n-|S|)!=k!(n-k)! }[/math]. Thus,

[math]\displaystyle{ |\mathcal{R}|=\sum_{S\in\mathcal{F}}k!(n-k)!=|\mathcal{F}|k!(n-k)! }[/math].

Altogether, we have

[math]\displaystyle{ |\mathcal{F}|\le\frac{k(n-1)!}{k!(n-k)!}=\frac{(n-1)!}{(k-1)!(n-k)!}={n-1\choose k-1} }[/math].
[math]\displaystyle{ \square }[/math]

Erdős' shifting technique

We now introduce the original proof of the Erdős–Ko–Rado theorem, which uses a technique called shifting (originally called compression).

Without loss of generality, we assume [math]\displaystyle{ X=[n] }[/math], and restate the Erdős–Ko–Rado theorem as follows.

Erdős–Ko–Rado theorem
Let [math]\displaystyle{ \mathcal{F}\subseteq {[n]\choose k} }[/math] and [math]\displaystyle{ n\ge 2k }[/math]. If [math]\displaystyle{ \mathcal{F} }[/math] is intersecting, then [math]\displaystyle{ |\mathcal{F}|\le{n-1\choose k-1} }[/math].

We define a shift operator for the set family.

Definition (shift operator)
Assume [math]\displaystyle{ \mathcal{F}\subseteq 2^{[n]} }[/math], and [math]\displaystyle{ 0\le i\lt j\le n-1 }[/math]. Define the [math]\displaystyle{ (i,j) }[/math]-shift [math]\displaystyle{ S_{ij} }[/math] as an operator on [math]\displaystyle{ \mathcal{F} }[/math] as follows:
  • for each [math]\displaystyle{ T\in\mathcal{F} }[/math], write [math]\displaystyle{ T_{ij}=(T\setminus\{j\})\cup\{i\} }[/math], and let
[math]\displaystyle{ S_{ij}(T)= \begin{cases} T_{ij} & \mbox{if }j\in T, i\not\in T, \mbox{ and }T_{ij} \not\in\mathcal{F},\\ T & \mbox{otherwise;} \end{cases} }[/math]
  • let [math]\displaystyle{ S_{ij}(\mathcal{F})=\{S_{ij}(T)\mid T\in \mathcal{F}\} }[/math].

It is easy to verify the following propositions of shifts.

Proposition
  1. [math]\displaystyle{ |S_{ij}(T)|=|T|\, }[/math] and [math]\displaystyle{ |S_{ij}(\mathcal{F})|=\mathcal{F} }[/math];
  2. if [math]\displaystyle{ \mathcal{F} }[/math] is intersecting, then so is [math]\displaystyle{ S_{ij}(\mathcal{F}) }[/math].
Proof.

(1) is quite obvious. Now we prove (2).

Consider any [math]\displaystyle{ A,B\in\mathcal{F} }[/math]. If [math]\displaystyle{ A\cap B }[/math] includes any element other than [math]\displaystyle{ j }[/math] then [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math] are still intersecting after [math]\displaystyle{ (i,j) }[/math]-shift. Thus without loss of generality, we may consider the only unsafe case where [math]\displaystyle{ A\cap B=\{j\} }[/math], and [math]\displaystyle{ B }[/math] is successfully shifted to [math]\displaystyle{ B_{ij}=(B\setminus\{j\})\cup\{i\} }[/math] but [math]\displaystyle{ A }[/math] fails to shift to [math]\displaystyle{ A_{ij}=(A\setminus\{j\})\cup\{i\} }[/math].

Since [math]\displaystyle{ B }[/math] is successfully shifted to [math]\displaystyle{ B_{ij} }[/math], we know that it must hold that [math]\displaystyle{ i\not\in B }[/math] and [math]\displaystyle{ B_{ij}\not\in\mathcal{F} }[/math]. And the only two reasons for which [math]\displaystyle{ A }[/math] may fail to shift to [math]\displaystyle{ A_{ij} }[/math] are: (1) [math]\displaystyle{ i\in A }[/math] and (2) [math]\displaystyle{ i\not\in A }[/math] but [math]\displaystyle{ A_{ij}\in\mathcal{F} }[/math].

Case.1: [math]\displaystyle{ i\in A }[/math]. In this case, since [math]\displaystyle{ i\in B_{ij} }[/math], we have [math]\displaystyle{ S_{ij}(A)\cap S_{ij}(B)=A\cap B_{ij}=\{i\} }[/math], i.e. [math]\displaystyle{ B }[/math] is still intersecting with [math]\displaystyle{ A }[/math] after shifting.

Case.2: [math]\displaystyle{ i\not\in A }[/math] but [math]\displaystyle{ A_{ij}\in\mathcal{F} }[/math]. In this case, it is easy to verify that [math]\displaystyle{ A_{ij}\cap B=(A\cap B)\setminus\{j\}=\emptyset }[/math]. Recall that we assume [math]\displaystyle{ A_{ij}\in\mathcal{F} }[/math]. This contradicts to that [math]\displaystyle{ \mathcal{F} }[/math] is intersecting.

In conclusion, in all cases, [math]\displaystyle{ S_{ij}(\mathcal{F}) }[/math] remains intersecting.

[math]\displaystyle{ \square }[/math]

Repeatedly applying [math]\displaystyle{ S_{ij}(\mathcal{F}) }[/math] for any [math]\displaystyle{ 0\le i\lt j\le n-1 }[/math], since we only replace elements by smaller elements, eventually [math]\displaystyle{ \mathcal{F} }[/math] will stop changing, that is, [math]\displaystyle{ S_{ij}(\mathcal{F})=\mathcal{F} }[/math] for all [math]\displaystyle{ 0\le i\lt j\le n-1 }[/math]. We call such an [math]\displaystyle{ \mathcal{F} }[/math] shifted.

The idea behind the shifting technique is very natural: by applying shifting, all intersecting families are transformed to some special forms, and we only need to prove the theorem for these special form of intersecting families.

Proof of Erdős-Ko-Rado theorem
(The original proof of Erdős-Ko-Rado by shifting)

By the above lemma, it is sufficient to prove the Erdős-Ko-Rado theorem holds for shifted [math]\displaystyle{ \mathcal{F} }[/math]. We assume that [math]\displaystyle{ \mathcal{F} }[/math] is shifted.

First, it is trivial to see that the theorem holds for [math]\displaystyle{ k=1 }[/math] (no matter whether shifted).

Next, we show that the theorem holds when [math]\displaystyle{ n=2k }[/math] (no matter whether shifted). For any [math]\displaystyle{ S\in{X\choose k} }[/math], both [math]\displaystyle{ S }[/math] and [math]\displaystyle{ X\setminus S }[/math] are in [math]\displaystyle{ {X\choose k} }[/math], but at most one of them can be in [math]\displaystyle{ \mathcal{F} }[/math]. Thus,

[math]\displaystyle{ |\mathcal{F}|\le\frac{1}{2}{n\choose k}=\frac{n!}{2k!(n-k)!}=\frac{(n-1)!}{(k-1)!(n-k)!}={n-1\choose k-1} }[/math].

We then apply the induction on [math]\displaystyle{ n }[/math]. For [math]\displaystyle{ n\gt 2k }[/math], the induction hypothesis is stated as:

  • the Erdős-Ko-Rado theorem holds for any smaller [math]\displaystyle{ n }[/math].

Define

[math]\displaystyle{ \mathcal{F}_0=\{S\in\mathcal{F}\mid n\not\in S\} }[/math] and [math]\displaystyle{ \mathcal{F}_1=\{S\in\mathcal{F}\mid n\in S\} }[/math].

Clearly, [math]\displaystyle{ \mathcal{F}_0\subseteq{[n-1]\choose k} }[/math] and [math]\displaystyle{ \mathcal{F}_0 }[/math] is intersecting. Due to the induction hypothesis, [math]\displaystyle{ |\mathcal{F}_0|\le{n-2\choose k-1} }[/math].

In order to apply the induction, we let

[math]\displaystyle{ \mathcal{F}_1'=\{S\setminus\{n\}\mid S\in\mathcal{F}_1\} }[/math].

Clearly, [math]\displaystyle{ \mathcal{F}_1'\subseteq{[n-1]\choose k-1} }[/math]. If only it is also intersecting, we can apply the induction hypothesis, and indeed it is. To see this, by contradiction we assume that [math]\displaystyle{ \mathcal{F}_1' }[/math] is not intersecting. Then there must exist [math]\displaystyle{ A,B\in\mathcal{F} }[/math] such that [math]\displaystyle{ A\cap B=\{n\} }[/math], which means that [math]\displaystyle{ |A\cup B|\le 2k-1\lt n-1 }[/math]. Thus, there is some [math]\displaystyle{ 0\le i\le n-1 }[/math] such that [math]\displaystyle{ i\not\in A\cup B }[/math]. Since [math]\displaystyle{ \mathcal{F} }[/math] is shifted, [math]\displaystyle{ A_{in}=A\setminus\{n\}\cup\{i\}\in\mathcal{F} }[/math]. On the other hand it can be verified that [math]\displaystyle{ A_{in}\cap B=\emptyset }[/math], which contradicts that [math]\displaystyle{ \mathcal{F} }[/math] is intersecting.

Thus, [math]\displaystyle{ \mathcal{F}_1'\subseteq{[n-1]\choose k-1} }[/math] and [math]\displaystyle{ \mathcal{F}_1' }[/math] is intersecting. Due to the induction hypothesis, [math]\displaystyle{ |\mathcal{F}_1'|\le{n-2\choose k-2} }[/math].

Combining these together,

[math]\displaystyle{ |\mathcal{F}|=|\mathcal{F}_0|+|\mathcal{F}_1|=|\mathcal{F}_0|+|\mathcal{F}_1'|\le {n-2\choose k-1}+{n-2\choose k-2}={n-1\choose k-1} }[/math].
[math]\displaystyle{ \square }[/math]

Sauer's lemma and VC-dimension

Shattering and the VC-dimension

Definition (shatter)
Let [math]\displaystyle{ \mathcal{F}\subseteq 2^X }[/math] be set family and let [math]\displaystyle{ R\subseteq X }[/math] be a subset. The trace of [math]\displaystyle{ \mathcal{F} }[/math] on [math]\displaystyle{ R }[/math], denoted [math]\displaystyle{ \mathcal{F}|_R }[/math] is defined as
[math]\displaystyle{ \mathcal{F}|_R=\{S\cap R\mid S\in\mathcal{F}\} }[/math].
We say that [math]\displaystyle{ \mathcal{F} }[/math] shatters [math]\displaystyle{ R }[/math] if [math]\displaystyle{ \mathcal{F}|_R=2^R }[/math], i.e. for all [math]\displaystyle{ T\subseteq R }[/math], there exists an [math]\displaystyle{ S\in\mathcal{F} }[/math] such that [math]\displaystyle{ T=S\cap R }[/math].

The VC dimension is defined by the power of a family to shatter a set.

Definition (VC-dimension)
The Vapnik–Chervonenkis dimension (VC-dimension) of a set family [math]\displaystyle{ \mathcal{F}\subseteq 2^X }[/math], denoted [math]\displaystyle{ \text{VC-dim}(\mathcal{F}) }[/math], is the size of the largest [math]\displaystyle{ R\subseteq X }[/math] shattered by [math]\displaystyle{ \mathcal{F} }[/math].

It is a core concept in computational learning theory.

Each subset [math]\displaystyle{ S\subseteq X }[/math] can be equivalently represented by its characteristic function [math]\displaystyle{ f_S:X\rightarrow\{0,1\} }[/math], such that for each [math]\displaystyle{ x\in X }[/math],

[math]\displaystyle{ f_S(x)=\begin{cases} 1 & x\in S\\ 0 & x\not\in S. \end{cases} }[/math]

Then a set family [math]\displaystyle{ \mathcal{F}\subseteq2^X }[/math] corresponds to a collection of boolean functions [math]\displaystyle{ \{f_S\mid S\in\mathcal{F}\} }[/math], which is a subset of all Boolean functions in the form [math]\displaystyle{ f:X\rightarrow\{0,1\} }[/math]. We wonder on how large a subdomain [math]\displaystyle{ Y\subseteq X }[/math], [math]\displaystyle{ \mathcal{F} }[/math] includes all the [math]\displaystyle{ 2^{|Y|} }[/math] mappings [math]\displaystyle{ Y\rightarrow\{0,1\} }[/math]. The largest size of such subdomain is the VC-dimension. It measures how complicated a collection of boolean functions (or equivalently a set family) is.

Sauer's Lemma

The definition of the VC-dimension involves enumerating all subsets, thus is difficult to analyze in general. The following famous result state a very simple sufficient condition to lower bound the VC-dimension, regarding only the size of the family. The lemma is due to Sauer, and independently due to Shelah and Perles. A slightly weaker version is found by Vapnik and Chervonenkis, who use the framework to develop a theory of classifications.

Sauer's Lemma (Sauer; Shelah-Perles; Vapnik-Chervonenkis)
Let [math]\displaystyle{ \mathcal{F}\subseteq 2^X }[/math] where [math]\displaystyle{ |X|=n }[/math]. If [math]\displaystyle{ |\mathcal{F}|\gt \sum_{1\le i\lt k}{n\choose i} }[/math], then there exists an [math]\displaystyle{ R\in{X\choose k} }[/math] such that [math]\displaystyle{ \mathcal{F} }[/math] shatters [math]\displaystyle{ R }[/math].

In other words, for any set family [math]\displaystyle{ \mathcal{F} }[/math] with [math]\displaystyle{ |\mathcal{F}|\gt \sum_{1\le i\lt k}{n\choose i} }[/math], its VC-dimension [math]\displaystyle{ \text{VC-dim}(\mathcal{F})\ge k }[/math].

Hereditary family

We note the Sauer's lemma is especially easy to prove for a special type of set families, called the hereditary families.

Definition (hereditary family)
A set system [math]\displaystyle{ \mathcal{F}\subseteq 2^X }[/math] is said to be hereditary (also called an ideal or an abstract simplicial complex), if
[math]\displaystyle{ S\subseteq T\in\mathcal{F} }[/math] implies [math]\displaystyle{ S\in\mathcal{F} }[/math].

In other words, for a hereditary family [math]\displaystyle{ \mathcal{F} }[/math], if [math]\displaystyle{ R\in\mathcal{F} }[/math], then all subsets of [math]\displaystyle{ R }[/math] are also in [math]\displaystyle{ \mathcal{F} }[/math]. An immediate consequence is the following proposition.

Proposition
Let [math]\displaystyle{ \mathcal{F} }[/math] be a hereditary family. If [math]\displaystyle{ R\in\mathcal{F} }[/math] then [math]\displaystyle{ \mathcal{F} }[/math] shatters [math]\displaystyle{ R }[/math].

Therefore, it is very easy to prove the Sauer's lemma for hereditary families:

Lemma
For [math]\displaystyle{ \mathcal{F}\subseteq 2^X }[/math], [math]\displaystyle{ |X|=n }[/math], if [math]\displaystyle{ \mathcal{F} }[/math] is hereditary and [math]\displaystyle{ |\mathcal{F}|\gt \sum_{1\le i\lt k}{n\choose i} }[/math] then there exists an [math]\displaystyle{ R\in{X\choose k} }[/math] such that [math]\displaystyle{ \mathcal{F} }[/math] shatters [math]\displaystyle{ R }[/math].
Proof.

Since [math]\displaystyle{ \mathcal{F} }[/math] is hereditary, we only need to show that there exists an [math]\displaystyle{ R\in\mathcal{F} }[/math] of size [math]\displaystyle{ |R|\ge k }[/math], which must be true, because if all members of [math]\displaystyle{ \mathcal{F} }[/math] are of sizes [math]\displaystyle{ \lt k }[/math], then [math]\displaystyle{ |\mathcal{F}|\le\left|\bigcup_{1\le i\lt k}{X\choose k}\right|=\sum_{1\le i\lt k}{n\choose i} }[/math], a contradiction.

[math]\displaystyle{ \square }[/math]

To prove the Sauer's lemma for general non-hereditary families, we can use some way to reduce arbitrary families to hereditary families. Here we apply the shifting technique to achieve this.

Down-shifts

Note that we work on [math]\displaystyle{ \mathcal{F}\subseteq2^X }[/math], instead of [math]\displaystyle{ \mathcal{F}\subseteq{X\choose k} }[/math] like in the Erdős–Ko–Rado theorem, so we do not need to preserve the size of member sets. Instead, we need to reduce an arbitrary family to a hereditary one, thus we use a shift operator which replaces a member set by a subset of it.

Definition (down-shifts)
Assume [math]\displaystyle{ \mathcal{F}\subseteq 2^{[n]} }[/math], and [math]\displaystyle{ i\in[n] }[/math]. Define the down-shift operator [math]\displaystyle{ S_{i} }[/math] as follows:
  • for each [math]\displaystyle{ T\in\mathcal{F} }[/math], let
[math]\displaystyle{ S_{i}(T)= \begin{cases} T\setminus\{i\} & \mbox{if }i\in T \mbox{ and }T\setminus\{i\} \not\in\mathcal{F},\\ T & \mbox{otherwise;} \end{cases} }[/math]
  • let [math]\displaystyle{ S_{i}(\mathcal{F})=\{S_{i}(T)\mid T\in \mathcal{F}\} }[/math].

Repeatedly applying [math]\displaystyle{ S_i }[/math] to [math]\displaystyle{ \mathcal{F} }[/math] for all [math]\displaystyle{ i\in[n] }[/math], due to the finiteness, eventually [math]\displaystyle{ \mathcal{F} }[/math] is not changed by any [math]\displaystyle{ S_i }[/math]. We call such a family down-shifted. A family [math]\displaystyle{ \mathcal{F} }[/math] is down-shifted if and only if [math]\displaystyle{ S_i(\mathcal{F})=\mathcal{F} }[/math] for all [math]\displaystyle{ i\in[n] }[/math]. It is then easy to see that a down-shifted [math]\displaystyle{ \mathcal{F} }[/math] must be hereditary.

Theorem
If [math]\displaystyle{ \mathcal{F}\subseteq2^X }[/math] is down-shifted, then [math]\displaystyle{ \mathcal{F} }[/math] is hereditary.

In order to use down-shift to prove the Sauer's lemma, we need to make sure that down-shift does not decrease [math]\displaystyle{ |\mathcal{F}| }[/math] and does not increase the VC-dimension [math]\displaystyle{ \text{VC-dim}(\mathcal{F}) }[/math].

Proposition
  1. [math]\displaystyle{ |S_{i}(\mathcal{F})|=\mathcal{F} }[/math];
  2. [math]\displaystyle{ |S_i(\mathcal{F})|_R|\le |\mathcal{F}|_R| }[/math], thus if [math]\displaystyle{ S_{i}(\mathcal{F}) }[/math] shatters an [math]\displaystyle{ R }[/math], so does [math]\displaystyle{ \mathcal{F} }[/math].
Proof.

(1) is immediate. To prove (2), it is sufficient to prove that [math]\displaystyle{ S_i(\mathcal{F})|_R\subseteq S_i(\mathcal{F}|_R) }[/math] and then apply (1).

File:Down-shift-property.png

[math]\displaystyle{ \square }[/math]
Proof of Sauer's lemma

Now we can prove the Sauer's lemma for arbitrary [math]\displaystyle{ \mathcal{F}\subseteq 2^{[n]} }[/math].

For any [math]\displaystyle{ \mathcal{F}\subseteq 2^{[n]} }[/math], repeatedly apply [math]\displaystyle{ S_i }[/math] for all [math]\displaystyle{ i\in }[/math] till the family is down-shifted, which is denoted by [math]\displaystyle{ \mathcal{F}' }[/math]. We have proved that [math]\displaystyle{ |\mathcal{F}'|=|\mathcal{F}|\gt \sum_{1\le i\lt k}{n\choose i} }[/math] and [math]\displaystyle{ \mathcal{F}' }[/math] is hereditary, thus as argued before, here exists an [math]\displaystyle{ R }[/math] of size [math]\displaystyle{ k }[/math] shattered by [math]\displaystyle{ \mathcal{F}' }[/math]. By the above proposition, [math]\displaystyle{ |\mathcal{F}'|_R|\le |\mathcal{F}|_R| }[/math], thus [math]\displaystyle{ \mathcal{F} }[/math] also shatters [math]\displaystyle{ R }[/math]. The lemma is proved.

[math]\displaystyle{ \square }[/math]