# 随机算法 (Fall 2011)/Coupon Collector

Suppose that a chocolate company releases ${\displaystyle n}$ different types of coupons. Each box of chocolates contains one coupon with a uniformly random type. Once you have collected all ${\displaystyle n}$ types of coupons, you will get a prize. So how many boxes of chocolates you are expected to buy to win the prize?

The coupon collector problem can be described in the balls-into-bins model as follows. We keep throwing balls one-by-one into ${\displaystyle n}$ bins (coupons), such that each ball is thrown into a bin uniformly and independently at random. Each ball corresponds to a box of chocolate, and each bin corresponds to a type of coupon. Thus, the number of boxes bought to collect ${\displaystyle n}$ coupons is just the number of balls thrown until none of the ${\displaystyle n}$ bins is empty.

 Theorem Let ${\displaystyle X}$ be the number of balls thrown uniformly and independently to ${\displaystyle n}$ bins until no bin is empty. Then ${\displaystyle \mathbf {E} [X]=nH(n)}$, where ${\displaystyle H(n)}$ is the ${\displaystyle n}$th harmonic number.
Proof.
 Let ${\displaystyle X_{i}}$ be the number of balls thrown while there are exactly ${\displaystyle i-1}$ nonempty bins, then clearly ${\displaystyle X=\sum _{i=1}^{n}X_{i}}$. When there are exactly ${\displaystyle i-1}$ nonempty bins, throwing a ball, the probability that the number of nonempty bins increases (i.e. the ball is thrown to an empty bin) is ${\displaystyle p_{i}=1-{\frac {i-1}{n}}.}$ ${\displaystyle X_{i}}$ is the number of balls thrown to make the number of nonempty bins increases from ${\displaystyle i-1}$ to ${\displaystyle i}$, i.e. the number of balls thrown until a ball is thrown to a current empty bin. Thus, ${\displaystyle X_{i}}$ follows the geometric distribution, such that ${\displaystyle \Pr[X_{i}=k]=(1-p_{i})^{k-1}p_{i}}$ For a geometric random variable, ${\displaystyle \mathbf {E} [X_{i}]={\frac {1}{p_{i}}}={\frac {n}{n-i+1}}}$. Applying the linearity of expectations, {\displaystyle {\begin{aligned}\mathbf {E} [X]&=\mathbf {E} \left[\sum _{i=1}^{n}X_{i}\right]\\&=\sum _{i=1}^{n}\mathbf {E} \left[X_{i}\right]\\&=\sum _{i=1}^{n}{\frac {n}{n-i+1}}\\&=n\sum _{i=1}^{n}{\frac {1}{i}}\\&=nH(n),\end{aligned}}} where ${\displaystyle H(n)}$ is the ${\displaystyle n}$th Harmonic number, and ${\displaystyle H(n)=\ln n+O(1)}$. Thus, for the coupon collectors problem, the expected number of coupons required to obtain all ${\displaystyle n}$ types of coupons is ${\displaystyle n\ln n+O(n)}$.
${\displaystyle \square }$

Only knowing the expectation is not good enough. We would like to know how fast the probability decrease as a random variable deviates from its mean value.

 Theorem Let ${\displaystyle X}$ be the number of balls thrown uniformly and independently to ${\displaystyle n}$ bins until no bin is empty. Then ${\displaystyle \Pr[X\geq n\ln n+cn] for any ${\displaystyle c>0}$.
Proof.
 For any particular bin ${\displaystyle i}$, the probability that bin ${\displaystyle i}$ is empty after throwing ${\displaystyle n\ln n+cn}$ balls is ${\displaystyle \left(1-{\frac {1}{n}}\right)^{n\ln n+cn} By the union bound, the probability that there exists an empty bin after throwing ${\displaystyle n\ln n+cn}$ balls is ${\displaystyle \Pr[X\geq n\ln n+cn]
${\displaystyle \square }$