随机算法 (Fall 2011)/Coupon Collector
Suppose that a chocolate company releases different types of coupons. Each box of chocolates contains one coupon with a uniformly random type. Once you have collected all types of coupons, you will get a prize. So how many boxes of chocolates you are expected to buy to win the prize?
The coupon collector problem can be described in the balls-into-bins model as follows. We keep throwing balls one-by-one into bins (coupons), such that each ball is thrown into a bin uniformly and independently at random. Each ball corresponds to a box of chocolate, and each bin corresponds to a type of coupon. Thus, the number of boxes bought to collect coupons is just the number of balls thrown until none of the bins is empty.
- Let be the number of balls thrown uniformly and independently to bins until no bin is empty. Then , where is the th harmonic number.
Proof. Let be the number of balls thrown while there are exactly nonempty bins, then clearly .
When there are exactly nonempty bins, throwing a ball, the probability that the number of nonempty bins increases (i.e. the ball is thrown to an empty bin) is
is the number of balls thrown to make the number of nonempty bins increases from to , i.e. the number of balls thrown until a ball is thrown to a current empty bin. Thus, follows the geometric distribution, such that
For a geometric random variable, .
Applying the linearity of expectations,
where is the th Harmonic number, and . Thus, for the coupon collectors problem, the expected number of coupons required to obtain all types of coupons is .
Only knowing the expectation is not good enough. We would like to know how fast the probability decrease as a random variable deviates from its mean value.
- Let be the number of balls thrown uniformly and independently to bins until no bin is empty. Then for any .
Proof. For any particular bin , the probability that bin is empty after throwing balls is
By the union bound, the probability that there exists an empty bin after throwing balls is