随机算法 (Fall 2011)/Problem set 1

From TCS Wiki
Revision as of 08:15, 18 September 2011 by imported>Etone (→‎Problem 1)
Jump to navigation Jump to search

Problem 0

你的姓名、学号。

Problem 1

(Interviewing problem of Google Inc.)

Give a streaming algorithm maintaining a uniform sample from a data stream. The meaning of this sentence is explained as follows:

Suppose that the input is a sequence of items [math]\displaystyle{ A[1], A[2], A[3], \ldots, A[n] }[/math], which is passed to your algorithm by one item at a time in the sequential order. (Equivalently, you can imagine that your algorithm scans over a large array [math]\displaystyle{ A }[/math] in one direction from left to right.)

You algorithm should return an [math]\displaystyle{ A[r] }[/math], where [math]\displaystyle{ r }[/math] is uniformly distributed over [math]\displaystyle{ \{1,2,\ldots, n\} }[/math].

Usually the input "data stream" is from a massive data set (e.g. search engine data), so you cannot afford storing the entire input. Make your algorithm use as small space as possible. We hope for a small space storing only constant number of items.

  • Develop an algorithm for the above problem. Give rigorous analysis for the algorithm to justify its correctness and efficiency.
  • Develop an algorithm which works even if [math]\displaystyle{ n }[/math] is not known in advance, and also give your analysis for the algorithm. (If your algorithm already satisfies this requirement, it's OK to have one algorithm answer both questions.)

Problem 2

Problem 3