随机算法 (Fall 2011)/Problem set 1: Difference between revisions
imported>Etone Created page with "== Problem 0== 你的姓名、学号。 == Problem 1 == == Problem 2 == == Problem 3 ==" |
imported>Etone |
||
Line 3: | Line 3: | ||
== Problem 1 == | == Problem 1 == | ||
(Interviewing problem of Google Inc.) | |||
Give an [http://en.wikipedia.org/wiki/Streaming_algorithm streaming algorithm] maintaining a uniform sample from a data stream. The meaning of this sentence is explained as follows: | |||
Suppose that the input is a sequence of items <math>A[1], A[2], A[3], \ldots, A[n]</math>, which is passed to your algorithm by one item at a time in the sequential order. (Equivalently, you can imagine that your algorithm ''scans'' over a large array <math>A</math> in one direction from left to right.) | |||
You algorithm should return an <math>A[r]</math>, where <math>r</math> is uniformly distributed over <math>{1,2,\ldots, n}</math>. | |||
Usually the input "data stream" is from a massive data set (e.g. search engine data), so you cannot afford storing the entire input. Make your algorithm use as small space as possible. We hope for a small space storing only constant number of items. | |||
* Develop an algorithm for the above problem. Give rigorous analysis for the algorithm to justify its correctness and efficiency. | |||
* Develop an algorithm which works even if <math>n</math> is not known in advance, and also give your analysis for the algorithm. (If your algorithm already satisfies this requirement, it's OK to use one algorithm answering both questions.) | |||
== Problem 2 == | == Problem 2 == | ||
== Problem 3 == | == Problem 3 == |
Revision as of 08:09, 18 September 2011
Problem 0
你的姓名、学号。
Problem 1
(Interviewing problem of Google Inc.)
Give an streaming algorithm maintaining a uniform sample from a data stream. The meaning of this sentence is explained as follows:
Suppose that the input is a sequence of items [math]\displaystyle{ A[1], A[2], A[3], \ldots, A[n] }[/math], which is passed to your algorithm by one item at a time in the sequential order. (Equivalently, you can imagine that your algorithm scans over a large array [math]\displaystyle{ A }[/math] in one direction from left to right.)
You algorithm should return an [math]\displaystyle{ A[r] }[/math], where [math]\displaystyle{ r }[/math] is uniformly distributed over [math]\displaystyle{ {1,2,\ldots, n} }[/math].
Usually the input "data stream" is from a massive data set (e.g. search engine data), so you cannot afford storing the entire input. Make your algorithm use as small space as possible. We hope for a small space storing only constant number of items.
- Develop an algorithm for the above problem. Give rigorous analysis for the algorithm to justify its correctness and efficiency.
- Develop an algorithm which works even if [math]\displaystyle{ n }[/math] is not known in advance, and also give your analysis for the algorithm. (If your algorithm already satisfies this requirement, it's OK to use one algorithm answering both questions.)