随机算法 (Fall 2011)/Problem set 1: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>Etone
imported>Etone
 
(13 intermediate revisions by the same user not shown)
Line 5: Line 5:
(Interviewing problem of Google Inc.)
(Interviewing problem of Google Inc.)


Give an [http://en.wikipedia.org/wiki/Streaming_algorithm streaming algorithm] maintaining a uniform sample from a data stream. The meaning of this sentence is explained as follows:
Give a [http://en.wikipedia.org/wiki/Streaming_algorithm streaming algorithm] to maintain a uniform sample from a data stream. The meaning of this sentence is explained as follows:


Suppose that the input is a sequence of items <math>A[1], A[2], A[3], \ldots, A[n]</math>, which is passed to your algorithm by one item at a time in the sequential order. (Equivalently, you can imagine that your algorithm ''scans'' over a large array <math>A</math> in one direction from left to right.)
Suppose that the input is a sequence of items <math>A[1], A[2], A[3], \ldots, A[n]</math>, which is passed to your algorithm by one item at a time in the sequential order. (Equivalently, you can imagine that your algorithm ''scans'' over a large array <math>A</math> in one direction from left to right.)


You algorithm should return an <math>A[r]</math>, where <math>r</math> is uniformly distributed over <math>{1,2,\ldots, n}</math>.
You algorithm should return an <math>A[r]</math>, where <math>r</math> is uniformly distributed over <math>\{1,2,\ldots, n\}</math>.


Usually the input "data stream" is from a massive data set (e.g. search engine data), so you cannot afford storing the entire input. Make your algorithm use as small space as possible. We hope for a small space storing only constant number of items.  
Usually the input "data stream" is from a massive data set (e.g. search engine data), so you cannot afford storing the entire input. Make your algorithm use as small space as possible. We hope for a small space storing only constant number of items.  


* Develop an algorithm for the above problem. Give rigorous analysis for the algorithm to justify its correctness and efficiency.
* Develop an algorithm for the above problem. Give rigorous analysis for the algorithm to justify its correctness and efficiency.
* Develop an algorithm which works even if <math>n</math> is not known in advance, and also give your analysis for the algorithm. (If your algorithm already satisfies this requirement, it's OK to use one algorithm answering both questions.)
* Develop an algorithm which works even if <math>n</math> is not known in advance, and also give your analysis for the algorithm. (If your algorithm already satisfies this requirement, it's OK to have one algorithm answer both questions.)


== Problem 2 ==
== Problem 2 ==
(Interviewing problem of Theory Group in Microsoft Research Asia for student interns.)
We start with <math>n</math> people, each with 2 hands. None of these hands hold each other.
At each round, we uniformly pick 2 free hands and let them hold together.
* After how many rounds, there are no free hands left?
* What is the expected number of cycles made by people holding hands with each other (one person with left hand holding right hand is also counted as a cycle), when there are no free hands left?


== Problem 3 ==
== Problem 3 ==
We use the Bloom filters to store playlists of songs, and find people with same taste in music.
Suppose Li Lei has a set <math>X</math> and Han Meimei has a set <math>Y</math> (their playlists), both with <math>n</math> elements. Denote <math>\ell=|X\cap Y|</math>.
We create Bloom filters for both <math>X</math> and <math>Y</math>, using the same number of bits <math>m</math> and the same <math>k</math> hash functions (with the uniform hash assumption). Let <math>t</math> be the expected number of bits where the two Bloom filters differ.
* Represent <math>t</math> as a function of <math>m,n,k,\ell</math>.
* How can this be used to find people with same taste in music. Is it more efficient than directly comparing playlists? Why?

Latest revision as of 01:40, 19 September 2011

Problem 0

你的姓名、学号。

Problem 1

(Interviewing problem of Google Inc.)

Give a streaming algorithm to maintain a uniform sample from a data stream. The meaning of this sentence is explained as follows:

Suppose that the input is a sequence of items [math]\displaystyle{ A[1], A[2], A[3], \ldots, A[n] }[/math], which is passed to your algorithm by one item at a time in the sequential order. (Equivalently, you can imagine that your algorithm scans over a large array [math]\displaystyle{ A }[/math] in one direction from left to right.)

You algorithm should return an [math]\displaystyle{ A[r] }[/math], where [math]\displaystyle{ r }[/math] is uniformly distributed over [math]\displaystyle{ \{1,2,\ldots, n\} }[/math].

Usually the input "data stream" is from a massive data set (e.g. search engine data), so you cannot afford storing the entire input. Make your algorithm use as small space as possible. We hope for a small space storing only constant number of items.

  • Develop an algorithm for the above problem. Give rigorous analysis for the algorithm to justify its correctness and efficiency.
  • Develop an algorithm which works even if [math]\displaystyle{ n }[/math] is not known in advance, and also give your analysis for the algorithm. (If your algorithm already satisfies this requirement, it's OK to have one algorithm answer both questions.)

Problem 2

(Interviewing problem of Theory Group in Microsoft Research Asia for student interns.)

We start with [math]\displaystyle{ n }[/math] people, each with 2 hands. None of these hands hold each other.

At each round, we uniformly pick 2 free hands and let them hold together.

  • After how many rounds, there are no free hands left?
  • What is the expected number of cycles made by people holding hands with each other (one person with left hand holding right hand is also counted as a cycle), when there are no free hands left?

Problem 3

We use the Bloom filters to store playlists of songs, and find people with same taste in music.

Suppose Li Lei has a set [math]\displaystyle{ X }[/math] and Han Meimei has a set [math]\displaystyle{ Y }[/math] (their playlists), both with [math]\displaystyle{ n }[/math] elements. Denote [math]\displaystyle{ \ell=|X\cap Y| }[/math].

We create Bloom filters for both [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math], using the same number of bits [math]\displaystyle{ m }[/math] and the same [math]\displaystyle{ k }[/math] hash functions (with the uniform hash assumption). Let [math]\displaystyle{ t }[/math] be the expected number of bits where the two Bloom filters differ.

  • Represent [math]\displaystyle{ t }[/math] as a function of [math]\displaystyle{ m,n,k,\ell }[/math].
  • How can this be used to find people with same taste in music. Is it more efficient than directly comparing playlists? Why?