随机算法 (Spring 2014)/Problem Set 2: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>Etone
No edit summary
imported>Etone
No edit summary
(3 intermediate revisions by the same user not shown)
Line 6: Line 6:


== Problem 2 ==
== Problem 2 ==
(Lower bound for balls and bins.)
Suppose we throw <math>n</math> balls uniformly and independently into <math>n</math> bins. Show that with high probability the maximum load is <math>\Omega\left(\frac{\ln n}{\ln\ln n}\right)</math>. (Hint: Use the second moment method.)
==  Problem 3==
# Generalize the LazySelect algorithm for the <math>k</math>-selection problem: Given as input an array of <math>n</math> distinct numbers and an integer <math>k</math>, find the <math>k</math>th smallest number in the array.
# Use the Chernoff bounds instead of Chebyshev's inequality in the analysis of the LazySelect Algorithm and try to use as few random samples as possible.
==Problem 4==
A '''boolean code''' is a mapping <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math>. Each <math>x\in\{0,1\}^k</math> is called a '''message''' and <math>y=C(x)</math> is called a '''codeword'''. The '''code rate''' <math>r</math> of a code <math>C</math> is <math>r=\frac{k}{n}</math>. A boolean code <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math> is a '''linear code''' if it is a linear transformation, i.e. there is a matrix <math>A\in\{0,1\}^{n\times k}</math> such that <math>C(x)=Ax</math> for any <math>x\in\{0,1\}^k</math>, where the additions and multiplications are defined over the finite field of order two, <math>(\{0,1\},+_{\bmod 2},\times_{\bmod 2})</math>.
The '''distance''' between two codeword <math>y_1</math> and <math>y_2</math>, denoted by <math>d(y_1,y_2)</math>, is defined as the Hamming distance between them. Formally, <math>d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)|</math>. The distance of a code <math>C</math> is the minimum distance between any two codewords. Formally, <math>d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2))</math>.
Usually we want to make both the code rate <math>r</math> and the code distance <math>d</math> as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.
* Use the probabilistic method to prove that there exists a boolean code <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math> of code rate <math>r</math> and distance <math>\left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n</math>. Try to optimize the constant in <math>\Theta(\cdot)</math>.
* Prove a similar result for linear boolean codes.
== Problem 5 ==


Let <math>X_1,X_2,\dots,X_n</math> be independent geometrically distributed random variables each having expectation 2 (each of the <math>X_i</math> is an independent experiment counting the number of tosses of an unbiased coin up to and including the first HEADS). Let <math>X=\sum_{i=1}^nX_i</math> and <math>\delta</math> be a positive real constant. Derive the best upper bound you can on <math>\Pr[X>(1+\delta)(2n)]</math>.
Let <math>X_1,X_2,\dots,X_n</math> be independent geometrically distributed random variables each having expectation 2 (each of the <math>X_i</math> is an independent experiment counting the number of tosses of an unbiased coin up to and including the first HEADS). Let <math>X=\sum_{i=1}^nX_i</math> and <math>\delta</math> be a positive real constant. Derive the best upper bound you can on <math>\Pr[X>(1+\delta)(2n)]</math>.

Revision as of 08:51, 24 March 2014

Problem 1

(Due to D.R. Karger and R. Motwani.)

  1. Let [math]\displaystyle{ S,T }[/math] be two disjoint subsets of a universe [math]\displaystyle{ U }[/math] such that [math]\displaystyle{ |S|=|T|=n }[/math]. Suppose we select a random set [math]\displaystyle{ R\subseteq U }[/math] by independently sampling each element of [math]\displaystyle{ U }[/math] with probability [math]\displaystyle{ p }[/math]. We say that the random sample [math]\displaystyle{ R }[/math] is good if the following two conditions hold: [math]\displaystyle{ R\cap S=\emptyset }[/math] and [math]\displaystyle{ R\cap T\ne\emptyset }[/math]. Show that for [math]\displaystyle{ p=1/n }[/math], the probability that [math]\displaystyle{ R }[/math] is good is larger than some positive constant.
  2. Suppose now that the random set [math]\displaystyle{ R }[/math] is chosen by sampling the elements of [math]\displaystyle{ U }[/math] with only pairwise independence. Show that for a suitable choice of the value of [math]\displaystyle{ p }[/math], the probability that [math]\displaystyle{ R }[/math] is good is larger than some positive constant.

Problem 2

(Lower bound for balls and bins.)

Suppose we throw [math]\displaystyle{ n }[/math] balls uniformly and independently into [math]\displaystyle{ n }[/math] bins. Show that with high probability the maximum load is [math]\displaystyle{ \Omega\left(\frac{\ln n}{\ln\ln n}\right) }[/math]. (Hint: Use the second moment method.)

Problem 3

  1. Generalize the LazySelect algorithm for the [math]\displaystyle{ k }[/math]-selection problem: Given as input an array of [math]\displaystyle{ n }[/math] distinct numbers and an integer [math]\displaystyle{ k }[/math], find the [math]\displaystyle{ k }[/math]th smallest number in the array.
  2. Use the Chernoff bounds instead of Chebyshev's inequality in the analysis of the LazySelect Algorithm and try to use as few random samples as possible.

Problem 4

A boolean code is a mapping [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math]. Each [math]\displaystyle{ x\in\{0,1\}^k }[/math] is called a message and [math]\displaystyle{ y=C(x) }[/math] is called a codeword. The code rate [math]\displaystyle{ r }[/math] of a code [math]\displaystyle{ C }[/math] is [math]\displaystyle{ r=\frac{k}{n} }[/math]. A boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] is a linear code if it is a linear transformation, i.e. there is a matrix [math]\displaystyle{ A\in\{0,1\}^{n\times k} }[/math] such that [math]\displaystyle{ C(x)=Ax }[/math] for any [math]\displaystyle{ x\in\{0,1\}^k }[/math], where the additions and multiplications are defined over the finite field of order two, [math]\displaystyle{ (\{0,1\},+_{\bmod 2},\times_{\bmod 2}) }[/math].

The distance between two codeword [math]\displaystyle{ y_1 }[/math] and [math]\displaystyle{ y_2 }[/math], denoted by [math]\displaystyle{ d(y_1,y_2) }[/math], is defined as the Hamming distance between them. Formally, [math]\displaystyle{ d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)| }[/math]. The distance of a code [math]\displaystyle{ C }[/math] is the minimum distance between any two codewords. Formally, [math]\displaystyle{ d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2)) }[/math].

Usually we want to make both the code rate [math]\displaystyle{ r }[/math] and the code distance [math]\displaystyle{ d }[/math] as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.

  • Use the probabilistic method to prove that there exists a boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] of code rate [math]\displaystyle{ r }[/math] and distance [math]\displaystyle{ \left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n }[/math]. Try to optimize the constant in [math]\displaystyle{ \Theta(\cdot) }[/math].
  • Prove a similar result for linear boolean codes.

Problem 5

Let [math]\displaystyle{ X_1,X_2,\dots,X_n }[/math] be independent geometrically distributed random variables each having expectation 2 (each of the [math]\displaystyle{ X_i }[/math] is an independent experiment counting the number of tosses of an unbiased coin up to and including the first HEADS). Let [math]\displaystyle{ X=\sum_{i=1}^nX_i }[/math] and [math]\displaystyle{ \delta }[/math] be a positive real constant. Derive the best upper bound you can on [math]\displaystyle{ \Pr[X\gt (1+\delta)(2n)] }[/math].