随机算法 (Fall 2015)/Problem Set 3: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>Etone
imported>Etone
Line 26: Line 26:
#'''Bernoulli random variables.''' Let <math>X\in\{0,1\}</math> be a single Bernoulli trial with probability of success <math>p</math>, that is, <math>\Pr[X=1]=1-\Pr[X=0]=p</math>. Show that for any <math>t\in(p,1)</math>, we have <math>\Psi_X^*(t)=D(Y \| X)</math> where <math>Y\in\{0,1\}</math> is a Bernoulli random variable with parameter <math>t</math> and <math>D(Y \| X)=(1-t)\ln\frac{1-t}{1-p}+t\ln\frac{t}{p}</math> is the ''Kullback-Leibler divergence'' between <math>Y</math> and <math>X</math>.
#'''Bernoulli random variables.''' Let <math>X\in\{0,1\}</math> be a single Bernoulli trial with probability of success <math>p</math>, that is, <math>\Pr[X=1]=1-\Pr[X=0]=p</math>. Show that for any <math>t\in(p,1)</math>, we have <math>\Psi_X^*(t)=D(Y \| X)</math> where <math>Y\in\{0,1\}</math> is a Bernoulli random variable with parameter <math>t</math> and <math>D(Y \| X)=(1-t)\ln\frac{1-t}{1-p}+t\ln\frac{t}{p}</math> is the ''Kullback-Leibler divergence'' between <math>Y</math> and <math>X</math>.
#'''Sum of independent random variables.''' Let <math>X=\sum_{i=1}^nX_i</math> be the sum of <math>n</math> independently and identically distributed random variables <math>X_1,X_2,\ldots, X_n</math>. Show that <math>\Psi_X(\lambda)=\sum_{i=1}^n\Psi_{X_i}(\lambda)</math> and <math>\Psi_X^*(t)=n\Psi^*_{X_i}(\frac{t}{n})</math>. Also for binomial random variable <math>X\sim \mathrm{Bin}(n,p)</math>, give an upper bound to the tail inequality <math>\Pr[X\ge t]</math> in terms of KL-divergence.  
#'''Sum of independent random variables.''' Let <math>X=\sum_{i=1}^nX_i</math> be the sum of <math>n</math> independently and identically distributed random variables <math>X_1,X_2,\ldots, X_n</math>. Show that <math>\Psi_X(\lambda)=\sum_{i=1}^n\Psi_{X_i}(\lambda)</math> and <math>\Psi_X^*(t)=n\Psi^*_{X_i}(\frac{t}{n})</math>. Also for binomial random variable <math>X\sim \mathrm{Bin}(n,p)</math>, give an upper bound to the tail inequality <math>\Pr[X\ge t]</math> in terms of KL-divergence.  
#Give an upper bound to <math>\Pr[X\ge t]</math> when every <math>X_i</math> follows the geometric distribution with a probability <math>p</math> of success.
:Give an upper bound to <math>\Pr[X\ge t]</math> when every <math>X_i</math> follows the geometric distribution with a probability <math>p</math> of success.


== Problem 4 ==
== Problem 4 ==

Revision as of 05:44, 4 December 2015

Problem 1

Use the Chernoff bounds instead of Chebyshev's inequality in the analysis of the LazySelect Algorithm and try to use as few random samples as possible.

Problem 2

A boolean code is a mapping [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math]. Each [math]\displaystyle{ x\in\{0,1\}^k }[/math] is called a message and [math]\displaystyle{ y=C(x) }[/math] is called a codeword. The code rate [math]\displaystyle{ r }[/math] of a code [math]\displaystyle{ C }[/math] is [math]\displaystyle{ r=\frac{k}{n} }[/math]. A boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] is a linear code if it is a linear transformation, i.e. there is a matrix [math]\displaystyle{ A\in\{0,1\}^{n\times k} }[/math] such that [math]\displaystyle{ C(x)=Ax }[/math] for any [math]\displaystyle{ x\in\{0,1\}^k }[/math], where the additions and multiplications are defined over the finite field of order two, [math]\displaystyle{ (\{0,1\},+_{\bmod 2},\times_{\bmod 2}) }[/math].

The distance between two codeword [math]\displaystyle{ y_1 }[/math] and [math]\displaystyle{ y_2 }[/math], denoted by [math]\displaystyle{ d(y_1,y_2) }[/math], is defined as the Hamming distance between them. Formally, [math]\displaystyle{ d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)| }[/math]. The distance of a code [math]\displaystyle{ C }[/math] is the minimum distance between any two codewords. Formally, [math]\displaystyle{ d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2)) }[/math].

Usually we want to make both the code rate [math]\displaystyle{ r }[/math] and the code distance [math]\displaystyle{ d }[/math] as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.

  • Use the probabilistic method to prove that there exists a boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] of code rate [math]\displaystyle{ r }[/math] and distance [math]\displaystyle{ \left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n }[/math]. Try to optimize the constant in [math]\displaystyle{ \Theta(\cdot) }[/math].
  • Prove a similar result for linear boolean codes.

Problem 3

Let [math]\displaystyle{ X }[/math] be a real-valued random variable with finite [math]\displaystyle{ \mathbb{E}[X] }[/math] and finite [math]\displaystyle{ \mathbb{E}\left[\mathrm{e}^{\lambda X}\right] }[/math] for all [math]\displaystyle{ \lambda\ge 0 }[/math]. We define the log-moment-generating function as

[math]\displaystyle{ \Psi_X(\lambda)=\ln\mathbb{E}[\mathrm{e}^{\lambda X}] \quad\text{ for all }\lambda\ge 0 }[/math],

and its dual function:

[math]\displaystyle{ \Psi_X^*(t)=\sup_{\lambda\ge 0}(\lambda t-\Psi_X(\lambda)) }[/math].

Assume that [math]\displaystyle{ X }[/math] is NOT almost surely constant. Then due to the convexity of [math]\displaystyle{ \mathrm{e}^{\lambda X} }[/math] with respect to [math]\displaystyle{ \lambda }[/math], the function [math]\displaystyle{ \Psi_X(\lambda) }[/math] is strictly convex over [math]\displaystyle{ \lambda\ge 0 }[/math].

  1. Prove the following Chernoff bound:
    [math]\displaystyle{ \Pr[X\ge t]\le\exp(-\Psi_X^*(t)) }[/math].
    In particular if [math]\displaystyle{ \Psi_X(\lambda) }[/math] is continuously differentiable, prove that the supreme in [math]\displaystyle{ \Psi_X^*(t) }[/math] is achieved at the unique [math]\displaystyle{ \lambda\ge 0 }[/math] satisfying [math]\displaystyle{ \Psi_X'(\lambda)=t\, }[/math], where [math]\displaystyle{ \Psi_X'(\lambda)\, }[/math] denotes the derivative of [math]\displaystyle{ \Psi_X(\lambda)\, }[/math] with respect to [math]\displaystyle{ \lambda }[/math].
  2. Normal random variables. Let [math]\displaystyle{ X\sim \mathrm{N}(\mu,\sigma) }[/math] be a Gaussian random variable with mean [math]\displaystyle{ \mu }[/math] and standard deviation [math]\displaystyle{ \sigma }[/math]. What are the [math]\displaystyle{ \Psi_X(\lambda) }[/math] and [math]\displaystyle{ \Psi_X^*(t) }[/math]? And give a tail inequality to upper bound the probability [math]\displaystyle{ \Pr[X\ge t] }[/math].
  3. Poisson random variables. Let [math]\displaystyle{ X\sim \mathrm{Pois}(\nu) }[/math] be a Poisson random variable with parameter [math]\displaystyle{ \nu }[/math], that is, [math]\displaystyle{ \Pr[X=k]=\mathrm{e}^{-\nu}\nu^k/k! }[/math] for all [math]\displaystyle{ k=0,1,2,\ldots }[/math]. What are the [math]\displaystyle{ \Psi_X(\lambda) }[/math] and [math]\displaystyle{ \Psi_X^*(t) }[/math]? And give a tail inequality to upper bound the probability [math]\displaystyle{ \Pr[X\ge t] }[/math].
  4. Bernoulli random variables. Let [math]\displaystyle{ X\in\{0,1\} }[/math] be a single Bernoulli trial with probability of success [math]\displaystyle{ p }[/math], that is, [math]\displaystyle{ \Pr[X=1]=1-\Pr[X=0]=p }[/math]. Show that for any [math]\displaystyle{ t\in(p,1) }[/math], we have [math]\displaystyle{ \Psi_X^*(t)=D(Y \| X) }[/math] where [math]\displaystyle{ Y\in\{0,1\} }[/math] is a Bernoulli random variable with parameter [math]\displaystyle{ t }[/math] and [math]\displaystyle{ D(Y \| X)=(1-t)\ln\frac{1-t}{1-p}+t\ln\frac{t}{p} }[/math] is the Kullback-Leibler divergence between [math]\displaystyle{ Y }[/math] and [math]\displaystyle{ X }[/math].
  5. Sum of independent random variables. Let [math]\displaystyle{ X=\sum_{i=1}^nX_i }[/math] be the sum of [math]\displaystyle{ n }[/math] independently and identically distributed random variables [math]\displaystyle{ X_1,X_2,\ldots, X_n }[/math]. Show that [math]\displaystyle{ \Psi_X(\lambda)=\sum_{i=1}^n\Psi_{X_i}(\lambda) }[/math] and [math]\displaystyle{ \Psi_X^*(t)=n\Psi^*_{X_i}(\frac{t}{n}) }[/math]. Also for binomial random variable [math]\displaystyle{ X\sim \mathrm{Bin}(n,p) }[/math], give an upper bound to the tail inequality [math]\displaystyle{ \Pr[X\ge t] }[/math] in terms of KL-divergence.
Give an upper bound to [math]\displaystyle{ \Pr[X\ge t] }[/math] when every [math]\displaystyle{ X_i }[/math] follows the geometric distribution with a probability [math]\displaystyle{ p }[/math] of success.

Problem 4

Given a binary string, define a run as a maximal sequence of contiguous 1s; for example, the following string

[math]\displaystyle{ \underbrace{111}_{3}00\underbrace{11}_{2}00\underbrace{111111}_{6}0\underbrace{1}_{1}0\underbrace{11}_{2} }[/math]

contains 5 runs, of length 3, 2, 6, 1, and 2.

Let [math]\displaystyle{ S }[/math] be a binary string of length [math]\displaystyle{ n }[/math], generated uniformly at random. Let [math]\displaystyle{ X_k }[/math] be the number of runs in [math]\displaystyle{ S }[/math] of length [math]\displaystyle{ k }[/math] or more.

  • Compute the exact value of [math]\displaystyle{ \mathbb{E}[X_k] }[/math] as a function of [math]\displaystyle{ n }[/math] and [math]\displaystyle{ k }[/math].
  • Give the best concentration bound you can for [math]\displaystyle{ |X_k -\mathbb{E}[X_k]| }[/math].


Problem 5

(Due to J. Naor.)

The Chernoff bound is an exponentially decreasing bound on tail distributions. Let [math]\displaystyle{ X_1,\dots,X_n }[/math] be independent random variables and [math]\displaystyle{ \mathbf{E}[X_i]=0 }[/math] for all [math]\displaystyle{ 1\le i\le n }[/math]. Define [math]\displaystyle{ X=X_1+X_2+\dots+X_n }[/math]. We can use the following two kinds of tail inequalities for [math]\displaystyle{ X }[/math].

  • Chernoff Bounds:
[math]\displaystyle{ \Pr[|X|\ge\delta]\le\min_{t\ge 0}\frac{\mathbf{E}[e^{t|X|}]}{e^{t\delta}} }[/math].


  • [math]\displaystyle{ k }[/math]th-Moment Bound:
[math]\displaystyle{ \Pr[|X|\ge\delta]\le\frac{\mathbf{E}[|X|^k]}{\delta^k} }[/math].
  1. Show that for each [math]\displaystyle{ \delta }[/math], there exists a choice of [math]\displaystyle{ k }[/math] such that the [math]\displaystyle{ k }[/math]th-moment bound is stronger than the Chernoff bound. (Hint: You may use the probabilistic method.)
  2. Why would we still prefer the Chernoff bound to the seemingly stronger [math]\displaystyle{ k }[/math]th-moment bound?