随机算法 (Fall 2015)/Problem Set 3

From EtoneWiki
Jump to: navigation, search

Contents

Problem 1

Use the Chernoff bounds instead of Chebyshev's inequality in the analysis of the LazySelect Algorithm and try to use as few random samples as possible.

Problem 2

Let X be a real-valued random variable with finite \mathbb{E}[X] and finite \mathbb{E}\left[\mathrm{e}^{\lambda X}\right] for all \lambda\ge 0. We define the log-moment-generating function as

\Psi_X(\lambda)=\ln\mathbb{E}[\mathrm{e}^{\lambda X}] \quad\text{ for all }\lambda\ge 0,

and its dual function:

\Psi_X^*(t)=\sup_{\lambda\ge 0}(\lambda t-\Psi_X(\lambda)).

Assume that X is NOT almost surely constant. Then due to the convexity of eλX with respect to λ, the function ΨX(λ) is strictly convex over \lambda\ge 0.

  1. Prove the following Chernoff bound:
    \Pr[X\ge t]\le\exp(-\Psi_X^*(t)).
    In particular if ΨX(λ) is continuously differentiable, prove that the supreme in \Psi_X^*(t) is achieved at the unique \lambda\ge 0 satisfying \Psi_X'(\lambda)=t\,, where \Psi_X'(\lambda)\, denotes the derivative of \Psi_X(\lambda)\, with respect to λ.
  2. Normal random variables. Let X∼N(μ,σ) be a Gaussian random variable with mean μ and standard deviation σ. What are the ΨX(λ) and \Psi_X^*(t)? And give a tail inequality to upper bound the probability \Pr[X\ge t].
  3. Poisson random variables. Let X∼Pois(ν) be a Poisson random variable with parameter ν, that is, Pr[X = k] = e − ννk / k! for all k=0,1,2,\ldots. What are the ΨX(λ) and \Psi_X^*(t)? And give a tail inequality to upper bound the probability \Pr[X\ge t].
  4. Bernoulli random variables. Let X\in\{0,1\} be a single Bernoulli trial with probability of success p, that is, Pr[X = 1] = 1 − Pr[X = 0] = p. Show that for any t\in(p,1), we have \Psi_X^*(t)=D(Y \| X) where Y\in\{0,1\} is a Bernoulli random variable with parameter t and D(Y \| X)=(1-t)\ln\frac{1-t}{1-p}+t\ln\frac{t}{p} is the Kullback-Leibler divergence between Y and X.
  5. Sum of independent random variables. Let X=\sum_{i=1}^nX_i be the sum of n independently and identically distributed random variables X_1,X_2,\ldots, X_n. Show that \Psi_X(\lambda)=\sum_{i=1}^n\Psi_{X_i}(\lambda) and \Psi_X^*(t)=n\Psi^*_{X_i}(\frac{t}{n}). Also for binomial random variable X∼Bin(n,p), give an upper bound to the tail inequality \Pr[X\ge t] in terms of KL-divergence.
Give an upper bound to \Pr[X\ge t] when every Xi follows the geometric distribution with a probability p of success.


Problem 3

A boolean code is a mapping C:\{0,1\}^k\rightarrow\{0,1\}^n. Each x\in\{0,1\}^k is called a message and y = C(x) is called a codeword. The code rate r of a code C is r=\frac{k}{n}. A boolean code C:\{0,1\}^k\rightarrow\{0,1\}^n is a linear code if it is a linear transformation, i.e. there is a matrix A\in\{0,1\}^{n\times k} such that C(x) = Ax for any x\in\{0,1\}^k, where the additions and multiplications are defined over the finite field of order two, (\{0,1\},+_{\bmod 2},\times_{\bmod 2}).

The distance between two codeword y1 and y2, denoted by d(y1,y2), is defined as the Hamming distance between them. Formally, d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)|. The distance of a code C is the minimum distance between any two codewords. Formally, d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2)).

Usually we want to make both the code rate r and the code distance d as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.

  • Use the probabilistic method to prove that there exists a boolean code C:\{0,1\}^k\rightarrow\{0,1\}^n of code rate r and distance \left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n. Try to optimize the constant in \Theta(\cdot).
  • Prove a similar result for linear boolean codes.


Problem 4

Given a binary string, define a run as a maximal sequence of contiguous 1s; for example, the following string

\underbrace{111}_{3}00\underbrace{11}_{2}00\underbrace{111111}_{6}0\underbrace{1}_{1}0\underbrace{11}_{2}

contains 5 runs, of length 3, 2, 6, 1, and 2.

Let S be a binary string of length n, generated uniformly at random. Let Xk be the number of runs in S of length k or more.

  • Compute the exact value of \mathbb{E}[X_k] as a function of n and k.
  • Give the best concentration bound you can obtain for |X_k -\mathbb{E}[X_k]|.

Problem 5

(Due to J. Naor.)

The Chernoff bound is an exponentially decreasing bound on tail distributions. Let X_1,\dots,X_n be independent random variables and \mathbf{E}[X_i]=0 for all 1\le i\le n. Define X=X_1+X_2+\dots+X_n. We can use the following two kinds of tail inequalities for X.

  • Chernoff Bounds:
\Pr[|X|\ge\delta]\le\min_{t\ge 0}\frac{\mathbf{E}[e^{t|X|}]}{e^{t\delta}}.


  • kth-Moment Bound:
\Pr[|X|\ge\delta]\le\frac{\mathbf{E}[|X|^k]}{\delta^k}.
  1. Show that for each δ, there exists a choice of k such that the kth-moment bound is strictly stronger than the Chernoff bound. (Hint: You may consider using the probabilistic method.)
  2. Why would we still prefer the Chernoff bound to the seemingly stronger kth-moment bound?
Personal tools