随机算法 (Fall 2015)/Problem Set 3

Problem 1

Use the Chernoff bounds instead of Chebyshev's inequality in the analysis of the LazySelect Algorithm and try to use as few random samples as possible.

Problem 2

Let $\displaystyle X$ be a real-valued random variable with finite $\displaystyle \mathbb{E}[X]$ and finite $\displaystyle \mathbb{E}\left[\mathrm{e}^{\lambda X}\right]$ for all $\displaystyle \lambda\ge 0$ . We define the log-moment-generating function as

$\displaystyle \Psi_X(\lambda)=\ln\mathbb{E}[\mathrm{e}^{\lambda X}] \quad\text{ for all }\lambda\ge 0$ ,

and its dual function:

$\displaystyle \Psi_X^*(t)=\sup_{\lambda\ge 0}(\lambda t-\Psi_X(\lambda))$ .

Assume that $\displaystyle X$ is NOT almost surely constant. Then due to the convexity of $\displaystyle \mathrm{e}^{\lambda X}$ with respect to $\displaystyle \lambda$ , the function $\displaystyle \Psi_X(\lambda)$ is strictly convex over $\displaystyle \lambda\ge 0$ .

1. Prove the following Chernoff bound:
$\displaystyle \Pr[X\ge t]\le\exp(-\Psi_X^*(t))$ .
In particular if $\displaystyle \Psi_X(\lambda)$ is continuously differentiable, prove that the supreme in $\displaystyle \Psi_X^*(t)$ is achieved at the unique $\displaystyle \lambda\ge 0$ satisfying $\displaystyle \Psi_X'(\lambda)=t\,$ , where $\displaystyle \Psi_X'(\lambda)\,$ denotes the derivative of $\displaystyle \Psi_X(\lambda)\,$ with respect to $\displaystyle \lambda$ .
2. Normal random variables. Let $\displaystyle X\sim \mathrm{N}(\mu,\sigma)$ be a Gaussian random variable with mean $\displaystyle \mu$ and standard deviation $\displaystyle \sigma$ . What are the $\displaystyle \Psi_X(\lambda)$ and $\displaystyle \Psi_X^*(t)$ ? And give a tail inequality to upper bound the probability $\displaystyle \Pr[X\ge t]$ .
3. Poisson random variables. Let $\displaystyle X\sim \mathrm{Pois}(\nu)$ be a Poisson random variable with parameter $\displaystyle \nu$ , that is, $\displaystyle \Pr[X=k]=\mathrm{e}^{-\nu}\nu^k/k!$ for all $\displaystyle k=0,1,2,\ldots$ . What are the $\displaystyle \Psi_X(\lambda)$ and $\displaystyle \Psi_X^*(t)$ ? And give a tail inequality to upper bound the probability $\displaystyle \Pr[X\ge t]$ .
4. Bernoulli random variables. Let $\displaystyle X\in\{0,1\}$ be a single Bernoulli trial with probability of success $\displaystyle p$ , that is, $\displaystyle \Pr[X=1]=1-\Pr[X=0]=p$ . Show that for any $\displaystyle t\in(p,1)$ , we have $\displaystyle \Psi_X^*(t)=D(Y \| X)$ where $\displaystyle Y\in\{0,1\}$ is a Bernoulli random variable with parameter $\displaystyle t$ and $\displaystyle D(Y \| X)=(1-t)\ln\frac{1-t}{1-p}+t\ln\frac{t}{p}$ is the Kullback-Leibler divergence between $\displaystyle Y$ and $\displaystyle X$ .
5. Sum of independent random variables. Let $\displaystyle X=\sum_{i=1}^nX_i$ be the sum of $\displaystyle n$ independently and identically distributed random variables $\displaystyle X_1,X_2,\ldots, X_n$ . Show that $\displaystyle \Psi_X(\lambda)=\sum_{i=1}^n\Psi_{X_i}(\lambda)$ and $\displaystyle \Psi_X^*(t)=n\Psi^*_{X_i}(\frac{t}{n})$ . Also for binomial random variable $\displaystyle X\sim \mathrm{Bin}(n,p)$ , give an upper bound to the tail inequality $\displaystyle \Pr[X\ge t]$ in terms of KL-divergence.
Give an upper bound to $\displaystyle \Pr[X\ge t]$ when every $\displaystyle X_i$ follows the geometric distribution with a probability $\displaystyle p$ of success.

Problem 3

A boolean code is a mapping $\displaystyle C:\{0,1\}^k\rightarrow\{0,1\}^n$ . Each $\displaystyle x\in\{0,1\}^k$ is called a message and $\displaystyle y=C(x)$ is called a codeword. The code rate $\displaystyle r$ of a code $\displaystyle C$ is $\displaystyle r=\frac{k}{n}$ . A boolean code $\displaystyle C:\{0,1\}^k\rightarrow\{0,1\}^n$ is a linear code if it is a linear transformation, i.e. there is a matrix $\displaystyle A\in\{0,1\}^{n\times k}$ such that $\displaystyle C(x)=Ax$ for any $\displaystyle x\in\{0,1\}^k$ , where the additions and multiplications are defined over the finite field of order two, $\displaystyle (\{0,1\},+_{\bmod 2},\times_{\bmod 2})$ .

The distance between two codeword $\displaystyle y_1$ and $\displaystyle y_2$ , denoted by $\displaystyle d(y_1,y_2)$ , is defined as the Hamming distance between them. Formally, $\displaystyle d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)|$ . The distance of a code $\displaystyle C$ is the minimum distance between any two codewords. Formally, $\displaystyle d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2))$ .

Usually we want to make both the code rate $\displaystyle r$ and the code distance $\displaystyle d$ as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.

• Use the probabilistic method to prove that there exists a boolean code $\displaystyle C:\{0,1\}^k\rightarrow\{0,1\}^n$ of code rate $\displaystyle r$ and distance $\displaystyle \left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n$ . Try to optimize the constant in $\displaystyle \Theta(\cdot)$ .
• Prove a similar result for linear boolean codes.

Problem 4

Given a binary string, define a run as a maximal sequence of contiguous 1s; for example, the following string

$\displaystyle \underbrace{111}_{3}00\underbrace{11}_{2}00\underbrace{111111}_{6}0\underbrace{1}_{1}0\underbrace{11}_{2}$

contains 5 runs, of length 3, 2, 6, 1, and 2.

Let $\displaystyle S$ be a binary string of length $\displaystyle n$ , generated uniformly at random. Let $\displaystyle X_k$ be the number of runs in $\displaystyle S$ of length $\displaystyle k$ or more.

• Compute the exact value of $\displaystyle \mathbb{E}[X_k]$ as a function of $\displaystyle n$ and $\displaystyle k$ .
• Give the best concentration bound you can obtain for $\displaystyle |X_k -\mathbb{E}[X_k]|$ .

Problem 5

(Due to J. Naor.)

The Chernoff bound is an exponentially decreasing bound on tail distributions. Let $\displaystyle X_1,\dots,X_n$ be independent random variables and $\displaystyle \mathbf{E}[X_i]=0$ for all $\displaystyle 1\le i\le n$ . Define $\displaystyle X=X_1+X_2+\dots+X_n$ . We can use the following two kinds of tail inequalities for $\displaystyle X$ .

• Chernoff Bounds:
$\displaystyle \Pr[|X|\ge\delta]\le\min_{t\ge 0}\frac{\mathbf{E}[e^{t|X|}]}{e^{t\delta}}$ .

• $\displaystyle k$ th-Moment Bound:
$\displaystyle \Pr[|X|\ge\delta]\le\frac{\mathbf{E}[|X|^k]}{\delta^k}$ .
1. Show that for each $\displaystyle \delta$ , there exists a choice of $\displaystyle k$ such that the $\displaystyle k$ th-moment bound is strictly stronger than the Chernoff bound. (Hint: You may consider using the probabilistic method.)
2. Why would we still prefer the Chernoff bound to the seemingly stronger $\displaystyle k$ th-moment bound?