# 随机算法 (Fall 2015)/Problem Set 3

## Problem 1

Use the Chernoff bounds instead of Chebyshev's inequality in the analysis of the LazySelect Algorithm and try to use as few random samples as possible.

## Problem 2

Let X be a real-valued random variable with finite $\mathbb{E}[X]$ and finite $\mathbb{E}\left[\mathrm{e}^{\lambda X}\right]$ for all $\lambda\ge 0$. We define the log-moment-generating function as

$\Psi_X(\lambda)=\ln\mathbb{E}[\mathrm{e}^{\lambda X}] \quad\text{ for all }\lambda\ge 0$,

and its dual function:

$\Psi_X^*(t)=\sup_{\lambda\ge 0}(\lambda t-\Psi_X(\lambda))$.

Assume that X is NOT almost surely constant. Then due to the convexity of eλX with respect to λ, the function ΨX(λ) is strictly convex over $\lambda\ge 0$.

1. Prove the following Chernoff bound:
$\Pr[X\ge t]\le\exp(-\Psi_X^*(t))$.
In particular if ΨX(λ) is continuously differentiable, prove that the supreme in $\Psi_X^*(t)$ is achieved at the unique $\lambda\ge 0$ satisfying $\Psi_X'(\lambda)=t\,$, where $\Psi_X'(\lambda)\,$ denotes the derivative of $\Psi_X(\lambda)\,$ with respect to λ.
2. Normal random variables. Let X∼N(μ,σ) be a Gaussian random variable with mean μ and standard deviation σ. What are the ΨX(λ) and $\Psi_X^*(t)$? And give a tail inequality to upper bound the probability $\Pr[X\ge t]$.
3. Poisson random variables. Let X∼Pois(ν) be a Poisson random variable with parameter ν, that is, Pr[X = k] = e − ννk / k! for all $k=0,1,2,\ldots$. What are the ΨX(λ) and $\Psi_X^*(t)$? And give a tail inequality to upper bound the probability $\Pr[X\ge t]$.
4. Bernoulli random variables. Let $X\in\{0,1\}$ be a single Bernoulli trial with probability of success p, that is, Pr[X = 1] = 1 − Pr[X = 0] = p. Show that for any $t\in(p,1)$, we have $\Psi_X^*(t)=D(Y \| X)$ where $Y\in\{0,1\}$ is a Bernoulli random variable with parameter t and $D(Y \| X)=(1-t)\ln\frac{1-t}{1-p}+t\ln\frac{t}{p}$ is the Kullback-Leibler divergence between Y and X.
5. Sum of independent random variables. Let $X=\sum_{i=1}^nX_i$ be the sum of n independently and identically distributed random variables $X_1,X_2,\ldots, X_n$. Show that $\Psi_X(\lambda)=\sum_{i=1}^n\Psi_{X_i}(\lambda)$ and $\Psi_X^*(t)=n\Psi^*_{X_i}(\frac{t}{n})$. Also for binomial random variable X∼Bin(n,p), give an upper bound to the tail inequality $\Pr[X\ge t]$ in terms of KL-divergence.
Give an upper bound to $\Pr[X\ge t]$ when every Xi follows the geometric distribution with a probability p of success.

## Problem 3

A boolean code is a mapping $C:\{0,1\}^k\rightarrow\{0,1\}^n$. Each $x\in\{0,1\}^k$ is called a message and y = C(x) is called a codeword. The code rate r of a code C is $r=\frac{k}{n}$. A boolean code $C:\{0,1\}^k\rightarrow\{0,1\}^n$ is a linear code if it is a linear transformation, i.e. there is a matrix $A\in\{0,1\}^{n\times k}$ such that C(x) = Ax for any $x\in\{0,1\}^k$, where the additions and multiplications are defined over the finite field of order two, $(\{0,1\},+_{\bmod 2},\times_{\bmod 2})$.

The distance between two codeword y1 and y2, denoted by d(y1,y2), is defined as the Hamming distance between them. Formally, $d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)|$. The distance of a code C is the minimum distance between any two codewords. Formally, $d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2))$.

Usually we want to make both the code rate r and the code distance d as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.

• Use the probabilistic method to prove that there exists a boolean code $C:\{0,1\}^k\rightarrow\{0,1\}^n$ of code rate r and distance $\left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n$. Try to optimize the constant in $\Theta(\cdot)$.
• Prove a similar result for linear boolean codes.

## Problem 4

Given a binary string, define a run as a maximal sequence of contiguous 1s; for example, the following string

$\underbrace{111}_{3}00\underbrace{11}_{2}00\underbrace{111111}_{6}0\underbrace{1}_{1}0\underbrace{11}_{2}$

contains 5 runs, of length 3, 2, 6, 1, and 2.

Let S be a binary string of length n, generated uniformly at random. Let Xk be the number of runs in S of length k or more.

• Compute the exact value of $\mathbb{E}[X_k]$ as a function of n and k.
• Give the best concentration bound you can obtain for $|X_k -\mathbb{E}[X_k]|$.

## Problem 5

(Due to J. Naor.)

The Chernoff bound is an exponentially decreasing bound on tail distributions. Let $X_1,\dots,X_n$ be independent random variables and $\mathbf{E}[X_i]=0$ for all $1\le i\le n$. Define $X=X_1+X_2+\dots+X_n$. We can use the following two kinds of tail inequalities for X.

• Chernoff Bounds:
$\Pr[|X|\ge\delta]\le\min_{t\ge 0}\frac{\mathbf{E}[e^{t|X|}]}{e^{t\delta}}$.

• kth-Moment Bound:
$\Pr[|X|\ge\delta]\le\frac{\mathbf{E}[|X|^k]}{\delta^k}$.
1. Show that for each δ, there exists a choice of k such that the kth-moment bound is strictly stronger than the Chernoff bound. (Hint: You may consider using the probabilistic method.)
2. Why would we still prefer the Chernoff bound to the seemingly stronger kth-moment bound?