# 随机算法 (Fall 2015)/Problem Set 3

## Problem 1

Use the Chernoff bounds instead of Chebyshev's inequality in the analysis of the LazySelect Algorithm and try to use as few random samples as possible.

## Problem 2

Let ${\displaystyle X}$ be a real-valued random variable with finite ${\displaystyle \mathbb {E} [X]}$ and finite ${\displaystyle \mathbb {E} \left[\mathrm {e} ^{\lambda X}\right]}$ for all ${\displaystyle \lambda \geq 0}$. We define the log-moment-generating function as

${\displaystyle \Psi _{X}(\lambda )=\ln \mathbb {E} [\mathrm {e} ^{\lambda X}]\quad {\text{ for all }}\lambda \geq 0}$,

and its dual function:

${\displaystyle \Psi _{X}^{*}(t)=\sup _{\lambda \geq 0}(\lambda t-\Psi _{X}(\lambda ))}$.

Assume that ${\displaystyle X}$ is NOT almost surely constant. Then due to the convexity of ${\displaystyle \mathrm {e} ^{\lambda X}}$ with respect to ${\displaystyle \lambda }$, the function ${\displaystyle \Psi _{X}(\lambda )}$ is strictly convex over ${\displaystyle \lambda \geq 0}$.

1. Prove the following Chernoff bound:
${\displaystyle \Pr[X\geq t]\leq \exp(-\Psi _{X}^{*}(t))}$.
In particular if ${\displaystyle \Psi _{X}(\lambda )}$ is continuously differentiable, prove that the supreme in ${\displaystyle \Psi _{X}^{*}(t)}$ is achieved at the unique ${\displaystyle \lambda \geq 0}$ satisfying ${\displaystyle \Psi _{X}'(\lambda )=t\,}$, where ${\displaystyle \Psi _{X}'(\lambda )\,}$ denotes the derivative of ${\displaystyle \Psi _{X}(\lambda )\,}$ with respect to ${\displaystyle \lambda }$.
2. Normal random variables. Let ${\displaystyle X\sim \mathrm {N} (\mu ,\sigma )}$ be a Gaussian random variable with mean ${\displaystyle \mu }$ and standard deviation ${\displaystyle \sigma }$. What are the ${\displaystyle \Psi _{X}(\lambda )}$ and ${\displaystyle \Psi _{X}^{*}(t)}$? And give a tail inequality to upper bound the probability ${\displaystyle \Pr[X\geq t]}$.
3. Poisson random variables. Let ${\displaystyle X\sim \mathrm {Pois} (\nu )}$ be a Poisson random variable with parameter ${\displaystyle \nu }$, that is, ${\displaystyle \Pr[X=k]=\mathrm {e} ^{-\nu }\nu ^{k}/k!}$ for all ${\displaystyle k=0,1,2,\ldots }$. What are the ${\displaystyle \Psi _{X}(\lambda )}$ and ${\displaystyle \Psi _{X}^{*}(t)}$? And give a tail inequality to upper bound the probability ${\displaystyle \Pr[X\geq t]}$.
4. Bernoulli random variables. Let ${\displaystyle X\in \{0,1\}}$ be a single Bernoulli trial with probability of success ${\displaystyle p}$, that is, ${\displaystyle \Pr[X=1]=1-\Pr[X=0]=p}$. Show that for any ${\displaystyle t\in (p,1)}$, we have ${\displaystyle \Psi _{X}^{*}(t)=D(Y\|X)}$ where ${\displaystyle Y\in \{0,1\}}$ is a Bernoulli random variable with parameter ${\displaystyle t}$ and ${\displaystyle D(Y\|X)=(1-t)\ln {\frac {1-t}{1-p}}+t\ln {\frac {t}{p}}}$ is the Kullback-Leibler divergence between ${\displaystyle Y}$ and ${\displaystyle X}$.
5. Sum of independent random variables. Let ${\displaystyle X=\sum _{i=1}^{n}X_{i}}$ be the sum of ${\displaystyle n}$ independently and identically distributed random variables ${\displaystyle X_{1},X_{2},\ldots ,X_{n}}$. Show that ${\displaystyle \Psi _{X}(\lambda )=\sum _{i=1}^{n}\Psi _{X_{i}}(\lambda )}$ and ${\displaystyle \Psi _{X}^{*}(t)=n\Psi _{X_{i}}^{*}({\frac {t}{n}})}$. Also for binomial random variable ${\displaystyle X\sim \mathrm {Bin} (n,p)}$, give an upper bound to the tail inequality ${\displaystyle \Pr[X\geq t]}$ in terms of KL-divergence.
Give an upper bound to ${\displaystyle \Pr[X\geq t]}$ when every ${\displaystyle X_{i}}$ follows the geometric distribution with a probability ${\displaystyle p}$ of success.

## Problem 3

A boolean code is a mapping ${\displaystyle C:\{0,1\}^{k}\rightarrow \{0,1\}^{n}}$. Each ${\displaystyle x\in \{0,1\}^{k}}$ is called a message and ${\displaystyle y=C(x)}$ is called a codeword. The code rate ${\displaystyle r}$ of a code ${\displaystyle C}$ is ${\displaystyle r={\frac {k}{n}}}$. A boolean code ${\displaystyle C:\{0,1\}^{k}\rightarrow \{0,1\}^{n}}$ is a linear code if it is a linear transformation, i.e. there is a matrix ${\displaystyle A\in \{0,1\}^{n\times k}}$ such that ${\displaystyle C(x)=Ax}$ for any ${\displaystyle x\in \{0,1\}^{k}}$, where the additions and multiplications are defined over the finite field of order two, ${\displaystyle (\{0,1\},+_{\bmod {2}},\times _{\bmod {2}})}$.

The distance between two codeword ${\displaystyle y_{1}}$ and ${\displaystyle y_{2}}$, denoted by ${\displaystyle d(y_{1},y_{2})}$, is defined as the Hamming distance between them. Formally, ${\displaystyle d(y_{1},y_{2})=\|y_{1}-y_{2}\|_{1}=\sum _{i=1}^{n}|y_{1}(i)-y_{2}(i)|}$. The distance of a code ${\displaystyle C}$ is the minimum distance between any two codewords. Formally, ${\displaystyle d=\min _{x_{1},x_{2}\in \{0,1\}^{k} \atop x_{1}\neq x_{2}}d(C(x_{1}),C(x_{2}))}$.

Usually we want to make both the code rate ${\displaystyle r}$ and the code distance ${\displaystyle d}$ as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.

• Use the probabilistic method to prove that there exists a boolean code ${\displaystyle C:\{0,1\}^{k}\rightarrow \{0,1\}^{n}}$ of code rate ${\displaystyle r}$ and distance ${\displaystyle \left({\frac {1}{2}}-\Theta \left({\sqrt {r}}\right)\right)n}$. Try to optimize the constant in ${\displaystyle \Theta (\cdot )}$.
• Prove a similar result for linear boolean codes.

## Problem 4

Given a binary string, define a run as a maximal sequence of contiguous 1s; for example, the following string

${\displaystyle \underbrace {111} _{3}00\underbrace {11} _{2}00\underbrace {111111} _{6}0\underbrace {1} _{1}0\underbrace {11} _{2}}$

contains 5 runs, of length 3, 2, 6, 1, and 2.

Let ${\displaystyle S}$ be a binary string of length ${\displaystyle n}$, generated uniformly at random. Let ${\displaystyle X_{k}}$ be the number of runs in ${\displaystyle S}$ of length ${\displaystyle k}$ or more.

• Compute the exact value of ${\displaystyle \mathbb {E} [X_{k}]}$ as a function of ${\displaystyle n}$ and ${\displaystyle k}$.
• Give the best concentration bound you can obtain for ${\displaystyle |X_{k}-\mathbb {E} [X_{k}]|}$.

## Problem 5

(Due to J. Naor.)

The Chernoff bound is an exponentially decreasing bound on tail distributions. Let ${\displaystyle X_{1},\dots ,X_{n}}$ be independent random variables and ${\displaystyle \mathbf {E} [X_{i}]=0}$ for all ${\displaystyle 1\leq i\leq n}$. Define ${\displaystyle X=X_{1}+X_{2}+\dots +X_{n}}$. We can use the following two kinds of tail inequalities for ${\displaystyle X}$.

• Chernoff Bounds:
${\displaystyle \Pr[|X|\geq \delta ]\leq \min _{t\geq 0}{\frac {\mathbf {E} [e^{t|X|}]}{e^{t\delta }}}}$.

• ${\displaystyle k}$th-Moment Bound:
${\displaystyle \Pr[|X|\geq \delta ]\leq {\frac {\mathbf {E} [|X|^{k}]}{\delta ^{k}}}}$.
1. Show that for each ${\displaystyle \delta }$, there exists a choice of ${\displaystyle k}$ such that the ${\displaystyle k}$th-moment bound is strictly stronger than the Chernoff bound. (Hint: You may consider using the probabilistic method.)
2. Why would we still prefer the Chernoff bound to the seemingly stronger ${\displaystyle k}$th-moment bound?