随机算法 (Fall 2015)/Problem Set 3

From EtoneWiki
Jump to: navigation, search

Problem 1

Use the Chernoff bounds instead of Chebyshev's inequality in the analysis of the LazySelect Algorithm and try to use as few random samples as possible.

Problem 2

Let be a real-valued random variable with finite and finite for all . We define the log-moment-generating function as

,

and its dual function:

.

Assume that is NOT almost surely constant. Then due to the convexity of with respect to , the function is strictly convex over .

  1. Prove the following Chernoff bound:
    .
    In particular if is continuously differentiable, prove that the supreme in is achieved at the unique satisfying , where denotes the derivative of with respect to .
  2. Normal random variables. Let be a Gaussian random variable with mean and standard deviation . What are the and ? And give a tail inequality to upper bound the probability .
  3. Poisson random variables. Let be a Poisson random variable with parameter , that is, for all . What are the and ? And give a tail inequality to upper bound the probability .
  4. Bernoulli random variables. Let be a single Bernoulli trial with probability of success , that is, . Show that for any , we have where is a Bernoulli random variable with parameter and is the Kullback-Leibler divergence between and .
  5. Sum of independent random variables. Let be the sum of independently and identically distributed random variables . Show that and . Also for binomial random variable , give an upper bound to the tail inequality in terms of KL-divergence.
Give an upper bound to when every follows the geometric distribution with a probability of success.


Problem 3

A boolean code is a mapping . Each is called a message and is called a codeword. The code rate of a code is . A boolean code is a linear code if it is a linear transformation, i.e. there is a matrix such that for any , where the additions and multiplications are defined over the finite field of order two, .

The distance between two codeword and , denoted by , is defined as the Hamming distance between them. Formally, . The distance of a code is the minimum distance between any two codewords. Formally, .

Usually we want to make both the code rate and the code distance as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.

  • Use the probabilistic method to prove that there exists a boolean code of code rate and distance . Try to optimize the constant in .
  • Prove a similar result for linear boolean codes.


Problem 4

Given a binary string, define a run as a maximal sequence of contiguous 1s; for example, the following string

contains 5 runs, of length 3, 2, 6, 1, and 2.

Let be a binary string of length , generated uniformly at random. Let be the number of runs in of length or more.

  • Compute the exact value of as a function of and .
  • Give the best concentration bound you can obtain for .

Problem 5

(Due to J. Naor.)

The Chernoff bound is an exponentially decreasing bound on tail distributions. Let be independent random variables and for all . Define . We can use the following two kinds of tail inequalities for .

  • Chernoff Bounds:
.


  • th-Moment Bound:
.
  1. Show that for each , there exists a choice of such that the th-moment bound is strictly stronger than the Chernoff bound. (Hint: You may consider using the probabilistic method.)
  2. Why would we still prefer the Chernoff bound to the seemingly stronger th-moment bound?