# 高级算法 (Fall 2018)/Problem Set 2

每道题目的解答都要有完整的解题过程。中英文不限。

## Contents

## Problem 1

Starting with a graph with vertices and no edge, we consider the following process to build a random undirected simple graph. At each step, we choose an edge uniformly at random from the set of all remaining unchosen edges, and add it to the graph. I.e., in the first step, we choose an edge from the set of edges; in the second step, we choose from the remaining edges; and so on. Let denote number of edges added until the graph is connected, give an upper bound for .

## Problem 2

In Balls-and-Bins model, we throw balls independently and uniformly at random into bins. We know that the maximum load is with high probability when .
The two-choice paradigm is another way to throw balls into bins: each ball is thrown into the least loaded of two bins chosen independently and uniformly at random(it could be the case that the two chosen bins are exactly the same, and then the ball will be thrown into that bin), and breaks the tie arbitrarily. When , the maximum load of two-choice paradigm is known to be with high probability, which is exponentially less than the maxim load when there is only one random choice. This phenomenon is called * the power of two choices*.

Here are the questions:

- Consider the following paradigm: we throw balls into bins. The first balls are thrown into bins independently and uniformly at random. The remaining balls are thrown into bins using the two-choice paradigm. What is the maximum load with high probability? You need to give an asymptotically tight bound (in the form of ).

- Replace the above paradigm to the following: the first balls are thrown into bins using the two-choice paradigm while the remaining balls are thrown into bins independently and uniformly at random. What is the maximum load with high probability in this case? You need to give an asymptotically tight bound.

- Replace the above paradigm to the following: assume all balls are thrown in a sequence. For every , if is odd, we throw -th ball into bins independently and uniformly at random, otherwise, we throw it into bins using the two-choice paradigm. What is the maximum load with high probability in this case? You need to give an asymptotically tight bound.

## Problem 3

Let be a real-valued random variable with finite and finite for all . We define the **log-moment-generating function** as

- ,

and its *dual function*:

- .

Assume that is NOT almost surely constant. Then due to the convexity of with respect to , the function is *strictly* convex over .

- Prove the following Chernoff bound:

- .

- In particular if is continuously differentiable, prove that the supreme in is achieved at the unique satisfying
- where denotes the derivative of with respect to .

**Normal random variables.**Let be a Gaussian random variable with mean and standard deviation . What are the and ? And give a tail inequality to upper bound the probability .

**Poisson random variables.**Let be a Poisson random variable with parameter , that is, for all . What are the and ? And give a tail inequality to upper bound the probability .

**Bernoulli random variables.**Let be a single Bernoulli trial with probability of success , that is, . Show that for any , we have where is a Bernoulli random variable with parameter and is the**Kullback-Leibler divergence**between and .

**Sum of independent random variables.**Let be the sum of independently and identically distributed random variables . Show that and . Also for binomial random variable , give an upper bound to the tail inequality in terms of KL-divergence.

- Give an upper bound to when every follows the geometric distribution with a probability of success.

## Problem 4

A **boolean code** is a mapping . Each is called a **message** and is called a **codeword**. The **code rate** of a code is . A boolean code is a **linear code** if it is a linear transformation, i.e. there is a matrix such that for any , where the additions and multiplications are defined over the finite field of order two, .

The **distance** between two codeword and , denoted by , is defined as the Hamming distance between them. Formally, . The distance of a code is the minimum distance between any two codewords. Formally, .

Usually we want to make both the code rate and the code distance as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.

- Use the probabilistic method to prove that there exists a boolean code of code rate and distance . Try to optimize the constant in .
- Prove a similar result for linear boolean codes.