高级算法 (Fall 2017)/Problem Set 1

每道题目的解答都要有完整的解题过程。中英文不限。

Problem 1

Recall that in class we show by the probabilistic method how to deduce a [math]\displaystyle{ \frac{n(n-1)}{2} }[/math] upper bound on the number of distinct min-cuts in any multigraph [math]\displaystyle{ G }[/math] with [math]\displaystyle{ n }[/math] vertices from the [math]\displaystyle{ \frac{2}{n(n-1)} }[/math] lower bound for success probability of Karger's min-cut algorithm.

Also recall that the [math]\displaystyle{ FastCut }[/math] algorithm taught in class guarantees to return a min-cut with probability at least [math]\displaystyle{ \Omega(1/\log n) }[/math]. Does this imply a much tighter [math]\displaystyle{ O(\log n) }[/math] upper bound on the number of distinct min-cuts in any multigraph [math]\displaystyle{ G }[/math] with [math]\displaystyle{ n }[/math] vertices? Prove your improved upper bound if your answer is "yes", and give a satisfactory explanation if your answer is "no".

Problem 2

Two rooted trees [math]\displaystyle{ T_1 }[/math] and [math]\displaystyle{ T_2 }[/math] are said to be isomorphic if there exists a bijection [math]\displaystyle{ \phi }[/math] that maps vertices of [math]\displaystyle{ T_1 }[/math] to those of [math]\displaystyle{ T_2 }[/math] satisfying the following condition: for each internal vertex [math]\displaystyle{ v }[/math] of [math]\displaystyle{ T_1 }[/math] with children [math]\displaystyle{ u_1,u_2,\ldots, u_k }[/math], the set of children of vertex [math]\displaystyle{ \phi(v) }[/math] in [math]\displaystyle{ T_2 }[/math] is precisely [math]\displaystyle{ \{\phi(u_1), \phi(u_2),\ldots,\phi(u_k)\} }[/math], no ordering among children assumed.

Give an efficient randomized algorithm with bounded one-sided error (false positive), for testing isomorphism between rooted trees with [math]\displaystyle{ n }[/math] vertices. Analyze your algorithm.

Problem 3

Suppose [math]\displaystyle{ x_1,x_2,\ldots,x_n }[/math] is an unsorted list of [math]\displaystyle{ n }[/math] distinct numbers. We sample (with replacement) [math]\displaystyle{ t }[/math] items uniformly at random from the list, and denote them as [math]\displaystyle{ Y_1,Y_2,\ldots,Y_t }[/math]. Obviously [math]\displaystyle{ \{Y_1,Y_2,\ldots,Y_t\}\subseteq \{x_1,x_2,\ldots,x_n\} }[/math].

Describe a strategy of choosing an [math]\displaystyle{ x }[/math] from the sampled set [math]\displaystyle{ \{Y_1,Y_2,\ldots,Y_t\} }[/math] such that [math]\displaystyle{ \mathrm{rank}(x) }[/math] is approximately [math]\displaystyle{ k }[/math]. Here [math]\displaystyle{ \mathrm{rank}(x) }[/math] denotes the rank of [math]\displaystyle{ x }[/math] in the original list [math]\displaystyle{ \{x_1,x_2,\ldots,x_n\} }[/math]: The rank of the largest number among [math]\displaystyle{ x_1,x_2,\ldots,x_n }[/math] is 1; the rank of the second largest number among [math]\displaystyle{ x_1,x_2,\ldots,x_n }[/math] is 2, and so on. Choose your [math]\displaystyle{ t }[/math] as small as possible (in big-O notation) so that with probability at least [math]\displaystyle{ 1-\delta }[/math], your strategy returns an [math]\displaystyle{ x }[/math] such that [math]\displaystyle{ (1-\epsilon)k\le \mathrm{rank}(x)\le (1+\epsilon)k }[/math].

Problem 4

In Balls-and-Bins model, we throw [math]\displaystyle{ m }[/math] balls independently and uniformly at random into [math]\displaystyle{ n }[/math] bins. We know that the maximum load is [math]\displaystyle{ \Theta\left(\frac{\log n}{\log\log n}\right) }[/math] with high probability when [math]\displaystyle{ m=\Theta(n) }[/math]. The two-choice paradigm is another way to throw [math]\displaystyle{ m }[/math] balls into [math]\displaystyle{ n }[/math] bins: each ball is thrown into the least loaded of two bins chosen independently and uniformly at random(it could be the case that the two chosen bins are exactly the same, and then the ball will be thrown into that bin), and breaks the tie arbitrarily. When [math]\displaystyle{ m=\Theta(n) }[/math], the maximum load of two-choice paradigm is known to be [math]\displaystyle{ \Theta(\log\log n) }[/math] with high probability, which is exponentially less than the maxim load when there is only one random choice. This phenomenon is called the power of two choices.

Here are the questions:

Consider the following paradigm: we throw [math]\displaystyle{ n }[/math] balls into [math]\displaystyle{ n }[/math] bins. The first [math]\displaystyle{ \frac{n}{2} }[/math] balls are thrown into bins independently and uniformly at random. The remaining [math]\displaystyle{ \frac{n}{2} }[/math] balls are thrown into bins using the two-choice paradigm. What is the maximum load with high probability? You need to give an asymptotically tight bound (in the form of [math]\displaystyle{ \Theta(\cdot) }[/math]).

Replace the above paradigm to the following: the first [math]\displaystyle{ \frac{n}{2} }[/math] balls are thrown into bins using the two-choice paradigm while the remaining [math]\displaystyle{ \frac{n}{2} }[/math] balls are thrown into bins independently and uniformly at random. What is the maximum load with high probability in this case? You need to give an asymptotically tight bound.

Replace the above paradigm to the following: assume all [math]\displaystyle{ n }[/math] balls are thrown in a sequence. For every [math]\displaystyle{ 1\le i\le n }[/math], if [math]\displaystyle{ i }[/math] is odd, we throw [math]\displaystyle{ i }[/math]-th ball into bins independently and uniformly at random, otherwise, we throw it into bins using the two-choice paradigm. What is the maximum load with high probability in this case? You need to give an asymptotically tight bound.

Problem 5

Let [math]\displaystyle{ X }[/math] be a real-valued random variable with finite [math]\displaystyle{ \mathbb{E}[X] }[/math] and finite [math]\displaystyle{ \mathbb{E}\left[\mathrm{e}^{\lambda X}\right] }[/math] for all [math]\displaystyle{ \lambda\ge 0 }[/math]. We define the log-moment-generating function as

[math]\displaystyle{ \Psi_X(\lambda):=\ln\mathbb{E}[\mathrm{e}^{\lambda X}] \quad\text{ for all }\lambda\ge 0 }[/math],

and its dual function:

[math]\displaystyle{ \Psi_X^*(t):=\sup_{\lambda\ge 0}(\lambda t-\Psi_X(\lambda)) }[/math].

Assume that [math]\displaystyle{ X }[/math] is NOT almost surely constant. Then due to the convexity of [math]\displaystyle{ \mathrm{e}^{\lambda X} }[/math] with respect to [math]\displaystyle{ \lambda }[/math], the function [math]\displaystyle{ \Psi_X(\lambda) }[/math] is strictly convex over [math]\displaystyle{ \lambda\ge 0 }[/math].

Prove the following Chernoff bound:

[math]\displaystyle{ \Pr[X\ge t]\le\exp(-\Psi_X^*(t)) }[/math].

In particular if [math]\displaystyle{ \Psi_X(\lambda) }[/math] is continuously differentiable, prove that the supreme in [math]\displaystyle{ \Psi_X^*(t) }[/math] is achieved at the unique [math]\displaystyle{ \lambda\ge 0 }[/math] satisfying

[math]\displaystyle{ \Psi_X'(\lambda)=t }[/math]

where [math]\displaystyle{ \Psi_X'(\lambda) }[/math] denotes the derivative of [math]\displaystyle{ \Psi_X(\lambda) }[/math] with respect to [math]\displaystyle{ \lambda }[/math].

Normal random variables. Let [math]\displaystyle{ X\sim \mathrm{N}(\mu,\sigma) }[/math] be a Gaussian random variable with mean [math]\displaystyle{ \mu }[/math] and standard deviation [math]\displaystyle{ \sigma }[/math]. What are the [math]\displaystyle{ \Psi_X(\lambda) }[/math] and [math]\displaystyle{ \Psi_X^*(t) }[/math]? And give a tail inequality to upper bound the probability [math]\displaystyle{ \Pr[X\ge t] }[/math].

Poisson random variables. Let [math]\displaystyle{ X\sim \mathrm{Pois}(\nu) }[/math] be a Poisson random variable with parameter [math]\displaystyle{ \nu }[/math], that is, [math]\displaystyle{ \Pr[X=k]=\mathrm{e}^{-\nu}\nu^k/k! }[/math] for all [math]\displaystyle{ k=0,1,2,\ldots }[/math]. What are the [math]\displaystyle{ \Psi_X(\lambda) }[/math] and [math]\displaystyle{ \Psi_X^*(t) }[/math]? And give a tail inequality to upper bound the probability [math]\displaystyle{ \Pr[X\ge t] }[/math].

Bernoulli random variables. Let [math]\displaystyle{ X\in\{0,1\} }[/math] be a single Bernoulli trial with probability of success [math]\displaystyle{ p }[/math], that is, [math]\displaystyle{ \Pr[X=1]=1-\Pr[X=0]=p }[/math]. Show that for any [math]\displaystyle{ t\in(p,1) }[/math], we have [math]\displaystyle{ \Psi_X^*(t)=D(Y \| X) }[/math] where [math]\displaystyle{ Y\in\{0,1\} }[/math] is a Bernoulli random variable with parameter [math]\displaystyle{ t }[/math] and [math]\displaystyle{ D(Y \| X)=(1-t)\ln\frac{1-t}{1-p}+t\ln\frac{t}{p} }[/math] is the Kullback-Leibler divergence between [math]\displaystyle{ Y }[/math] and [math]\displaystyle{ X }[/math].

Sum of independent random variables. Let [math]\displaystyle{ X=\sum_{i=1}^nX_i }[/math] be the sum of [math]\displaystyle{ n }[/math] independently and identically distributed random variables [math]\displaystyle{ X_1,X_2,\ldots, X_n }[/math]. Show that [math]\displaystyle{ \Psi_X(\lambda)=\sum_{i=1}^n\Psi_{X_i}(\lambda) }[/math] and [math]\displaystyle{ \Psi_X^*(t)=n\Psi^*_{X_i}(\frac{t}{n}) }[/math]. Also for binomial random variable [math]\displaystyle{ X\sim \mathrm{Bin}(n,p) }[/math], give an upper bound to the tail inequality [math]\displaystyle{ \Pr[X\ge t] }[/math] in terms of KL-divergence.

Give an upper bound to [math]\displaystyle{ \Pr[X\ge t] }[/math] when every [math]\displaystyle{ X_i }[/math] follows the geometric distribution with a probability [math]\displaystyle{ p }[/math] of success.

Problem 6

A boolean code is a mapping [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math]. Each [math]\displaystyle{ x\in\{0,1\}^k }[/math] is called a message and [math]\displaystyle{ y=C(x) }[/math] is called a codeword. The code rate [math]\displaystyle{ r }[/math] of a code [math]\displaystyle{ C }[/math] is [math]\displaystyle{ r=\frac{k}{n} }[/math]. A boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] is a linear code if it is a linear transformation, i.e. there is a matrix [math]\displaystyle{ A\in\{0,1\}^{n\times k} }[/math] such that [math]\displaystyle{ C(x)=Ax }[/math] for any [math]\displaystyle{ x\in\{0,1\}^k }[/math], where the additions and multiplications are defined over the finite field of order two, [math]\displaystyle{ (\{0,1\},+_{\bmod 2},\times_{\bmod 2}) }[/math].

The distance between two codeword [math]\displaystyle{ y_1 }[/math] and [math]\displaystyle{ y_2 }[/math], denoted by [math]\displaystyle{ d(y_1,y_2) }[/math], is defined as the Hamming distance between them. Formally, [math]\displaystyle{ d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)| }[/math]. The distance of a code [math]\displaystyle{ C }[/math] is the minimum distance between any two codewords. Formally, [math]\displaystyle{ d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2)) }[/math].

Usually we want to make both the code rate [math]\displaystyle{ r }[/math] and the code distance [math]\displaystyle{ d }[/math] as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.

Use the probabilistic method to prove that there exists a boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] of code rate [math]\displaystyle{ r }[/math] and distance [math]\displaystyle{ \left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n }[/math]. Try to optimize the constant in [math]\displaystyle{ \Theta(\cdot) }[/math].
Prove a similar result for linear boolean codes.

Bonus problem

Let [math]\displaystyle{ X }[/math] be a centralized random variable ([math]\displaystyle{ \mathbb{E}[X]=0 }[/math]) with finite [math]\displaystyle{ \mathbb{E}\left[\mathrm{e}^{\lambda |X|}\right] }[/math] for [math]\displaystyle{ \lambda\gt 0 }[/math]. We have the following two kinds of tail inequalities.

Chernoff bound:: [math]\displaystyle{ \Pr[|X|\ge t]\le\inf_{\lambda\ge 0}\frac{\mathbb{E}\left[\mathrm{e}^{\lambda |X|}\right]}{\mathrm{e}^{\lambda t}} }[/math].
[math]\displaystyle{ k }[/math]-th moment bound:: [math]\displaystyle{ \Pr[|X|\ge t]\le \frac{\mathbb{E}\left[|X|^k\right]}{t^k} }[/math].

Use the probabilistic method to show that for any [math]\displaystyle{ t\gt 0 }[/math], there exists a choice of [math]\displaystyle{ k }[/math] such that the [math]\displaystyle{ k }[/math]-th moment bound is strictly stronger than the Chernoff bound.
Why would we still prefer the Chernoff bound to the seemingly stronger [math]\displaystyle{ k }[/math]-th moment method?

高级算法 (Fall 2017)/Problem Set 1

Contents

Problem 1

Problem 2

Problem 3

Problem 4

Problem 5

Problem 6

Bonus problem

Navigation menu

高级算法 (Fall 2017)/Problem Set 1

Problem 1

Problem 2

Problem 3

Problem 4

Problem 5

Problem 6

Bonus problem

Navigation menu

Search