高级算法 (Fall 2020)/Problem Set 1

From TCS Wiki
Jump to navigation Jump to search
  • 每道题目的解答都要有完整的解题过程。中英文不限。

Problem 1

Modify the Karger's Contraction algorithm so that it works for the weighted min-cut problem. Prove that the modified algorithm returns a weighted minimum cut with probability at least [math]\displaystyle{ \frac{2}{n(n-1)} }[/math]. The weighted min-cut problem is defined as follows.

  • Input: an undirected weighted graph [math]\displaystyle{ G(V, E) }[/math], where every edge [math]\displaystyle{ e \in E }[/math] is associated with a positive real weight [math]\displaystyle{ w_e }[/math];
  • Output: a cut [math]\displaystyle{ C }[/math] in [math]\displaystyle{ G }[/math] such that [math]\displaystyle{ \sum_{e \in C} w_e }[/math] is minimized.

Problem 2

Let [math]\displaystyle{ X }[/math] be a real-valued random variable with finite [math]\displaystyle{ \mathbb{E}[X] }[/math] and finite [math]\displaystyle{ \mathbb{E}\left[\mathrm{e}^{\lambda X}\right] }[/math] for all [math]\displaystyle{ \lambda\ge 0 }[/math]. We define the log-moment-generating function as

[math]\displaystyle{ \Psi_X(\lambda):=\ln\mathbb{E}[\mathrm{e}^{\lambda X}] \quad\text{ for all }\lambda\ge 0 }[/math],

and its dual function:

[math]\displaystyle{ \Psi_X^*(t):=\sup_{\lambda\ge 0}(\lambda t-\Psi_X(\lambda)) }[/math].

Assume that [math]\displaystyle{ X }[/math] is NOT almost surely constant. Then due to the convexity of [math]\displaystyle{ \mathrm{e}^{\lambda X} }[/math] with respect to [math]\displaystyle{ \lambda }[/math], the function [math]\displaystyle{ \Psi_X(\lambda) }[/math] is strictly convex over [math]\displaystyle{ \lambda\ge 0 }[/math].

  • Prove the following Chernoff bound:
[math]\displaystyle{ \Pr[X\ge t]\le\exp(-\Psi_X^*(t)) }[/math].
In particular if [math]\displaystyle{ \Psi_X(\lambda) }[/math] is continuously differentiable, prove that the supreme in [math]\displaystyle{ \Psi_X^*(t) }[/math] is achieved at the unique [math]\displaystyle{ \lambda\ge 0 }[/math] satisfying
[math]\displaystyle{ \Psi_X'(\lambda)=t }[/math]
where [math]\displaystyle{ \Psi_X'(\lambda) }[/math] denotes the derivative of [math]\displaystyle{ \Psi_X(\lambda) }[/math] with respect to [math]\displaystyle{ \lambda }[/math].
  • Normal random variables. Let [math]\displaystyle{ X\sim \mathrm{N}(\mu,\sigma) }[/math] be a Gaussian random variable with mean [math]\displaystyle{ \mu }[/math] and standard deviation [math]\displaystyle{ \sigma }[/math]. What are the [math]\displaystyle{ \Psi_X(\lambda) }[/math] and [math]\displaystyle{ \Psi_X^*(t) }[/math]? And give a tail inequality to upper bound the probability [math]\displaystyle{ \Pr[X\ge t] }[/math].
  • Poisson random variables. Let [math]\displaystyle{ X\sim \mathrm{Pois}(\nu) }[/math] be a Poisson random variable with parameter [math]\displaystyle{ \nu }[/math], that is, [math]\displaystyle{ \Pr[X=k]=\mathrm{e}^{-\nu}\nu^k/k! }[/math] for all [math]\displaystyle{ k=0,1,2,\ldots }[/math]. What are the [math]\displaystyle{ \Psi_X(\lambda) }[/math] and [math]\displaystyle{ \Psi_X^*(t) }[/math]? And give a tail inequality to upper bound the probability [math]\displaystyle{ \Pr[X\ge t] }[/math].
  • Bernoulli random variables. Let [math]\displaystyle{ X\in\{0,1\} }[/math] be a single Bernoulli trial with probability of success [math]\displaystyle{ p }[/math], that is, [math]\displaystyle{ \Pr[X=1]=1-\Pr[X=0]=p }[/math]. Show that for any [math]\displaystyle{ t\in(p,1) }[/math], we have [math]\displaystyle{ \Psi_X^*(t)=D(Y \| X) }[/math] where [math]\displaystyle{ Y\in\{0,1\} }[/math] is a Bernoulli random variable with parameter [math]\displaystyle{ t }[/math] and [math]\displaystyle{ D(Y \| X)=(1-t)\ln\frac{1-t}{1-p}+t\ln\frac{t}{p} }[/math] is the Kullback-Leibler divergence between [math]\displaystyle{ Y }[/math] and [math]\displaystyle{ X }[/math].
  • Sum of independent random variables. Let [math]\displaystyle{ X=\sum_{i=1}^nX_i }[/math] be the sum of [math]\displaystyle{ n }[/math] independently and identically distributed random variables [math]\displaystyle{ X_1,X_2,\ldots, X_n }[/math]. Show that [math]\displaystyle{ \Psi_X(\lambda)=\sum_{i=1}^n\Psi_{X_i}(\lambda) }[/math] and [math]\displaystyle{ \Psi_X^*(t)=n\Psi^*_{X_i}(\frac{t}{n}) }[/math]. Also for binomial random variable [math]\displaystyle{ X\sim \mathrm{Bin}(n,p) }[/math], give an upper bound to the tail inequality [math]\displaystyle{ \Pr[X\ge t] }[/math] in terms of KL-divergence.
Give an upper bound to [math]\displaystyle{ \Pr[X\ge t] }[/math] when every [math]\displaystyle{ X_i }[/math] follows the geometric distribution with a probability [math]\displaystyle{ p }[/math] of success.

Problem 3

Let [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math] be [math]\displaystyle{ n }[/math] random variables, where each [math]\displaystyle{ X_i \in \{0, 1\} }[/math] follows the distribution [math]\displaystyle{ \mu_i }[/math]. For each [math]\displaystyle{ 1\leq i \leq n }[/math], let [math]\displaystyle{ \rho_i = \mathbb{E}[X_i] }[/math] and assume [math]\displaystyle{ \rho_i \geq \frac{1}{2} }[/math]. Consider the problem of estimating the value of

[math]\displaystyle{ Z = \prod_{i = 1}^n \rho_i }[/math].

For each [math]\displaystyle{ 1\leq i \leq n }[/math], the algorithm draws [math]\displaystyle{ s }[/math] random samples [math]\displaystyle{ X_i^{(1)},X_i^{(2)},\ldots,X_i^{(s)} }[/math] independently from the distribution [math]\displaystyle{ \mu_i }[/math], and computes

[math]\displaystyle{ \widehat{\rho}_{i}=\frac{1}{s}\sum_{j=1}^s X_i^{(j)} }[/math].

Finally, the algorithm outputs the product of all [math]\displaystyle{ \widehat{Z}_{i} }[/math]:

[math]\displaystyle{ \widehat{Z}=\prod_{i= 1}^n\widehat{\rho}_i }[/math].

Express [math]\displaystyle{ s }[/math] as a function of [math]\displaystyle{ n,\varepsilon,\delta }[/math] so that the output [math]\displaystyle{ \widehat{Z} }[/math] satisfies

[math]\displaystyle{ \Pr\left[\mathrm{e}^{-\varepsilon}Z \leq \widehat{Z} \leq \mathrm{e}^{\varepsilon}Z\right] \geq 1- \delta }[/math].

Try to make [math]\displaystyle{ s }[/math] as small as possible.

Problem 4

In Balls-and-Bins model, we throw [math]\displaystyle{ m }[/math] balls independently and uniformly at random into [math]\displaystyle{ n }[/math] bins. We know that the maximum load is [math]\displaystyle{ \Theta\left(\frac{\log n}{\log\log n}\right) }[/math] with high probability when [math]\displaystyle{ m=\Theta(n) }[/math]. The two-choice paradigm is another way to throw [math]\displaystyle{ m }[/math] balls into [math]\displaystyle{ n }[/math] bins: each ball is thrown into the least loaded of two bins chosen independently and uniformly at random(it could be the case that the two chosen bins are exactly the same, and then the ball will be thrown into that bin), and breaks the tie arbitrarily. When [math]\displaystyle{ m=\Theta(n) }[/math], the maximum load of two-choice paradigm is known to be [math]\displaystyle{ \Theta(\log\log n) }[/math] with high probability, which is exponentially less than the maxim load when there is only one random choice. This phenomenon is called the power of two choices.

Here are the questions:

  • Consider the following paradigm: we throw [math]\displaystyle{ n }[/math] balls into [math]\displaystyle{ n }[/math] bins. The first [math]\displaystyle{ \frac{n}{2} }[/math] balls are thrown into bins independently and uniformly at random. The remaining [math]\displaystyle{ \frac{n}{2} }[/math] balls are thrown into bins using the two-choice paradigm. What is the maximum load with high probability? You need to give an asymptotically tight bound (in the form of [math]\displaystyle{ \Theta(\cdot) }[/math]).
  • Replace the above paradigm to the following: the first [math]\displaystyle{ \frac{n}{2} }[/math] balls are thrown into bins using the two-choice paradigm while the remaining [math]\displaystyle{ \frac{n}{2} }[/math] balls are thrown into bins independently and uniformly at random. What is the maximum load with high probability in this case? You need to give an asymptotically tight bound.
  • Replace the above paradigm to the following: assume all [math]\displaystyle{ n }[/math] balls are thrown in a sequence. For every [math]\displaystyle{ 1\le i\le n }[/math], if [math]\displaystyle{ i }[/math] is odd, we throw [math]\displaystyle{ i }[/math]-th ball into bins independently and uniformly at random, otherwise, we throw it into bins using the two-choice paradigm. What is the maximum load with high probability in this case? You need to give an asymptotically tight bound.

Problem 5

Consider the function [math]\displaystyle{ f:\mathbb{R}^n\to\mathbb{R} }[/math] defined as

[math]\displaystyle{ f(\vec x)=f(x_1,x_2,\dots,x_n)=\prod_{i=1}^{n}(a_ix_i-b_i) }[/math],

where [math]\displaystyle{ \{a_i\}_{1\le i\le n} }[/math] and [math]\displaystyle{ \{b_i\}_{1\le i\le n} }[/math] are unknown coefficients satisfy that [math]\displaystyle{ a_i, b_i\in \mathbb{Z} }[/math] and [math]\displaystyle{ 0\le a_i, b_i \le n }[/math] for all [math]\displaystyle{ 1\le i\le n }[/math].

Let [math]\displaystyle{ p\gt n }[/math] be the smallest prime strictly greater than [math]\displaystyle{ n }[/math]. The function [math]\displaystyle{ g:\mathbb{Z}_p^n\to\mathbb{Z}_p }[/math] is defined as

[math]\displaystyle{ g(\vec x)=g(x_1,x_2,\dots,x_n)=\prod_{i=1}^{n}(a_ix_i-b_i) }[/math],

where [math]\displaystyle{ + }[/math] and [math]\displaystyle{ \cdot }[/math] are defined over the finite field [math]\displaystyle{ \mathbb{Z}_p }[/math].

By the properties of finite field, for any value [math]\displaystyle{ \vec r\in\mathbb{Z}_p^n }[/math], it holds that [math]\displaystyle{ g(\vec r)=f(\vec r)\bmod p }[/math].

Since the coefficients [math]\displaystyle{ \{a_i\}_{1\le i\le n} }[/math] and [math]\displaystyle{ \{b_i\}_{1\le i\le n} }[/math] are unknown, you can't calculate [math]\displaystyle{ f(\vec x) }[/math] directly. However, there exists an oracle [math]\displaystyle{ O }[/math], each time [math]\displaystyle{ O }[/math] gets an input [math]\displaystyle{ \vec x }[/math], it immediately outputs the value of [math]\displaystyle{ g(\vec x) }[/math].

1. Prove that [math]\displaystyle{ f\not\equiv 0 \Rightarrow g\not\equiv 0 }[/math].

2. Use the oracle [math]\displaystyle{ O }[/math] to design an algorithm to determine whether [math]\displaystyle{ f\equiv 0 }[/math], with error probability at most [math]\displaystyle{ \epsilon }[/math], where [math]\displaystyle{ \epsilon\in (0,1) }[/math] is a constant.

Problem 6

Recall that a function [math]\displaystyle{ f }[/math] is said to be convex if, for any [math]\displaystyle{ x_1, x_2 }[/math] and for [math]\displaystyle{ 0 \leq \lambda \leq 1 }[/math], [math]\displaystyle{ f(\lambda x_1 + (1 − \lambda)x_2 ) \leq \lambda f (x_1) + (1 − \lambda)f (x_2) }[/math].

(a) Prove Jensen's Inequality: If [math]\displaystyle{ f }[/math] is a convex function, and [math]\displaystyle{ X }[/math] is a random variable which takes on a set of finite values, then [math]\displaystyle{ \mathbb{E}[f(X)] \geq f (\mathbb{E}[X]) }[/math].

(b) Use Jensen's inequality to prove log-sum inequality: For nonnegative numbers, [math]\displaystyle{ a_1, a_2, \dots , a_n }[/math] and [math]\displaystyle{ b_1, b_2, \dots , b_n }[/math],

$$\sum\limits_{i=1}^{n}a_i\log{\frac{a_i}{b_i}}\geq\left(\sum\limits_{i=1}^na_i\right)\log{\frac{\sum_{i=1}^n a_i}{\sum_{i=1}^n{b_i}}}.$$

(c) Let [math]\displaystyle{ X_1,X_2,\dots,X_n }[/math] be [math]\displaystyle{ n }[/math] independent random variable which takes on a set of finite values in the range [math]\displaystyle{ [a,b] }[/math]. Let [math]\displaystyle{ X=\sum\limits_{i=1}^{n}X_i }[/math] and [math]\displaystyle{ \mu=\mathbb{E}[X] }[/math]. Use the fact that [math]\displaystyle{ f(x) = e^{tx} }[/math] is convex for any [math]\displaystyle{ t \geq 0 }[/math] to show [math]\displaystyle{ \text{Pr}[\vert X-\mu\vert \geq t]\leq 2exp(-\frac{t^2}{2n(b-a)^2}) }[/math], based on a Chernoff bound for independent Poisson trials.

(Hint: Use the bound for Rademacher random variables: For a random variable [math]\displaystyle{ X }[/math] with [math]\displaystyle{ \text{Pr}[X=1]=\text{Pr}[X=-1]=\frac{1}{2} }[/math], it holds that [math]\displaystyle{ \mathbb{E}[e^{tX}]\leq e^{\frac{t^2}{2}} }[/math].)