高级算法 (Fall 2020)/Problem Set 1: Difference between revisions
imported>TCSseminar |
imported>TCSseminar |
||
Line 24: | Line 24: | ||
== Problem 2 == | == Problem 2 == | ||
Let <math>X_1,X_2,\ldots,X_n</math> be <math>n</math> random variables, where each <math>X_i \in \{0, 1\}</math> follows the distribution <math>\mu_i</math>. For each <math>1\leq i \leq n</math>, let <math>\rho_i = \mathbb{E}[X_i]</math> and assume <math>\rho_i \geq \frac{1}{2}</math>. Consider the problem of estimating the value of | |||
:<math>Z = \prod_{i = 1}^n \rho_i</math>. | |||
For each <math>1\leq i \leq n</math>, the algorithm draws <math>s</math> random samples <math>X_i^{(1)},X_i^{(2)},\ldots,X_i^{(s)}</math> independently from the distribution <math>\mu_i</math>, and computes | |||
:<math>\widehat{\rho}_{i}=\frac{1}{s}\sum_{j=1}^s X_i^{(j)}</math>. | |||
Finally, the algorithm outputs the product of all <math>\widehat{Z}_{i}</math>: | |||
:<math>\widehat{Z}=\prod_{i= 1}^n\widehat{\rho}_i</math>. | |||
Express <math>s</math> as a function of <math>n,\varepsilon,\delta</math> so that the output <math>\widehat{Z}</math> satisfies | |||
:<math>\Pr\left[\mathrm{e}^{-\varepsilon}Z \leq \widehat{Z} \leq \mathrm{e}^{\varepsilon}Z\right] \geq 1- \delta</math>. | |||
Try to make <math>s</math> as small as possible. | |||
== Problem 3 == | == Problem 3 == | ||
== Problem 4 == | == Problem 4 == | ||
== Problem 5 == | == Problem 5 == | ||
== Problem 6 == | == Problem 6 == |
Revision as of 12:05, 29 September 2020
- 每道题目的解答都要有完整的解题过程。中英文不限。
Problem 1
Let [math]\displaystyle{ X }[/math] be a real-valued random variable with finite [math]\displaystyle{ \mathbb{E}[X] }[/math] and finite [math]\displaystyle{ \mathbb{E}\left[\mathrm{e}^{\lambda X}\right] }[/math] for all [math]\displaystyle{ \lambda\ge 0 }[/math]. We define the log-moment-generating function as
- [math]\displaystyle{ \Psi_X(\lambda):=\ln\mathbb{E}[\mathrm{e}^{\lambda X}] \quad\text{ for all }\lambda\ge 0 }[/math],
and its dual function:
- [math]\displaystyle{ \Psi_X^*(t):=\sup_{\lambda\ge 0}(\lambda t-\Psi_X(\lambda)) }[/math].
Assume that [math]\displaystyle{ X }[/math] is NOT almost surely constant. Then due to the convexity of [math]\displaystyle{ \mathrm{e}^{\lambda X} }[/math] with respect to [math]\displaystyle{ \lambda }[/math], the function [math]\displaystyle{ \Psi_X(\lambda) }[/math] is strictly convex over [math]\displaystyle{ \lambda\ge 0 }[/math].
- Prove the following Chernoff bound:
- [math]\displaystyle{ \Pr[X\ge t]\le\exp(-\Psi_X^*(t)) }[/math].
- In particular if [math]\displaystyle{ \Psi_X(\lambda) }[/math] is continuously differentiable, prove that the supreme in [math]\displaystyle{ \Psi_X^*(t) }[/math] is achieved at the unique [math]\displaystyle{ \lambda\ge 0 }[/math] satisfying
- [math]\displaystyle{ \Psi_X'(\lambda)=t }[/math]
- where [math]\displaystyle{ \Psi_X'(\lambda) }[/math] denotes the derivative of [math]\displaystyle{ \Psi_X(\lambda) }[/math] with respect to [math]\displaystyle{ \lambda }[/math].
- Normal random variables. Let [math]\displaystyle{ X\sim \mathrm{N}(\mu,\sigma) }[/math] be a Gaussian random variable with mean [math]\displaystyle{ \mu }[/math] and standard deviation [math]\displaystyle{ \sigma }[/math]. What are the [math]\displaystyle{ \Psi_X(\lambda) }[/math] and [math]\displaystyle{ \Psi_X^*(t) }[/math]? And give a tail inequality to upper bound the probability [math]\displaystyle{ \Pr[X\ge t] }[/math].
- Poisson random variables. Let [math]\displaystyle{ X\sim \mathrm{Pois}(\nu) }[/math] be a Poisson random variable with parameter [math]\displaystyle{ \nu }[/math], that is, [math]\displaystyle{ \Pr[X=k]=\mathrm{e}^{-\nu}\nu^k/k! }[/math] for all [math]\displaystyle{ k=0,1,2,\ldots }[/math]. What are the [math]\displaystyle{ \Psi_X(\lambda) }[/math] and [math]\displaystyle{ \Psi_X^*(t) }[/math]? And give a tail inequality to upper bound the probability [math]\displaystyle{ \Pr[X\ge t] }[/math].
- Bernoulli random variables. Let [math]\displaystyle{ X\in\{0,1\} }[/math] be a single Bernoulli trial with probability of success [math]\displaystyle{ p }[/math], that is, [math]\displaystyle{ \Pr[X=1]=1-\Pr[X=0]=p }[/math]. Show that for any [math]\displaystyle{ t\in(p,1) }[/math], we have [math]\displaystyle{ \Psi_X^*(t)=D(Y \| X) }[/math] where [math]\displaystyle{ Y\in\{0,1\} }[/math] is a Bernoulli random variable with parameter [math]\displaystyle{ t }[/math] and [math]\displaystyle{ D(Y \| X)=(1-t)\ln\frac{1-t}{1-p}+t\ln\frac{t}{p} }[/math] is the Kullback-Leibler divergence between [math]\displaystyle{ Y }[/math] and [math]\displaystyle{ X }[/math].
- Sum of independent random variables. Let [math]\displaystyle{ X=\sum_{i=1}^nX_i }[/math] be the sum of [math]\displaystyle{ n }[/math] independently and identically distributed random variables [math]\displaystyle{ X_1,X_2,\ldots, X_n }[/math]. Show that [math]\displaystyle{ \Psi_X(\lambda)=\sum_{i=1}^n\Psi_{X_i}(\lambda) }[/math] and [math]\displaystyle{ \Psi_X^*(t)=n\Psi^*_{X_i}(\frac{t}{n}) }[/math]. Also for binomial random variable [math]\displaystyle{ X\sim \mathrm{Bin}(n,p) }[/math], give an upper bound to the tail inequality [math]\displaystyle{ \Pr[X\ge t] }[/math] in terms of KL-divergence.
- Give an upper bound to [math]\displaystyle{ \Pr[X\ge t] }[/math] when every [math]\displaystyle{ X_i }[/math] follows the geometric distribution with a probability [math]\displaystyle{ p }[/math] of success.
Problem 2
Let [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math] be [math]\displaystyle{ n }[/math] random variables, where each [math]\displaystyle{ X_i \in \{0, 1\} }[/math] follows the distribution [math]\displaystyle{ \mu_i }[/math]. For each [math]\displaystyle{ 1\leq i \leq n }[/math], let [math]\displaystyle{ \rho_i = \mathbb{E}[X_i] }[/math] and assume [math]\displaystyle{ \rho_i \geq \frac{1}{2} }[/math]. Consider the problem of estimating the value of
- [math]\displaystyle{ Z = \prod_{i = 1}^n \rho_i }[/math].
For each [math]\displaystyle{ 1\leq i \leq n }[/math], the algorithm draws [math]\displaystyle{ s }[/math] random samples [math]\displaystyle{ X_i^{(1)},X_i^{(2)},\ldots,X_i^{(s)} }[/math] independently from the distribution [math]\displaystyle{ \mu_i }[/math], and computes
- [math]\displaystyle{ \widehat{\rho}_{i}=\frac{1}{s}\sum_{j=1}^s X_i^{(j)} }[/math].
Finally, the algorithm outputs the product of all [math]\displaystyle{ \widehat{Z}_{i} }[/math]:
- [math]\displaystyle{ \widehat{Z}=\prod_{i= 1}^n\widehat{\rho}_i }[/math].
Express [math]\displaystyle{ s }[/math] as a function of [math]\displaystyle{ n,\varepsilon,\delta }[/math] so that the output [math]\displaystyle{ \widehat{Z} }[/math] satisfies
- [math]\displaystyle{ \Pr\left[\mathrm{e}^{-\varepsilon}Z \leq \widehat{Z} \leq \mathrm{e}^{\varepsilon}Z\right] \geq 1- \delta }[/math].
Try to make [math]\displaystyle{ s }[/math] as small as possible.