高级算法 (Fall 2024)/Problem Set 1: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
Kvrmnks (talk | contribs)
Line 10: Line 10:


== Problem 3 (Hashing) ==
== Problem 3 (Hashing) ==
Let <math>X_1,X_2,\ldots,X_n</math> be <math>n</math> random variables, where each <math>X_i \in \{0, 1\}</math> follows the distribution <math>\mu_i</math>. For each <math>1\leq i \leq n</math>, let <math>\rho_i = \mathbb{E}[X_i]</math> and assume <math>\rho_i \geq \frac{1}{2}</math>. Consider the problem of estimating the value of
:<math>Z = \prod_{i = 1}^n \rho_i</math>.
For each <math>1\leq  i \leq n</math>, the algorithm draws <math>s</math> random samples <math>X_i^{(1)},X_i^{(2)},\ldots,X_i^{(s)}</math> independently from the distribution <math>\mu_i</math>, and computes
:<math>\widehat{\rho}_{i}=\frac{1}{s}\sum_{j=1}^s X_i^{(j)}</math>.
Finally, the algorithm outputs the product of all <math>\widehat{Z}_{i}</math>:
:<math>\widehat{Z}=\prod_{i= 1}^n\widehat{\rho}_i</math>.
Express <math>s</math> as a function of <math>n,\varepsilon,\delta</math> so that the output <math>\widehat{Z}</math> satisfies
:<math>\Pr\left[(1 - \varepsilon) Z \leq \widehat{Z} \leq (1 + \varepsilon)Z\right] \geq 1- \delta</math>.
Try to make <math>s</math> as small as possible.


== Problem 4 (Concentration of measure) ==
== Problem 4 (Concentration of measure) ==

Revision as of 03:15, 28 September 2024

  • 每道题目的解答都要有完整的解题过程,中英文不限。
  • 我们推荐大家使用LaTeX, markdown等对作业进行排版。

Problem 1 (Min-cut/Max-cut)

For any [math]\displaystyle{ \alpha \ge 1 }[/math], a cut is called an [math]\displaystyle{ \alpha }[/math]-approximate min-cut in a multigraph [math]\displaystyle{ G }[/math] if the number of edges in it is at most [math]\displaystyle{ \alpha }[/math] times that of the min-cut. Prove that the number of [math]\displaystyle{ \alpha }[/math]-approximate min-cuts in a multigraph [math]\displaystyle{ G }[/math] is at most [math]\displaystyle{ n^{2\alpha} / 2 }[/math]. (Hint: Run Karger's algorithm until it has [math]\displaystyle{ \lceil 2\alpha \rceil }[/math] supernodes. What is the chance that a particular [math]\displaystyle{ \alpha }[/math]-approximate min-cut is still available? How many possible cuts does this collapsed graph have?)

Problem 2 (Fingerprinting)

Two rooted trees [math]\displaystyle{ T_1 }[/math] and [math]\displaystyle{ T_2 }[/math] are said to be isomorphic if there exists a one to one mapping [math]\displaystyle{ f }[/math] from the nodes of [math]\displaystyle{ T_1 }[/math] to those of [math]\displaystyle{ T_2 }[/math] satisfying the following condition: [math]\displaystyle{ v }[/math] is a child of [math]\displaystyle{ w }[/math] in [math]\displaystyle{ T_1 }[/math] if and only if [math]\displaystyle{ f(v) }[/math] is a child of [math]\displaystyle{ f(w) }[/math] in [math]\displaystyle{ T_2 }[/math]. Observe that no ordering is assumed on the children of any vertex. Devise an efficient randomized algorithm for testing the isomorphism of rooted trees and analyze its performance. Hint: Recursively associate a polynomial [math]\displaystyle{ P_v }[/math] with each vertex [math]\displaystyle{ v }[/math] in a tree [math]\displaystyle{ T }[/math].

Problem 3 (Hashing)

Let [math]\displaystyle{ X_1,X_2,\ldots,X_n }[/math] be [math]\displaystyle{ n }[/math] random variables, where each [math]\displaystyle{ X_i \in \{0, 1\} }[/math] follows the distribution [math]\displaystyle{ \mu_i }[/math]. For each [math]\displaystyle{ 1\leq i \leq n }[/math], let [math]\displaystyle{ \rho_i = \mathbb{E}[X_i] }[/math] and assume [math]\displaystyle{ \rho_i \geq \frac{1}{2} }[/math]. Consider the problem of estimating the value of

[math]\displaystyle{ Z = \prod_{i = 1}^n \rho_i }[/math].

For each [math]\displaystyle{ 1\leq i \leq n }[/math], the algorithm draws [math]\displaystyle{ s }[/math] random samples [math]\displaystyle{ X_i^{(1)},X_i^{(2)},\ldots,X_i^{(s)} }[/math] independently from the distribution [math]\displaystyle{ \mu_i }[/math], and computes

[math]\displaystyle{ \widehat{\rho}_{i}=\frac{1}{s}\sum_{j=1}^s X_i^{(j)} }[/math].

Finally, the algorithm outputs the product of all [math]\displaystyle{ \widehat{Z}_{i} }[/math]:

[math]\displaystyle{ \widehat{Z}=\prod_{i= 1}^n\widehat{\rho}_i }[/math].

Express [math]\displaystyle{ s }[/math] as a function of [math]\displaystyle{ n,\varepsilon,\delta }[/math] so that the output [math]\displaystyle{ \widehat{Z} }[/math] satisfies

[math]\displaystyle{ \Pr\left[(1 - \varepsilon) Z \leq \widehat{Z} \leq (1 + \varepsilon)Z\right] \geq 1- \delta }[/math].

Try to make [math]\displaystyle{ s }[/math] as small as possible.

Problem 4 (Concentration of measure)

Consider the Erdős–Rényi random graph [math]\displaystyle{ G(n, p) }[/math] where every two vertices in the graph are connected randomly and independently with probability [math]\displaystyle{ p }[/math]. We denote [math]\displaystyle{ G \sim G(n, p) }[/math] if [math]\displaystyle{ G }[/math] is generated in this way. Recall that [math]\displaystyle{ \chi(G) }[/math] is the chromatic number of the graph [math]\displaystyle{ G }[/math].

(a.) For [math]\displaystyle{ 0 \lt p_1 \lt p_2 \lt 1 }[/math], let [math]\displaystyle{ G_1 \sim G(n, p_1) }[/math] and let [math]\displaystyle{ G_2 \sim G(n, p_2) }[/math]. Compare [math]\displaystyle{ \mathbf{E}[\chi(G_1)] }[/math] and [math]\displaystyle{ \mathbf{E}[\chi(G_2)] }[/math] and prove it.

(b.) For [math]\displaystyle{ G \sim G(n, n^{-\alpha}) }[/math] with [math]\displaystyle{ \alpha \gt 5/6 }[/math], prove that there exists a constant [math]\displaystyle{ C }[/math] such that every subgraph of [math]\displaystyle{ G }[/math] on [math]\displaystyle{ C\sqrt{n} }[/math] vertices is [math]\displaystyle{ 3 }[/math]-colorable with probability [math]\displaystyle{ 1 - o(1) }[/math].

(c.) For [math]\displaystyle{ G \sim G(n, n^{-\alpha}) }[/math] with [math]\displaystyle{ \alpha \gt 5/6 }[/math], show that [math]\displaystyle{ \chi(G) }[/math] is concentrated on four values with probability at least [math]\displaystyle{ 1 - o(1) }[/math].

Problem 5 (Dimension reduction)

In machine learning, the goal of many classification methods is to seperate data into classes using a hyperplane. A hyperplane in [math]\displaystyle{ \mathbb{R}^d }[/math] is characterized by a unit vector [math]\displaystyle{ a\in \mathbb{R}^d (\|a\|_2 = 1) }[/math] and [math]\displaystyle{ c\in \mathbb{R} }[/math]. It contains all [math]\displaystyle{ z\in \mathbb{R}^d }[/math] such that [math]\displaystyle{ a^\top z = c }[/math]. Suppose our dataset consists of [math]\displaystyle{ n }[/math] unit vectors in [math]\displaystyle{ \mathbb{R}^d }[/math]. These points can be separated into two linearly separable sets [math]\displaystyle{ X,Y }[/math] where [math]\displaystyle{ |X|+|Y| = n }[/math]. That is, for all [math]\displaystyle{ x\in X }[/math], [math]\displaystyle{ a^\top x\gt c }[/math] and for all [math]\displaystyle{ y\in Y }[/math], [math]\displaystyle{ a^\top y\lt c }[/math] (or vice versa). Furthermore, suppose that the [math]\displaystyle{ \ell_2 }[/math] distance of each point in [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] to this separating hyperplane is at least [math]\displaystyle{ \epsilon }[/math]. When this is the case, the hyperplane is said to have margin [math]\displaystyle{ \epsilon }[/math].

  1. Show that [math]\displaystyle{ X,Y }[/math] can be separated by the hyperplane characterized by [math]\displaystyle{ a\in \mathbb{R}^d (\|a\|_2 = 1) }[/math] and [math]\displaystyle{ c\in \mathbb{R} }[/math] with margin [math]\displaystyle{ \epsilon }[/math] is equivalent to the following condition: for all [math]\displaystyle{ x\in X }[/math], [math]\displaystyle{ a^\top x \gt c+\epsilon }[/math] and for all [math]\displaystyle{ y\in Y }[/math], [math]\displaystyle{ a^\top y \lt c-\epsilon }[/math] (or vice versa).
  2. Show that if we use a Johnson-Lindenstrauss map [math]\displaystyle{ A\in \mathbb{R}^{k\times d} }[/math] (the scaled Gaussian matrix given in the lecture) to reduce our data points to [math]\displaystyle{ O(\log n/\epsilon^2) }[/math] dimensions, then with probability at least [math]\displaystyle{ 9/10 }[/math], the dimension reduced data can still be separated by a hyperplane with margin [math]\displaystyle{ \epsilon/4 }[/math]. (Hint: use the fact that JLT preserves inner product.)