高级算法 (Fall 2025)/Problem Set 1: Difference between revisions
Zhangyiyao (talk | contribs) |
Zhangyiyao (talk | contribs) No edit summary |
||
| Line 67: | Line 67: | ||
['''Inner product'''] | ['''Inner product'''] | ||
Fix parameters <math>d > 0</math> and <math>\delta, \epsilon \in (0,1)</math>. Let <math>A \in \mathbb{R}^{k \times d}</math> be a random matrix with <math>k = O(\log(1/\delta) / \epsilon^2)</math> rows, and the entries of <math>A</math> are chosen i.i.d. from a Gaussian distribution with mean 0 and variance <math>1/k</math>. Prove that | Fix parameters <math>d > 0</math> and <math>\delta, \epsilon \in (0,1)</math>. Let <math>A \in \mathbb{R}^{k \times d}</math> be a random matrix with <math>k = O(\log(1/\delta) / \epsilon^2)</math> rows, and the entries of <math>A</math> are chosen i.i.d. from a Gaussian distribution with mean 0 and variance <math>1/k</math>. Prove that for any <math> x, y \in \mathbb{R}^d </math>, <math> \big|x^\top y - (Ax)^\top(Ay)\big| \leq \epsilon(\lVert x \rVert_2^2 + \lVert y \rVert_2^2)</math> holds with probability at least <math> 1 - \delta </math>. | ||
\big|x^\top y - (Ax)^\top(Ay)\big| \leq \epsilon(\lVert x \rVert_2^2 + \lVert y \rVert_2^2) | |||
</math> | |||
Latest revision as of 06:53, 7 October 2025
- 每道题目的解答都要有完整的解题过程,中英文不限。
- 我们推荐大家使用LaTeX, markdown等对作业进行排版。
Problem 1 (s–t Min-Cut)
Consider adapting Karger's min-cut algorithm to the problem of finding an [math]\displaystyle{ s }[/math]–[math]\displaystyle{ t }[/math] min-cut in an undirected graph. In this problem, we are given an undirected graph [math]\displaystyle{ G }[/math] together with two distinguished vertices [math]\displaystyle{ s }[/math] and [math]\displaystyle{ t }[/math]. An [math]\displaystyle{ s }[/math]–[math]\displaystyle{ t }[/math] min-cut is a set of edges whose removal disconnects [math]\displaystyle{ s }[/math] from [math]\displaystyle{ t }[/math]; we seek an edge set of minimum cardinality.
As the algorithm proceeds, the vertex [math]\displaystyle{ s }[/math] may get amalgamated into a new vertex as the result of an edge being contracted; we call this vertex the [math]\displaystyle{ s }[/math]-vertex (initially [math]\displaystyle{ s }[/math] itself). Similarly, we have a [math]\displaystyle{ t }[/math]-vertex. As we run the contraction algorithm, we enforce that we never contract an edge between the [math]\displaystyle{ s }[/math]-vertex and the [math]\displaystyle{ t }[/math]-vertex.
(a) Show that there are graphs (not multi-graphs) in which the probability that this algorithm finds an [math]\displaystyle{ s }[/math]–[math]\displaystyle{ t }[/math] min-cut is exponentially small.
(b) How large (asymptotically) can the number of [math]\displaystyle{ s }[/math]–[math]\displaystyle{ t }[/math] min-cuts in a graph be?
Problem 2 (Count Sketch)
In class we learned about the Count-Min sketch. We saw that it could be used to estimate the frequency of any item in our stream up to an additive error [math]\displaystyle{ \epsilon \lVert \mathbf{x} \rVert_1 }[/math], where [math]\displaystyle{ \lVert \mathbf{x} \rVert_1 = n }[/math] is the total number of elements streamed in.
In this problem, we'll analyze an alternative algorithm that requires a bit more space, but can estimate the value of [math]\displaystyle{ x_i }[/math] to within error [math]\displaystyle{ \epsilon \lVert \mathbf{x} \rVert_2 }[/math], which is often much better in practice.
We'll analyze the following procedure:
- For a small value [math]\displaystyle{ q }[/math] to be set later, choose a random hash function [math]\displaystyle{ h(\cdot) }[/math] that maps every [math]\displaystyle{ i \in \{1, \dots, N\} }[/math] to [math]\displaystyle{ \{1, \dots, q\} }[/math]. Choose another random hash function [math]\displaystyle{ g(\cdot) }[/math] that maps every [math]\displaystyle{ i \in \{1, \dots, N\} }[/math] to [math]\displaystyle{ \{-1, 1\} }[/math]. Allocate space for [math]\displaystyle{ q }[/math] counters [math]\displaystyle{ C_1, \dots, C_q }[/math] (all initialized to 0).
- When [math]\displaystyle{ \mathsf{Increment}(x_i) }[/math] is called, set [math]\displaystyle{ C_{h(i)} \leftarrow C_{h(i)} + g(i) }[/math].
- When [math]\displaystyle{ \mathsf{Estimate}(x_i) }[/math] is called, return [math]\displaystyle{ y_i = g(i) C_{h(i)} }[/math].
Now answer the following:
(a) Show that [math]\displaystyle{ \mathbb{E}[y_i] = x_i }[/math]. In other words, show that our estimate for [math]\displaystyle{ x_i }[/math] is correct in expectation.
(b) Show that [math]\displaystyle{ \mathrm{Var}[y_i] \leq \frac{\lVert \mathbf{x} \rVert_2^2}{q} }[/math].
(c) What value of [math]\displaystyle{ q }[/math] would we need to ensure that we obtain [math]\displaystyle{ \epsilon }[/math] error with probability 9/10? How many counters do we need to store in comparison to Count-Min?
Problem 3 (Concentration of Measure I)
Let [math]\displaystyle{ P, Q }[/math] be two probability distributions on a finite set [math]\displaystyle{ \mathcal{X} }[/math], with probability mass functions [math]\displaystyle{ p(x) }[/math] and [math]\displaystyle{ q(x) }[/math]. Let [math]\displaystyle{ U_1, \dots, U_n }[/math] be i.i.d. samples from [math]\displaystyle{ P }[/math], and [math]\displaystyle{ V_1, \dots, V_n }[/math] be i.i.d. samples from [math]\displaystyle{ Q }[/math]. Define the log-likelihood ratio transforms
[math]\displaystyle{ X_i = \log \frac{p(U_i)}{q(U_i)}, \qquad Y_i = \log \frac{p(V_i)}{q(V_i)}. }[/math]
Also define
[math]\displaystyle{ D_{1/2}(P\|Q) = -2 \log \sum_{x\in \mathcal{X}} \sqrt{p(x)q(x)}. }[/math]
Show that
[math]\displaystyle{ \Pr\!\left(\sum_{i=1}^n X_i \le \sum_{i=1}^n Y_i\right) \;\le\; \exp\!\left(-n D_{1/2}(P\|Q)\right). }[/math]
Problem 4 (Concentration of Measure II)
Consider the Erdős–Rényi random graph [math]\displaystyle{ G(n, p) }[/math], where every two vertices in the graph are connected randomly and independently with probability [math]\displaystyle{ p }[/math]. We denote [math]\displaystyle{ G \sim G(n, p) }[/math] if [math]\displaystyle{ G }[/math] is generated in this way. We define [math]\displaystyle{ d := (n - 1) p }[/math] as the expected degree of the random graph.
(a) For [math]\displaystyle{ 0 \lt p_1 \lt p_2 \lt 1 }[/math], let [math]\displaystyle{ G_1 \sim G(n, p_1) }[/math] and let [math]\displaystyle{ G_2 \sim G(n, p_2) }[/math]. Let [math]\displaystyle{ \Delta(G) }[/math] be the maximum degree of the graph [math]\displaystyle{ G }[/math]. Compare [math]\displaystyle{ \mathbf{E}[\Delta(G_1)] }[/math] and [math]\displaystyle{ \mathbf{E}[\Delta(G_2)] }[/math] and prove it.
(b) Consider a random graph [math]\displaystyle{ G \sim G(n, p) }[/math] with expected degrees [math]\displaystyle{ d = O(1) }[/math]. Show that for sufficiently large [math]\displaystyle{ n }[/math], with probability at least 0.9, all vertices of [math]\displaystyle{ G }[/math] have degrees [math]\displaystyle{ O\!\big(\tfrac{\log n}{\log \log n}\big) }[/math].
(c) Consider a random graph [math]\displaystyle{ G \sim G(n, p) }[/math] with expected degrees [math]\displaystyle{ d = o(\log n) }[/math]. Show that for sufficiently large [math]\displaystyle{ n }[/math], with probability at least 0.9, there exists a vertex in [math]\displaystyle{ G }[/math] with degree at least [math]\displaystyle{ 10 d }[/math].
Problem 5 (Dimension Reduction)
[Inner product] Fix parameters [math]\displaystyle{ d \gt 0 }[/math] and [math]\displaystyle{ \delta, \epsilon \in (0,1) }[/math]. Let [math]\displaystyle{ A \in \mathbb{R}^{k \times d} }[/math] be a random matrix with [math]\displaystyle{ k = O(\log(1/\delta) / \epsilon^2) }[/math] rows, and the entries of [math]\displaystyle{ A }[/math] are chosen i.i.d. from a Gaussian distribution with mean 0 and variance [math]\displaystyle{ 1/k }[/math]. Prove that for any [math]\displaystyle{ x, y \in \mathbb{R}^d }[/math], [math]\displaystyle{ \big|x^\top y - (Ax)^\top(Ay)\big| \leq \epsilon(\lVert x \rVert_2^2 + \lVert y \rVert_2^2) }[/math] holds with probability at least [math]\displaystyle{ 1 - \delta }[/math].
[Linear Separability]
In machine learning, the goal of many classification methods is to separate data into classes using a hyperplane. A hyperplane in [math]\displaystyle{ \mathbb{R}^d }[/math] is characterized by a unit vector [math]\displaystyle{ a \in \mathbb{R}^d }[/math] ([math]\displaystyle{ \lVert a \rVert_2 = 1 }[/math]) and [math]\displaystyle{ c \in \mathbb{R} }[/math]. It contains all [math]\displaystyle{ z \in \mathbb{R}^d }[/math] such that [math]\displaystyle{ a^\top z = c }[/math].
Suppose our dataset consists of [math]\displaystyle{ n }[/math] unit vectors in [math]\displaystyle{ \mathbb{R}^d }[/math]. These points can be separated into two linearly separable sets [math]\displaystyle{ X, Y }[/math], where [math]\displaystyle{ |X| + |Y| = n }[/math]. That is, for all [math]\displaystyle{ x \in X }[/math], [math]\displaystyle{ a^\top x \gt c }[/math] and for all [math]\displaystyle{ y \in Y }[/math], [math]\displaystyle{ a^\top y \lt c }[/math] (or vice versa). Furthermore, suppose that the [math]\displaystyle{ \ell_2 }[/math] distance of each point in [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] to this separating hyperplane is at least [math]\displaystyle{ \epsilon }[/math]. When this is the case, the hyperplane is said to have margin [math]\displaystyle{ \epsilon }[/math].
- Show that [math]\displaystyle{ X, Y }[/math] can be separated by the hyperplane characterized by [math]\displaystyle{ a \in \mathbb{R}^d }[/math] ([math]\displaystyle{ \lVert a \rVert_2 = 1 }[/math]) and [math]\displaystyle{ c \in \mathbb{R} }[/math] with margin [math]\displaystyle{ \epsilon }[/math] is equivalent to the following condition: for all [math]\displaystyle{ x \in X }[/math], [math]\displaystyle{ a^\top x \geq c + \epsilon }[/math] and for all [math]\displaystyle{ y \in Y }[/math], [math]\displaystyle{ a^\top y \leq c - \epsilon }[/math] (or vice versa).
- Show that if we use a Johnson–Lindenstrauss map [math]\displaystyle{ A \in \mathbb{R}^{k \times d} }[/math] (the scaled Gaussian matrix given in the lecture) to reduce our data points to [math]\displaystyle{ O(\log n / \epsilon^2) }[/math] dimensions, then with probability at least [math]\displaystyle{ 9/10 }[/math], the dimension-reduced data can still be separated by a hyperplane with margin [math]\displaystyle{ \epsilon / 4 }[/math].
Problem 6 (Lovász Local Lemma)
Given a simple undirected graph [math]\displaystyle{ G = (V, E) }[/math] with [math]\displaystyle{ |V| = n }[/math], [math]\displaystyle{ |E| = m }[/math] and maximum degree [math]\displaystyle{ \Delta }[/math]. For any [math]\displaystyle{ v \in V }[/math], we denote [math]\displaystyle{ \Gamma_v }[/math] as the set of neighbors of [math]\displaystyle{ v }[/math] in [math]\displaystyle{ G }[/math]. Let [math]\displaystyle{ \zeta \gt 0 }[/math] and [math]\displaystyle{ k \ge 1 }[/math] be an integer. We say the graph [math]\displaystyle{ G }[/math] has a [math]\displaystyle{ (\zeta, k) }[/math]-partition if there exists a partition [math]\displaystyle{ V = U_1 \uplus U_2 \uplus \ldots \uplus U_k }[/math] such that for any [math]\displaystyle{ i \in [k] }[/math] and [math]\displaystyle{ v \in V }[/math], it holds that
[math]\displaystyle{ |\Gamma_v \cap U_i| \le \tfrac{(1 + \zeta)\Delta}{k}. }[/math]
(a) Prove that any graph [math]\displaystyle{ G = (V, E) }[/math] with maximum degree [math]\displaystyle{ \Delta \ge \Delta_0(\zeta, k) = \Omega\!\big(\tfrac{k^2}{\zeta^2} \log k\big) }[/math] has a [math]\displaystyle{ (\zeta, k) }[/math]-partition.
(b) Show that there exists a randomized algorithm that given any graph [math]\displaystyle{ G = (V, E) }[/math] satisfying the conditions above, returns a [math]\displaystyle{ (\zeta, k) }[/math]-partition in time [math]\displaystyle{ O((n + m)\log \tfrac{1}{\epsilon}) }[/math] with success probability at least [math]\displaystyle{ 1 - \epsilon }[/math].