高级算法 (Fall 2023)/Problem Set 1: Difference between revisions
Zouzongrui (talk | contribs) |
Zouzongrui (talk | contribs) |
||
Line 91: | Line 91: | ||
== Problem 5 (Dimension reduction) == | == Problem 5 (Dimension reduction) == | ||
* ['''inner product'''] Fix parameters <math>d>0, \delta,\epsilon\in(0,1)</math>. Let <math>A\in \mathbb{R}^{k\times d}</math> be a random matrix with <math>k = O(\log(1/\delta)/\epsilon)</math> rows, and entries of <math>A</math> are chosen i.i.d. from Gaussian distribution with mean <math>0</math> and variance <math>1/k</math>. Prove that for any <math>x,y\in \mathbb{R}^d</math>: <math>|x^\top y - (Ax)^\top(Ay)|\leq \epsilon(\|x\|_2^2 + \|y\|_2^2)</math> | * ['''inner product'''] Fix parameters <math>d>0, \delta,\epsilon\in(0,1)</math>. Let <math>A\in \mathbb{R}^{k\times d}</math> be a random matrix with <math>k = O(\log(1/\delta)/\epsilon)</math> rows, and entries of <math>A</math> are chosen i.i.d. from Gaussian distribution with mean <math>0</math> and variance <math>1/k</math>. Prove that for any <math>x,y\in \mathbb{R}^d</math>: <math>|x^\top y - (Ax)^\top(Ay)|\leq \epsilon(\|x\|_2^2 + \|y\|_2^2)</math> with probability <math>\geq 1-\delta</math>. | ||
with probability <math>\geq 1-\delta</math>. | |||
* ['''linear separability'''] | * ['''linear separability'''] | ||
* ['''sparse vector'''] | * ['''sparse vector'''] |
Revision as of 06:54, 23 October 2023
- 作业目前正在更新中,不是最终版
- 每道题目的解答都要有完整的解题过程,中英文不限。
- 我们推荐大家使用LaTeX, markdown等对作业进行排版。
Problem 1 (Min-cut/Max-cut)
- [Counting [math]\displaystyle{ \alpha }[/math]-approximate min-cut] For any [math]\displaystyle{ \alpha \ge 1 }[/math], a cut is called an [math]\displaystyle{ \alpha }[/math]-approximate min-cut in a multigraph [math]\displaystyle{ G }[/math] if the number of edges in it is at most [math]\displaystyle{ \alpha }[/math] times that of the min-cut. Prove that the number of [math]\displaystyle{ \alpha }[/math]-approximate min-cuts in a multigraph [math]\displaystyle{ G }[/math] is at most [math]\displaystyle{ n^{2\alpha} / 2 }[/math]. Hint: Run Karger's algorithm until it has [math]\displaystyle{ \lceil 2\alpha \rceil }[/math] supernodes. What is the chance that a particular [math]\displaystyle{ \alpha }[/math]-approximate min-cut is still available? How many possible cuts does this collapsed graph have?
- [Weighted min-cut problem] Modify the Karger's Contraction algorithm so that it works for the weighted min-cut problem. Prove that the modified algorithm returns a weighted minimum cut with probability at least [math]\displaystyle{ {2}/{n(n-1)} }[/math]. The weighted min-cut problem is defined as follows.
- Input: an undirected weighted graph [math]\displaystyle{ G(V, E) }[/math], where every edge [math]\displaystyle{ e \in E }[/math] is associated with a positive real weight [math]\displaystyle{ w_e }[/math]
- Output: a cut [math]\displaystyle{ C }[/math] in [math]\displaystyle{ G }[/math] such that [math]\displaystyle{ \sum_{e \in C} w_e }[/math] is minimized.
- [Max directed-cut] In the maximum directed cut (MAX-DICUT) problem, we are given as input a directed graph [math]\displaystyle{ G(V,E) }[/math]. Each directed arc [math]\displaystyle{ (i, j) \in E }[/math] has nonnegative weight [math]\displaystyle{ w_{ij} \ge 0 }[/math]. The goal is to partition [math]\displaystyle{ V }[/math] into disjoint sets [math]\displaystyle{ U }[/math] and [math]\displaystyle{ W=V\setminus U }[/math] so as to maximize the total weight of the arcs going from [math]\displaystyle{ U }[/math] to [math]\displaystyle{ W }[/math]. Give a randomized [math]\displaystyle{ 1/4 }[/math]-approximation algorithm for this problem.
Problem 2 (Fingerprinting)
- [Polynomial Identity Testing] Consider the function [math]\displaystyle{ f:\mathbb{R}^n\to\mathbb{R} }[/math] defined as
- [math]\displaystyle{ f(\vec x)=f(x_1,x_2,\dots,x_n)=\prod_{i=1}^{n}(a_ix_i-b_i) }[/math],
where [math]\displaystyle{ \{a_i\}_{1\le i\le n} }[/math] and [math]\displaystyle{ \{b_i\}_{1\le i\le n} }[/math] are unknown coefficients satisfy that [math]\displaystyle{ a_i, b_i\in \mathbb{Z} }[/math] and [math]\displaystyle{ 0\le a_i, b_i \le n }[/math] for all [math]\displaystyle{ 1\le i\le n }[/math].
Let [math]\displaystyle{ p\gt n }[/math] be the smallest prime strictly greater than [math]\displaystyle{ n }[/math]. The function [math]\displaystyle{ g:\mathbb{Z}_p^n\to\mathbb{Z}_p }[/math] is defined as
- [math]\displaystyle{ g(\vec x)=g(x_1,x_2,\dots,x_n)=\prod_{i=1}^{n}(a_ix_i-b_i) }[/math],
where [math]\displaystyle{ + }[/math] and [math]\displaystyle{ \cdot }[/math] are defined over the finite field [math]\displaystyle{ \mathbb{Z}_p }[/math].
By the properties of finite field, for any value [math]\displaystyle{ \vec r\in\mathbb{Z}_p^n }[/math], it holds that [math]\displaystyle{ g(\vec r)=f(\vec r)\bmod p }[/math].
Since the coefficients [math]\displaystyle{ \{a_i\}_{1\le i\le n} }[/math] and [math]\displaystyle{ \{b_i\}_{1\le i\le n} }[/math] are unknown, you can't calculate [math]\displaystyle{ f(\vec x) }[/math] directly. However, there exists an oracle [math]\displaystyle{ O }[/math], each time [math]\displaystyle{ O }[/math] gets an input [math]\displaystyle{ \vec x }[/math], it immediately outputs the value of [math]\displaystyle{ g(\vec x) }[/math].
- Prove that [math]\displaystyle{ f\not\equiv 0 \Rightarrow g\not\equiv 0 }[/math].
- Use the oracle [math]\displaystyle{ O }[/math] to design an algorithm to determine whether [math]\displaystyle{ f\equiv 0 }[/math], with error probability at most [math]\displaystyle{ \epsilon }[/math], where [math]\displaystyle{ \epsilon\in (0,1) }[/math] is a constant.
- [Test isomorphism of rooted tree] Two rooted trees [math]\displaystyle{ T_1 }[/math] and [math]\displaystyle{ T_2 }[/math] are said to be isomorphic if there exists a one to one mapping [math]\displaystyle{ f }[/math] from the nodes of [math]\displaystyle{ T_1 }[/math] to those of [math]\displaystyle{ T_2 }[/math] satisfying the following condition: [math]\displaystyle{ v }[/math] is a child of [math]\displaystyle{ w }[/math] in [math]\displaystyle{ T_1 }[/math] if and only if [math]\displaystyle{ f(v) }[/math] is a child of [math]\displaystyle{ f(w) }[/math] in [math]\displaystyle{ T_2 }[/math]. Observe that no ordering is assumed on the children of any vertex. Devise an efficient randomized algorithm for testing the isomorphism of rooted trees and analyze its performance. Hint: Recursively associate a polynomial [math]\displaystyle{ P_v }[/math] with each vertex [math]\displaystyle{ v }[/math] in a tree [math]\displaystyle{ T }[/math].
- [2D pattern matching] Consider the problem of image matching. You are given an [math]\displaystyle{ n\times n }[/math]-bit matrix as "text" and an [math]\displaystyle{ m\times m }[/math]-bit “pattern” you want to find in the text. Devise an efficient (expected time [math]\displaystyle{ O(n^2) }[/math] ) algorithm for the problem. Hint: The key is rapidly updating a fingerprint as you shift the “window” over the matrix. You may first want to consider the case of an [math]\displaystyle{ n \times m }[/math]-bit “text.” Can you transform this into a standard string-matching problem?
Problem 3 (Hashing)
- [Set differences] Fix a universe [math]\displaystyle{ U }[/math] and two subset [math]\displaystyle{ A,B \subseteq U }[/math], both with size [math]\displaystyle{ n }[/math]. we create both Bloom filters [math]\displaystyle{ F_A }[/math]([math]\displaystyle{ F_B }[/math]) for [math]\displaystyle{ A }[/math] ([math]\displaystyle{ B }[/math]), using the same number of bits [math]\displaystyle{ m }[/math] and the same [math]\displaystyle{ k }[/math] hash functions.
- Let [math]\displaystyle{ F_C = F_A \land F_B }[/math] be the Bloom filter formed by computing the bitwise AND of [math]\displaystyle{ F_A }[/math] and [math]\displaystyle{ F_B }[/math]. Argue that [math]\displaystyle{ F_C }[/math] may not always be the same as the Bloom filter that are created for [math]\displaystyle{ A\cap B }[/math].
- Bloom filters can be used to estimate set differences. Express the expected number of bits where [math]\displaystyle{ F_A }[/math] and [math]\displaystyle{ F_B }[/math] differ as a function of [math]\displaystyle{ m, n, k }[/math] and [math]\displaystyle{ |A\cap B| }[/math].
- [Count Distinct Element] In class, we saw how to estimate the number of distinct elements in a data stream using the Flajolet-Martin algorithm. Consider the following alternative formulation of the distinct elements problem: given an [math]\displaystyle{ N }[/math] dimensional vector [math]\displaystyle{ x }[/math], we want to process a stream of arbitrary increments to entries in [math]\displaystyle{ x }[/math]. In other words, if we see a number [math]\displaystyle{ i\in 1,\dots,N }[/math] in the stream, we update entry [math]\displaystyle{ x_i\gets x_i + 1 }[/math]. Our goal is to estimate [math]\displaystyle{ \left \|x\right \|_0 }[/math], which measures the number of non-zero entries in [math]\displaystyle{ x }[/math]. With [math]\displaystyle{ x }[/math] viewed as a histogram that maintains counts for [math]\displaystyle{ N }[/math] potential elements, [math]\displaystyle{ \left \|x\right \|_0 }[/math] is exactly the number of distinct elements processed. In this problem we will develop an alternative algorithm for estimating [math]\displaystyle{ \left \|x\right \|_0 }[/math] that can also handle decrements to entries in x. Specifically, instead of the stream containing just indices [math]\displaystyle{ i }[/math], it contains pairs [math]\displaystyle{ (i, +) }[/math] and [math]\displaystyle{ (i, −) }[/math]. On receiving [math]\displaystyle{ (i, +) }[/math], [math]\displaystyle{ x }[/math] should update so that [math]\displaystyle{ x_i\gets x_i + 1 }[/math] and on receiving [math]\displaystyle{ (i, −) }[/math], [math]\displaystyle{ x }[/math] should update so that [math]\displaystyle{ x_i\gets x_i - 1 }[/math]. For this problem we will assume that, at the end of our stream, each [math]\displaystyle{ x_i \ge 0 }[/math] (i.e. for a specific index we can’t receive more decrements than increments).
- Consider a simpler problem. For a given value [math]\displaystyle{ T }[/math], let’s design an algorithm that succeeds with probability [math]\displaystyle{ (1 − \delta) }[/math], outputing LOW if [math]\displaystyle{ T \lt \frac{1}{2}\left \|x\right \|_0 }[/math] and HIGH if [math]\displaystyle{ T \gt 2\left \|x\right \|_0 }[/math]:
- Assume we have access to a completely random hash function [math]\displaystyle{ h(\cdot) }[/math] that maps each [math]\displaystyle{ i }[/math] to a random point in [math]\displaystyle{ [0, 1] }[/math]. We maintain the estimator [math]\displaystyle{ s=\sum_{i:h(i)\lt \frac{1}{2T}}x_i }[/math] as we receive increment and decrement updates. Show that, at the end of our stream, (i) If [math]\displaystyle{ T \lt \frac{1}{2}\left \|x\right \|_0 }[/math], [math]\displaystyle{ \Pr_h[s=0]\lt 1/e\approx 0.37 }[/math] and (ii) If [math]\displaystyle{ T \gt 2\left \|x\right \|_0 }[/math], [math]\displaystyle{ \Pr_h[s=0]\gt 0.5 }[/math].
- Using this fact, show how to use [math]\displaystyle{ k=O(\log 1/\delta) }[/math] independent random hash functions, and corresponding individual estimators [math]\displaystyle{ s_1, s_2, . . . , s_k }[/math], to output LOW if [math]\displaystyle{ T \lt \frac{1}{2}\left \|x\right \|_0 }[/math] and HIGH if [math]\displaystyle{ T \gt 2\left \|x\right \|_0 }[/math]. If neither event occurs you can output either LOW or HIGH. Your algorithm should succeed with probability [math]\displaystyle{ (1 − \delta) }[/math].
- Using [math]\displaystyle{ O(\log N) }[/math] repetitions of your algorithm for the above decision problem (with [math]\displaystyle{ \delta }[/math] set appropriately), show how to obtain an estimate [math]\displaystyle{ F }[/math] for [math]\displaystyle{ \left \|x\right \|_0 }[/math] such that [math]\displaystyle{ \frac{1}{4}\left \|x\right \|_0\le F\le 4\left \|x\right \|_0 }[/math] w.h.p.(with probability [math]\displaystyle{ 1-O(1/N) }[/math]).
- [Rethinking bloom filter] In class, we argued that with the right setting of parameters, the probability of a false positive for an [math]\displaystyle{ m }[/math]-bit bloom filter on [math]\displaystyle{ n }[/math] items was roughly [math]\displaystyle{ (.6185)^{m/n} }[/math]. This problem will consider another approach. Divide the [math]\displaystyle{ m }[/math] bits into [math]\displaystyle{ r }[/math] "blocks" of [math]\displaystyle{ m/r }[/math] bits. Associate a (random) [math]\displaystyle{ 2 }[/math]-universal hash function with each block. For each item, hash it to one bit in each block using the block’s hash function, and set those bits to [math]\displaystyle{ 1 }[/math]. Now consider an item [math]\displaystyle{ x }[/math] not in the inserted set, and suppose we hash it into the blocks in the same way. We wish to bound the probability we get a false positive in this data structure (i.e. all bits [math]\displaystyle{ 1 }[/math]).
- Bound the probability that in a particular block, [math]\displaystyle{ x }[/math] hashes to a set bit, and from that bound derive the probability that [math]\displaystyle{ x }[/math] is a false positive.
- Determine the best choice of [math]\displaystyle{ r }[/math] to minimize the above probability as a function of [math]\displaystyle{ m }[/math] and [math]\displaystyle{ n }[/math].
Problem 4 (Concentration of measure)
- [[math]\displaystyle{ k }[/math]-th moment bound] Let [math]\displaystyle{ X }[/math] be a random variable with expectation [math]\displaystyle{ 0 }[/math] such that moment generating function [math]\displaystyle{ \mathbf{E}[\exp(t|X|)] }[/math] is finite for some [math]\displaystyle{ t \gt 0 }[/math]. We can use the following two kinds of tail inequalities for [math]\displaystyle{ X }[/math].
Chernoff Bound
- [math]\displaystyle{ \begin{align} \mathbf{Pr}[|X| \geq \delta] \leq \min_{t \geq 0} \frac{\mathbf{E}[e^{t|X|}]}{e^{t\delta}} \end{align} }[/math]
[math]\displaystyle{ k }[/math]th-Moment Bound
- [math]\displaystyle{ \begin{align} \mathbf{Pr}[|X| \geq \delta] \leq \frac{\mathbf{E}[|X|^k]}{\delta^k} \end{align} }[/math]
- Show that for each [math]\displaystyle{ \delta }[/math], there exists a choice of [math]\displaystyle{ k }[/math] such that the [math]\displaystyle{ k }[/math]th-moment bound is stronger than the Chernoff bound. Hint: Consider the Taylor expansion of the moment generating function and apply the probabilistic method.
- Why would we still prefer the Chernoff bound to the (seemingly) stronger [math]\displaystyle{ k }[/math]-th moment bound?
- [the median trick] Suppose we want to estimate the value of [math]\displaystyle{ Z }[/math]. Let [math]\displaystyle{ \mathcal{A} }[/math] be an algorithm that outputs [math]\displaystyle{ \widehat{Z} }[/math] satisfying
- [math]\displaystyle{ \Pr[ (1-\epsilon)Z \leq \widehat{Z} \leq (1+\epsilon )Z] \geq \frac{3}{4} . }[/math]
We run [math]\displaystyle{ \mathcal{A} }[/math] independently for [math]\displaystyle{ s }[/math] times, and obtain the outputs [math]\displaystyle{ \widehat{Z}_1,\widehat{Z}_2,\ldots,\widehat{Z}_s }[/math]. Let [math]\displaystyle{ X }[/math] be the median (中位数) of [math]\displaystyle{ \widehat{Z}_1,\widehat{Z}_2,\ldots,\widehat{Z}_s }[/math]. Find the number [math]\displaystyle{ s }[/math] such that
- [math]\displaystyle{ \Pr[ (1-\epsilon)Z \leq X \leq (1+\epsilon )Z] \geq 1 - \delta . }[/math]
Use Chernoff bound to express [math]\displaystyle{ s }[/math] as a function of [math]\displaystyle{ \delta }[/math]. Try to make [math]\displaystyle{ s }[/math] as small as possible.
- [cut size in random graph] Show that with probability at least [math]\displaystyle{ 2/3 }[/math], the size of the max-cut in Erdős–Rényi random graph [math]\displaystyle{ G(n,1/2) }[/math] is at most [math]\displaystyle{ n^2/8 + O(n^{1.5}) }[/math]. In the [math]\displaystyle{ G(n,1/2) }[/math] model, each edge is included in the graph with probability [math]\displaystyle{ 1/2 }[/math], independently of every other edge.
- [code rate of boolean code] A boolean code is a mapping [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math]. Each [math]\displaystyle{ x\in\{0,1\}^k }[/math] is called a message and [math]\displaystyle{ y=C(x) }[/math] is called a codeword. The code rate [math]\displaystyle{ r }[/math] of a code [math]\displaystyle{ C }[/math] is [math]\displaystyle{ r=\frac{k}{n} }[/math]. A boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] is a linear code if it is a linear transformation, i.e. there is a matrix [math]\displaystyle{ A\in\{0,1\}^{n\times k} }[/math] such that [math]\displaystyle{ C(x)=Ax }[/math] for any [math]\displaystyle{ x\in\{0,1\}^k }[/math], where the additions and multiplications are defined over the finite field of order two, [math]\displaystyle{ (\{0,1\},+_{\bmod 2},\times_{\bmod 2}) }[/math]. The distance between two codeword [math]\displaystyle{ y_1 }[/math] and [math]\displaystyle{ y_2 }[/math], denoted by [math]\displaystyle{ d(y_1,y_2) }[/math], is defined as the Hamming distance between them. Formally, [math]\displaystyle{ d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)| }[/math]. The distance of a code [math]\displaystyle{ C }[/math] is the minimum distance between any two codewords. Formally, [math]\displaystyle{ d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2)) }[/math]. Usually we want to make both the code rate [math]\displaystyle{ r }[/math] and the code distance [math]\displaystyle{ d }[/math] as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection. Use the probabilistic method to prove that there exists a boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] of code rate [math]\displaystyle{ r }[/math] and distance [math]\displaystyle{ \left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n }[/math]. Try to optimize the constant in [math]\displaystyle{ \Theta(\cdot) }[/math].
- [balls into bins with the "power of two choices"] In Balls-and-Bins model, we throw [math]\displaystyle{ m }[/math] balls independently and uniformly at random into [math]\displaystyle{ n }[/math] bins. We know that the maximum load is [math]\displaystyle{ \Theta\left(\frac{\log n}{\log\log n}\right) }[/math] with high probability when [math]\displaystyle{ m=\Theta(n) }[/math]. The two-choice paradigm is another way to throw [math]\displaystyle{ m }[/math] balls into [math]\displaystyle{ n }[/math] bins: each ball is thrown into the least loaded of two bins chosen independently and uniformly at random(it could be the case that the two chosen bins are exactly the same, and then the ball will be thrown into that bin), and breaks the tie arbitrarily. When [math]\displaystyle{ m=\Theta(n) }[/math], the maximum load of two-choice paradigm is known to be [math]\displaystyle{ \Theta(\log\log n) }[/math] with high probability, which is exponentially less than the maxim load when there is only one random choice. This phenomenon is called the power of two choices. Here are the questions:
- Consider the following paradigm: we throw [math]\displaystyle{ n }[/math] balls into [math]\displaystyle{ n }[/math] bins. The first [math]\displaystyle{ \frac{n}{2} }[/math] balls are thrown into bins independently and uniformly at random. The remaining [math]\displaystyle{ \frac{n}{2} }[/math] balls are thrown into bins using the two-choice paradigm. What is the maximum load with high probability? You need to give an asymptotically tight bound (in the form of [math]\displaystyle{ \Theta(\cdot) }[/math]).
- Replace the above paradigm to the following: the first [math]\displaystyle{ \frac{n}{2} }[/math] balls are thrown into bins using the two-choice paradigm while the remaining [math]\displaystyle{ \frac{n}{2} }[/math] balls are thrown into bins independently and uniformly at random. What is the maximum load with high probability in this case? You need to give an asymptotically tight bound.
- Replace the above paradigm to the following: assume all [math]\displaystyle{ n }[/math] balls are thrown in a sequence. For every [math]\displaystyle{ 1\le i\le n }[/math], if [math]\displaystyle{ i }[/math] is odd, we throw [math]\displaystyle{ i }[/math]-th ball into bins independently and uniformly at random, otherwise, we throw it into bins using the two-choice paradigm. What is the maximum load with high probability in this case? You need to give an asymptotically tight bound.
Problem 5 (Dimension reduction)
- [inner product] Fix parameters [math]\displaystyle{ d\gt 0, \delta,\epsilon\in(0,1) }[/math]. Let [math]\displaystyle{ A\in \mathbb{R}^{k\times d} }[/math] be a random matrix with [math]\displaystyle{ k = O(\log(1/\delta)/\epsilon) }[/math] rows, and entries of [math]\displaystyle{ A }[/math] are chosen i.i.d. from Gaussian distribution with mean [math]\displaystyle{ 0 }[/math] and variance [math]\displaystyle{ 1/k }[/math]. Prove that for any [math]\displaystyle{ x,y\in \mathbb{R}^d }[/math]: [math]\displaystyle{ |x^\top y - (Ax)^\top(Ay)|\leq \epsilon(\|x\|_2^2 + \|y\|_2^2) }[/math] with probability [math]\displaystyle{ \geq 1-\delta }[/math].
- [linear separability]
- [sparse vector]
Problem 1 (Lovász Local Lemma)
- [colorable hypergrap]
- [directed cycle]
- [algorithmic LLL]