高级算法 (Fall 2023)/Problem Set 1: Difference between revisions
Zouzongrui (talk | contribs) |
Zouzongrui (talk | contribs) |
||
Line 82: | Line 82: | ||
Use Chernoff bound to express <math>s</math> as a function of <math>\delta</math>. Try to make <math>s</math> as small as possible. | Use Chernoff bound to express <math>s</math> as a function of <math>\delta</math>. Try to make <math>s</math> as small as possible. | ||
* ['''cut size in random graph'''] Show that with probability at least <math>2/3</math>, the size of the max-cut in [[wikipedia:Erdős–Rényi_model#Definition|Erdős–Rényi random graph]] <math>G(n,1/2)</math> is at most <math>n^2/8 + O(n^{1.5})</math>. In the <math>G(n,1/2)</math> model, each edge is included in the graph with probability <math>1/2</math>, independently of every other edge. | * ['''cut size in random graph'''] Show that with probability at least <math>2/3</math>, the size of the max-cut in [[wikipedia:Erdős–Rényi_model#Definition|Erdős–Rényi random graph]] <math>G(n,1/2)</math> is at most <math>n^2/8 + O(n^{1.5})</math>. In the <math>G(n,1/2)</math> model, each edge is included in the graph with probability <math>1/2</math>, independently of every other edge. | ||
* ['''code rate of boolean code'''] A '''boolean code''' is a mapping <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math>. Each <math>x\in\{0,1\}^k</math> is called a '''message''' and <math>y=C(x)</math> is called a '''codeword'''. The '''code rate''' <math>r</math> of a code <math>C</math> is <math>r=\frac{k}{n}</math>. A boolean code <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math> is a '''linear code''' if it is a linear transformation, i.e. there is a matrix <math>A\in\{0,1\}^{n\times k}</math> such that <math>C(x)=Ax</math> for any <math>x\in\{0,1\}^k</math>, where the additions and multiplications are defined over the finite field of order two, <math>(\{0,1\},+_{\bmod 2},\times_{\bmod 2})</math>. The '''distance''' between two codeword <math>y_1</math> and <math>y_2</math>, denoted by <math>d(y_1,y_2)</math>, is defined as the Hamming distance between them. Formally, <math>d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)|</math>. The distance of a code <math>C</math> is the minimum distance between any two codewords. Formally, <math>d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2))</math>. | * ['''code rate of boolean code'''] A '''boolean code''' is a mapping <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math>. Each <math>x\in\{0,1\}^k</math> is called a '''message''' and <math>y=C(x)</math> is called a '''codeword'''. The '''code rate''' <math>r</math> of a code <math>C</math> is <math>r=\frac{k}{n}</math>. A boolean code <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math> is a '''linear code''' if it is a linear transformation, i.e. there is a matrix <math>A\in\{0,1\}^{n\times k}</math> such that <math>C(x)=Ax</math> for any <math>x\in\{0,1\}^k</math>, where the additions and multiplications are defined over the finite field of order two, <math>(\{0,1\},+_{\bmod 2},\times_{\bmod 2})</math>. The '''distance''' between two codeword <math>y_1</math> and <math>y_2</math>, denoted by <math>d(y_1,y_2)</math>, is defined as the Hamming distance between them. Formally, <math>d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)|</math>. The distance of a code <math>C</math> is the minimum distance between any two codewords. Formally, <math>d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2))</math>. Usually we want to make both the code rate <math>r</math> and the code distance <math>d</math> as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection. Use the probabilistic method to prove that there exists a boolean code <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math> of code rate <math>r</math> and distance <math>\left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n</math>. Try to optimize the constant in <math>\Theta(\cdot)</math>. | ||
Usually we want to make both the code rate <math>r</math> and the code distance <math>d</math> as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection. Use the probabilistic method to prove that there exists a boolean code <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math> of code rate <math>r</math> and distance <math>\left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n</math>. Try to optimize the constant in <math>\Theta(\cdot)</math>. | |||
* ['''balls into bins with the "power of two choices"'''] | * ['''balls into bins with the "power of two choices"'''] | ||
Revision as of 06:49, 23 October 2023
- 作业目前正在更新中,不是最终版
- 每道题目的解答都要有完整的解题过程,中英文不限。
- 我们推荐大家使用LaTeX, markdown等对作业进行排版。
Problem 1 (Min-cut/Max-cut)
- [Counting [math]\displaystyle{ \alpha }[/math]-approximate min-cut] For any [math]\displaystyle{ \alpha \ge 1 }[/math], a cut is called an [math]\displaystyle{ \alpha }[/math]-approximate min-cut in a multigraph [math]\displaystyle{ G }[/math] if the number of edges in it is at most [math]\displaystyle{ \alpha }[/math] times that of the min-cut. Prove that the number of [math]\displaystyle{ \alpha }[/math]-approximate min-cuts in a multigraph [math]\displaystyle{ G }[/math] is at most [math]\displaystyle{ n^{2\alpha} / 2 }[/math]. Hint: Run Karger's algorithm until it has [math]\displaystyle{ \lceil 2\alpha \rceil }[/math] supernodes. What is the chance that a particular [math]\displaystyle{ \alpha }[/math]-approximate min-cut is still available? How many possible cuts does this collapsed graph have?
- [Weighted min-cut problem] Modify the Karger's Contraction algorithm so that it works for the weighted min-cut problem. Prove that the modified algorithm returns a weighted minimum cut with probability at least [math]\displaystyle{ {2}/{n(n-1)} }[/math]. The weighted min-cut problem is defined as follows.
- Input: an undirected weighted graph [math]\displaystyle{ G(V, E) }[/math], where every edge [math]\displaystyle{ e \in E }[/math] is associated with a positive real weight [math]\displaystyle{ w_e }[/math]
- Output: a cut [math]\displaystyle{ C }[/math] in [math]\displaystyle{ G }[/math] such that [math]\displaystyle{ \sum_{e \in C} w_e }[/math] is minimized.
- [Max directed-cut] In the maximum directed cut (MAX-DICUT) problem, we are given as input a directed graph [math]\displaystyle{ G(V,E) }[/math]. Each directed arc [math]\displaystyle{ (i, j) \in E }[/math] has nonnegative weight [math]\displaystyle{ w_{ij} \ge 0 }[/math]. The goal is to partition [math]\displaystyle{ V }[/math] into disjoint sets [math]\displaystyle{ U }[/math] and [math]\displaystyle{ W=V\setminus U }[/math] so as to maximize the total weight of the arcs going from [math]\displaystyle{ U }[/math] to [math]\displaystyle{ W }[/math]. Give a randomized [math]\displaystyle{ 1/4 }[/math]-approximation algorithm for this problem.
Problem 2 (Fingerprinting)
- [Polynomial Identity Testing] Consider the function [math]\displaystyle{ f:\mathbb{R}^n\to\mathbb{R} }[/math] defined as
- [math]\displaystyle{ f(\vec x)=f(x_1,x_2,\dots,x_n)=\prod_{i=1}^{n}(a_ix_i-b_i) }[/math],
where [math]\displaystyle{ \{a_i\}_{1\le i\le n} }[/math] and [math]\displaystyle{ \{b_i\}_{1\le i\le n} }[/math] are unknown coefficients satisfy that [math]\displaystyle{ a_i, b_i\in \mathbb{Z} }[/math] and [math]\displaystyle{ 0\le a_i, b_i \le n }[/math] for all [math]\displaystyle{ 1\le i\le n }[/math].
Let [math]\displaystyle{ p\gt n }[/math] be the smallest prime strictly greater than [math]\displaystyle{ n }[/math]. The function [math]\displaystyle{ g:\mathbb{Z}_p^n\to\mathbb{Z}_p }[/math] is defined as
- [math]\displaystyle{ g(\vec x)=g(x_1,x_2,\dots,x_n)=\prod_{i=1}^{n}(a_ix_i-b_i) }[/math],
where [math]\displaystyle{ + }[/math] and [math]\displaystyle{ \cdot }[/math] are defined over the finite field [math]\displaystyle{ \mathbb{Z}_p }[/math].
By the properties of finite field, for any value [math]\displaystyle{ \vec r\in\mathbb{Z}_p^n }[/math], it holds that [math]\displaystyle{ g(\vec r)=f(\vec r)\bmod p }[/math].
Since the coefficients [math]\displaystyle{ \{a_i\}_{1\le i\le n} }[/math] and [math]\displaystyle{ \{b_i\}_{1\le i\le n} }[/math] are unknown, you can't calculate [math]\displaystyle{ f(\vec x) }[/math] directly. However, there exists an oracle [math]\displaystyle{ O }[/math], each time [math]\displaystyle{ O }[/math] gets an input [math]\displaystyle{ \vec x }[/math], it immediately outputs the value of [math]\displaystyle{ g(\vec x) }[/math].
- Prove that [math]\displaystyle{ f\not\equiv 0 \Rightarrow g\not\equiv 0 }[/math].
- Use the oracle [math]\displaystyle{ O }[/math] to design an algorithm to determine whether [math]\displaystyle{ f\equiv 0 }[/math], with error probability at most [math]\displaystyle{ \epsilon }[/math], where [math]\displaystyle{ \epsilon\in (0,1) }[/math] is a constant.
- [Test isomorphism of rooted tree] Two rooted trees [math]\displaystyle{ T_1 }[/math] and [math]\displaystyle{ T_2 }[/math] are said to be isomorphic if there exists a one to one mapping [math]\displaystyle{ f }[/math] from the nodes of [math]\displaystyle{ T_1 }[/math] to those of [math]\displaystyle{ T_2 }[/math] satisfying the following condition: [math]\displaystyle{ v }[/math] is a child of [math]\displaystyle{ w }[/math] in [math]\displaystyle{ T_1 }[/math] if and only if [math]\displaystyle{ f(v) }[/math] is a child of [math]\displaystyle{ f(w) }[/math] in [math]\displaystyle{ T_2 }[/math]. Observe that no ordering is assumed on the children of any vertex. Devise an efficient randomized algorithm for testing the isomorphism of rooted trees and analyze its performance. Hint: Recursively associate a polynomial [math]\displaystyle{ P_v }[/math] with each vertex [math]\displaystyle{ v }[/math] in a tree [math]\displaystyle{ T }[/math].
- [2D pattern matching] Consider the problem of image matching. You are given an [math]\displaystyle{ n\times n }[/math]-bit matrix as "text" and an [math]\displaystyle{ m\times m }[/math]-bit “pattern” you want to find in the text. Devise an efficient (expected time [math]\displaystyle{ O(n^2) }[/math] ) algorithm for the problem. Hint: The key is rapidly updating a fingerprint as you shift the “window” over the matrix. You may first want to consider the case of an [math]\displaystyle{ n \times m }[/math]-bit “text.” Can you transform this into a standard string-matching problem?
Problem 3 (Hashing)
- [Set differences] Fix a universe [math]\displaystyle{ U }[/math] and two subset [math]\displaystyle{ A,B \subseteq U }[/math], both with size [math]\displaystyle{ n }[/math]. we create both Bloom filters [math]\displaystyle{ F_A }[/math]([math]\displaystyle{ F_B }[/math]) for [math]\displaystyle{ A }[/math] ([math]\displaystyle{ B }[/math]), using the same number of bits [math]\displaystyle{ m }[/math] and the same [math]\displaystyle{ k }[/math] hash functions.
- Let [math]\displaystyle{ F_C = F_A \land F_B }[/math] be the Bloom filter formed by computing the bitwise AND of [math]\displaystyle{ F_A }[/math] and [math]\displaystyle{ F_B }[/math]. Argue that [math]\displaystyle{ F_C }[/math] may not always be the same as the Bloom filter that are created for [math]\displaystyle{ A\cap B }[/math].
- Bloom filters can be used to estimate set differences. Express the expected number of bits where [math]\displaystyle{ F_A }[/math] and [math]\displaystyle{ F_B }[/math] differ as a function of [math]\displaystyle{ m, n, k }[/math] and [math]\displaystyle{ |A\cap B| }[/math].
- [Count Distinct Element] In class, we saw how to estimate the number of distinct elements in a data stream using the Flajolet-Martin algorithm. Consider the following alternative formulation of the distinct elements problem: given an [math]\displaystyle{ N }[/math] dimensional vector [math]\displaystyle{ x }[/math], we want to process a stream of arbitrary increments to entries in [math]\displaystyle{ x }[/math]. In other words, if we see a number [math]\displaystyle{ i\in 1,\dots,N }[/math] in the stream, we update entry [math]\displaystyle{ x_i\gets x_i + 1 }[/math]. Our goal is to estimate [math]\displaystyle{ \left \|x\right \|_0 }[/math], which measures the number of non-zero entries in [math]\displaystyle{ x }[/math]. With [math]\displaystyle{ x }[/math] viewed as a histogram that maintains counts for [math]\displaystyle{ N }[/math] potential elements, [math]\displaystyle{ \left \|x\right \|_0 }[/math] is exactly the number of distinct elements processed. In this problem we will develop an alternative algorithm for estimating [math]\displaystyle{ \left \|x\right \|_0 }[/math] that can also handle decrements to entries in x. Specifically, instead of the stream containing just indices [math]\displaystyle{ i }[/math], it contains pairs [math]\displaystyle{ (i, +) }[/math] and [math]\displaystyle{ (i, −) }[/math]. On receiving [math]\displaystyle{ (i, +) }[/math], [math]\displaystyle{ x }[/math] should update so that [math]\displaystyle{ x_i\gets x_i + 1 }[/math] and on receiving [math]\displaystyle{ (i, −) }[/math], [math]\displaystyle{ x }[/math] should update so that [math]\displaystyle{ x_i\gets x_i - 1 }[/math]. For this problem we will assume that, at the end of our stream, each [math]\displaystyle{ x_i \ge 0 }[/math] (i.e. for a specific index we can’t receive more decrements than increments).
- Consider a simpler problem. For a given value [math]\displaystyle{ T }[/math], let’s design an algorithm that succeeds with probability [math]\displaystyle{ (1 − \delta) }[/math], outputing LOW if [math]\displaystyle{ T \lt \frac{1}{2}\left \|x\right \|_0 }[/math] and HIGH if [math]\displaystyle{ T \gt 2\left \|x\right \|_0 }[/math]:
- Assume we have access to a completely random hash function [math]\displaystyle{ h(\cdot) }[/math] that maps each [math]\displaystyle{ i }[/math] to a random point in [math]\displaystyle{ [0, 1] }[/math]. We maintain the estimator [math]\displaystyle{ s=\sum_{i:h(i)\lt \frac{1}{2T}}x_i }[/math] as we receive increment and decrement updates. Show that, at the end of our stream, (i) If [math]\displaystyle{ T \lt \frac{1}{2}\left \|x\right \|_0 }[/math], [math]\displaystyle{ \Pr_h[s=0]\lt 1/e\approx 0.37 }[/math] and (ii) If [math]\displaystyle{ T \gt 2\left \|x\right \|_0 }[/math], [math]\displaystyle{ \Pr_h[s=0]\gt 0.5 }[/math].
- Using this fact, show how to use [math]\displaystyle{ k=O(\log 1/\delta) }[/math] independent random hash functions, and corresponding individual estimators [math]\displaystyle{ s_1, s_2, . . . , s_k }[/math], to output LOW if [math]\displaystyle{ T \lt \frac{1}{2}\left \|x\right \|_0 }[/math] and HIGH if [math]\displaystyle{ T \gt 2\left \|x\right \|_0 }[/math]. If neither event occurs you can output either LOW or HIGH. Your algorithm should succeed with probability [math]\displaystyle{ (1 − \delta) }[/math].
- Using [math]\displaystyle{ O(\log N) }[/math] repetitions of your algorithm for the above decision problem (with [math]\displaystyle{ \delta }[/math] set appropriately), show how to obtain an estimate [math]\displaystyle{ F }[/math] for [math]\displaystyle{ \left \|x\right \|_0 }[/math] such that [math]\displaystyle{ \frac{1}{4}\left \|x\right \|_0\le F\le 4\left \|x\right \|_0 }[/math] w.h.p.(with probability [math]\displaystyle{ 1-O(1/N) }[/math]).
- [Rethinking bloom filter] In class, we argued that with the right setting of parameters, the probability of a false positive for an [math]\displaystyle{ m }[/math]-bit bloom filter on [math]\displaystyle{ n }[/math] items was roughly [math]\displaystyle{ (.6185)^{m/n} }[/math]. This problem will consider another approach. Divide the [math]\displaystyle{ m }[/math] bits into [math]\displaystyle{ r }[/math] "blocks" of [math]\displaystyle{ m/r }[/math] bits. Associate a (random) [math]\displaystyle{ 2 }[/math]-universal hash function with each block. For each item, hash it to one bit in each block using the block’s hash function, and set those bits to [math]\displaystyle{ 1 }[/math]. Now consider an item [math]\displaystyle{ x }[/math] not in the inserted set, and suppose we hash it into the blocks in the same way. We wish to bound the probability we get a false positive in this data structure (i.e. all bits [math]\displaystyle{ 1 }[/math]).
- Bound the probability that in a particular block, [math]\displaystyle{ x }[/math] hashes to a set bit, and from that bound derive the probability that [math]\displaystyle{ x }[/math] is a false positive.
- Determine the best choice of [math]\displaystyle{ r }[/math] to minimize the above probability as a function of [math]\displaystyle{ m }[/math] and [math]\displaystyle{ n }[/math].
Problem 4 (Concentration of measure)
- [[math]\displaystyle{ k }[/math]-th moment bound] Let [math]\displaystyle{ X }[/math] be a random variable with expectation [math]\displaystyle{ 0 }[/math] such that moment generating function [math]\displaystyle{ \mathbf{E}[\exp(t|X|)] }[/math] is finite for some [math]\displaystyle{ t \gt 0 }[/math]. We can use the following two kinds of tail inequalities for [math]\displaystyle{ X }[/math].
Chernoff Bound
- [math]\displaystyle{ \begin{align} \mathbf{Pr}[|X| \geq \delta] \leq \min_{t \geq 0} \frac{\mathbf{E}[e^{t|X|}]}{e^{t\delta}} \end{align} }[/math]
[math]\displaystyle{ k }[/math]th-Moment Bound
- [math]\displaystyle{ \begin{align} \mathbf{Pr}[|X| \geq \delta] \leq \frac{\mathbf{E}[|X|^k]}{\delta^k} \end{align} }[/math]
- Show that for each [math]\displaystyle{ \delta }[/math], there exists a choice of [math]\displaystyle{ k }[/math] such that the [math]\displaystyle{ k }[/math]th-moment bound is stronger than the Chernoff bound. Hint: Consider the Taylor expansion of the moment generating function and apply the probabilistic method.
- Why would we still prefer the Chernoff bound to the (seemingly) stronger [math]\displaystyle{ k }[/math]-th moment bound?
- [the median trick] Suppose we want to estimate the value of [math]\displaystyle{ Z }[/math]. Let [math]\displaystyle{ \mathcal{A} }[/math] be an algorithm that outputs [math]\displaystyle{ \widehat{Z} }[/math] satisfying
- [math]\displaystyle{ \Pr[ (1-\epsilon)Z \leq \widehat{Z} \leq (1+\epsilon )Z] \geq \frac{3}{4} . }[/math]
We run [math]\displaystyle{ \mathcal{A} }[/math] independently for [math]\displaystyle{ s }[/math] times, and obtain the outputs [math]\displaystyle{ \widehat{Z}_1,\widehat{Z}_2,\ldots,\widehat{Z}_s }[/math]. Let [math]\displaystyle{ X }[/math] be the median (中位数) of [math]\displaystyle{ \widehat{Z}_1,\widehat{Z}_2,\ldots,\widehat{Z}_s }[/math]. Find the number [math]\displaystyle{ s }[/math] such that
- [math]\displaystyle{ \Pr[ (1-\epsilon)Z \leq X \leq (1+\epsilon )Z] \geq 1 - \delta . }[/math]
Use Chernoff bound to express [math]\displaystyle{ s }[/math] as a function of [math]\displaystyle{ \delta }[/math]. Try to make [math]\displaystyle{ s }[/math] as small as possible.
- [cut size in random graph] Show that with probability at least [math]\displaystyle{ 2/3 }[/math], the size of the max-cut in Erdős–Rényi random graph [math]\displaystyle{ G(n,1/2) }[/math] is at most [math]\displaystyle{ n^2/8 + O(n^{1.5}) }[/math]. In the [math]\displaystyle{ G(n,1/2) }[/math] model, each edge is included in the graph with probability [math]\displaystyle{ 1/2 }[/math], independently of every other edge.
- [code rate of boolean code] A boolean code is a mapping [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math]. Each [math]\displaystyle{ x\in\{0,1\}^k }[/math] is called a message and [math]\displaystyle{ y=C(x) }[/math] is called a codeword. The code rate [math]\displaystyle{ r }[/math] of a code [math]\displaystyle{ C }[/math] is [math]\displaystyle{ r=\frac{k}{n} }[/math]. A boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] is a linear code if it is a linear transformation, i.e. there is a matrix [math]\displaystyle{ A\in\{0,1\}^{n\times k} }[/math] such that [math]\displaystyle{ C(x)=Ax }[/math] for any [math]\displaystyle{ x\in\{0,1\}^k }[/math], where the additions and multiplications are defined over the finite field of order two, [math]\displaystyle{ (\{0,1\},+_{\bmod 2},\times_{\bmod 2}) }[/math]. The distance between two codeword [math]\displaystyle{ y_1 }[/math] and [math]\displaystyle{ y_2 }[/math], denoted by [math]\displaystyle{ d(y_1,y_2) }[/math], is defined as the Hamming distance between them. Formally, [math]\displaystyle{ d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)| }[/math]. The distance of a code [math]\displaystyle{ C }[/math] is the minimum distance between any two codewords. Formally, [math]\displaystyle{ d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2)) }[/math]. Usually we want to make both the code rate [math]\displaystyle{ r }[/math] and the code distance [math]\displaystyle{ d }[/math] as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection. Use the probabilistic method to prove that there exists a boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] of code rate [math]\displaystyle{ r }[/math] and distance [math]\displaystyle{ \left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n }[/math]. Try to optimize the constant in [math]\displaystyle{ \Theta(\cdot) }[/math].
- [balls into bins with the "power of two choices"]
Problem 5 (Dimension reduction)
- [inner product]
- [linear separability]
- [sparse vector]
Problem 1 (Lovász Local Lemma)
- [colorable hypergrap]
- [directed cycle]
- [algorithmic LLL]