高级算法 (Fall 2021)/Problem Set 3
- 每道题目的解答都要有完整的解题过程。中英文不限。
Problem 1
Let [math]\displaystyle{ N }[/math] be a set (the "universe") and [math]\displaystyle{ F }[/math] a function mapping subsets of [math]\displaystyle{ N }[/math] to real numbers. There are two different standard definitions of what it means for [math]\displaystyle{ F }[/math] to be submodular:
(i) For all [math]\displaystyle{ A }[/math], [math]\displaystyle{ B }[/math],
- [math]\displaystyle{ F(A\cap B)+F(A\cup B)\leq F(A)+F(B) }[/math].
(ii) For all [math]\displaystyle{ A\subseteq B }[/math], [math]\displaystyle{ j\notin B }[/math],
- [math]\displaystyle{ F(A\cup\{j\})-F(A)\geq F(B\cup\{j\})-F(B) }[/math].
Your task is to prove these definitions are equivalent:
(a) Show that (i) implies (ii).
(b) Show that (ii) implies (i).
Problem 2
A boolean code is a mapping [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math]. Each [math]\displaystyle{ x\in\{0,1\}^k }[/math] is called a message and [math]\displaystyle{ y=C(x) }[/math] is called a codeword. The code rate [math]\displaystyle{ r }[/math] of a code [math]\displaystyle{ C }[/math] is [math]\displaystyle{ r=\frac{k}{n} }[/math]. A boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] is a linear code if it is a linear transformation, i.e. there is a matrix [math]\displaystyle{ A\in\{0,1\}^{n\times k} }[/math] such that [math]\displaystyle{ C(x)=Ax }[/math] for any [math]\displaystyle{ x\in\{0,1\}^k }[/math], where the additions and multiplications are defined over the finite field of order two, [math]\displaystyle{ (\{0,1\},+_{\bmod 2},\times_{\bmod 2}) }[/math].
The distance between two codeword [math]\displaystyle{ y_1 }[/math] and [math]\displaystyle{ y_2 }[/math], denoted by [math]\displaystyle{ d(y_1,y_2) }[/math], is defined as the Hamming distance between them. Formally, [math]\displaystyle{ d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)| }[/math]. The distance of a code [math]\displaystyle{ C }[/math] is the minimum distance between any two codewords. Formally, [math]\displaystyle{ d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2)) }[/math].
Usually we want to make both the code rate [math]\displaystyle{ r }[/math] and the code distance [math]\displaystyle{ d }[/math] as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.
- Use the probabilistic method to prove that there exists a boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] of code rate [math]\displaystyle{ r }[/math] and distance [math]\displaystyle{ \left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n }[/math]. Try to optimize the constant in [math]\displaystyle{ \Theta(\cdot) }[/math].
- Prove a similar result for linear boolean codes.
Problem 3
In this problem, we will demonstrate a common algorithmic application of the Johnson-Lindenstrauss Theorem (JLT). Specifically, we will consider the [math]\displaystyle{ k }[/math]-means clustering problem, which is widely used in areas like machine learning and data mining.
In the [math]\displaystyle{ k }[/math]-means clustering problem, we are given [math]\displaystyle{ n }[/math] points [math]\displaystyle{ \mathbf{x}_1,\cdots,\mathbf x_n\in\mathbb{R}^d }[/math], and the goal is to partition these points into [math]\displaystyle{ k }[/math] disjoint sets (or, cluster) [math]\displaystyle{ C_1,\cdots,C_k }[/math] so as to minimize the cost:
- [math]\displaystyle{ \texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k)=\sum_{i=1}^{k}\sum_{j\in C_i}\lVert\mathbf{x}_j-\mathbf{\mu}_i\rVert^2_2 }[/math]
Here, [math]\displaystyle{ \mathbf{\mu}_i=(1/|C_i|)\cdot\sum_{j\in C_i}\mathbf{x}_j }[/math] is the mean of the points in cluster [math]\displaystyle{ C_i }[/math]. In other words, we want to find [math]\displaystyle{ k }[/math] clusters that have as little internal variance as possible.
(a) Prove the following equality:
- [math]\displaystyle{ \texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k)=\sum_{i=1}^{k}\frac{1}{|C_i|}\sum_{j,l\in C_i,j\lt l}\lVert\mathbf{x}_j-\mathbf{x}_l\rVert^2_2 }[/math]
(b) Suppose we use the Johnson-Lindenstrauss method taught in class to embed [math]\displaystyle{ \mathbf{x}_1,\cdots,\mathbf{x}_n }[/math] into [math]\displaystyle{ O(\log{n}/\epsilon^2) }[/math]-dimensional points [math]\displaystyle{ \mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n }[/math]. Show that with high probability in [math]\displaystyle{ n }[/math], the following inequality holds for all clusters:
- [math]\displaystyle{ (1-\epsilon)\texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k) \leq\texttt{cost}(\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n,C_1,\cdots,C_k) \leq(1+\epsilon)\texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k) }[/math]
(c) Suppose we know a set of clusters (i.e., a partition of the [math]\displaystyle{ n }[/math] points) [math]\displaystyle{ \tilde{C}_1,\cdots,\tilde{C}_k }[/math] such that:
- [math]\displaystyle{ \texttt{cost}(\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n,\tilde{C}_1,\cdots,\tilde{C}_k)\leq\gamma\cdot\texttt{cost}(\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n,\tilde{C}^*_1,\cdots,\tilde{C}^*_k) }[/math]
where [math]\displaystyle{ \tilde{C}^*_1,\cdots,\tilde{C}^*_k }[/math] is the optimal partition for points [math]\displaystyle{ \mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n }[/math]. That is, assume we know a [math]\displaystyle{ \gamma }[/math] approximation to the optimal clustering for the dimensionality reduced points. Show that, when [math]\displaystyle{ \epsilon\leq 1/2 }[/math],
- [math]\displaystyle{ \texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,\tilde{C}_1,\cdots,\tilde{C}_k)\leq(1+O(\epsilon))\cdot\gamma\cdot\texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C^*_1,\cdots,C^*_k) }[/math]
where [math]\displaystyle{ C^*_1,\cdots,C^*_k }[/math] is the optimal partition for [math]\displaystyle{ \mathbf{x}_1,\cdots,\mathbf{x}_n }[/math].
In other words, we can compute an approximately optimal clustering for high dimensional original data using just the dimensionality reduced data. In many cases, this approach will speed up algorithms for solving [math]\displaystyle{ k }[/math]-means clustering.
Problem 4
The following is the weighted version of set cover problem:
Given [math]\displaystyle{ m }[/math] subsets [math]\displaystyle{ S_1,S_2,\ldots,S_m\subseteq U }[/math], where [math]\displaystyle{ U }[/math] is a universe of size [math]\displaystyle{ n=|U| }[/math], and each subset [math]\displaystyle{ S_i }[/math] is assigned a positive weight [math]\displaystyle{ w_i\gt 0 }[/math], the goal is to find a [math]\displaystyle{ C\subseteq\{1,2,\ldots,m\} }[/math] such that [math]\displaystyle{ U=\bigcup_{i\in C}S_i }[/math] and the total weight [math]\displaystyle{ \sum_{I\in C}w_i }[/math] is minimized.
- Give an integer program for the problem and its LP relaxation.
- Consider the following idea of randomized rounding: independently round each fractional value to [math]\displaystyle{ \{0,1\} }[/math] with the probability of the fractional value itself; and repeatedly apply this process to the variables rounded to 0 in previous iterations until [math]\displaystyle{ U }[/math] is fully covered. Show that this can return a set cover with [math]\displaystyle{ O(\log n) }[/math] approximation ratio with probability at least [math]\displaystyle{ 0.99 }[/math].
Problem 5
In the maximum directed cut (MAX-DICUT) problem, we are given as input a directed graph [math]\displaystyle{ G(V,E) }[/math]. The goal is to partition [math]\displaystyle{ V }[/math] into disjoint [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math] so that the number of edges in [math]\displaystyle{ E(S,T)=\{(u,v)\in E\mid u\in S, v\in T\} }[/math] is maximized. The following is the integer program for MAX-DICUT:
- [math]\displaystyle{ \begin{align} \text{maximize} &&& \sum_{(u,v)\in E}y_{u,v}\\ \text{subject to} && y_{u,v} &\le x_u, & \forall (u,v)&\in E,\\ && y_{u,v} &\le 1-x_v, & \forall (u,v)&\in E,\\ && x_v &\in\{0,1\}, & \forall v&\in V,\\ && y_{u,v} &\in\{0,1\}, & \forall (u,v)&\in E. \end{align} }[/math]
Let [math]\displaystyle{ x_v^*,y_{u,v}^* }[/math] denote the optimal solution to the LP-relaxation of the above integer program.
- Apply the randomized rounding such that for every [math]\displaystyle{ v\in V }[/math], [math]\displaystyle{ \hat{x}_v=1 }[/math] independently with probability [math]\displaystyle{ x_v^* }[/math]. Analyze the approximation ratio (between the expected size of the random cut and OPT).
- Apply another randomized rounding such that for every [math]\displaystyle{ v\in V }[/math], [math]\displaystyle{ \hat{x}_v=1 }[/math] independently with probability [math]\displaystyle{ 1/4+x_v^*/2 }[/math]. Analyze the approximation ratio for this algorithm.