高级算法 (Fall 2021)/Greedy and Local Search and 高级算法 (Fall 2021)/Problem Set 3: Difference between pages

From TCS Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Etone
 
imported>TCSseminar
No edit summary
 
Line 1: Line 1:
= Set cover =
*每道题目的解答都要有<font color="red" size=5>完整的解题过程</font>。中英文不限。
Given <math>m</math> subsets <math>S_1,S_2,\ldots,S_m\subseteq U</math> of a universe <math>U</math> of size <math>n=|U|</math>, a <math>C\subseteq\{1,2,\ldots,m\}</math> forms a '''set cover''' if <math>U=\bigcup_{i\in\mathcal{C}}S_i</math>, that is, <math>C</math> is a sub-collection of sets whose union "covers" all elements in the universe.


Without loss of generality, we always assume that the universe is <math>U=\bigcup_{i=1}^mS_i</math>.
== Problem 1 ==
In this problem, we will demonstrate a common algorithmic application of the Johnson-Lindenstrauss Theorem (JLT). Specifically, we will consider the ''<math>k</math>-means clustering'' problem, which is widely used in areas like machine learning and data mining.


This defines an important optimization problem:
In the <math>k</math>-means clustering problem, we are given <math>n</math> points <math>\mathbf{x}_1,\cdots,\mathbf x_n\in\mathbb{R}^d</math>, and the goal is to partition these points into <math>k</math> disjoint sets (or, cluster) <math>C_1,\cdots,C_k</math> so as to minimize the cost:
{{Theorem|Set Cover Problem|
*'''Input''': <math>m</math> subsets <math>S_1,S_2,\ldots,S_m\subseteq U</math> of a universe <math>U</math> of size <math>n</math>;
*'''Output''': the smallest <math>C\subseteq\{1,2,\ldots,m\}</math> such that <math>U=\bigcup_{i\in C}S_i</math>.
}}


We can think of each instance as a bipartite graph <math>G(U,\{S_1,S_2,\ldots,S_n\}, E)</math> with <math>n</math> vertices on the right side, each corresponding to an element <math>x\in U</math>, <math>m</math> vertices on the left side, each corresponding to one of the <math>m</math> subsets <math>S_1,S_2,\ldots,S_m</math>, and there is a bipartite edge connecting <math>x</math> with <math>S_i</math> if and only if <math>x\in S_i</math>. By this translation the set cover problem is precisely the problem of given as input a bipartite graph <math>G(U,V,E)</math>, to find the smallest subset <math>C\subseteq V</math> of vertices on the right side to "cover" all vertices on the left side, such that every vertex on the left side <math>x\in U</math> is incident to some vertex in <math>C</math>.
:<math>\texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k)=\sum_{i=1}^{k}\sum_{j\in C_i}\lVert\mathbf{x}_j-\mathbf{\mu}_i\rVert^2_2</math>


By alternating the roles of sets and elements in the above interpretation of set cover instances as bipartite graphs, the set cover problem can be translated to the following equivalent hitting set problem:
Here, <math>\mathbf{\mu}_i=(1/|C_i|)\cdot\sum_{j\in C_i}\mathbf{x}_j</math> is the mean of the points in cluster <math>C_i</math>. In other words, we want to find <math>k</math> clusters that have as little internal variance as possible.
{{Theorem|Hitting Set Problem|
*'''Input''': <math>n</math> subsets <math>S_1,S_2,\ldots,S_n\subseteq U</math> of a universe <math>U</math> of size <math>m</math>;
*'''Output''': the smallest subset <math>C\subseteq U</math> of elements such that <math>C</math> intersects with every set <math>S_i</math> for <math>1\le i\le n</math>.
}}


== Frequency and Vertex Cover==
'''(a)''' Prove the following equality:
Given an instance of set cover problem <math>S_1,S_2,\ldots,S_m\subseteq U</math>, for every element <math>x\in U</math>, its '''frequency''', denoted as <math>frequency(x)</math>, is defined as the number of sets containing <math>X</math>. Formally,
:<math>frequency(x)=|\{i\mid x\in S_i\}|</math>.


In the hitting set version, the frequency should be defined for each set: for a set <math>S_i</math> its frequency <math>frequency(S_i)=|S_i|</math> is just the size of the set <math>S_i</math>.
:<math>\texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k)=\sum_{i=1}^{k}\frac{1}{|C_i|}\sum_{j,l\in C_i,j<l}\lVert\mathbf{x}_j-\mathbf{x}_l\rVert^2_2</math>


The set cover problem restricted to the instances with frequency 2 becomes the vertex cover problem.
'''(b)''' Suppose we use the Johnson-Lindenstrauss method taught in class to embed <math>\mathbf{x}_1,\cdots,\mathbf{x}_n</math> into <math>O(\log{n}/\epsilon^2)</math>-dimensional points <math>\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n</math>. Show that with high probability in <math>n</math>, the following inequality holds for all clusters:
 
Given an undirected graph <math>G(U,V)</math>, a '''vertex cover''' is a subset <math>C\subseteq V</math> of vertices such that every edge <math>uv\in E</math> has at least one endpoint in <math>C</math>.
{{Theorem|Vertex Cover Problem|
*'''Input''': an undirected graph <math>G(V,E)</math>
*'''Output''': the smallest <math>C\subseteq V</math> such that every edge <math>e\in E</math> is incident to at least one vertex in <math>C</math>.
}}
It is easy to compare with the hitting set problem:
*For graph <math>G(V,E)</math>, its edges <math>e_1,e_2,\ldots,e_n\subseteq V</math> are vertex-sets of size 2.
*A subset <math>C\subseteq V</math> of vertices is a vertex cover if and only if it is a hitting sets for <math>e_1,e_2,\ldots,e_n</math>, i.e. every <math>e_i</math> intersects with <math>C</math>.
Therefore vertex cover is just set cover with frequency 2.
 
The vertex cover problem is '''NP-hard'''. Its decision version is among [https://en.wikipedia.org/wiki/Karp%27s_21_NP-complete_problems Karp's 21 '''NP-complete''' problems]. Since vertex cover is a special case of set cover, the set cover problem is also '''NP-hard'''.
 
== Greedy Algorithm==
We present our algorithms in the original set cover setting (instead of the hitting set version).
 
A natural algorithm is the greedy algorithm: sequentially add such <math>i</math> to the cover <math>C</math>, where each <math>S_i</math> covers the largest number of ''currently uncovered'' elements, until no element is left uncovered.
 
{{Theorem|''GreedyCover''|
:'''Input:''' sets <math>S_1,S_2,\ldots,S_m</math>;
----
:initially, <math>U=\bigcup_{i=1}^mS_i</math>, and <math>C=\emptyset</math>;
:while <math>U\neq\emptyset</math> do
::find <math>i\in\{1,2,\ldots, m\}</math> with the largest <math>|S_i\cap U|</math>;
::let <math>C=C\cup\{i\}</math> and <math>U=U\setminus S_i</math>;
:return <math>C</math>;
}}
 
Obviously the algorithm runs in polynomial time and always returns a set cover.  We will then show how good the set cover returned by the algorithm compared to the optimal solution by analyzing its approximation ratio.
 
We define the following notations:
* We enumerate all elements of the universe <math>U</math> as <math>x_1,x_2,\ldots,x_n</math>, in the order in which they are covered in the algorithm.
* For <math>t=1,2,\ldots</math>, let <math>U_t</math> denote the set of uncovered elements in the beginning of the <math>t</math>-th iteration of the algorithm.
* For the <math>k</math>-th element <math>x_k</math> covered, supposed that it is covered by <math>S_i</math> in the <math>t</math>-th iteration, define
::<math>price(x_k)=\frac{1}{|S_i\cap U_t|}</math>
: to be the average "price" to cover element <math>x_k</math> in the algorithm.
 
Observe that if <math>x_k</math> is covered by <math>S_i</math> in the <math>t</math>-th iteration, then there are precisely <math>|S_i\cap U_t|</math> elements, including <math>x_k</math>, become covered in that iteration, and all these elements have price <math>1/|S_i\cap U_t|</math>. Then it is easy to have the following lemma:
{{Theorem|Lemma 1|
:For the set cover <math>C</math> returned by the algorithm, <math>|C|=\sum_{k=1}^nprice(x_k)</math>.
}}
This lemma connect the size of the returned set cover to the prices of elements. The next lemme connects the price of each element to the optimal solution.
 
{{Theorem|Lemma 2|
:For each <math>x_k</math>, <math>price(x_k)\le \frac{OPT}{n-k+1}</math>, where <math>OPT</math> is the size of the optimal set cover.
}}
{{Proof|
For an instance <math>S_1,S_2,\ldots,S_m\subseteq U</math> with a universe of size <math>n=|U|</math>, let <math>C^*\subseteq\{1,2,\ldots,m\}</math> denote an optimal set cover. Then
:<math>U=\bigcup_{i\in C^*}S_i</math>.
By averaging principle, there must be an <math>S_i</math> of size
:<math>|S_i|\ge\frac{n}{|C^*|}=\frac{n}{OPT}</math>.
By the greediness of the algorithm, in the first iteration the algorithm must choose a set <math>S_i</math> of at least this size to add to the set cover <math>C</math>, which means the price the element covered at first, <math>x_1</math>, along with all elements covered in the first iteration, are priced as
:<math>price(x_1)=\frac{1}{\max_{i}|S_i|}\le \frac{OPT}{n}</math>.
For the <math>k</math>-th element covered by the algorithm, supposed that it is covered by in the <math>t</math>-th iteration, and the current universe for the uncovered elements is <math>U_t</math>, it holds that
:<math>|U_t|\le n-k+1</math>,
since there are at most <math>k-1</math> elements covered before <math>x_k</math>.
 
The uncovered elements constitutes a set cover instance <math>S_1\cap U_t, S_2\cap U_t, \ldots, S_m\cap U_t</math> (some of which may be empty), with universe <math>U_t</math>. Note that this smaller instance has an optimal set cover of size at most OPT (since the optimal solution for the original instance must also be an optimal solution for this smaller instance), and <math>x_k</math> is among the elements covered in the first iteration of the algorithm running on this smaller instance. By the above argument, it holds for the <math>price(x_k)=\frac{1}{|S_i\cap U_t|}</math> (also note that this value is not changed no matter as in the <math>t</math>-th integration of the algorithm running on the original instance or as in the first iteration of the algorithm on the smaller instance) that
<math>
price(x_k)=\frac{1}{|S_i\cap U_t|}\le\frac{OPT}{|U_t|}=\frac{OPT}{n-k+1}.
</math>
The lemma is proved.
}}
 
 
Combining Lemma 1 and Lemma 2, we have
:<math>
:<math>
|C|=\sum_{k=1}^nprice(x_k) \le \sum_{k=1}^n  \frac{OPT}{n-k+1}=\sum_{k=1}^n\frac{OPT}{k}=H_n\cdot OPT,
(1-\epsilon)\texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k)  
\leq\texttt{cost}(\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n,C_1,\cdots,C_k)
\leq(1+\epsilon)\texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k)
</math>
</math>
where <math>H_n=\sum_{k=1}^n\frac{1}{k}\approx\ln n+O(1)</math> is the <math>n</math>-th [http://en.wikipedia.org/wiki/Harmonic_number Harmonic number].


This shows that the ''GreedyCover'' algorithm has approximation ratio <math>H_n\approx\ln n</math>.
'''(c)''' Suppose we know a set of clusters (i.e., a partition of the <math>n</math> points) <math>\tilde{C}_1,\cdots,\tilde{C}_k</math> such that:
{{Theorem|Theorem|
:For any set cover instance <math>S_1,S_2,\ldots,S_m\subseteq U</math> with optimal set cover of size <math>OPT</math>, the ''GreedyCover'' returns a set cover of size
::<math>C\le H_n\cdot {OPT}</math>,
:where <math>n=|U|</math> is the size of the universe and <math>H_n\approx\ln n</math> represents the <math>n</math>-th Harmonic number.
}}


Ignoring lower order terms, <math>\ln n</math> is also the best approximation ratio achievable by polynomial time algorithms, assuming that '''NP'''<math>neq</math>'''P'''.
:<math>\texttt{cost}(\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n,\tilde{C}_1,\cdots,\tilde{C}_k)\leq\gamma\cdot\texttt{cost}(\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n,\tilde{C}^*_1,\cdots,\tilde{C}^*_k)</math>
* [http://www.cs.mun.ca/~yzchen/papers/papers/hardness-approx-lund-yannakakis.pdf Lund and Yannakakis (1994)] showed that there is no poly-time algorithm with approximation ratio <math><\frac{1}{2}\log_2 n</math>, unless all '''NP''' problems have quasi-polynomial time algorithms (which runs in time <math>n^{\mathrm{polylog}(n)}</math>).
* [http://www.cs.duke.edu/courses/spring07/cps296.2/papers/p634-feige.pdf Feige (1998)] showed that there is no poly-time algorithm with approximation ratio better than <math>(1-o(1))\ln n</math> with the same complexity assumption.
* [http://courses.cs.tau.ac.il/368-3168/03a/ACT2/articles/raz97subconstant.pdf Ras and Safra (1997)] showed that there is no poly-time algorithm with approximation ratio better than <math>c\ln n</math> for a constant <math>c</math> assuming that '''NP'''<math>\neq</math>'''P'''.
* [http://arxiv.org/pdf/1305.1979.pdf Dinur and Steurer (2014)] showed that there is no poly-time algorithm with approximation ratio better than <math>(1-o(1))\ln n</math> assuming that '''NP'''<math>\neq</math>'''P'''.


== Primal-Dual Algorithm==
where <math>\tilde{C}^*_1,\cdots,\tilde{C}^*_k</math> is the optimal partition for points <math>\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n</math>. That is, assume we know a <math>\gamma</math> approximation to the optimal clustering for the dimensionality reduced points. Show that, when <math>\epsilon\leq 1/2</math>,
Given an instance <math>S_1,S_2,\ldots,S_m\subseteq U</math> for set cover, the set cover problem asks for minimizing the size of <math>|C|</math> subject to the constraints that <math>C\subseteq\{1,2,\ldots, m\}</math> and <math>U=\bigcup_{i\in C}S_i</math>, i.e. <math>C</math> is a set cover. We can define a '''dual''' problem on the same instance. The original problem, the set cover problem is called the '''primal''' problem. Formally, the primal and dual problems are defined as follows:
:'''''Primal''''':
:: '''minimize''' <math>|C|</math>
:: '''subject to''' <math>C\subseteq \{1,2,\ldots,m\}</math>
:::::<math>U=\bigcup_{i\in C}S_i</math>


:'''''Dual''''':
:<math>\texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,\tilde{C}_1,\cdots,\tilde{C}_k)\leq(1+O(\epsilon))\cdot\gamma\cdot\texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C^*_1,\cdots,C^*_k)</math>
:: '''maximize''' <math>|M|</math>
:: '''subject to''' <math>M\subseteq U</math>
:::::<math>|S_i\cap M|\le 1</math>, <math>\forall 1\le i\le m</math>


The dual problem is a "maximum matching" problem, where the matching is defined for the set system instead of graph. Given an instance <math>S_1,S_2,\ldots,S_m\subseteq U</math>, an <math>M\subseteq U</math> is called a '''matching''' for <math>S_1,S_2,\ldots,S_m</math> if <math>|S_i\cap M|\le 1</math> for all <math>i=1,2,\ldots, m</math>.
where <math>C^*_1,\cdots,C^*_k</math> is the optimal partition for <math>\mathbf{x}_1,\cdots,\mathbf{x}_n</math>.


It is easier to see these two optimization problems are dual to each other if we write them as mathematical programs.  
In other words, we can compute an approximately optimal clustering for high dimensional original data using just the dimensionality reduced data. In many cases, this approach will speed up algorithms for solving <math>k</math>-means clustering.


For the primal problem (set cover), for each <math>1\le i\le m</math>, let <math>x_i\in\{0,1\}</math> indicate whether <math>i\in C</math>. The set cover problem can be written as the following integer '''linear programming (ILP)'''.
== Problem 2 ==
:'''''Primal''''':
The following is the weighted version of set cover problem:
:: '''minimize''' <math>\sum_{i=1}^m x_i</math>
:: '''subject to''' <math>\sum_{i:v\in S_i}x_i\ge 1</math>, <math>\forall v\in U</math>
:::::<math>x_i\in\{0,1\}</math>, <math>\forall 1\le i\le m</math>


For the dual problem (maximum matching), for each <math>v\in U</math>, let <math>y_v\in\{0,1\}</math> indicate whether <math>v\in M</math>. Then the dual problem can be written as the following ILP.
Given <math>m</math> subsets <math>S_1,S_2,\ldots,S_m\subseteq U</math>, where <math>U</math> is a universe of size <math>n=|U|</math>, and each subset <math>S_i</math> is assigned a positive weight <math>w_i>0</math>, the goal is to find a <math>C\subseteq\{1,2,\ldots,m\}</math> such that <math>U=\bigcup_{i\in C}S_i</math> and the total weight <math>\sum_{I\in C}w_i</math> is minimized.
:'''''Dual''''':
* Give an integer program for the problem and its LP relaxation.
:: '''maximize''' <math>\sum_{v\in U}y_v</math>
* Consider the following idea of randomized rounding: independently round each fractional value to <math>\{0,1\}</math> with the probability of the fractional value itself; and repeatedly apply this process to the variables rounded to 0 in previous iterations until <math>U</math> is fully covered. Show that this can return a set cover with <math>O(\log n)</math> approximation ratio with probability at least <math>0.99</math>.
:: '''subject to''' <math>\sum_{v\in S_i}y_v\le 1</math>, <math>\forall 1\le i\le m</math>
:::::<math>y_v\in\{0,1\}</math>, <math>\forall v\in U</math>


It is fundamental fact that for a minimization problem, every feasible solution to the dual problem (which is a maximization problem) is a lower bound for the optimal solution to the primal problem. This is called the '''weak duality''' phenomenon. The easy direction (that every cut is a lower bound for every flow) of the famous "max-flow min-cut" is an instance of weak duality.
== Problem 3==
 
In the ''maximum directed cut'' (MAX-DICUT) problem, we are given as input a directed graph <math>G(V,E)</math>. The goal is to partition <math>V</math> into disjoint <math>S</math> and <math>T</math> so that the number of edges in <math>E(S,T)=\{(u,v)\in E\mid u\in S, v\in T\}</math> is maximized. The following is the integer program for MAX-DICUT:
{{Theorem|Theorem (Weak Duality)|
:::<math>
:For every matching <math>M</math> and every vertex cover <math>C</math> for <math>S_1,S_2,\ldots,S_m\subseteq U</math>, it holds that <math>|M|\le |C|</math>.
\begin{align}
}}
\text{maximize} &&& \sum_{(u,v)\in E}y_{u,v}\\
{{Proof|
\text{subject to} && y_{u,v} &\le x_u, & \forall (u,v)&\in E,\\
If <math>M\subseteq U</math> is a matching for <math>S_1,S_2,\ldots,S_m\subseteq U</math>, then every <math>S_i</math> intersects with <math>M</math> on at most one element, which means no two elements in <math>M</math> can be covered by one <math>S_i</math>, and hence each element in <math>M</math> will consume at least one distinct <math>S_i</math> to cover. Therefore, for any set cover, in order to cover all elements in <math>M</math> must cost at least <math>|M|</math> sets.
&& y_{u,v} &\le 1-x_v, & \forall (u,v)&\in E,\\
 
&& x_v &\in\{0,1\}, & \forall v&\in V,\\
More formally, for any matching <math>M</math> and set cover <math>C</math>, it holds that
&&  y_{u,v} &\in\{0,1\}, & \forall (u,v)&\in E.
:<math>
\end{align}
|M|=\left|\bigcup_{i\in C}(S_i\cap M)\right|\le \sum_{i\in C}|S_i\cap M|\le |C|,
</math>
where the first equality is because <math>C</math> is a set cover and the last inequality is because <math>M</math> is a matching.
}}
 
As a corollary, every matching <math>M</math> for <math>S_1,S_2,\ldots,S_m\subseteq U</math> is a lower bound for the optimal set cover <math>OPT</math>:
{{Theorem|Corollary|
:Let <math>S_1,S_2,\ldots,S_m\subseteq U</math> be an instance for set cover, and <math>OPT</math> the size of the optimal set cover.
:For every matching <math>M</math> for <math>S_1,S_2,\ldots,S_m</math>, it holds that <math>|M|\le OPT</math>.
}}
 
Now we are ready to present our algorithm. It is a greedy algorithm in the dual world. And the maximal (local optimal) solution to the dual problem helps us to find a good enough solution to the primal problem.
{{Theorem|''DualCover''|
:'''Input:''' sets <math>S_1,S_2,\ldots,S_m\subseteq U</math>;
----
:construct a ''maximal'' matching <math>M\subseteq U</math> such that <math>|S_i\cap M|\le 1</math> for all <math>i=1,2,\ldots, m</math>
::by sequentially adding elements to <math>M</math> until nothing can be added;
:return <math>C=\{i\mid S_i\cap M\neq\emptyset\}</math>
}}
The algorithm constructs the maximal matching <math>M</math> by sequentially adding elements into <math>M</math> until reaching the maximality. This obviously takes polynomial time.
 
It is not so obvious to see that the returned <math>C</math> is always a set cover. This is due to the maximality of the matching <math>M</math>:
* By contradiction, assuming that <math>C</math> is not a set cover, which means there is an element <math>x\in U</math> such that for all <math>i\in C</math>, <math>x\not\in S_i</math>, which implies that <math>x\not\in M</math> and the <math>M'=M\cap\{x\}</math> is still a matching, contradicting the maximality of <math>M</math>.
Therefore the <math>C</math> constructed as in the DualCover algorithm must be a set cover.
 
For the maximal matching <math>M</math> constructed by the ''DualCover'' algorithm, the output set cover is the collection of all sets which contain at least one element in <math>M</math>. Recall that the frequency of an element <math>frequency(x)</math> is defined as the number of sets <math>S_i</math> containing <math>x</math>. Then each element <math>x\in M</math> may contribute at most <math>frequency(x)</math> many sets into the set cover <math>C</math>. Then it holds that
:<math>
|C|\le \sum_{x\in M}frequency(x)\le f\cdot |M|\le f\cdot OPT,
</math>
</math>
where <math>f=\max_{x\in U}frequency(x)</math> denotes the maximum frequency of all elements.
Let <math>x_v^*,y_{u,v}^*</math> denote the optimal solution to the '''LP-relaxation''' of the above integer program.
 
* Apply the randomized rounding such that for every <math>v\in V</math>, <math>\hat{x}_v=1</math> independently with probability <math>x_v^*</math>. Analyze the approximation ratio (between the expected size of the random cut and OPT).
We proved the following <math>f</math>-approximation bound for the ''DualCover'' algorithm on set cover instances with maximum frequency <math>f</math>.
* Apply another randomized rounding such that for every <math>v\in V</math>, <math>\hat{x}_v=1</math> independently with probability <math>1/4+x_v^*/2</math>. Analyze the approximation ratio for this algorithm.
{{Theorem|Theorem|
:For any set cover instance <math>S_1,S_2,\ldots,S_m\subseteq U</math> with optimal set cover of size <math>OPT</math>, the ''DualCover'' returns a set cover of size
::<math>C\le f\cdot {OPT}</math>,  
:where <math>f=\max_{x\in U}frequency(x)=\max_{x\in U}|\{i\mid x\in S_i\}|</math> is the maximum frequency.
}}
 
When the frequency <math>f=2</math>, the set cover problem becomes the vertex cover problem. And the DualCover algorithm works simply as follows:
{{Theorem|''DualCover'' for vertex cover problem|
:'''Input:''' an undirected graph <math>G(V,E)</math>;
----
: initially <math>C=\emptyset</math>;
: while <math>E\neq\emptyset</math>
:: pick an arbitrary edge <math>uv\in E</math> and add both <math>u</math> and <math>v</math> to <math>C</math>;
:: remove all edges in <math>E</math> incident to <math>u</math> or <math>v</math>;
: return <math>C</math>;
}}
 
Since this algorithm is just an implementation of the ''DualCover'' algorithm on the vertex cover instances (set cover instances with frequency <math>f=2</math>), by the analysis of the ''DualCover'' algorithm, it is a 2-approximation algorithm for the vertex cover problem.


Ignoring lower order terms, <math>2</math> is also the best approximation ratio achievable by polynomial time algorithms, assuming certain complexity assumption.
== Problem 4 ==
* [http://www.wisdom.weizmann.ac.il/~dinuri/mypapers/vc.pdf Dinur and Safra (2005)] showed that there is no poly-time algorithm with approximation ratio <math><1.3606</math>, assuming that '''NP'''<math>\neq</math>'''P'''.
* [http://www.cs.nyu.edu/~khot/papers/vc_hard.pdf Khot and Regev (2008)] showed that there is no poly-time algorithm with approximation ratio <math>2-\epsilon</math> for any constant <math>\epsilon</math> assuming the [https://en.wikipedia.org/wiki/Unique_games_conjecture unique games conjecture].

Revision as of 08:30, 22 November 2021

  • 每道题目的解答都要有完整的解题过程。中英文不限。

Problem 1

In this problem, we will demonstrate a common algorithmic application of the Johnson-Lindenstrauss Theorem (JLT). Specifically, we will consider the [math]\displaystyle{ k }[/math]-means clustering problem, which is widely used in areas like machine learning and data mining.

In the [math]\displaystyle{ k }[/math]-means clustering problem, we are given [math]\displaystyle{ n }[/math] points [math]\displaystyle{ \mathbf{x}_1,\cdots,\mathbf x_n\in\mathbb{R}^d }[/math], and the goal is to partition these points into [math]\displaystyle{ k }[/math] disjoint sets (or, cluster) [math]\displaystyle{ C_1,\cdots,C_k }[/math] so as to minimize the cost:

[math]\displaystyle{ \texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k)=\sum_{i=1}^{k}\sum_{j\in C_i}\lVert\mathbf{x}_j-\mathbf{\mu}_i\rVert^2_2 }[/math]

Here, [math]\displaystyle{ \mathbf{\mu}_i=(1/|C_i|)\cdot\sum_{j\in C_i}\mathbf{x}_j }[/math] is the mean of the points in cluster [math]\displaystyle{ C_i }[/math]. In other words, we want to find [math]\displaystyle{ k }[/math] clusters that have as little internal variance as possible.

(a) Prove the following equality:

[math]\displaystyle{ \texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k)=\sum_{i=1}^{k}\frac{1}{|C_i|}\sum_{j,l\in C_i,j\lt l}\lVert\mathbf{x}_j-\mathbf{x}_l\rVert^2_2 }[/math]

(b) Suppose we use the Johnson-Lindenstrauss method taught in class to embed [math]\displaystyle{ \mathbf{x}_1,\cdots,\mathbf{x}_n }[/math] into [math]\displaystyle{ O(\log{n}/\epsilon^2) }[/math]-dimensional points [math]\displaystyle{ \mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n }[/math]. Show that with high probability in [math]\displaystyle{ n }[/math], the following inequality holds for all clusters:

[math]\displaystyle{ (1-\epsilon)\texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k) \leq\texttt{cost}(\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n,C_1,\cdots,C_k) \leq(1+\epsilon)\texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k) }[/math]

(c) Suppose we know a set of clusters (i.e., a partition of the [math]\displaystyle{ n }[/math] points) [math]\displaystyle{ \tilde{C}_1,\cdots,\tilde{C}_k }[/math] such that:

[math]\displaystyle{ \texttt{cost}(\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n,\tilde{C}_1,\cdots,\tilde{C}_k)\leq\gamma\cdot\texttt{cost}(\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n,\tilde{C}^*_1,\cdots,\tilde{C}^*_k) }[/math]

where [math]\displaystyle{ \tilde{C}^*_1,\cdots,\tilde{C}^*_k }[/math] is the optimal partition for points [math]\displaystyle{ \mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n }[/math]. That is, assume we know a [math]\displaystyle{ \gamma }[/math] approximation to the optimal clustering for the dimensionality reduced points. Show that, when [math]\displaystyle{ \epsilon\leq 1/2 }[/math],

[math]\displaystyle{ \texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,\tilde{C}_1,\cdots,\tilde{C}_k)\leq(1+O(\epsilon))\cdot\gamma\cdot\texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C^*_1,\cdots,C^*_k) }[/math]

where [math]\displaystyle{ C^*_1,\cdots,C^*_k }[/math] is the optimal partition for [math]\displaystyle{ \mathbf{x}_1,\cdots,\mathbf{x}_n }[/math].

In other words, we can compute an approximately optimal clustering for high dimensional original data using just the dimensionality reduced data. In many cases, this approach will speed up algorithms for solving [math]\displaystyle{ k }[/math]-means clustering.

Problem 2

The following is the weighted version of set cover problem:

Given [math]\displaystyle{ m }[/math] subsets [math]\displaystyle{ S_1,S_2,\ldots,S_m\subseteq U }[/math], where [math]\displaystyle{ U }[/math] is a universe of size [math]\displaystyle{ n=|U| }[/math], and each subset [math]\displaystyle{ S_i }[/math] is assigned a positive weight [math]\displaystyle{ w_i\gt 0 }[/math], the goal is to find a [math]\displaystyle{ C\subseteq\{1,2,\ldots,m\} }[/math] such that [math]\displaystyle{ U=\bigcup_{i\in C}S_i }[/math] and the total weight [math]\displaystyle{ \sum_{I\in C}w_i }[/math] is minimized.

  • Give an integer program for the problem and its LP relaxation.
  • Consider the following idea of randomized rounding: independently round each fractional value to [math]\displaystyle{ \{0,1\} }[/math] with the probability of the fractional value itself; and repeatedly apply this process to the variables rounded to 0 in previous iterations until [math]\displaystyle{ U }[/math] is fully covered. Show that this can return a set cover with [math]\displaystyle{ O(\log n) }[/math] approximation ratio with probability at least [math]\displaystyle{ 0.99 }[/math].

Problem 3

In the maximum directed cut (MAX-DICUT) problem, we are given as input a directed graph [math]\displaystyle{ G(V,E) }[/math]. The goal is to partition [math]\displaystyle{ V }[/math] into disjoint [math]\displaystyle{ S }[/math] and [math]\displaystyle{ T }[/math] so that the number of edges in [math]\displaystyle{ E(S,T)=\{(u,v)\in E\mid u\in S, v\in T\} }[/math] is maximized. The following is the integer program for MAX-DICUT:

[math]\displaystyle{ \begin{align} \text{maximize} &&& \sum_{(u,v)\in E}y_{u,v}\\ \text{subject to} && y_{u,v} &\le x_u, & \forall (u,v)&\in E,\\ && y_{u,v} &\le 1-x_v, & \forall (u,v)&\in E,\\ && x_v &\in\{0,1\}, & \forall v&\in V,\\ && y_{u,v} &\in\{0,1\}, & \forall (u,v)&\in E. \end{align} }[/math]

Let [math]\displaystyle{ x_v^*,y_{u,v}^* }[/math] denote the optimal solution to the LP-relaxation of the above integer program.

  • Apply the randomized rounding such that for every [math]\displaystyle{ v\in V }[/math], [math]\displaystyle{ \hat{x}_v=1 }[/math] independently with probability [math]\displaystyle{ x_v^* }[/math]. Analyze the approximation ratio (between the expected size of the random cut and OPT).
  • Apply another randomized rounding such that for every [math]\displaystyle{ v\in V }[/math], [math]\displaystyle{ \hat{x}_v=1 }[/math] independently with probability [math]\displaystyle{ 1/4+x_v^*/2 }[/math]. Analyze the approximation ratio for this algorithm.

Problem 4