高级算法 (Fall 2016)/Greedy and Local Search
Under construction.
Set cover
Given [math]\displaystyle{ m }[/math] subsets [math]\displaystyle{ S_1,S_2,\ldots,S_m\subseteq U }[/math] of a universe [math]\displaystyle{ U }[/math] of size [math]\displaystyle{ n=|U| }[/math], a [math]\displaystyle{ C\subseteq\{1,2,\ldots,m\} }[/math] forms a set cover if [math]\displaystyle{ U=\bigcup_{i\in\mathcal{C}}S_i }[/math], that is, [math]\displaystyle{ C }[/math] is a sub-collection of sets whose union "covers" all elements in the universe.
Without loss of generality, we always assume that the universe is [math]\displaystyle{ U==\bigcup_{i=1}^mS_i }[/math].
This defines an important optimization problem:
Set Cover Problem - Input: [math]\displaystyle{ m }[/math] subsets [math]\displaystyle{ S_1,S_2,\ldots,S_m\subseteq U }[/math] of a universe [math]\displaystyle{ U }[/math] of size [math]\displaystyle{ n }[/math];
- Output: the smallest [math]\displaystyle{ C\subseteq\{1,2,\ldots,m\} }[/math] such that [math]\displaystyle{ U=\bigcup_{i\in C}S_i }[/math].
We can think of each instance as a bipartite graph [math]\displaystyle{ G(U,\{S_1,S_2,\ldots,S_n\}, E) }[/math] with [math]\displaystyle{ n }[/math] vertices on the right side, each corresponding to an element [math]\displaystyle{ x\in U }[/math], [math]\displaystyle{ m }[/math] vertices on the left side, each corresponding to one of the [math]\displaystyle{ m }[/math] subsets [math]\displaystyle{ S_1,S_2,\ldots,S_m }[/math], and there is a bipartite edge connecting [math]\displaystyle{ x }[/math] with [math]\displaystyle{ S_i }[/math] if and only if [math]\displaystyle{ x\in S_i }[/math]. By this translation the set cover problem is precisely the problem of given as input a bipartite graph [math]\displaystyle{ G(U,V,E) }[/math], to find the smallest subset [math]\displaystyle{ C\subseteq V }[/math] of vertices on the right side to "cover" all vertices on the left side, such that every vertex on the left side [math]\displaystyle{ x\in U }[/math] is incident to some vertex in [math]\displaystyle{ C }[/math].
By alternating the roles of sets and elements in the above interpretation of set cover instances as bipartite graphs, the set cover problem can be translated to the following equivalent hitting set problem:
Hitting Set Problem - Input: [math]\displaystyle{ n }[/math] subsets [math]\displaystyle{ S_1,S_2,\ldots,S_n\subseteq U }[/math] of a universe [math]\displaystyle{ U }[/math] of size [math]\displaystyle{ m }[/math];
- Output: the smallest subset [math]\displaystyle{ C\subseteq U }[/math] of elements such that [math]\displaystyle{ C }[/math] intersects with every set [math]\displaystyle{ S_i }[/math] for [math]\displaystyle{ 1\le i\le n }[/math].
Frequency and Vertex Cover
Given an instance of set cover problem [math]\displaystyle{ S_1,S_2,\ldots,S_m\subseteq U }[/math], for every element [math]\displaystyle{ x\in U }[/math], its frequency, denoted as [math]\displaystyle{ frequency(x) }[/math], is defined as the number of sets containing [math]\displaystyle{ X }[/math]. Formally,
- [math]\displaystyle{ frequency(x)=|\{i\mid x\in S_i\}| }[/math].
In the hitting set version, the frequency should be defined for each set: for a set [math]\displaystyle{ S_i }[/math] its frequency [math]\displaystyle{ frequency(S_i)=|S_i| }[/math] is just the size of the set [math]\displaystyle{ S_i }[/math].
The set cover problem restricted to the instances with frequency 2 becomes the vertex cover problem.
Given an undirected graph [math]\displaystyle{ G(U,V) }[/math], a vertex cover is a subset [math]\displaystyle{ C\subseteq V }[/math] of vertices such that every edge [math]\displaystyle{ uv\in E }[/math] has at least one endpoint in [math]\displaystyle{ C }[/math].
Vertex Cover Problem - Input: an undirected graph [math]\displaystyle{ G(V,E) }[/math]
- Output: the smallest [math]\displaystyle{ C\subseteq V }[/math] such that every edge [math]\displaystyle{ e\in E }[/math] is incident to at least one vertex in [math]\displaystyle{ C }[/math].
It is easy to compare with the hitting set problem:
- For graph [math]\displaystyle{ G(V,E) }[/math], its edges [math]\displaystyle{ e_1,e_2,\ldots,e_n\subseteq V }[/math] are vertex-sets of size 2.
- A subset [math]\displaystyle{ C\subseteq V }[/math] of vertices is a vertex cover if and only if it is a hitting sets for [math]\displaystyle{ e_1,e_2,\ldots,e_n }[/math], i.e. every [math]\displaystyle{ e_i }[/math] intersects with [math]\displaystyle{ C }[/math].
Therefore vertex cover is just set cover with frequency 2.
The vertex cover problem is NP-hard. Its decision version is among Karp's 21 NP-complete problems. Since vertex cover is a special case of set cover, the set cover problem is also NP-hard.
Greedy Algorithm for Set Cover
We present our algorithms in the original set cover setting (instead of the hitting set version).
A natural algorithm is the greedy algorithm: sequentially add such [math]\displaystyle{ i }[/math] to the cover [math]\displaystyle{ C }[/math], where each [math]\displaystyle{ S_i }[/math] covers the largest number of currently uncovered elements, until no element is left uncovered.
GreedyCover - Input: sets [math]\displaystyle{ S_1,S_2,\ldots,S_m }[/math];
- initially, [math]\displaystyle{ U=\bigcup_{i=1}^mS_i }[/math], and [math]\displaystyle{ C=\emptyset }[/math];
- while [math]\displaystyle{ U\neq\emptyset }[/math] do
- find [math]\displaystyle{ i\in\{1,2,\ldots, m\} }[/math] with the largest [math]\displaystyle{ |S_i\cap U| }[/math];
- let [math]\displaystyle{ C=C\cup\{i\} }[/math] and [math]\displaystyle{ U=U\setminus S_i }[/math];
- return [math]\displaystyle{ C }[/math];
Obviously the algorithm runs in polynomial time and always returns a set cover.
We will show that the GreedyCover algorithm has approximation ratio [math]\displaystyle{ O(\ln n) }[/math].
Theorem - For any set cover instance [math]\displaystyle{ S_1,S_2,\ldots,S_m\subseteq U }[/math] with optimal set cover of size [math]\displaystyle{ OPT }[/math], the GreedyCover returns a set cover of size
- [math]\displaystyle{ C\le H_n\cdot {OPT} }[/math],
- where [math]\displaystyle{ n=|U| }[/math] is the size of the universe and [math]\displaystyle{ H_n\approx\ln n }[/math] represents the [math]\displaystyle{ n }[/math]-th Harmonic number.
- For any set cover instance [math]\displaystyle{ S_1,S_2,\ldots,S_m\subseteq U }[/math] with optimal set cover of size [math]\displaystyle{ OPT }[/math], the GreedyCover returns a set cover of size
We define the following notations:
- We enumerate all elements of the universe [math]\displaystyle{ U }[/math] as [math]\displaystyle{ x_1,x_2,\ldots,x_n }[/math], in the order in which they are covered in the algorithm.
- For [math]\displaystyle{ t=1,2,\ldots }[/math], let [math]\displaystyle{ U_t }[/math] denote the set of uncovered elements in the beginning of the [math]\displaystyle{ t }[/math]-th iteration of the algorithm.
- For the [math]\displaystyle{ k }[/math]-th element [math]\displaystyle{ x_k }[/math] covered, supposed that it is covered by [math]\displaystyle{ S_i }[/math] in the [math]\displaystyle{ t }[/math]-th iteration, define
- [math]\displaystyle{ price(x_k)=\frac{1}{|S_i\cap U_t|} }[/math]
- to be the average "price" to cover element [math]\displaystyle{ x_k }[/math] in the algorithm.
Observe that if [math]\displaystyle{ x_k }[/math] is covered by [math]\displaystyle{ S_i }[/math] in the [math]\displaystyle{ t }[/math]-th iteration, then there are precisely [math]\displaystyle{ |S_i\cap U_t| }[/math] elements, including [math]\displaystyle{ x_k }[/math], become covered in that iteration, and all these elements have price [math]\displaystyle{ 1/|S_i\cap U_t| }[/math]. Then it is easy to have the following lemma:
Lemma 1 - For the set cover [math]\displaystyle{ C }[/math] returned by the algorithm, [math]\displaystyle{ |C|=\sum_{k=1}^nprice(x_k) }[/math].
This lemma connect the size of the returned set cover to the prices of elements. The next lemme connects the price of each element to the optimal solution.
Lemma 2 - For each [math]\displaystyle{ x_k }[/math], [math]\displaystyle{ price(x_k)\le \frac{OPT}{n-k+1} }[/math], where [math]\displaystyle{ OPT }[/math] is the size of the optimal set cover.
Proof. For an instance [math]\displaystyle{ S_1,S_2,\ldots,S_m\subseteq U }[/math] with a universe of size [math]\displaystyle{ n=|U| }[/math], let [math]\displaystyle{ C^*\subseteq\{1,2,\ldots,m\} }[/math] denote an optimal set cover. Then
- [math]\displaystyle{ U=\bigcup_{i\in C^*}S_i }[/math].
By averaging principle, there must be an [math]\displaystyle{ S_i }[/math] of size
- [math]\displaystyle{ |S_i|\ge\frac{n}{|C^*|}=\frac{n}{OPT} }[/math].
By the greediness of the algorithm, in the first iteration the algorithm must choose a set [math]\displaystyle{ S_i }[/math] of at least this size to add to the set cover [math]\displaystyle{ C }[/math], which means the price the element covered at first, [math]\displaystyle{ x_1 }[/math], along with all elements covered in the first iteration, are priced as
- [math]\displaystyle{ price(x_1)=\frac{1}{\max_{i}|S_i|}\le \frac{OPT}{n} }[/math].
For the [math]\displaystyle{ k }[/math]-th element covered by the algorithm, supposed that it is covered by in the [math]\displaystyle{ t }[/math]-th iteration, and the current universe for the uncovered elements is [math]\displaystyle{ U_t }[/math], it holds that
- [math]\displaystyle{ |U_t|\le n-k+1 }[/math],
since there are at most [math]\displaystyle{ k-1 }[/math] elements covered before [math]\displaystyle{ x_k }[/math].
The uncovered elements constitutes a set cover instance [math]\displaystyle{ S_1\cap U_t, S_2\cap U_t, \ldots, S_m\cap U_t }[/math] (some of which may be empty), with universe [math]\displaystyle{ U_t }[/math]. Note that this smaller instance has an optimal set cover of size at most OPT (since the optimal solution for the original instance must also be an optimal solution for this smaller instance), and [math]\displaystyle{ x_k }[/math] is among the elements covered in the first iteration of the algorithm running on this smaller instance. By the above argument, it holds for the [math]\displaystyle{ price(x_k)=\frac{1}{|S_i\cap U_t|} }[/math] (also note that this value is not changed no matter as in the [math]\displaystyle{ t }[/math]-th integration of the algorithm running on the original instance or as in the first iteration of the algorithm on the smaller instance) that [math]\displaystyle{ price(x_k)=\frac{1}{|S_i\cap U_t|}\le\frac{OPT}{|U_t|}=\frac{OPT}{n-k+1}. }[/math] The lemma is proved.
- [math]\displaystyle{ \square }[/math]
Combining Lemma 1 and Lemma 2, we have
- [math]\displaystyle{ |C|=\sum_{k=1}^nprice(x_k) \le \sum_{k=1}^n \frac{OPT}{n-k+1}=\sum_{k=1}^n\frac{OPT}{k}=H_n\cdot OPT. }[/math]
Primal-Dual Greedy Algorithm for Set Cover
DualCover - Input: sets [math]\displaystyle{ S_1,S_2,\ldots,S_m\subseteq U }[/math];
- construct a maximal [math]\displaystyle{ M\subseteq U }[/math] such that [math]\displaystyle{ |S_i\cap M|\le 1 }[/math] for all [math]\displaystyle{ i=1,2,\ldots, m }[/math];
- return [math]\displaystyle{ C=\{i\mid S_i\cap M\neq\} }[/math]
Theorem - For any set cover instance [math]\displaystyle{ S_1,S_2,\ldots,S_m\subseteq U }[/math] with optimal set cover of size [math]\displaystyle{ OPT }[/math], the DualCover returns a set cover of size
- [math]\displaystyle{ C\le f\cdot {OPT} }[/math],
- where [math]\displaystyle{ f=\max_{x\in U}frequency(x)=\max_{x\in U}|\{i\mid x\in S_i\}| }[/math] is the maximum frequency.
- For any set cover instance [math]\displaystyle{ S_1,S_2,\ldots,S_m\subseteq U }[/math] with optimal set cover of size [math]\displaystyle{ OPT }[/math], the DualCover returns a set cover of size