高级算法 (Fall 2016)/Greedy and Local Search: Difference between revisions
imported>Etone |
imported>Etone |
||
Line 267: | Line 267: | ||
== Local Search == | == Local Search == | ||
{{Theorem|Local Search for Scheduling| | |||
:start with an arbitrary schedule of <math>n</math> jobs to <math>m</math> machines; | |||
:while (true) | |||
::let <math>\ell</math> denote the job finished at last in the current schedule; | |||
::if there is machine <math>i</math> such that job <math>\ell</math> can finish earlier by moving to machine <math>i</math> | |||
:::job <math>\ell</math> moves to machine <math>i</math>; | |||
::else break; | |||
}} | |||
== Longest Processing Time (LPT)== | == Longest Processing Time (LPT)== | ||
== Online Algorithms and Competitive Ratio== | == Online Algorithms and Competitive Ratio== |
Revision as of 03:04, 29 September 2016
Under construction.
Set cover
Given
Without loss of generality, we always assume that the universe is
This defines an important optimization problem:
Set Cover Problem - Input:
subsets of a universe of size ; - Output: the smallest
such that .
- Input:
We can think of each instance as a bipartite graph
By alternating the roles of sets and elements in the above interpretation of set cover instances as bipartite graphs, the set cover problem can be translated to the following equivalent hitting set problem:
Hitting Set Problem - Input:
subsets of a universe of size ; - Output: the smallest subset
of elements such that intersects with every set for .
- Input:
Frequency and Vertex Cover
Given an instance of set cover problem
.
In the hitting set version, the frequency should be defined for each set: for a set
The set cover problem restricted to the instances with frequency 2 becomes the vertex cover problem.
Given an undirected graph
Vertex Cover Problem - Input: an undirected graph
- Output: the smallest
such that every edge is incident to at least one vertex in .
- Input: an undirected graph
It is easy to compare with the hitting set problem:
- For graph
, its edges are vertex-sets of size 2. - A subset
of vertices is a vertex cover if and only if it is a hitting sets for , i.e. every intersects with .
Therefore vertex cover is just set cover with frequency 2.
The vertex cover problem is NP-hard. Its decision version is among Karp's 21 NP-complete problems. Since vertex cover is a special case of set cover, the set cover problem is also NP-hard.
Greedy Algorithm
We present our algorithms in the original set cover setting (instead of the hitting set version).
A natural algorithm is the greedy algorithm: sequentially add such
GreedyCover - Input: sets
;
- initially,
, and ; - while
do- find
with the largest ; - let
and ;
- find
- return
;
- Input: sets
Obviously the algorithm runs in polynomial time and always returns a set cover. We will then show how good the set cover returned by the algorithm compared to the optimal solution by analyzing its approximation ratio.
We define the following notations:
- We enumerate all elements of the universe
as , in the order in which they are covered in the algorithm. - For
, let denote the set of uncovered elements in the beginning of the -th iteration of the algorithm. - For the
-th element covered, supposed that it is covered by in the -th iteration, define
- to be the average "price" to cover element
in the algorithm.
Observe that if
Lemma 1 - For the set cover
returned by the algorithm, .
- For the set cover
This lemma connect the size of the returned set cover to the prices of elements. The next lemme connects the price of each element to the optimal solution.
Lemma 2 - For each
, , where is the size of the optimal set cover.
- For each
Proof. For an instance
with a universe of size , let denote an optimal set cover. Then .
By averaging principle, there must be an
of size .
By the greediness of the algorithm, in the first iteration the algorithm must choose a set
of at least this size to add to the set cover , which means the price the element covered at first, , along with all elements covered in the first iteration, are priced as .
For the
-th element covered by the algorithm, supposed that it is covered by in the -th iteration, and the current universe for the uncovered elements is , it holds that ,
since there are at most
elements covered before .The uncovered elements constitutes a set cover instance
(some of which may be empty), with universe . Note that this smaller instance has an optimal set cover of size at most OPT (since the optimal solution for the original instance must also be an optimal solution for this smaller instance), and is among the elements covered in the first iteration of the algorithm running on this smaller instance. By the above argument, it holds for the (also note that this value is not changed no matter as in the -th integration of the algorithm running on the original instance or as in the first iteration of the algorithm on the smaller instance) that The lemma is proved.
Combining Lemma 1 and Lemma 2, we have
where
This shows that the GreedyCover algorithm has approximation ratio
Theorem - For any set cover instance
with optimal set cover of size , the GreedyCover returns a set cover of size ,
- where
is the size of the universe and represents the -th Harmonic number.
- For any set cover instance
Ignoring lower order terms,
- Lund and Yannakakis (1994) showed that there is no poly-time algorithm with approximation ratio
, unless all NP problems have quasi-polynomial time algorithms (which runs in time ). - Feige (1998) showed that there is no poly-time algorithm with approximation ratio better than
with the same complexity assumption. - Ras and Safra (1997) showed that there is no poly-time algorithm with approximation ratio better than
for a constant assuming that NP P. - Dinur and Steurer (2014) showed that there is no poly-time algorithm with approximation ratio better than
assuming that NP P.
Primal-Dual Algorithm
Given an instance
- Primal:
- minimize
- subject to
- minimize
- Dual:
- maximize
- subject to
,
- maximize
The dual problem is a "maximum matching" problem, where the matching is defined for the set system instead of graph. Given an instance
It is easier to see these two optimization problems are dual to each other if we write them as mathematical programs.
For the primal problem (set cover), for each
- Primal:
- minimize
- subject to
, ,
- minimize
For the dual problem (maximum matching), for each
- Dual:
- maximize
- subject to
, ,
- maximize
It is fundamental fact that for a minimization problem, every feasible solution to the dual problem (which is a maximization problem) is a lower bound for the optimal solution to the primal problem. This is called the weak duality phenomenon. The easy direction (that every cut is a lower bound for every flow) of the famous "max-flow min-cut" is an instance of weak duality.
Theorem (Weak Duality) - For every matching
and every vertex cover for , it holds that .
- For every matching
Proof. If
is a matching for , then every intersects with on at most one element, which means no two elements in can be covered by one , and hence each element in will consume at least one distinct to cover. Therefore, for any set cover, in order to cover all elements in must cost at least sets.More formally, for any matching
and set cover , it holds thatwhere the first equality is because
is a set cover and the last inequality is because is a matching.
As a corollary, every matching
Corollary - Let
be an instance for set cover, and the size of the optimal set cover. - For every matching
for , it holds that .
- Let
Now we are ready to present our algorithm. It is a greedy algorithm in the dual world. And the maximal (local optimal) solution to the dual problem helps us to find a good enough solution to the primal problem.
DualCover - Input: sets
;
- construct a maximal matching
such that for all- by sequentially adding elements to
until nothing can be added;
- by sequentially adding elements to
- return
- Input: sets
The algorithm constructs the maximal matching
It is not so obvious to see that the returned
- By contradiction, assuming that
is not a set cover, which means there is an element such that for all , , which implies that and the is still a matching, contradicting the maximality of .
Therefore the
For the maximal matching
where
We proved the following
Theorem - For any set cover instance
with optimal set cover of size , the DualCover returns a set cover of size ,
- where
is the maximum frequency.
- For any set cover instance
When the frequency
DualCover for vertex cover problem - Input: an undirected graph
;
- initially
; - while
- pick an arbitrary edge
and add both and to ; - remove all edges in
incident to or ;
- pick an arbitrary edge
- return
;
- Input: an undirected graph
Since this algorithm is just an implementation of the DualCover algorithm on the vertex cover instances (set cover instances with frequency
Ignoring lower order terms,
- Dinur and Safra (2005) showed that there is no poly-time algorithm with approximation ratio
, assuming that NP P. - Khot and Regev (2008) showed that there is no poly-time algorithm with approximation ratio
for any constant assuming the unique games conjecture.
Scheduling
We consider the following scheduling problem:
- There are
jobs to be processed. - There are
identical parallel machines. Each machine can start processing jobs at time 0 and can process at most one job at a time. - Each job
must be processed on one of these machines for time units without interruption. is called the processing time of job .
In a schedule each job is assigned to a machine to process starting at some time, respecting the above rules. The goal is to complete all jobs as soon as possible.
Suppose each job
This problem is called minimum makespan on identical parallel machines. It can be described as the following simpler version as a load balancing problem:
Minimum Makespan on Identical Parallel Machines (load balancing version) - Input:
positive integers and a positive integer ; - Output: an assignment
which minimizes .
- Input:
With the
The
The problem is NP-hard. In particular, when
Partition Problem - Input: a set of
positive integers ; - Determine whether there is a partition of
into and such that .
- Input: a set of
Graham's List algorithm
In a technical report in the Bell labs in 1966, Graham described a natural greedy procedure for scheduling jobs on parallel identical machines and gave an elegant analysis of the performance of the procedure. It was probably the first approximation algorithm in modern dates with provable approximation ratio. Interestingly, it was even earlier than the discovery of the notion of NP-hardness.
Graham's List algorithm takes a list of jobs as input. The load of a machine is defined as the total processing time of the jobs currently assigned to the machine.
The List algorithm (Graham 1966) - Input: a list of jobs
with processing times ;
- for
- assign job
to the machine that currently has the smallest load;
- assign job
- Input: a list of jobs
In a scheduling language, the List algorithm can be more simply described as:
- Whenever a machine becomes idle, it starts processing the next job on the list.
It is well known that the List algorithm has approximation ratio
Theorem - For every instance of scheduling
jobs with processing times on parallel identical machines, the List algorithm finds a schedule with makespan , where is the makespan of optimal schedules.
- For every instance of scheduling
Proof. Obviously for any schedule the makespan is at least the maximum processing time:
and by averaging principle, the makespan (maximum load) is at least the average load:
.
Suppose that in the schedule given by the List algorithm, job
finished at last, so the makespan where is the completion time of job .By the greediness of the List algorithm, before job
is scheduled, the machine to which job is going to be assigned has the smallest load. By averaging principle: .
On the other hand,
.
Together, we have
The analysis is tight, you can try to construct a family of instances on which the List returns schedules with makespan at least
Local Search
Local Search for Scheduling - start with an arbitrary schedule of
jobs to machines; - while (true)
- let
denote the job finished at last in the current schedule; - if there is machine
such that job can finish earlier by moving to machine- job
moves to machine ;
- job
- else break;
- let
- start with an arbitrary schedule of