# Set cover

Given ${\displaystyle m}$ subsets ${\displaystyle S_{1},S_{2},\ldots ,S_{m}\subseteq U}$ of a universe ${\displaystyle U}$ of size ${\displaystyle n=|U|}$, a ${\displaystyle C\subseteq \{1,2,\ldots ,m\}}$ forms a set cover if ${\displaystyle U=\bigcup _{i\in {\mathcal {C}}}S_{i}}$, that is, ${\displaystyle C}$ is a sub-collection of sets whose union "covers" all elements in the universe.

Without loss of generality, we always assume that the universe is ${\displaystyle U=\bigcup _{i=1}^{m}S_{i}}$.

This defines an important optimization problem:

 Set Cover Problem Input: ${\displaystyle m}$ subsets ${\displaystyle S_{1},S_{2},\ldots ,S_{m}\subseteq U}$ of a universe ${\displaystyle U}$ of size ${\displaystyle n}$; Output: the smallest ${\displaystyle C\subseteq \{1,2,\ldots ,m\}}$ such that ${\displaystyle U=\bigcup _{i\in C}S_{i}}$.

We can think of each instance as a bipartite graph ${\displaystyle G(U,\{S_{1},S_{2},\ldots ,S_{n}\},E)}$ with ${\displaystyle n}$ vertices on the right side, each corresponding to an element ${\displaystyle x\in U}$, ${\displaystyle m}$ vertices on the left side, each corresponding to one of the ${\displaystyle m}$ subsets ${\displaystyle S_{1},S_{2},\ldots ,S_{m}}$, and there is a bipartite edge connecting ${\displaystyle x}$ with ${\displaystyle S_{i}}$ if and only if ${\displaystyle x\in S_{i}}$. By this translation the set cover problem is precisely the problem of given as input a bipartite graph ${\displaystyle G(U,V,E)}$, to find the smallest subset ${\displaystyle C\subseteq V}$ of vertices on the right side to "cover" all vertices on the left side, such that every vertex on the left side ${\displaystyle x\in U}$ is incident to some vertex in ${\displaystyle C}$.

By alternating the roles of sets and elements in the above interpretation of set cover instances as bipartite graphs, the set cover problem can be translated to the following equivalent hitting set problem:

 Hitting Set Problem Input: ${\displaystyle n}$ subsets ${\displaystyle S_{1},S_{2},\ldots ,S_{n}\subseteq U}$ of a universe ${\displaystyle U}$ of size ${\displaystyle m}$; Output: the smallest subset ${\displaystyle C\subseteq U}$ of elements such that ${\displaystyle C}$ intersects with every set ${\displaystyle S_{i}}$ for ${\displaystyle 1\leq i\leq n}$.

## Frequency and Vertex Cover

Given an instance of set cover problem ${\displaystyle S_{1},S_{2},\ldots ,S_{m}\subseteq U}$, for every element ${\displaystyle x\in U}$, its frequency, denoted as ${\displaystyle frequency(x)}$, is defined as the number of sets containing ${\displaystyle X}$. Formally,

${\displaystyle frequency(x)=|\{i\mid x\in S_{i}\}|}$.

In the hitting set version, the frequency should be defined for each set: for a set ${\displaystyle S_{i}}$ its frequency ${\displaystyle frequency(S_{i})=|S_{i}|}$ is just the size of the set ${\displaystyle S_{i}}$.

The set cover problem restricted to the instances with frequency 2 becomes the vertex cover problem.

Given an undirected graph ${\displaystyle G(U,V)}$, a vertex cover is a subset ${\displaystyle C\subseteq V}$ of vertices such that every edge ${\displaystyle uv\in E}$ has at least one endpoint in ${\displaystyle C}$.

 Vertex Cover Problem Input: an undirected graph ${\displaystyle G(V,E)}$ Output: the smallest ${\displaystyle C\subseteq V}$ such that every edge ${\displaystyle e\in E}$ is incident to at least one vertex in ${\displaystyle C}$.

It is easy to compare with the hitting set problem:

• For graph ${\displaystyle G(V,E)}$, its edges ${\displaystyle e_{1},e_{2},\ldots ,e_{n}\subseteq V}$ are vertex-sets of size 2.
• A subset ${\displaystyle C\subseteq V}$ of vertices is a vertex cover if and only if it is a hitting sets for ${\displaystyle e_{1},e_{2},\ldots ,e_{n}}$, i.e. every ${\displaystyle e_{i}}$ intersects with ${\displaystyle C}$.

Therefore vertex cover is just set cover with frequency 2.

The vertex cover problem is NP-hard. Its decision version is among Karp's 21 NP-complete problems. Since vertex cover is a special case of set cover, the set cover problem is also NP-hard.

## Greedy Algorithm

We present our algorithms in the original set cover setting (instead of the hitting set version).

A natural algorithm is the greedy algorithm: sequentially add such ${\displaystyle i}$ to the cover ${\displaystyle C}$, where each ${\displaystyle S_{i}}$ covers the largest number of currently uncovered elements, until no element is left uncovered.

 GreedyCover Input: sets ${\displaystyle S_{1},S_{2},\ldots ,S_{m}}$; initially, ${\displaystyle U=\bigcup _{i=1}^{m}S_{i}}$, and ${\displaystyle C=\emptyset }$; while ${\displaystyle U\neq \emptyset }$ do find ${\displaystyle i\in \{1,2,\ldots ,m\}}$ with the largest ${\displaystyle |S_{i}\cap U|}$; let ${\displaystyle C=C\cup \{i\}}$ and ${\displaystyle U=U\setminus S_{i}}$; return ${\displaystyle C}$;

Obviously the algorithm runs in polynomial time and always returns a set cover. We will then show how good the set cover returned by the algorithm compared to the optimal solution by analyzing its approximation ratio.

We define the following notations:

• We enumerate all elements of the universe ${\displaystyle U}$ as ${\displaystyle x_{1},x_{2},\ldots ,x_{n}}$, in the order in which they are covered in the algorithm.
• For ${\displaystyle t=1,2,\ldots }$, let ${\displaystyle U_{t}}$ denote the set of uncovered elements in the beginning of the ${\displaystyle t}$-th iteration of the algorithm.
• For the ${\displaystyle k}$-th element ${\displaystyle x_{k}}$ covered, supposed that it is covered by ${\displaystyle S_{i}}$ in the ${\displaystyle t}$-th iteration, define
${\displaystyle price(x_{k})={\frac {1}{|S_{i}\cap U_{t}|}}}$
to be the average "price" to cover element ${\displaystyle x_{k}}$ in the algorithm.

Observe that if ${\displaystyle x_{k}}$ is covered by ${\displaystyle S_{i}}$ in the ${\displaystyle t}$-th iteration, then there are precisely ${\displaystyle |S_{i}\cap U_{t}|}$ elements, including ${\displaystyle x_{k}}$, become covered in that iteration, and all these elements have price ${\displaystyle 1/|S_{i}\cap U_{t}|}$. Then it is easy to have the following lemma:

 Lemma 1 For the set cover ${\displaystyle C}$ returned by the algorithm, ${\displaystyle |C|=\sum _{k=1}^{n}price(x_{k})}$.

This lemma connect the size of the returned set cover to the prices of elements. The next lemme connects the price of each element to the optimal solution.

 Lemma 2 For each ${\displaystyle x_{k}}$, ${\displaystyle price(x_{k})\leq {\frac {OPT}{n-k+1}}}$, where ${\displaystyle OPT}$ is the size of the optimal set cover.
Proof.
 For an instance ${\displaystyle S_{1},S_{2},\ldots ,S_{m}\subseteq U}$ with a universe of size ${\displaystyle n=|U|}$, let ${\displaystyle C^{*}\subseteq \{1,2,\ldots ,m\}}$ denote an optimal set cover. Then ${\displaystyle U=\bigcup _{i\in C^{*}}S_{i}}$. By averaging principle, there must be an ${\displaystyle S_{i}}$ of size ${\displaystyle |S_{i}|\geq {\frac {n}{|C^{*}|}}={\frac {n}{OPT}}}$. By the greediness of the algorithm, in the first iteration the algorithm must choose a set ${\displaystyle S_{i}}$ of at least this size to add to the set cover ${\displaystyle C}$, which means the price the element covered at first, ${\displaystyle x_{1}}$, along with all elements covered in the first iteration, are priced as ${\displaystyle price(x_{1})={\frac {1}{\max _{i}|S_{i}|}}\leq {\frac {OPT}{n}}}$. For the ${\displaystyle k}$-th element covered by the algorithm, supposed that it is covered by in the ${\displaystyle t}$-th iteration, and the current universe for the uncovered elements is ${\displaystyle U_{t}}$, it holds that ${\displaystyle |U_{t}|\leq n-k+1}$, since there are at most ${\displaystyle k-1}$ elements covered before ${\displaystyle x_{k}}$. The uncovered elements constitutes a set cover instance ${\displaystyle S_{1}\cap U_{t},S_{2}\cap U_{t},\ldots ,S_{m}\cap U_{t}}$ (some of which may be empty), with universe ${\displaystyle U_{t}}$. Note that this smaller instance has an optimal set cover of size at most OPT (since the optimal solution for the original instance must also be an optimal solution for this smaller instance), and ${\displaystyle x_{k}}$ is among the elements covered in the first iteration of the algorithm running on this smaller instance. By the above argument, it holds for the ${\displaystyle price(x_{k})={\frac {1}{|S_{i}\cap U_{t}|}}}$ (also note that this value is not changed no matter as in the ${\displaystyle t}$-th integration of the algorithm running on the original instance or as in the first iteration of the algorithm on the smaller instance) that ${\displaystyle price(x_{k})={\frac {1}{|S_{i}\cap U_{t}|}}\leq {\frac {OPT}{|U_{t}|}}={\frac {OPT}{n-k+1}}.}$ The lemma is proved.
${\displaystyle \square }$

Combining Lemma 1 and Lemma 2, we have

${\displaystyle |C|=\sum _{k=1}^{n}price(x_{k})\leq \sum _{k=1}^{n}{\frac {OPT}{n-k+1}}=\sum _{k=1}^{n}{\frac {OPT}{k}}=H_{n}\cdot OPT,}$

where ${\displaystyle H_{n}=\sum _{k=1}^{n}{\frac {1}{k}}\approx \ln n+O(1)}$ is the ${\displaystyle n}$-th Harmonic number.

This shows that the GreedyCover algorithm has approximation ratio ${\displaystyle H_{n}\approx \ln n}$.

 Theorem For any set cover instance ${\displaystyle S_{1},S_{2},\ldots ,S_{m}\subseteq U}$ with optimal set cover of size ${\displaystyle OPT}$, the GreedyCover returns a set cover of size ${\displaystyle C\leq H_{n}\cdot {OPT}}$, where ${\displaystyle n=|U|}$ is the size of the universe and ${\displaystyle H_{n}\approx \ln n}$ represents the ${\displaystyle n}$-th Harmonic number.

Ignoring lower order terms, ${\displaystyle \ln n}$ is also the best approximation ratio achievable by polynomial time algorithms, assuming that NP${\displaystyle neq}$P.

• Lund and Yannakakis (1994) showed that there is no poly-time algorithm with approximation ratio ${\displaystyle <{\frac {1}{2}}\log _{2}n}$, unless all NP problems have quasi-polynomial time algorithms (which runs in time ${\displaystyle n^{\mathrm {polylog} (n)}}$).
• Feige (1998) showed that there is no poly-time algorithm with approximation ratio better than ${\displaystyle (1-o(1))\ln n}$ with the same complexity assumption.
• Ras and Safra (1997) showed that there is no poly-time algorithm with approximation ratio better than ${\displaystyle c\ln n}$ for a constant ${\displaystyle c}$ assuming that NP${\displaystyle \neq }$P.
• Dinur and Steurer (2014) showed that there is no poly-time algorithm with approximation ratio better than ${\displaystyle (1-o(1))\ln n}$ assuming that NP${\displaystyle \neq }$P.

## Primal-Dual Algorithm

Given an instance ${\displaystyle S_{1},S_{2},\ldots ,S_{m}\subseteq U}$ for set cover, the set cover problem asks for minimizing the size of ${\displaystyle |C|}$ subject to the constraints that ${\displaystyle C\subseteq \{1,2,\ldots ,m\}}$ and ${\displaystyle U=\bigcup _{i\in C}S_{i}}$, i.e. ${\displaystyle C}$ is a set cover. We can define a dual problem on the same instance. The original problem, the set cover problem is called the primal problem. Formally, the primal and dual problems are defined as follows:

Primal:
minimize ${\displaystyle |C|}$
subject to ${\displaystyle C\subseteq \{1,2,\ldots ,m\}}$
${\displaystyle U=\bigcup _{i\in C}S_{i}}$
Dual:
maximize ${\displaystyle |M|}$
subject to ${\displaystyle M\subseteq U}$
${\displaystyle |S_{i}\cap M|\leq 1}$, ${\displaystyle \forall 1\leq i\leq m}$

The dual problem is a "maximum matching" problem, where the matching is defined for the set system instead of graph. Given an instance ${\displaystyle S_{1},S_{2},\ldots ,S_{m}\subseteq U}$, an ${\displaystyle M\subseteq U}$ is called a matching for ${\displaystyle S_{1},S_{2},\ldots ,S_{m}}$ if ${\displaystyle |S_{i}\cap M|\leq 1}$ for all ${\displaystyle i=1,2,\ldots ,m}$.

It is easier to see these two optimization problems are dual to each other if we write them as mathematical programs.

For the primal problem (set cover), for each ${\displaystyle 1\leq i\leq m}$, let ${\displaystyle x_{i}\in \{0,1\}}$ indicate whether ${\displaystyle i\in C}$. The set cover problem can be written as the following integer linear programming (ILP).

Primal:
minimize ${\displaystyle \sum _{i=1}^{m}x_{i}}$
subject to ${\displaystyle \sum _{i:v\in S_{i}}x_{i}\geq 1}$, ${\displaystyle \forall v\in U}$
${\displaystyle x_{i}\in \{0,1\}}$, ${\displaystyle \forall 1\leq i\leq m}$

For the dual problem (maximum matching), for each ${\displaystyle v\in U}$, let ${\displaystyle y_{v}\in \{0,1\}}$ indicate whether ${\displaystyle v\in M}$. Then the dual problem can be written as the following ILP.

Dual:
maximize ${\displaystyle \sum _{v\in U}y_{v}}$
subject to ${\displaystyle \sum _{v\in S_{i}}y_{v}\leq 1}$, ${\displaystyle \forall 1\leq i\leq m}$
${\displaystyle y_{v}\in \{0,1\}}$, ${\displaystyle \forall v\in U}$

It is fundamental fact that for a minimization problem, every feasible solution to the dual problem (which is a maximization problem) is a lower bound for the optimal solution to the primal problem. This is called the weak duality phenomenon. The easy direction (that every cut is a lower bound for every flow) of the famous "max-flow min-cut" is an instance of weak duality.

 Theorem (Weak Duality) For every matching ${\displaystyle M}$ and every vertex cover ${\displaystyle C}$ for ${\displaystyle S_{1},S_{2},\ldots ,S_{m}\subseteq U}$, it holds that ${\displaystyle |M|\leq |C|}$.
Proof.
 If ${\displaystyle M\subseteq U}$ is a matching for ${\displaystyle S_{1},S_{2},\ldots ,S_{m}\subseteq U}$, then every ${\displaystyle S_{i}}$ intersects with ${\displaystyle M}$ on at most one element, which means no two elements in ${\displaystyle M}$ can be covered by one ${\displaystyle S_{i}}$, and hence each element in ${\displaystyle M}$ will consume at least one distinct ${\displaystyle S_{i}}$ to cover. Therefore, for any set cover, in order to cover all elements in ${\displaystyle M}$ must cost at least ${\displaystyle |M|}$ sets. More formally, for any matching ${\displaystyle M}$ and set cover ${\displaystyle C}$, it holds that ${\displaystyle |M|=\left|\bigcup _{i\in C}(S_{i}\cap M)\right|\leq \sum _{i\in C}|S_{i}\cap M|\leq |C|,}$ where the first equality is because ${\displaystyle C}$ is a set cover and the last inequality is because ${\displaystyle M}$ is a matching.
${\displaystyle \square }$

As a corollary, every matching ${\displaystyle M}$ for ${\displaystyle S_{1},S_{2},\ldots ,S_{m}\subseteq U}$ is a lower bound for the optimal set cover ${\displaystyle OPT}$:

 Corollary Let ${\displaystyle S_{1},S_{2},\ldots ,S_{m}\subseteq U}$ be an instance for set cover, and ${\displaystyle OPT}$ the size of the optimal set cover. For every matching ${\displaystyle M}$ for ${\displaystyle S_{1},S_{2},\ldots ,S_{m}}$, it holds that ${\displaystyle |M|\leq OPT}$.

Now we are ready to present our algorithm. It is a greedy algorithm in the dual world. And the maximal (local optimal) solution to the dual problem helps us to find a good enough solution to the primal problem.

 DualCover Input: sets ${\displaystyle S_{1},S_{2},\ldots ,S_{m}\subseteq U}$; construct a maximal matching ${\displaystyle M\subseteq U}$ such that ${\displaystyle |S_{i}\cap M|\leq 1}$ for all ${\displaystyle i=1,2,\ldots ,m}$ by sequentially adding elements to ${\displaystyle M}$ until nothing can be added; return ${\displaystyle C=\{i\mid S_{i}\cap M\neq \emptyset \}}$

The algorithm constructs the maximal matching ${\displaystyle M}$ by sequentially adding elements into ${\displaystyle M}$ until reaching the maximality. This obviously takes polynomial time.

It is not so obvious to see that the returned ${\displaystyle C}$ is always a set cover. This is due to the maximality of the matching ${\displaystyle M}$:

• By contradiction, assuming that ${\displaystyle C}$ is not a set cover, which means there is an element ${\displaystyle x\in U}$ such that for all ${\displaystyle i\in C}$, ${\displaystyle x\not \in S_{i}}$, which implies that ${\displaystyle x\not \in M}$ and the ${\displaystyle M'=M\cap \{x\}}$ is still a matching, contradicting the maximality of ${\displaystyle M}$.

Therefore the ${\displaystyle C}$ constructed as in the DualCover algorithm must be a set cover.

For the maximal matching ${\displaystyle M}$ constructed by the DualCover algorithm, the output set cover is the collection of all sets which contain at least one element in ${\displaystyle M}$. Recall that the frequency of an element ${\displaystyle frequency(x)}$ is defined as the number of sets ${\displaystyle S_{i}}$ containing ${\displaystyle x}$. Then each element ${\displaystyle x\in M}$ may contribute at most ${\displaystyle frequency(x)}$ many sets into the set cover ${\displaystyle C}$. Then it holds that

${\displaystyle |C|\leq \sum _{x\in M}frequency(x)\leq f\cdot |M|\leq f\cdot OPT,}$

where ${\displaystyle f=\max _{x\in U}frequency(x)}$ denotes the maximum frequency of all elements.

We proved the following ${\displaystyle f}$-approximation bound for the DualCover algorithm on set cover instances with maximum frequency ${\displaystyle f}$.

 Theorem For any set cover instance ${\displaystyle S_{1},S_{2},\ldots ,S_{m}\subseteq U}$ with optimal set cover of size ${\displaystyle OPT}$, the DualCover returns a set cover of size ${\displaystyle C\leq f\cdot {OPT}}$, where ${\displaystyle f=\max _{x\in U}frequency(x)=\max _{x\in U}|\{i\mid x\in S_{i}\}|}$ is the maximum frequency.

When the frequency ${\displaystyle f=2}$, the set cover problem becomes the vertex cover problem. And the DualCover algorithm works simply as follows:

 DualCover for vertex cover problem Input: an undirected graph ${\displaystyle G(V,E)}$; initially ${\displaystyle C=\emptyset }$; while ${\displaystyle E\neq \emptyset }$ pick an arbitrary edge ${\displaystyle uv\in E}$ and add both ${\displaystyle u}$ and ${\displaystyle v}$ to ${\displaystyle C}$; remove all edges in ${\displaystyle E}$ incident to ${\displaystyle u}$ or ${\displaystyle v}$; return ${\displaystyle C}$;

Since this algorithm is just an implementation of the DualCover algorithm on the vertex cover instances (set cover instances with frequency ${\displaystyle f=2}$), by the analysis of the DualCover algorithm, it is a 2-approximation algorithm for the vertex cover problem.

Ignoring lower order terms, ${\displaystyle 2}$ is also the best approximation ratio achievable by polynomial time algorithms, assuming certain complexity assumption.

• Dinur and Safra (2005) showed that there is no poly-time algorithm with approximation ratio ${\displaystyle <1.3606}$, assuming that NP${\displaystyle \neq }$P.
• Khot and Regev (2008) showed that there is no poly-time algorithm with approximation ratio ${\displaystyle 2-\epsilon }$ for any constant ${\displaystyle \epsilon }$ assuming the unique games conjecture.

# Scheduling

We consider the following scheduling problem:

• There are ${\displaystyle n}$ jobs to be processed.
• There are ${\displaystyle m}$ identical parallel machines. Each machine can start processing jobs at time 0 and can process at most one job at a time.
• Each job ${\displaystyle j=1,2,\ldots ,n}$ must be processed on one of these machines for ${\displaystyle p_{j}}$ time units without interruption. ${\displaystyle p_{j}}$ is called the processing time of job ${\displaystyle j}$.

In a schedule each job is assigned to a machine to process starting at some time, respecting the above rules. The goal is to complete all jobs as soon as possible.

Suppose each job ${\displaystyle j}$ is completed at time ${\displaystyle C_{j}}$, the objective is to minimize the makespan ${\displaystyle C_{\max }=\max _{j}C_{j}}$.

This problem is called minimum makespan on identical parallel machines. It can be described as the following simpler version as a load balancing problem:

 Minimum Makespan on Identical Parallel Machines (load balancing version) Input: ${\displaystyle n}$ positive integers ${\displaystyle p_{1},p_{2},\ldots ,p_{n}}$ and a positive integer ${\displaystyle m}$; Output: an assignment ${\displaystyle \sigma :[n]\to [m]}$ which minimizes ${\displaystyle C_{\max }=\max _{i\in [m]}\sum _{j:i=\sigma (j)}p_{j}}$.

With the ${\displaystyle \alpha |\beta |\gamma }$ notation for scheduling, this problem is the scheduling problem ${\displaystyle P||C_{\max }}$.

The ${\displaystyle \alpha |\beta |\gamma }$ notation was introduced by Ron Graham et al. to model scheduling problems. See this note for more details.

The problem is NP-hard. In particular, when ${\displaystyle m=2}$, the problem can solve the partition problem, which is among Karp's 21 NP-complete problems.

 Partition Problem Input: a set of ${\displaystyle n}$ positive integers ${\displaystyle S=\{x_{1},x_{2},\ldots ,x_{n}\}}$; Determine whether there is a partition of ${\displaystyle S}$ into ${\displaystyle A}$ and ${\displaystyle B}$ such that ${\displaystyle \sum _{x\in A}=\sum _{x\in B}}$.

## Graham's List algorithm

In a technical report in the Bell labs in 1966, Graham described a natural greedy procedure for scheduling jobs on parallel identical machines and gave an elegant analysis of the performance of the procedure. It was probably the first approximation algorithm in modern dates with provable approximation ratio. Interestingly, it was even earlier than the discovery of the notion of NP-hardness.

Graham's List algorithm takes a list of jobs as input. The load of a machine is defined as the total processing time of the jobs currently assigned to the machine.

 The List algorithm (Graham 1966) Input: a list of jobs ${\displaystyle j=1,2,\ldots ,n}$ with processing times ${\displaystyle p_{1},p_{2},\ldots ,p_{n}}$; for ${\displaystyle j=1,2,\ldots ,n}$ assign job ${\displaystyle j}$ to the machine that currently has the smallest load;

In a scheduling language, the List algorithm can be more simply described as:

• Whenever a machine becomes idle, it starts processing the next job on the list.

It is well known that the List algorithm has approximation ratio ${\displaystyle \left(2-{\frac {1}{m}}\right)}$.

 Theorem For every instance of scheduling ${\displaystyle n}$ jobs with processing times ${\displaystyle p_{1},p_{2},\ldots ,p_{n}}$ on ${\displaystyle m}$ parallel identical machines, the List algorithm finds a schedule with makespan ${\displaystyle C_{\max }\leq \left(2-{\frac {1}{m}}\right)\cdot OPT}$, where ${\displaystyle OPT}$ is the makespan of optimal schedules.
Proof.
 Obviously for any schedule the makespan is at least the maximum processing time: ${\displaystyle OPT\geq \max _{1\leq j\leq n}p_{j}}$ and by averaging principle, the makespan (maximum load) is at least the average load: ${\displaystyle OPT\geq {\frac {1}{m}}\sum _{j=1}^{n}p_{j}}$. Suppose that in the schedule given by the List algorithm, job ${\displaystyle \ell }$ finished at last, so the makespan ${\displaystyle C_{\max }=C_{\ell }}$ where ${\displaystyle C_{\ell }}$ is the completion time of job ${\displaystyle \ell }$. By the greediness of the List algorithm, before job ${\displaystyle \ell }$ is scheduled, the machine to which job ${\displaystyle \ell }$ is going to be assigned has the smallest load. By averaging principle: ${\displaystyle C_{\ell }-p_{\ell }\leq {\frac {1}{m}}\sum _{j\neq \ell }p_{j}}$. On the other hand, ${\displaystyle p_{\ell }\leq \max _{1\leq j\leq n}p_{j}}$. Together, we have ${\displaystyle C_{\max }=C_{\ell }\leq {\frac {1}{m}}\sum _{j=1}^{m}p_{j}+\left(1-{\frac {1}{m}}\right)p_{\ell }\leq {\frac {1}{m}}\sum _{j=1}^{m}p_{j}+\left(1-{\frac {1}{m}}\right)\max _{1\leq j\leq n}p_{j}\leq \left(2-{\frac {1}{m}}\right)\cdot OPT.}$
${\displaystyle \square }$

The analysis is tight, you can try to construct a family of instances on which the List returns schedules with makespan at least ${\displaystyle \left(2-{\frac {1}{m}}\right)\cdot OPT}$.

## Local Search

Another natural idea for solving optimization problems is the local search. Given an instance of optimization problem, the principle of local search is as follows:

• At each step, we try to improve the solution by modifying the locally (changing the assignment of a constant number of variables), until nothing can be further changed (thus a local optimum is reached).

For the scheduling problem, we have the following local search algorithm.

 LocalSearch start with an arbitrary schedule of ${\displaystyle n}$ jobs to ${\displaystyle m}$ machines; while (true) let ${\displaystyle \ell }$ denote the job finished at last in the current schedule; if there is machine ${\displaystyle i}$ such that job ${\displaystyle \ell }$ can finish earlier if transferred to machine ${\displaystyle i}$ job ${\displaystyle \ell }$ transfers to machine ${\displaystyle i}$; else break;

By a similar analysis to that of the List algorithm, we can give the same bound to the approximation ratio of the LocalSearch algorithm.

Suppose that when the algorithm stops and a local optimum is reached, job ${\displaystyle \ell }$ finishes at last. Then the makespan ${\displaystyle C_{\max }}$ is achieved by job ${\displaystyle \ell }$'s completion time ${\displaystyle C_{\max }=C_{\ell }}$.

In a local optimum, ${\displaystyle p_{\ell }}$ cannot transfer to any other machine to improve its completion time ${\displaystyle C_{\ell }}$. Then by averaging principle, the starting time of job ${\displaystyle \ell }$, ${\displaystyle C_{\ell }-p_{\ell }}$, must be no greater than the average processing time of all jobs except ${\displaystyle \ell }$:

${\displaystyle C_{\ell }-p_{\ell }\leq {\frac {1}{m}}\sum _{j\neq \ell }p_{j}}$.

The rest is precisely the same as the analysis of the List algorithm. We have

${\displaystyle C_{\max }=C_{\ell }\leq {\frac {1}{m}}\sum _{j=1}^{n}p_{j}+\left(1-{\frac {1}{m}}\right)p_{\ell }\leq OPT+\left(1-{\frac {1}{m}}\right)OPT=\left(2-{\frac {1}{m}}\right)\cdot OPT}$.

So the approximation ratio of the LocalSearch algorithm is ${\displaystyle \left(2-{\frac {1}{m}}\right)}$.

For local search, a bigger issue is to bound its running time. In the LocalSearch algorithm for scheduling, we observe that the makespan ${\displaystyle C_{\max }}$ never increases. Furthermore, in each iteration, either ${\displaystyle C_{\max }}$ decreases or the number of jobs completes at time ${\displaystyle C_{\max }}$ decreases. Therefore the algorithm must terminates within finite steps.

In order to improve the running time, we consider the following variant of the local search algorithm.

 GreedyLocalSearch start with an arbitrary schedule of ${\displaystyle n}$ jobs to ${\displaystyle m}$ machines; while (true) let ${\displaystyle \ell }$ denote the job finished at last in the current schedule; let ${\displaystyle i}$ denote the machine which completes all its jobs earliest; if job ${\displaystyle \ell }$ can finish earlier by transferring to the machine ${\displaystyle i}$ job ${\displaystyle \ell }$ transfers to machine ${\displaystyle i}$; else break;

The GreedyLocalSearch algorithm has the same approximation ratio as the LocalSearch algorithm since it is a special case of the LocalSearch algorithm.

We let ${\displaystyle C_{\min }}$ to denote the completion time of the machine who finishes all its jobs earliest. It is easy to see that ${\displaystyle C_{\min }}$ never decreases. More precisely, in each iteration either ${\displaystyle C_{\min }}$ increases or the number of machines that complete all its jobs at time ${\displaystyle C_{\min }}$ decreases. In each iteration, if the algorithm does not terminate, the job that finishes at last transfers to the machine that stops at time ${\displaystyle C_{\min }}$. Since a job will not be transferred to a machine which stops no earlier than its starting time and ${\displaystyle C_{\min }}$ never decreases, a job will never be transferred twice. Therefore, the GreedyLocalSearch algorithm must terminate within ${\displaystyle n}$ iterations.

 Theorem For every instance of scheduling ${\displaystyle n}$ jobs with processing times ${\displaystyle p_{1},p_{2},\ldots ,p_{n}}$ on ${\displaystyle m}$ parallel identical machines, starting from an arbitrary schedule, the GreedyLocalSearch algorithm terminates in at most ${\displaystyle n}$ iterations and always reaches a schedule with makespan ${\displaystyle C_{\max }\leq \left(2-{\frac {1}{m}}\right)\cdot OPT}$, where ${\displaystyle OPT}$ is the makespan of optimal schedules.

The local search algorithm can start with an arbitrary solution. If it starts with a schedule returned by the List algorithm, it will terminate without making any change to the schedule, yet the approximation ratio ${\displaystyle (2-1/m)}$ still holds. This provides another proof of that the List algorithm is ${\displaystyle (2-1/m)}$-approximation.

In the analysis of the local search algorithm, we actually show that any local optimum has approximation ratio ${\displaystyle \leq (2-1/m)}$ to the global optimum. And the List algorithm returns a local optimum.

## Longest Processing Time (LPT)

In the List algorithm, the jobs are presented in an arbitrary order. Intuitively, the List algorithm seems to perform better if we process the heavier jobs easier. This gives us the following Longest Processing Time (LPT) algorithm.

 LongestProcessingTime(LPT) Input: a list of jobs ${\displaystyle j=1,2,\ldots ,n}$ with processing times ${\displaystyle p_{1}\geq p_{2}\geq \cdots \geq p_{n}}$ in non-increasing order; for ${\displaystyle j=1,2,\ldots ,n}$ assign job ${\displaystyle j}$ to the machine that currently has the smallest load;

The analysis of approximation is similar as before, with an extra observation that the job that finishes at last is smaller since the jobs are scheduled in the non-increasing order in processing times.

Suppose that jobs ${\displaystyle \ell }$ finishes at last. And the makespan is job ${\displaystyle \ell }$'s completion time ${\displaystyle C_{\max }=C_{\ell }}$. Due to the greediness of the algorithm, we still have that

${\displaystyle C_{\ell }-p_{\ell }\leq {\frac {1}{m}}\sum _{j=1}^{n}p_{j}}$.

Therefore,

${\displaystyle C_{\max }=C_{\ell }\leq {\frac {1}{m}}\sum _{j=1}^{n}p_{j}+p_{\ell }}$.

We still have the lower bound ${\displaystyle OPT\geq {\frac {1}{m}}\sum _{j=1}^{n}p_{j}}$. Next we find a better lower bound of ${\displaystyle OPT}$ in terms of ${\displaystyle p_{\ell }}$.

Note that the first ${\displaystyle m}$ jobs are assigned to all ${\displaystyle m}$ machines, one job per each machine. Without loss of generality we can assume:

• the number of jobs is greater than the number of machines, ${\displaystyle n>m}$;
• the makespan is achieved by a job other than the first ${\displaystyle m}$ jobs, ${\displaystyle \ell >m}$;

since if otherwise, the LPT algorithm will return an optimal solution.

Therefore ${\displaystyle p_{\ell }\leq p_{m+1}}$. And observe that even for the first ${\displaystyle m+1}$ jobs with processing times ${\displaystyle p_{1}\geq p_{2}\geq \cdots \geq p_{m+1}}$, the optimal makespan is at least ${\displaystyle p_{m}+p_{m+1}}$ by pigeon hole principle. We have the lower bound for the optimal makespan

${\displaystyle OPT\geq p_{m}+p_{m+1}\geq 2p_{m+1}\geq 2p_{\ell }}$.

Altogether, we have

${\displaystyle C_{\max }=C_{\ell }\leq {\frac {1}{m}}\sum _{j=1}^{n}p_{j}+p_{\ell }\leq OPT+{\frac {1}{2}}OPT={\frac {3}{2}}OPT}$.

We show that the LPT algorithm is ${\displaystyle {\frac {3}{2}}}$-approximation.

Both the analysis of the LPT algorithm and approximability of the scheduling problem can be further improved:

• With a more careful analysis, one can show that the LPT algorithm is ${\displaystyle {\frac {4}{3}}}$-approximation. The trick is to show by case analysis that ${\displaystyle OPT\geq 3p_{\ell }}$ or otherwise the LPT algorithm can find an optimal solution.
• By scaling and rounding a dynamic programming, one can obtain a PTAS (Polynomial Time Approximation Scheme) for minimum makespan scheduling on parallel identical machines.

## Online Algorithms and Competitive Ratio

An advantage of the List algorithm is that it is an online algorithm. For an online algorithm, the input arrives one piece at a time, and the algorithm must make decision when a piece of the input arrives without seeing the future pieces. In contrast, the algorithm which can make decision after seeing the entire input is called an offline algorithm.

For the scheduling problem, the online setting is that the ${\displaystyle n}$ jobs arrive one-by-one in a series, and the online scheduling algorithm needs to assign each job to a machine to start processing right after it arrives (or having a buffer of bounded size to temporarily store the incoming jobs). The List algorithm is an online scheduling algorithm, while the LPT algorithm is not because it needs to sort all jobs according to the processing time.

For online algorithms, a notion similar to the approximation ratio, the competitive ratio was introduced to measure the performance. For a minimization problem, we say an online algorithm has competitive ratio ${\displaystyle \alpha }$, or is ${\displaystyle \alpha }$-competitive, if for any input sequence ${\displaystyle I}$,

${\displaystyle SOL_{I}\leq \alpha \cdot OPT_{I}}$

where ${\displaystyle SOL_{I}}$ is the value of the solution returned by the online algorithm with the input ${\displaystyle I}$ and ${\displaystyle OPT_{I}}$ is the value of the solution returned by the optimal offline algorithm on the same input. The competitive ratio for maximization problem can be similarly defined.

For online scheduling, it is immediate to see that the List algorithm is ${\displaystyle (2-1/m)}$-competitive.

• This competitive ratio is optimal for all deterministic online algorithms on ${\displaystyle m=2}$ or ${\displaystyle 3}$ machines.
• For large number of machines, better competitive ratios (1.986, 1.945, 1.923, 1.9201, 1.916) were achieved.
• On the lower bound side, it is known that no deterministic online scheduling algorithm can achieve a competitive ratio better than 1.88 and no randomized online scheduling algorithm can achieve a competitive ratio better than ${\displaystyle 1/(1-1/e)}$.

See a survey of Albers for more details.