# MAX-SAT

Suppose that we have a number of boolean variables ${\displaystyle x_{1},x_{2},\ldots ,\in \{\mathrm {true} ,\mathrm {false} \}}$. A literal is either a variable ${\displaystyle x_{i}}$ itself or its negation ${\displaystyle \neg x_{i}}$. A logic expression is a conjunctive normal form (CNF) if it is written as the conjunction(AND) of a set of clauses, where each clause is a disjunction(OR) of literals. For example:

${\displaystyle (x_{1}\vee \neg x_{2}\vee \neg x_{3})\wedge (\neg x_{1}\vee \neg x_{3})\wedge (x_{1}\vee x_{2}\vee x_{4})\wedge (x_{4}\vee \neg x_{3})\wedge (x_{4}\vee \neg x_{1}).}$

The satisfiability (SAT) problem is that given as input a CNF formula decide whether the CNF is satisfiable, i.e. there exists an assignment of variables to the values of true and false so that all clauses are true. SAT is the first problem known to be NP-complete (the Cook-Levin theorem).

We consider the the optimization version of SAT, which ask for an assignment that the number of satisfied clauses is maximized.

 Problem (MAX-SAT) Given a conjunctive normal form (CNF) formula of ${\displaystyle m}$ clauses defined on ${\displaystyle n}$ boolean variables ${\displaystyle x_{1},x_{2},\ldots ,x_{n}}$, find a truth assignment to the boolean variables that maximizes the number of satisfied clauses.

# The Probabilistic Method

A straightforward way to solve Max-SAT is to uniformly and independently assign each variable a random truth assignment. The following theorem is proved by the probabilistic method.

 Theorem For any set of ${\displaystyle m}$ clauses, there is a truth assignment that satisfies at least ${\displaystyle {\frac {m}{2}}}$ clauses.
Proof.
 For each variable, independently assign a random value in ${\displaystyle \{\mathrm {true} ,\mathrm {false} \}}$ with equal probability. For the ${\displaystyle i}$th clause, let ${\displaystyle X_{i}}$ be the random variable which indicates whether the ${\displaystyle i}$th clause is satisfied. Suppose that there are ${\displaystyle k}$ literals in the clause. The probability that the clause is satisfied is ${\displaystyle \Pr[X_{k}=1]\geq (1-2^{-k})\geq {\frac {1}{2}}}$. Let ${\displaystyle X=\sum _{i=1}^{m}X_{i}}$ be the number of satisfied clauses. By the linearity of expectation, ${\displaystyle \mathbf {E} [X]=\sum _{i=1}^{m}\mathbf {E} [X_{i}]\geq {\frac {m}{2}}.}$ Therefore, there exists an assignment such that at least ${\displaystyle {\frac {m}{2}}}$ clauses are satisfied.
${\displaystyle \square }$

Note that this gives a randomized algorithm which returns a truth assignment satisfying at least ${\displaystyle {\frac {m}{2}}}$ clauses in expectation. There are totally ${\displaystyle m}$ clauses, thus the optimal solution is at most ${\displaystyle m}$, which means that this simple randomized algorithm is a ${\displaystyle {\frac {1}{2}}}$-approximation algorithm for the MAX-CUT problem.

# LP Relaxation + Randomized Rounding

For a clause ${\displaystyle C_{j}}$, let ${\displaystyle C_{i}^{+}}$ be the set of indices of the variables that appear in the uncomplemented form in clause ${\displaystyle C_{j}}$, and let ${\displaystyle C_{i}^{-}}$ be the set of indices of the variables that appear in the complemented form in clause ${\displaystyle C_{j}}$. The Max-SAT problem can be formulated as the following integer linear programing.

{\displaystyle {\begin{aligned}{\mbox{maximize}}&\quad \sum _{j=1}^{m}z_{j}\\{\mbox{subject to}}&\quad \sum _{i\in C_{j}^{+}}y_{i}+\sum _{i\in C_{j}^{-}}(1-y_{i})\geq z_{j},&&\forall 1\leq j\leq m\\&\qquad \qquad y_{i}\in \{0,1\},&&\forall 1\leq i\leq n\\&\qquad \qquad z_{j}\in \{0,1\},&&\forall 1\leq j\leq m\end{aligned}}}

Each ${\displaystyle y_{i}}$ in the programing indicates the truth assignment to the variable ${\displaystyle x_{i}}$, and each ${\displaystyle z_{j}}$ indicates whether the claus ${\displaystyle C_{j}}$ is satisfied. The inequalities ensure that a clause is deemed to be true only if at least one of the literals in the clause is assigned the value 1.

The integer linear programming is relaxed to the following linear programming:

{\displaystyle {\begin{aligned}{\mbox{maximize}}&\quad \sum _{j=1}^{m}z_{j}\\{\mbox{subject to}}&\quad \sum _{i\in C_{j}^{+}}y_{i}+\sum _{i\in C_{j}^{-}}(1-y_{i})\geq z_{j},&&\forall 1\leq j\leq m\\&\qquad \qquad 0\leq y_{i}\leq 1,&&\forall 1\leq i\leq n\\&\qquad \qquad 0\leq z_{j}\leq 1,&&\forall 1\leq j\leq m\end{aligned}}}

Let ${\displaystyle y_{i}^{*}}$ and ${\displaystyle z_{j}^{*}}$ be the fractional optimal solutions to the above linear programming. Clearly, ${\displaystyle \sum _{j=1}^{m}z_{j}^{*}}$ is an upper bound on the optimal number of satisfied clauses, i.e. we have

${\displaystyle \mathrm {OPT} \leq \sum _{j=1}^{m}z_{j}^{*}}$.

Apply a very natural randomized rounding scheme. For each ${\displaystyle 1\leq i\leq n}$, independently

${\displaystyle y_{i}={\begin{cases}1&{\mbox{with probability }}y_{i}^{*}.\\0&{\mbox{with probability }}1-y_{i}^{*}.\end{cases}}}$

Correspondingly, each ${\displaystyle x_{i}}$ is assigned to TRUE independently with probability ${\displaystyle y_{i}^{*}}$.

 Lemma Let ${\displaystyle C_{j}}$ be a clause with ${\displaystyle k}$ literals. The probability that it is satisfied by randomized rounding is at least ${\displaystyle (1-(1-1/k)^{k})z_{j}^{*}}$.
Proof.
 Without loss of generality, we assume that all ${\displaystyle k}$ variables appear in ${\displaystyle C_{j}}$ in the uncomplemented form, and we assume that ${\displaystyle C_{j}=x_{1}\vee x_{2}\vee \cdots \vee x_{k}}$. The complemented cases are symmetric. Clause ${\displaystyle C_{j}}$ remains unsatisfied by randomized rounding only if every one of ${\displaystyle x_{i}}$, ${\displaystyle 1\leq i\leq k}$, is assigned to FALSE, which corresponds to that every one of ${\displaystyle y_{i}}$, ${\displaystyle 1\leq i\leq k}$, is rounded to 0. This event occurs with probability ${\displaystyle \prod _{i=1}^{k}(1-y_{i}^{*})}$. Therefore, the clause ${\displaystyle C_{j}}$ is satisfied by the randomized rounding with probability ${\displaystyle 1-\prod _{i=1}^{k}(1-y_{i}^{*})}$. By the linear programming constraints, ${\displaystyle y_{1}^{*}+y_{2}^{*}+\cdots +y_{k}^{*}\geq z_{j}^{*}}$. Then the value of ${\displaystyle 1-\prod _{i=1}^{k}(1-y_{i}^{*})}$ is minimized when all ${\displaystyle y_{i}^{*}}$ are equal and ${\displaystyle y_{i}^{*}={\frac {z_{j}^{*}}{k}}}$. Thus, the probability that ${\displaystyle C_{j}}$ is satisfied is ${\displaystyle 1-\prod _{i=1}^{k}(1-y_{i}^{*})\geq 1-(1-z_{j}^{*}/k)^{k}\geq (1-(1-1/k)^{k})z_{j}^{*}}$, where the last inequality is due to the concaveness of the function ${\displaystyle 1-(1-z_{j}^{*}/k)^{k}}$ of variable ${\displaystyle z_{j}^{*}}$.
${\displaystyle \square }$

For any ${\displaystyle k\geq 1}$, it holds that ${\displaystyle 1-(1-1/k)^{k}>1-1/e}$. Therefore, by the linearity of expectation, the expected number of satisfied clauses by the randomized rounding, is at least

${\displaystyle (1-1/e)\sum _{j=1}z_{j}^{*}\geq (1-1/e)\cdot \mathrm {OPT} }$.

The inequality is due to the fact that ${\displaystyle {\hat {z}}_{j}}$ are the optimal fractional solutions to the relaxed LP, thus are no worse than the optimal integral solutions.

# Choose a better solution

For any instance of the Max-SAT, let ${\displaystyle m_{1}}$ be the expected number of satisfied clauses when each variable is independently set to TRUE with probability ${\displaystyle {\frac {1}{2}}}$; and let ${\displaystyle m_{2}}$ be the expected number of satisfied clauses when we use the linear programming followed by randomized rounding.

We will show that on any instance of the Max-SAT, one of the two algorithms is a ${\displaystyle {\frac {3}{4}}}$-approximation algorithm.

 Theorem ${\displaystyle \max\{m_{1},m_{2}\}\geq {\frac {3}{4}}\cdot \mathrm {OPT} .}$
Proof.
 It suffices to show that ${\displaystyle {\frac {(m_{1}+m_{2})}{2}}\geq {\frac {3}{4}}\sum _{j=1}^{m}z_{j}^{*}}$. Letting ${\displaystyle S_{k}}$ denote the set of clauses that contain ${\displaystyle k}$ literals, we know that ${\displaystyle m_{1}=\sum _{k=1}^{n}\sum _{C_{j}\in S_{k}}(1-2^{-k})\geq \sum _{k=1}^{n}\sum _{C_{j}\in S_{k}}(1-2^{-k})z_{j}^{*}.}$ By the analysis of randomized rounding, ${\displaystyle m_{2}\geq \sum _{k=1}^{n}\sum _{C_{j}\in S_{k}}(1-(1-1/k)^{k})z_{j}^{*}.}$ Thus ${\displaystyle {\frac {(m_{1}+m_{2})}{2}}\geq \sum _{k=1}^{n}\sum _{C_{j}\in S_{k}}{\frac {1-2^{-k}+1-(1-1/k)^{k}}{2}}z_{j}^{*}.}$ An easy calculation shows that ${\displaystyle {\frac {1-2^{-k}+1-(1-1/k)^{k}}{2}}\geq {\frac {3}{4}}}$ for any ${\displaystyle k}$, so that we have ${\displaystyle {\frac {(m_{1}+m_{2})}{2}}\geq {\frac {3}{4}}\sum _{k=1}^{n}\sum _{C_{j}\in S_{k}}z_{j}^{*}={\frac {3}{4}}\sum _{j=1}^{m}z_{j}^{*}\geq {\frac {3}{4}}\cdot \mathrm {OPT} .}$
${\displaystyle \square }$