# 随机算法 (Fall 2011)/DNF Counting

# Approximate Counting

Let us consider the following abstract problem.

Let be a finite set of known size, and let . We want to compute the size of , namely .

We assume two devices:

- A
**uniform sampler**, which uniformly and independently samples a member of upon each calling. - A
**membership oracle**of , denoted . Given as the input an , indicates whether or not is a member of .

Equipped by and , we can have the following Monte Carlo algorithm:

- Choose independent samples from by the uniform sampler , represented by the random variables .
- Let be the indicator random variable defined as , namely, indicates whether .
- Define the estimator random variable

It is easy to see that and we might hope that with high probability the value of is close to . Formally, is called an -approximation of if

The following theorem states that the probabilistic accuracy of the estimation depends on the number of samples and the ratio between and

**Theorem (estimator theorem)**- Let . Then the Monte Carlo method yields an -approximation to with probability at least provided
- .

- Let . Then the Monte Carlo method yields an -approximation to with probability at least provided

**Proof.**Use the Chernoff bound.

A counting algorithm for the set has to deal with the following three issues:

- Implement the membership oracle . This is usually straightforward, or assumed by the model.
- Implement the uniform sampler . As we have seen, this is usually approximated by random walks. How to design the random walk and bound its mixing rate is usually technical challenging, if possible at all.
- Deal with exponentially small . This requires us to cleverly choose the universe . Sometimes this needs some nontrivial ideas.

# Counting DNFs

A disjunctive normal form (DNF) formular is a disjunction (OR) of clauses, where each clause is a conjunction (AND) of literals. For example:

- .

Note the difference from the conjunctive normal forms (CNF).

Given a DNF formular as the input, the problem is to count the number of satisfying assignments of . This problem is **#P-complete**.

Naively applying the Monte Carlo method will not give a good answer. Suppose that there are variables. Let be the set of all truth assignments of the variables. Let be the set of satisfying assignments for . The straightforward use of Monte Carlo method samples assignments from and check how many of them satisfy . This algorithm fails when is exponentially small, namely, when exponentially small fraction of the assignments satisfy the input DNF formula.

- The union of sets problem

We reformulate the DNF counting problem in a more abstract framework, called the **union of sets** problem.

Let be a finite universe. We are given subsets . The following assumptions hold:

- For all , is computable in poly-time.
- It is possible to sample uniformly from each individual .
- For any , it can be determined in poly-time whether .

The goal is to compute the size of .

DNF counting can be interpreted in this general framework as follows. Suppose that the DNF formula is defined on variables, and contains clauses , where clause has literals. Without loss of generality, we assume that in each clause, each variable appears at most once.

- is the set of all assignments.
- Each is the set of satisfying assignments for the -th clause of the DNF formular . Then the union of sets gives the set of satisfying assignments for .
- Each clause is a conjunction (AND) of literals. It is not hard to see that , which is efficiently computable.
- Sampling from an is simple: we just fix the assignments of the literals of that clause, and sample uniformly and independently the rest variable assignments.
- For each assignment , it is easy to check whether it satisfies a clause , thus it is easy to determine whether .

# The coverage algorithm

We now introduce the coverage algorithm for the union of sets problem.

Consider the multiset defined by

- ,

where denotes the multiset union. It is more convenient to define as the set

- .

For each , there may be more than one instances of . We can choose a unique representative among the multiple instances for the same , by choosing the with the minimum , and form a set .

Formally, . Every corresponds to a unique where is the smallest among .

It is obvious that and

- .

Therefore, estimation of is reduced to estimation of with . Then can have an -approximation with probability in poly-time, if we can uniformly sample from and is suitably small.

An uniform sample from can be implemented as follows:

- generate an with probability ;
- uniformly sample an , and return .

It is easy to see that this gives a uniform member of . The above sampling procedure is poly-time because each can be computed in poly-time, and sampling uniformly from each is poly-time.

We now only need to lower bound the ratio

- .

We claim that

- .

It is easy to see this, because each has at most instances of in , and we already know that .

Due to the estimator theorem, this needs uniform random samples from .

This gives the coverage algorithm for the abstract problem of the union of sets. The DNF counting is a special case of it.