Revision as of 03:05, 12 May 2014

Parameter Estimation

Consider the following abstract problem of parameter estimation.

Let [math]\displaystyle{ U }[/math] be a finite set of known size, and let [math]\displaystyle{ G\subseteq U }[/math]. We want to estimate the parameter [math]\displaystyle{ |G| }[/math], i.e. the size of [math]\displaystyle{ G }[/math].

We assume two devices:

A uniform sampler [math]\displaystyle{ \mathcal{U} }[/math], which uniformly and independently samples a member of [math]\displaystyle{ U }[/math] upon each calling.
A membership oracle of [math]\displaystyle{ G }[/math], denoted [math]\displaystyle{ \mathcal{O} }[/math]. Given as the input an [math]\displaystyle{ x\in U }[/math], [math]\displaystyle{ \mathcal{O}(x) }[/math] indicates whether or not [math]\displaystyle{ x }[/math] is a member of [math]\displaystyle{ G }[/math].

Equipped by [math]\displaystyle{ \mathcal{U} }[/math] and [math]\displaystyle{ \mathcal{O} }[/math], we can have the following Monte Carlo algorithm:

Choose [math]\displaystyle{ N }[/math] independent samples from [math]\displaystyle{ U }[/math] by the uniform sampler [math]\displaystyle{ \mathcal{U} }[/math], represented by the random variables [math]\displaystyle{ X_1,X_2,\ldots, X_N }[/math].
Let [math]\displaystyle{ Y_i }[/math] be the indicator random variable defined as [math]\displaystyle{ Y_i=\mathcal{O}(X_i) }[/math], namely, [math]\displaystyle{ Y_i }[/math] indicates whether [math]\displaystyle{ X_i\in G }[/math].
Define the estimator random variable

[math]\displaystyle{ Z=\frac{|U|}{N}\sum_{i=1}^N Y_i. }[/math]

It is easy to see that [math]\displaystyle{ \mathbf{E}[Z]=|G| }[/math] and we might hope that with high probability the value of [math]\displaystyle{ Z }[/math] is close to [math]\displaystyle{ |G| }[/math]. Formally, [math]\displaystyle{ Z }[/math] is called an [math]\displaystyle{ \epsilon }[/math]-approximation of [math]\displaystyle{ |G| }[/math] if

[math]\displaystyle{ (1-\epsilon)|G|\le Z\le (1+\epsilon)|G|. }[/math]

The following theorem states that the probabilistic accuracy of the estimation depends on the number of samples and the ratio between [math]\displaystyle{ |G| }[/math] and [math]\displaystyle{ |U| }[/math]

Theorem (estimator theorem)

Let [math]\displaystyle{ \alpha=\frac{|G|}{|U|} }[/math]. Then the Monte Carlo method yields an [math]\displaystyle{ \epsilon }[/math]-approximation to [math]\displaystyle{ |G| }[/math] with probability at least [math]\displaystyle{ 1-\delta }[/math] provided

[math]\displaystyle{ N\ge\frac{4}{\epsilon^2 \alpha}\ln\frac{2}{\delta} }[/math].

Proof.

Recall that [math]\displaystyle{ Y_i }[/math] indicates whether the [math]\displaystyle{ i }[/math]-th sample is in [math]\displaystyle{ G }[/math]. Let [math]\displaystyle{ Y=\sum_{i=1}^NY_i }[/math]. Then we have [math]\displaystyle{ Z=\frac{|U|}{N}Y }[/math], and hence the event [math]\displaystyle{ (1-\epsilon)|G|\le Z\le (1+\epsilon)|G| }[/math] is equivalent to that [math]\displaystyle{ (1-\epsilon)\frac{|G|}{|U|}N\le Y\le (1+\epsilon)\frac{|G|}{|U|}N }[/math]. Note that each [math]\displaystyle{ Y_i }[/math] is a Bernoulli trial that succeeds with probability [math]\displaystyle{ \frac{|G|}{|U|} }[/math], thus [math]\displaystyle{ \mathbb{E}[Y]=\frac{|G|}{|U|}N }[/math]. Then the rest is due to Chernoff bound.

[math]\displaystyle{ \square }[/math]

A counting algorithm for the set [math]\displaystyle{ G }[/math] has to deal with the following three issues:

Implement the membership oracle [math]\displaystyle{ \mathcal{O} }[/math]. This is usually straightforward, or assumed by the model.
Implement the uniform sampler [math]\displaystyle{ \mathcal{U} }[/math]. This can be straightforward or highly nontrivial, depending on the problem.
Deal with exponentially small [math]\displaystyle{ \alpha=\frac{|G|}{|U|} }[/math]. This requires us to cleverly choose the universe [math]\displaystyle{ U }[/math]. Sometimes this needs some nontrivial ideas.

Counting DNFs

A disjunctive normal form (DNF) formular is a disjunction (OR) of clauses, where each clause is a conjunction (AND) of literals. For example:

[math]\displaystyle{ (x_1\wedge \overline{x_2}\wedge x_3)\vee(x_2\wedge x_4)\vee(\overline{x_1}\wedge x_3\wedge x_4) }[/math].

Note the difference from the conjunctive normal forms (CNF).

Given a DNF formular [math]\displaystyle{ \phi }[/math] as the input, the problem is to count the number of satisfying assignments of [math]\displaystyle{ \phi }[/math]. This problem is #P-complete.

Naively applying the Monte Carlo method will not give a good answer. Suppose that there are [math]\displaystyle{ n }[/math] variables. Let [math]\displaystyle{ U=\{\mathrm{true},\mathrm{false}\}^n }[/math] be the set of all truth assignments of the [math]\displaystyle{ n }[/math] variables. Let [math]\displaystyle{ G=\{x\in U\mid \phi(x)=\mathrm{true}\} }[/math] be the set of satisfying assignments for [math]\displaystyle{ \phi }[/math]. The straightforward use of Monte Carlo method samples [math]\displaystyle{ N }[/math] assignments from [math]\displaystyle{ U }[/math] and check how many of them satisfy [math]\displaystyle{ \phi }[/math]. This algorithm fails when [math]\displaystyle{ |G|/|U| }[/math] is exponentially small, namely, when exponentially small fraction of the assignments satisfy the input DNF formula.

The union of sets problem

We reformulate the DNF counting problem in a more abstract framework, called the union of sets problem.

Let [math]\displaystyle{ V }[/math] be a finite universe. We are given [math]\displaystyle{ m }[/math] subsets [math]\displaystyle{ H_1,H_2,\ldots,H_m\subseteq V }[/math]. The following assumptions hold:

For all [math]\displaystyle{ i }[/math], [math]\displaystyle{ |H_i| }[/math] is computable in poly-time.
It is possible to sample uniformly from each individual [math]\displaystyle{ H_i }[/math].
For any [math]\displaystyle{ x\in V }[/math], it can be determined in poly-time whether [math]\displaystyle{ x\in H_i }[/math].

The goal is to compute the size of [math]\displaystyle{ H=\bigcup_{i=1}^m H_i }[/math].

DNF counting can be interpreted in this general framework as follows. Suppose that the DNF formula [math]\displaystyle{ \phi }[/math] is defined on [math]\displaystyle{ n }[/math] variables, and [math]\displaystyle{ \phi }[/math] contains [math]\displaystyle{ m }[/math] clauses [math]\displaystyle{ C_1,C_2,\ldots,C_m }[/math], where clause [math]\displaystyle{ C_i }[/math] has [math]\displaystyle{ k_i }[/math] literals. Without loss of generality, we assume that in each clause, each variable appears at most once.

[math]\displaystyle{ V }[/math] is the set of all assignments.
Each [math]\displaystyle{ H_i }[/math] is the set of satisfying assignments for the [math]\displaystyle{ i }[/math]-th clause [math]\displaystyle{ C_i }[/math] of the DNF formular [math]\displaystyle{ \phi }[/math]. Then the union of sets [math]\displaystyle{ H=\bigcup_i H_i }[/math] gives the set of satisfying assignments for [math]\displaystyle{ \phi }[/math].
Each clause [math]\displaystyle{ C_i }[/math] is a conjunction (AND) of literals. It is not hard to see that [math]\displaystyle{ |H_i|=2^{n-k_i} }[/math], which is efficiently computable.
Sampling from an [math]\displaystyle{ H_i }[/math] is simple: we just fix the assignments of the [math]\displaystyle{ k_i }[/math] literals of that clause, and sample uniformly and independently the rest [math]\displaystyle{ (n-k_i) }[/math] variable assignments.
For each assignment [math]\displaystyle{ x }[/math], it is easy to check whether it satisfies a clause [math]\displaystyle{ C_i }[/math], thus it is easy to determine whether [math]\displaystyle{ x\in H_i }[/math].

The coverage algorithm

We now introduce the coverage algorithm for the union of sets problem.

Consider the multiset [math]\displaystyle{ U }[/math] defined by

[math]\displaystyle{ U=H_1\uplus H_2\uplus\cdots \uplus H_m }[/math],

where [math]\displaystyle{ \uplus }[/math] denotes the multiset union. It is more convenient to define [math]\displaystyle{ U }[/math] as the set

[math]\displaystyle{ U=\{(x,i)\mid x\in H_i\} }[/math].

For each [math]\displaystyle{ x\in H }[/math], there may be more than one instances of [math]\displaystyle{ (x,i)\in U }[/math]. We can choose a unique representative among the multiple instances [math]\displaystyle{ (x,i)\in U }[/math] for the same [math]\displaystyle{ x\in H }[/math], by choosing the [math]\displaystyle{ (x,i) }[/math] with the minimum [math]\displaystyle{ i }[/math], and form a set [math]\displaystyle{ G }[/math].

Formally, [math]\displaystyle{ G=\{(x,i)\in U\mid \forall (x,j)\in U, j\le i\} }[/math]. Every [math]\displaystyle{ x\in H }[/math] corresponds to a unique [math]\displaystyle{ (x,i)\in G }[/math] where [math]\displaystyle{ i }[/math] is the smallest among [math]\displaystyle{ x\in H_i }[/math].

It is obvious that [math]\displaystyle{ G\subseteq U }[/math] and

[math]\displaystyle{ |G|=|H| }[/math].

Therefore, estimation of [math]\displaystyle{ |H| }[/math] is reduced to estimation of [math]\displaystyle{ |G| }[/math] with [math]\displaystyle{ G\subseteq U }[/math]. Then [math]\displaystyle{ |G| }[/math] can have an [math]\displaystyle{ \epsilon }[/math]-approximation with probability [math]\displaystyle{ (1-\delta) }[/math] in poly-time, if we can uniformly sample from [math]\displaystyle{ U }[/math] and [math]\displaystyle{ |G|/|U| }[/math] is suitably small.

An uniform sample from [math]\displaystyle{ U }[/math] can be implemented as follows:

generate an [math]\displaystyle{ i\in\{1,2,\ldots,m\} }[/math] with probability [math]\displaystyle{ \frac{|H_i|}{\sum_{i=1}^m|H_i|} }[/math];
uniformly sample an [math]\displaystyle{ x\in H_i }[/math], and return [math]\displaystyle{ (x,i) }[/math].

It is easy to see that this gives a uniform member of [math]\displaystyle{ U }[/math]. The above sampling procedure is poly-time because each [math]\displaystyle{ |H_i| }[/math] can be computed in poly-time, and sampling uniformly from each [math]\displaystyle{ H_i }[/math] is poly-time.

We now only need to lower bound the ratio

[math]\displaystyle{ \alpha=\frac{|G|}{|U|} }[/math].

We claim that

[math]\displaystyle{ \alpha\ge\frac{1}{m} }[/math].

It is easy to see this, because each [math]\displaystyle{ x\in H }[/math] has at most [math]\displaystyle{ m }[/math] instances of [math]\displaystyle{ (x,i) }[/math] in [math]\displaystyle{ U }[/math], and we already know that [math]\displaystyle{ |G|=|H| }[/math].

Due to the estimator theorem, this needs [math]\displaystyle{ \frac{4m}{\epsilon^2}\ln\frac{2}{\delta} }[/math] uniform random samples from [math]\displaystyle{ U }[/math].

This gives the coverage algorithm for the abstract problem of the union of sets. The DNF counting is a special case of it.

@@ Line 1: / Line 1: @@
-=Random Quicksort=
+= Parameter Estimation =
-Given as input a set <math>S</math> of <math>n</math> numbers, we want to sort the numbers in <math>S</math> in increasing order. One of the most famous algorithm for this problem is the  [http://en.wikipedia.org/wiki/Quicksort Quicksort] algorithm.
+Consider the following abstract problem of parameter estimation.
-* if <math>|S|>1</math> do:
-** pick an <math>x\in S</math> as the ''pivot'';
-** partition <math>S</math> into <math>S_1</math>, <math>\{x\}</math>, and <math>S_2</math>, where all numbers in <math>S_1</math> are smaller than <math>x</math> and all numbers in <math>S_2</math> are  larger than <math>x</math>;
-** recursively sort <math>S_1</math> and <math>S_2</math>;
-The time complexity of this sorting algorithm is measured by the '''number of comparisons'''.
+Let <math>U</math> be a finite set of known size, and let <math>G\subseteq U</math>. We want to estimate the ''parameter'' <math>|G|</math>, i.e. the size of <math>G</math>.
-For the '''deterministic''' quicksort algorithm, the pivot is picked from a fixed position (e.g. the first number in the array). The worst-case time complexity in terms of number of comparisons is <math>\Theta(n^2)</math>.
+We assume two devices:
+* A '''uniform sampler''' <math>\mathcal{U}</math>, which uniformly and independently samples a member of <math>U</math> upon each calling.
+* A '''membership oracle''' of <math>G</math>, denoted <math>\mathcal{O}</math>. Given as the input an <math>x\in U</math>, <math>\mathcal{O}(x)</math> indicates whether or not <math>x</math> is a member of <math>G</math>.
-We consider the following randomized version of the quicksort.
+Equipped by <math>\mathcal{U}</math> and  <math>\mathcal{O}</math>, we can have the following Monte Carlo algorithm:
-* if <math>|S|>1</math> do:
+*Choose <math>N</math> independent samples from <math>U</math> by the uniform sampler <math>\mathcal{U}</math>, represented by the random variables <math>X_1,X_2,\ldots, X_N</math>.
-** ''uniformly'' pick a ''random'' <math>x\in S</math> as the pivot;
+* Let <math>Y_i</math> be the indicator random variable defined as <math>Y_i=\mathcal{O}(X_i)</math>, namely, <math>Y_i</math> indicates whether <math>X_i\in G</math>.
-** partition <math>S</math> into <math>S_1</math>, <math>\{x\}</math>, and <math>S_2</math>, where all numbers in <math>S_1</math> are smaller than <math>x</math> and all numbers in <math>S_2</math> are  larger than <math>x</math>;
+* Define the estimator random variable
-** recursively sort <math>S_1</math> and <math>S_2</math>;
+::<math>Z=\frac{|U|}{N}\sum_{i=1}^N Y_i.</math>
-== Analysis of Random Quicksort==
-Our goal is to analyze the expected number of comparisons during an execution of RandQSort with an arbitrary input <math>S</math>. We achieve this by measuring the chance that each pair of elements are compared, and summing all of them up due to [http://en.wikipedia.org/wiki/Expected_value#Linearity Linearity of Expectation].
-Let <math>a_i</math> denote the <math>i</math>th smallest element in <math>S</math>.
-Let <math>X_{ij}\in\{0,1\}</math> be the random variable which indicates whether <math>a_i</math> and <math>a_j</math> are compared during the execution of RandQSort. That is:
+It is easy to see that <math>\mathbf{E}[Z]=|G|</math> and we might hope that with high probability the value of <math>Z</math> is close to <math>|G|</math>. Formally, <math>Z</math> is called an '''<math>\epsilon</math>-approximation''' of <math>|G|</math> if
 :<math>
-\begin{align}
+(1-\epsilon)|G|\le Z\le (1+\epsilon)|G|.
-X_{ij} &=
-\begin{cases}
-& a_i\mbox{ and }a_j\mbox{ are compared}\\
-& \mbox{otherwise}
-\end{cases}.
-\end{align}
 </math>
-Elements <math>a_i</math> and <math>a_j</math> are compared only if one of them is chosen as pivot. After comparison they are separated (thus are never compared again). So we have the following observations:
+The following theorem states that the probabilistic accuracy of the estimation depends on the number of samples and the ratio between <math>|G|</math> and <math>|U|</math>
-'''Observation 1:  Every pair of <math>a_i</math> and <math>a_j</math> are compared at most once.'''
+{{Theorem
+|Theorem (estimator theorem)|
-Therefore the sum of <math>X_{ij}</math> for all pair <math>\{i, j\}</math> gives the total number of comparisons. The expected number of comparisons is <math>\mathbf{E}\left[\sum_{i=1}^n\sum_{j>i}X_{ij}\right]</math>. Due to [http://en.wikipedia.org/wiki/Expected_value#Linearity Linearity of Expectation], <math>\mathbf{E}\left[\sum_{i=1}^n\sum_{j>i}X_{ij}\right] = \sum_{i=1}^n\sum_{j>i}\mathbf{E}\left[X_{ij}\right]</math>.
+:Let <math>\alpha=\frac{|G|}{|U|}</math>. Then the Monte Carlo method yields an <math>\epsilon</math>-approximation to <math>|G|</math> with probability at least <math>1-\delta</math> provided
-Our next step is to analyze <math>\mathbf{E}\left[X_{ij}\right]</math> for each <math>\{i, j\}</math>.
+::<math>N\ge\frac{4}{\epsilon^2 \alpha}\ln\frac{2}{\delta}</math>.
+}}
-By the definition of expectation and <math>X_{ij}</math>,
+{{Proof|
+Recall that <math>Y_i</math> indicates whether the <math>i</math>-th sample is in <math>G</math>. Let <math>Y=\sum_{i=1}^NY_i</math>. Then we have <math>Z=\frac{|U|}{N}Y</math>, and hence the event <math>(1-\epsilon)|G|\le Z\le (1+\epsilon)|G|</math> is equivalent to that <math>(1-\epsilon)\frac{|G|}{|U|}N\le Y\le (1+\epsilon)\frac{|G|}{|U|}N</math>. Note that each <math>Y_i</math> is a Bernoulli trial that succeeds with probability <math>\frac{|G|}{|U|}</math>, thus <math>\mathbb{E}[Y]=\frac{|G|}{|U|}N</math>. Then the rest is due to Chernoff bound.}}
-:<math>\begin{align}
-\mathbf{E}\left[X_{ij}\right]
-&= 1\cdot \Pr[a_i\mbox{ and }a_j\mbox{ are compared}] + 0\cdot \Pr[a_i\mbox{ and }a_j\mbox{ are not compared}]\\
-&= \Pr[a_i\mbox{ and }a_j\mbox{ are compared}].
-\end{align}</math>
-We are going to bound this probability.
-'''Observation 2: <math>a_i</math> and <math>a_j</math> are compared if and only if one of them is chosen as pivot when they are still in the same subset.'''
-This is easy to verify: just check the algorithm. The next one is a bit complicated.
-'''Observation 3: If <math>a_i</math> and <math>a_j</math> are still in the same subset then all <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math> are in the same subset.'''
-We can verify this by induction. Initially, <math>S</math> itself has the property described above; and partitioning any <math>S</math> with the property into <math>S_1</math> and <math>S_2</math> will preserve the property for both <math>S_1</math> and <math>S_2</math>. Therefore Claim 3 holds.
-Combining Observation 2 and 3, we have:
-'''Observation 4: <math>a_i</math> and <math>a_j</math> are compared only if one of <math>\{a_i, a_j\}</math> is chosen from <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math>.'''
-And,
-'''Observation 5: Every one of <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math> is chosen equal-probably.'''
+A counting algorithm for the set <math>G</math> has to deal with the following three issues:
+# Implement the membership oracle <math>\mathcal{O}</math>. This is usually straightforward, or assumed by the model.
+# Implement the uniform sampler <math>\mathcal{U}</math>. This can be straightforward or highly nontrivial, depending on the problem.
+# Deal with exponentially small <math>\alpha=\frac{|G|}{|U|}</math>. This requires us to cleverly choose the universe <math>U</math>. Sometimes this needs some nontrivial ideas.
-This is because the Random Quicksort chooses the pivot ''uniformly at random''.
+= Counting DNFs =
+A disjunctive normal form (DNF) formular is a disjunction (OR) of clauses, where each clause is a conjunction (AND) of literals. For example:
+:<math>(x_1\wedge \overline{x_2}\wedge x_3)\vee(x_2\wedge x_4)\vee(\overline{x_1}\wedge x_3\wedge x_4)</math>.
+Note the difference from the conjunctive normal forms (CNF).
-Observation 4 and 5 together imply:
+Given a DNF formular <math>\phi</math> as the input, the problem is to count the number of satisfying assignments of <math>\phi</math>. This problem is [http://en.wikipedia.org/wiki/Sharp-P-complete '''#P-complete'''].
-:<math>\begin{align}
+Naively applying the Monte Carlo method will not give a good answer. Suppose that there are <math>n</math> variables. Let <math>U=\{\mathrm{true},\mathrm{false}\}^n</math> be the set of all truth assignments of the <math>n</math> variables. Let <math>G=\{x\in U\mid \phi(x)=\mathrm{true}\}</math> be the set of satisfying assignments for <math>\phi</math>. The straightforward use of Monte Carlo method samples <math>N</math> assignments from <math>U</math> and check how many of them satisfy <math>\phi</math>. This algorithm fails when <math>|G|/|U|</math> is exponentially small, namely, when exponentially small fraction of the assignments satisfy the input DNF formula.
-\Pr[a_i\mbox{ and }a_j\mbox{ are compared}]
-&\le \frac{2}{j-i+1}.
-\end{align}</math>
-{|border="1"
-|'''Remark:''' Perhaps you feel confused about the above argument. You may ask: "''The algorithm chooses pivots for many times during the execution. Why in the above argument, it looks like the pivot is chosen only once?''" Good question! Let's see what really happens by looking closely.
-For any pair <math>a_i</math> and <math>a_j</math>, initially <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math> are all in the same set <math>S</math> (obviously!). During the execution of the algorithm, the set which containing <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math> are shrinking (due to the pivoting), until one of <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math> is chosen, and the set is partitioned into different subsets. We ask for the probability that the chosen one is among <math>\{a_i, a_j\}</math>. So we really care about "the last" pivoting before <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math> is split.
+;The union of sets problem
+We reformulate the DNF counting problem in a more abstract framework, called the '''union of sets''' problem.
-Formally, let <math>Y</math> be the random variable denoting the pivot element. We know that for each <math>a_k\in\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math>, <math>Y=a_k</math> with the same probability, and <math>Y\not\in\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math> with an unknown probability (remember that there might be other elements in the same subset with <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math>). The probability we are looking for is actually
+Let <math>V</math> be a finite universe. We are given <math>m</math> subsets <math>H_1,H_2,\ldots,H_m\subseteq V</math>. The following assumptions hold:
-<math>\Pr[Y\in \{a_i, a_j\}\mid Y\in\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}]</math>, which is always <math>\frac{2}{j-i+1}</math>, provided that <math>Y</math> is uniform over <math>\{a_i, a_{i+1}, \ldots, a_{j-1}, a_{j}\}</math>.
+*For all <math>i</math>, <math>|H_i|</math> is computable in poly-time.
+*It is possible to sample uniformly from each individual <math>H_i</math>.
+*For any <math>x\in V</math>, it can be determined in poly-time whether <math>x\in H_i</math>.
-The '''conditional probability''' rules out the ''irrelevant'' events in a probabilistic argument.
+The goal is to compute the size of <math>H=\bigcup_{i=1}^m H_i</math>.
-|}
-Summing all up:
+DNF counting can be interpreted in this general framework as follows. Suppose that the DNF formula <math>\phi</math> is defined on <math>n</math> variables, and <math>\phi</math> contains <math>m</math> clauses <math>C_1,C_2,\ldots,C_m</math>, where clause <math>C_i</math> has <math>k_i</math> literals. Without loss of generality, we assume that in each clause, each variable appears at most once.
+* <math>V</math> is the set of all assignments.
+*Each <math>H_i</math> is the set of satisfying assignments for the <math>i</math>-th clause <math>C_i</math> of the DNF formular <math>\phi</math>. Then the union of sets <math>H=\bigcup_i H_i</math> gives the set of satisfying assignments for <math>\phi</math>.
+* Each clause <math>C_i</math> is a conjunction (AND) of literals. It is not hard to see that <math>|H_i|=2^{n-k_i}</math>, which is efficiently computable.
+* Sampling from an <math>H_i</math> is simple: we just fix the assignments of the <math>k_i</math> literals of that clause, and sample uniformly and independently the rest <math>(n-k_i)</math> variable assignments.
+* For each assignment <math>x</math>, it is easy to check whether it satisfies a clause <math>C_i</math>, thus it is easy to determine whether <math>x\in H_i</math>.
-:<math>\begin{align}
+==The coverage algorithm==
-\mathbf{E}\left[\sum_{i=1}^n\sum_{j>i}X_{ij}\right]
+We now introduce the coverage algorithm for the union of sets problem.
-&=
-\sum_{i=1}^n\sum_{j>i}\mathbf{E}\left[X_{ij}\right]\\
-&\le \sum_{i=1}^n\sum_{j>i}\frac{2}{j-i+1}\\
-&= \sum_{i=1}^n\sum_{k=2}^{n-i+1}\frac{2}{k} & & (\mbox{Let }k=j-i+1)\\
-&\le \sum_{i=1}^n\sum_{k=1}^{n}\frac{2}{k}\\
-&= 2n\sum_{k=1}^{n}\frac{1}{k}\\
-&= 2n H(n).
-\end{align}</math>
-<math>H(n)</math> is the <math>n</math>th [http://en.wikipedia.org/wiki/Harmonic_number Harmonic number]. It holds that
+Consider the multiset <math>U</math> defined by
+:<math>U=H_1\uplus H_2\uplus\cdots \uplus H_m</math>,
+where <math>\uplus</math> denotes the multiset union. It is more convenient to define <math>U</math> as the set
+:<math>U=\{(x,i)\mid x\in H_i\}</math>.
+For each <math>x\in H</math>, there may be more than one instances of <math>(x,i)\in U</math>. We can choose a unique representative among the multiple instances <math>(x,i)\in U</math> for the same <math>x\in H</math>, by choosing the <math>(x,i)</math> with the minimum <math>i</math>, and form a set <math>G</math>.
-:<math>\begin{align}H(n) = \ln n+O(1)\end{align}</math>.
+Formally, <math>G=\{(x,i)\in U\mid \forall (x,j)\in U, j\le i\}</math>. Every <math>x\in H</math> corresponds to a unique <math>(x,i)\in G</math> where <math>i</math> is the smallest among <math>x\in H_i</math>.
-Therefore, for an arbitrary input <math>S</math> of <math>n</math> numbers, the expected number of comparisons taken by RandQSort to sort <math>S</math> is <math>\mathrm{O}(n\log n)</math>.
+It is obvious that <math>G\subseteq U</math> and
+:<math>|G|=|H|</math>.
-=Markov's Inequality=
+Therefore, estimation of <math>|H|</math> is reduced to estimation of <math>|G|</math> with <math>G\subseteq U</math>. Then <math>|G|</math> can have an <math>\epsilon</math>-approximation with probability <math>(1-\delta)</math> in poly-time, if we can uniformly sample from <math>U</math> and <math>|G|/|U|</math> is suitably small.
-One of the most natural information about a random variable is its expectation, which is the first moment of the random variable. Markov's inequality draws a tail bound for a random variable from its expectation.
+An uniform sample from <math>U</math> can be implemented as follows:
-{{Theorem
+* generate an <math>i\in\{1,2,\ldots,m\}</math> with probability <math>\frac{|H_i|}{\sum_{i=1}^m|H_i|}</math>;
-|Theorem (Markov's Inequality)|
+* uniformly sample an <math>x\in H_i</math>, and return <math>(x,i)</math>.
-:Let <math>X</math> be a random variable assuming only nonnegative values. Then, for all <math>t>0</math>,
-::<math>\begin{align}
-\Pr[X\ge t]\le \frac{\mathbf{E}[X]}{t}.
-\end{align}</math>
-}}
-{{Proof| Let <math>Y</math> be the indicator such that
-:<math>\begin{align}
-Y &=
-\begin{cases}
-& \mbox{if }X\ge t,\\
-& \mbox{otherwise.}
-\end{cases}
-\end{align}</math>
-It holds that <math>Y\le\frac{X}{t}</math>. Since <math>Y</math> is 0-1 valued, <math>\mathbf{E}[Y]=\Pr[Y=1]=\Pr[X\ge t]</math>. Therefore,
+It is easy to see that this gives a uniform member of <math>U</math>. The above sampling procedure is poly-time because each <math>|H_i|</math> can be computed in poly-time, and sampling uniformly from each <math>H_i</math> is poly-time.
-:<math>
-\Pr[X\ge t]
-=
-\mathbf{E}[Y]
-\le
-\mathbf{E}\left[\frac{X}{t}\right]
-=\frac{\mathbf{E}[X]}{t}.
-</math>
-}}
-==Example (from Las Vegas to Monte Carlo)==
+We now only need to lower bound the ratio
-Let <math>A</math> be a Las Vegas randomized algorithm for a decision problem <math>f</math>, whose expected running time is within <math>T(n)</math> on any input of size <math>n</math>. We transform <math>A</math> to a Monte Carlo randomized algorithm <math>B</math> with bounded one-sided error as follows:
+:<math>\alpha=\frac{|G|}{|U|}</math>.
-:<math>B(x)</math>:
-:*Run <math>A(x)</math> for <math>2T(n)</math> long where <math>n</math> is the size of <math>x</math>.
-:*If <math>A(x)</math> returned within <math>2T(n)</math> time, then return what <math>A(x)</math> just returned, else return 1.
-Since <math>A</math> is Las Vegas, its output is always correct, thus <math>B(x)</math> only errs when it returns 1, thus the error is one-sided. The error probability is bounded by the probability that <math>A(x)</math> runs longer than <math>2T(n)</math>. Since the expected running time of <math>A(x)</math> is at most <math>T(n)</math>, due to Markov's inequality,
+We claim that
-:<math>
+:<math>\alpha\ge\frac{1}{m}</math>.
-\Pr[\mbox{the running time of }A(x)\ge2T(n)]\le\frac{\mathbf{E}[\mbox{running time of }A(x)]}{2T(n)}\le\frac{1}{2},
+It is easy to see this, because each <math>x\in H</math> has at most <math>m</math> instances of <math>(x,i)</math> in <math>U</math>, and we already know that <math>|G|=|H|</math>.
-</math>
-thus the error probability is bounded.
-==Generalization ==
+Due to the estimator theorem, this needs <math>\frac{4m}{\epsilon^2}\ln\frac{2}{\delta}</math> uniform random samples from <math>U</math>.
-For any random variable <math>X</math>, for an arbitrary non-negative real function <math>h</math>, the <math>h(X)</math> is a non-negative random variable. Applying Markov's inequality, we directly have that
-:<math>
-\Pr[h(X)\ge t]\le\frac{\mathbf{E}[h(X)]}{t}.
-</math>
-This trivial application of Markov's inequality gives us a powerful tool for proving tail inequalities. With the function <math>h</math> which extracts more information about the random variable, we can prove sharper tail inequalities.
+This gives the coverage algorithm for the abstract problem of the union of sets. The DNF counting is a special case of it.

随机算法 (Spring 2014)/Random Recurrence and 随机算法 (Spring 2014)/The Monte Carlo Method: Difference between pages

Revision as of 03:05, 12 May 2014

Parameter Estimation

Counting DNFs

The coverage algorithm

Navigation menu

随机算法 (Spring 2014)/Random Recurrence and 随机算法 (Spring 2014)/The Monte Carlo Method: Difference between pages

Revision as of 03:05, 12 May 2014

Parameter Estimation

Counting DNFs

The coverage algorithm

Navigation menu

Search