Randomized Algorithms (Spring 2010)/Approximate counting, linear programming
Counting Problems
Complexity model
FPRAS
Approximate Counting
Let [math]\displaystyle{ U }[/math] be a finite set of known size, and let [math]\displaystyle{ G\subseteq U }[/math]. We want to compute the size of [math]\displaystyle{ G }[/math], [math]\displaystyle{ |G| }[/math].
We assume two devices:
- A uniform sampler [math]\displaystyle{ \mathcal{U} }[/math], which uniformly and independently samples a member of [math]\displaystyle{ U }[/math] upon each calling.
- A membership oracle of [math]\displaystyle{ G }[/math], denoted [math]\displaystyle{ \mathcal{O} }[/math]. Given as the input an [math]\displaystyle{ x\in U }[/math], [math]\displaystyle{ \mathcal{O}(x) }[/math] indicates whether or not [math]\displaystyle{ x }[/math] is a member of [math]\displaystyle{ G }[/math].
Equipped by [math]\displaystyle{ \mathcal{U} }[/math] and [math]\displaystyle{ \mathcal{O} }[/math], we can have the following Monte Carlo algorithm:
- Choose [math]\displaystyle{ N }[/math] independent samples from [math]\displaystyle{ U }[/math]by the uniform sampler [math]\displaystyle{ \mathcal{U} }[/math], represented by the random variables [math]\displaystyle{ X_1,X_2,\ldots, X_N }[/math].
- Let [math]\displaystyle{ Y_i }[/math] be the indicator random variable defined as [math]\displaystyle{ Y_i=\mathcal{O}(X_i) }[/math], namely, [math]\displaystyle{ Y_i }[/math] indicates whether [math]\displaystyle{ X_i\in G }[/math].
- Define the estimator random variable
- [math]\displaystyle{ Z=\frac{|U|}{N}\sum_{i=1}^N Y_i. }[/math]
It is easy to see that [math]\displaystyle{ \mathbf{E}[Z]=|G| }[/math] and we might hope that with high probability the value of [math]\displaystyle{ Z }[/math] is close to [math]\displaystyle{ |G| }[/math]. Formally, [math]\displaystyle{ Z }[/math] is called an [math]\displaystyle{ \epsilon }[/math]-approximation of [math]\displaystyle{ |G| }[/math] if
- [math]\displaystyle{ (1-\epsilon)|G|\le Z\le (1+\epsilon)|G|. }[/math]
The following theorem states that the probabilistic accuracy of the estimation depends on the number of samples and the ratio between [math]\displaystyle{ |G| }[/math] and [math]\displaystyle{ |U| }[/math]
Theorem (estimator theorem)
|
Proof: Use the Chernoff bound.
[math]\displaystyle{ \square }[/math]
A counting algorithm for the set [math]\displaystyle{ G }[/math] has to deal with the following three complications:
- Implement the membership oracle [math]\displaystyle{ \mathcal{O} }[/math]. This is usually straightforward, or assumed by the model.
- Implement the uniform sampler [math]\displaystyle{ \mathcal{U} }[/math]. As we have seen, this can be solved by relaxing to near-uniform sampler which can be simulated by random walks.
- Deal with exponentially small [math]\displaystyle{ \alpha=\frac{|G|}{|U|} }[/math]. This requires us to cleverly choose the universe [math]\displaystyle{ U }[/math]. Sometimes this needs some nontrivial ideas.