Randomized Algorithms (Spring 2010)/Complexity classes and lower bounds
Last time we talked about algorithms. Today's topic is complexity.
Upper bounds, lower bounds
Bounds are just inequalities (in a general sense, e.g. asymptotic inequalities). An inequality
is read " is a lower bound of " or equivalently " is an upper bound of ".
In Computer Science, when talking about upper or lower bounds, people really mean the upper or lower bounds of complexities.
In this lecture, we are focused on the time complexity, although there are other complexity measures in various computational models (e.g. space complexity, communication complexity, query complexity).
The complexity is represented as a function of , where is the length of the input.
There are two fundamental ways of measuring complexities
- Complexity of algorithms
- For an algorithm , the (worst-case) time complexity of is the maximum running time over all inputs of length .
- Complexity of problems
- For a computational problem, its time complexity is the time complexity of the optimal algorithm which solves the problem.
The complexity of an algorithm tells how good the algorithm is, yet the complexity of a problems tells how hard the problem is. While the former is what we care mostly about in practice, the later is more about the fundamental truths of computation.
In Theoretical Computer Science, when talking about upper or lower bounds, people usually refer to the bounds of the complexities of problems, rather than those of algorithms. Therefore, an upper bound means an algorithm; and a lower bound means bad news such as impossibility results.
Today's lecture is devoted to lower bounds, i.e. the necessary prices which have to paid by any algorithms which solve the given problems. Speaking of necessary prices, we have to be specific about the model, the mathematical rules which stipulate what is problem and what is algorithm.
Decision problems
Computational problems are functions mapping inputs to outputs. Sometimes, an input is called an instance of the problem and an output is called a solution to that instance. The theory of complexity deals almost exclusively with decision problems, the computational problems with yes-or-no answers.
For a decision problem , its positive instances are the inputs with "yes" answers. A decision problem can be equivalently represented as the set of all positive instances, denoted . We call a formal language (not entirely the same thing as the c language). The task of computing is equivalent to that of determining whether . Therefore the two formulations of "decision problems" and "formal languages" can be interchangeably used.
Turing Machine
In order to study complexities, which deal with the limits of computations, we have to be clear about what computation is, that is, modeling computations.
This work was done by Alan Turing in 1937. The model is now referred by the name Turing machine.
Complexity Classes
Problems are organized into classes according to their complexity in respective computational models. There are nearly 500 classes collected by the Complexity Zoo.
P, NP
A deterministic algorithm is polynomial time if its running time is within a polynomial of on any input of length .
Definition (P) - The class P is the class of decision problems that can be computed by polynomial time algorithms.
We now introduce the infamous NP class.
Definition (NP) - The class NP consists of all decision problems that have a polynomial time algorithm such that for any input ,
- if and only if , , where the size of is within polynomial of the size of
- The class NP consists of all decision problems that have a polynomial time algorithm such that for any input ,
Informally, NP is the class of decision problems that the "yes" instances can be verified in polynomial time. The string in the definition is called a certificate or a witness. Provided a polynomial size certificate, the algorithm verifies (instead of computing) the for any positive instance in polynomial time.
- Example
- Both the problems of deciding whether an input array is sorted and deciding whether an input graph has Hamiltonian cycles are in NP. For the former one, the input array itself is a certificate. And for the later one, a Hamiltonian cycle in the graph is a certificate (given a cycle, it is easy to verify whether it is Hamiltonian).
This definition is one of the equivalent definitions of the NP class. Another definition (also a classic one) is that NP is the class of decision problems that can be computed by polynomial time nondeterministic algorithms.
Common misuses of the terminology:
- "This algorithm is NP." --- NP is a class of decision problems, not algorithms.
- "This problem is a NP problem, so it must be very hard." --- By definition, a problem is in NP if its positive instances are poly-time verifiable, which implies nothing about the hardness. You probably means the problem is NP-hard.
- "NP problems are the hardest problems." --- There are infinitely many harder problems outside NP. Actually, according to a widely believed conjecture, there are infinitely many classes of problems which are harder than NP (see [1]).
Note that unlike P, the definition of NP is asymmetric. It only requires the positive instances (the that ) to be poly-time verifiable, but does not say anything about the negative instances. The class for that case is co-NP.
Definition (co-NP) - The class co-NP consists of all decision problems that have a polynomial time algorithm such that for any input ,
- if and only if , , where the size of is within polynomial of the size of
- The class co-NP consists of all decision problems that have a polynomial time algorithm such that for any input ,
Clearly, P NP co-NP. Does P = NP co-NP? It is an important open problem in the complexity theory which is closely related to our understanding of the relation between NP and P.
ZPP, RP, BPP
Now we proceeds to define complexity classes of the problems that is efficiently computable by the randomized algorithms, i.e. the randomized analogs of P.
In the last class we learned that there are two types of randomized algorithms: Monte Carlo algorithms (randomized algorithms with erros) and Las Vegas algorithms (randomized algorithms with random running time but with no erros). For Monte Carlo algorithms, there are two types of errors: one-sided errors where the algorithm errs only for positive instances, and two-sided errors where there are both false positives and false negatives. Therefore, there are three cases to deal with:
- Las Vegas algorithms, the corresponding class is ZPP (for Zero-error Probabilistic Polynomial time).
- Monte Carlo algorithms with one-sided error, the corresponding class is RP (for Randomized Polynomial time).
- Monte Carlo algorithms with two-sided error, the corresponding class is BPP (for Bounded-error Probabilistic Polynomial time).
We first introduce the class ZPP of the problems which can be solved by polynomial time Las Vegas algorithms. For Las Vegas algorithms, the running time is a random variable, therefore we actually refer to the Las Vegas algorithms whose expected running time is within a polynomial of for any input of size , where the expectation is taken over the internal randomness (coin flippings) of the algorithm.
Definition (ZPP) - The class ZPP consists of all decision problems that have a randomized algorithm running in expected polynomial time for any input such that for any input , .
Next we define the class RP of the problems which can be solved by polynomial time Monte Carlo algorithms with one-sided error.
Definition (RP) - The class RP consists of all decision problems that have a randomized algorithm running in worst-case polynomial time such that for any input ,
- if , then ;
- if , then .
- The class RP consists of all decision problems that have a randomized algorithm running in worst-case polynomial time such that for any input ,
- Remark
- The choice of the error probability is arbitrary. In fact, replacing the 1/2 with any constant will not change the definition of RP.
- Example
- Define the decision version of the minimum cut problem as follows. For a graph , if and only if there exists any cut of size smaller than , where is an arbitrary parameter. The problem can be solved probabilistically by Karger's min-cut algorithm in polynomial time. The error is one-sided, because if there does not exists any cut of size smaller than , then obviously the algorithm cannot find any. Therefore RP.
Like NP, the class RP is also asymmetrically defined, which hints us to define the co-RP class.
Definition (co-RP) - The class co-RP consists of all decision problems that have a randomized algorithm running in worst-case polynomial time such that for any input ,
- if , then ;
- if , then .
- The class co-RP consists of all decision problems that have a randomized algorithm running in worst-case polynomial time such that for any input ,
We then define the class BPP of the problems which can be solved by polynomial time Monte Carlo algorithms with two-sided error.
Definition (BPP) - The class BPP consists of all decision problems that have a randomized algorithm running in worst-case polynomial time such that for any input ,
- if , then ;
- if , then .
- The class BPP consists of all decision problems that have a randomized algorithm running in worst-case polynomial time such that for any input ,
- Remark
- Replacing the error probability from to , or any constant will not change the definition of the BPP class.
What is known (you can even prove by yourself):
- RP, BPP, ZPP all contain P
- RP NP
- ZPP = RPco-RP;
- RP BPP;
- co-RP BPP.
Open problem:
- BPP vs P (the second most important open problem in the complexity theory to the NP vs P).
Yao's minimax principle
Given a problem , how to prove the lower bound of for randomized algorithms? This sounds very hard, because we have to show that for all randomized algorithms which solve the problem , the expected performance is bad. How can we do that? The following principle is due to Andrew Chi-Chih Yao (姚期智):
Yao's Minimax Principle - For any problem , for any probability distribution over the input domain,
- the expected running time of the optimal (Las Vegas) randomized algorithm the expected running time of the optimal deterministic algorithm on the input generated by .
In other words, the performance of the optimal randomized algorithm is lower bounded by the average (with respect to an arbitrary distribution of the inputs) performance of the optimal deterministic algorithm. Therefore, the task of proving lower bounds of randomized algorithms is reduced to the task of proving lower bounds of the average-case performances of deterministic algorithms, which sounds like a much more doable task (not to mention that the input distribution is chosen by us).
More than 30 years has past since its first publication. Yao's principle remains the only general technique for proving lower bounds for randomized algorithms.
Proof of the principle
First, we have to realize what randomized algorithms really are. We make the following observation:
- A (Las Vegas) randomized algorithm for a problem is a probability distribution over all deterministic algorithms for .
Then we can formally state the Yao's principle.
Yao's Minimax Principle (formally stated) - Let be a problem with a finite set of input instances of a fixed size , and a finite set of deterministic algorithms . For input and algorithm , let denote the running time of algorithm on input . Then for all distributions over and over ,
- .
- Let be a problem with a finite set of input instances of a fixed size , and a finite set of deterministic algorithms . For input and algorithm , let denote the running time of algorithm on input . Then for all distributions over and over ,
We exhibit two proofs of the principle. One is due to the probabilistic method, the other is due to game theory.
Proof by probabilistic arguments
Let . Then , it holds that . Let be generated from the distribution over . Due to the law of total expectation, it holds that
Actually, the above inequalities can be implied by an almost naive observation: "if all cases are at least , then the weighted average over these cases cannot be smaller than ."
Then we apply this argument again to the inputs, this time in another direction: "If the weighted average is no smaller than , then there must be a case which is at least ."
Formally, expanding by ,
Due to Equation , it holds that . Then it must hold that
because if to the contrary , then due to Equation , it holds that , which contradicts .
Finally, implies that
The Yao's principle is proved.
Proof by game-theoretical arguments
- von Neumann's Minimax Theorem
- For any two-player zero-sum game with a payoff matrix ,
- .
- Loomis' Theorem
- For any two-player zero-sum game with a payoff matrix ,
- where denotes the unit vector with 1 in the th position and 0s elsewhere.
- Yao's Minimax Principle
- Let be a problem with a finite set of input instances of a fixed size , and a finite set of deterministic algorithms . For input and algorithm , let denote the running time of algorithm on input . Then for all distributions over and over ,
- .
Proving lower bounds with Yao's principle
Suppose that is an arbitrary permutation of a set of elements, and we want to find the position of an arbitrary element in , with as few accesses to as possible.
For deterministic algorithms, with a simple adversary argument, we know that any deterministic algorithm has to access positions of in the worst case. A very natural randomized algorithm for this problem is:
repeat for (n-1) times: uniformly pick a random position r from all unaccessed positions; if π(r) = x, then return r; return the only unaccessed position;
The algorithm can be equivalently viewed as first randomly reordering the , and then scanning from the 1st position to the th position of the reordered permutation to find . If is in the last position of the reordered permutation, then after accessing the first positions we automatically know that the last one must be .
For any and , is in every position of the randomly reordered with equal probability . The expected number of accesses made by our randomized searching algorithm is:
How good is it? Is there any randomized algorithm that can do better than this? With the help of Yao's principle, we can prove that this bound is the best we can get.
We analyze this problem first for a special class of algorithms, then generalize to all algorithms.
Lower bound for oblivious algorithms
We call a search algorithm oblivious if the algorithm follows the same predetermined order to access the positions of any input permutation. For example, the following code of linearly scanning all positions is a typical oblivious algorithm.
for(i=1 to n-1){ if π(i) = x return i; } return n;
Without loss of generality, we can rule out those stupid algorithms that access the same position twice. (Since we are proving lower bound, we are safe to do this.) Therefore, a deterministic oblivious algorithm can be represented by a permutation of , where the th position to access is specified by . By a simple adversary argument, we know that except the last element (which can be figured out by knowing the rest elements), the algorithm has to access the exact position which stores . Therefore, the following lemma holds:
Lemma 1 - For an input permutation and element , an oblivious algorithm with access sequence has to access many positions, where either or .
A randomized oblivious algorithm is a probability distribution over all oblivious algorithms. Due to Yao's principle, to prove a lower bound for randomized oblivious algorithms, it is sufficient to lower bound the expected number of accesses required by any deterministic oblivious algorithm (a permutation of ) for the random permutation of and the random .
Let be a uniformly random permutation of and be uniformly chosen from . Because of the symmetry, we can assume that the access sequence is just . This is because changing to any other access sequence will not change the distribution of the th accessed element for any (verify this by yourself).
Because both and are uniform, appears in each position of with equal probability. Due to Lemma 1, assuming that is in the th position, the number of accesses needed by the algorithm is if and is if . Taking the average, the expected number of accesses is
Due to Yao's principle, this gives the lower bound of the expected running time of randomized oblivious algorithms for the problem.
Lower bound for adaptive algorithms
For an adaptive algorithm, the next position to access is determined by the results of all previous accesses.
Formally, let be a decision tree such that for an input permutation and an input element , the th accessed position is given by
The following figure illustrates the structure of the decision tree :
This decision tree is an -ary labeled tree. The label of each edge is the input to , and the label of each node is the access position returned by for the sequence specified by the labels along the path from the root to the node.
This model seems having substantially more power than the oblivious algorithms, because it allows the algorithm be adaptive to the input. It actually covers all possible algorithms (no matter realistic or unrealistic) for the problem.
By an adversary argument, the algorithm still has to either access the position which stores or access positions, because if otherwise the adversary can always let be stored in the position which is not the one returned by the algorithm.
Apply Yao's principle. Let be a uniformly random permutation of and be uniformly chosen from . Due to the uniformity of , conditioning on any and any , for any , it holds that is uniformly distributed over the set of the unaccessed elements, and the distribution of is independent of and .
The above probability distribution is the same for both the oblivious which are fixed independent of and the adaptive which depends on and . Therefore the expected number of accesses needed by an adaptive algorithms is the same as that of an oblivious algorithm. The same lower bound of holds.