Randomized Algorithms (Spring 2010)/Complexity classes and lower bounds

From TCS Wiki
Jump to navigation Jump to search

Last time we talked about algorithms. Today's topic is complexity.

Upper bounds, lower bounds

Bounds are just inequalities (in a general sense, e.g. asymptotic inequalities). An inequality

[math]\displaystyle{ A\le B }[/math]

is read "[math]\displaystyle{ A }[/math] is a lower bound of [math]\displaystyle{ B }[/math]" or equivalently "[math]\displaystyle{ B }[/math] is an upper bound of [math]\displaystyle{ A }[/math]".

In Computer Science, when talking about upper or lower bounds, people really mean the upper or lower bounds of complexities.

In this lecture, we are focused on the time complexity, although there are other complexity measures in various computational models (e.g. space complexity, communication complexity, query complexity).

The complexity is represented as a function of [math]\displaystyle{ n }[/math], where [math]\displaystyle{ n }[/math] is the length of the input.

There are two fundamental ways of measuring complexities

Complexity of algorithms
For an algorithm [math]\displaystyle{ A }[/math], the (worst-case) time complexity of [math]\displaystyle{ A }[/math] is the maximum running time over all inputs [math]\displaystyle{ x }[/math] of length [math]\displaystyle{ n }[/math].
Complexity of problems
For a computational problem, its time complexity is the time complexity of the optimal algorithm which solves the problem.

The complexity of an algorithm tells how good the algorithm is, yet the complexity of a problems tells how hard the problem is. While the former is what we care mostly about in practice, the later is more about the fundamental truths of computation.

In Theoretical Computer Science, when talking about upper or lower bounds, people usually refer to the bounds of the complexities of problems, rather than those of algorithms. Therefore, an upper bound means an algorithm; and a lower bound means bad news such as impossibility results.

Today's lecture is devoted to lower bounds, i.e. the necessary prices which have to paid by any algorithms which solve the given problems. Speaking of necessary prices, we have to be specific about the model, the mathematical rules which stipulate what is problem and what is algorithm.

Decision problems

Computational problems are functions mapping inputs to outputs. Sometimes, an input is called an instance of the problem and an output is called a solution to that instance. The theory of complexity deals almost exclusively with decision problems, the computational problems with yes-or-no answers.

For a decision problem [math]\displaystyle{ f }[/math], its positive instances are the inputs with "yes" answers. A decision problem can be equivalently represented as the set of all positive instances, denoted [math]\displaystyle{ L }[/math]. We call [math]\displaystyle{ L }[/math] a formal language (not entirely the same thing as the c language). The task of computing [math]\displaystyle{ f(x) }[/math] is equivalent to that of determining whether [math]\displaystyle{ x\in L }[/math]. Therefore the two formulations of "decision problems" and "formal languages" can be interchangeably used.

Turing Machine

In order to study complexities, which deal with the limits of computations, we have to be clear about what computation is, that is, modeling computations.

This work was done by Alan Turing in 1937. The model is now referred by the name Turing machine.

Complexity Classes

Problems are organized into classes according to their complexity in respective computational models. There are nearly 500 classes collected by the Complexity Zoo.

P, NP

A deterministic algorithm [math]\displaystyle{ A }[/math] is polynomial time if its running time is within a polynomial of [math]\displaystyle{ n }[/math] on any input of length [math]\displaystyle{ n }[/math].

Definition (P)
The class P is the class of decision problems that can be computed by polynomial time algorithms.

We now introduce the infamous NP class.

Definition (NP)
The class NP consists of all decision problems [math]\displaystyle{ f }[/math] that have a polynomial time algorithm [math]\displaystyle{ A }[/math] such that for any input [math]\displaystyle{ x }[/math],
[math]\displaystyle{ f(x)=1 }[/math] if and only if [math]\displaystyle{ \exists y }[/math], [math]\displaystyle{ A(x,y)=1 }[/math], where the size of [math]\displaystyle{ y }[/math] is within polynomial of the size of [math]\displaystyle{ x }[/math]

Informally, NP is the class of decision problems that the "yes" instances can be verified in polynomial time. The string [math]\displaystyle{ y }[/math] in the definition is called a certificate or a witness. Provided a polynomial size certificate, the algorithm [math]\displaystyle{ A }[/math] verifies (instead of computing) the [math]\displaystyle{ f(x) }[/math] for any positive instance [math]\displaystyle{ x }[/math] in polynomial time.

Example
Both the problems of deciding whether an input array is sorted and deciding whether an input graph has Hamiltonian cycles are in NP. For the former one, the input array itself is a certificate. And for the later one, a Hamiltonian cycle in the graph is a certificate (given a cycle, it is easy to verify whether it is Hamiltonian).

This definition is one of the equivalent definitions of the NP class. Another definition (also a classic one) is that NP is the class of decision problems that can be computed by polynomial time nondeterministic algorithms.

Common misuses of the terminology:

  • "This algorithm is NP." --- NP is a class of decision problems, not algorithms.
  • "This problem is a NP problem, so it must be very hard." --- By definition, a problem is in NP if its positive instances are poly-time verifiable, which implies nothing about the hardness. You probably means the problem is NP-hard.
  • "NP problems are the hardest problems." --- There are infinitely many harder problems outside NP. Actually, according to a widely believed conjecture, there are infinitely many classes of problems which are harder than NP (see [1]).

Note that unlike P, the definition of NP is asymmetric. It only requires the positive instances (the [math]\displaystyle{ x }[/math] that [math]\displaystyle{ f(x)=1 }[/math]) to be poly-time verifiable, but does not say anything about the negative instances. The class for that case is co-NP.

Definition (co-NP)
The class co-NP consists of all decision problems [math]\displaystyle{ f }[/math] that have a polynomial time algorithm [math]\displaystyle{ A }[/math] such that for any input [math]\displaystyle{ x }[/math],
[math]\displaystyle{ f(x)=0 }[/math] if and only if [math]\displaystyle{ \exists y }[/math], [math]\displaystyle{ A(x,y)=0 }[/math], where the size of [math]\displaystyle{ y }[/math] is within polynomial of the size of [math]\displaystyle{ x }[/math]

Clearly, P [math]\displaystyle{ \subseteq }[/math] NP [math]\displaystyle{ \cap }[/math] co-NP. Does P = NP [math]\displaystyle{ \cap }[/math] co-NP? It is an important open problem in the complexity theory which is closely related to our understanding of the relation between NP and P.

ZPP, RP, BPP

Now we proceeds to define complexity classes of the problems that is efficiently computable by the randomized algorithms, i.e. the randomized analogs of P.

In the last class we learned that there are two types of randomized algorithms: Monte Carlo algorithms (randomized algorithms with erros) and Las Vegas algorithms (randomized algorithms with random running time but with no erros). For Monte Carlo algorithms, there are two types of errors: one-sided errors where the algorithm errs only for positive instances, and two-sided errors where there are both false positives and false negatives. Therefore, there are three cases to deal with:

  1. Las Vegas algorithms, the corresponding class is ZPP (for Zero-error Probabilistic Polynomial time).
  2. Monte Carlo algorithms with one-sided error, the corresponding class is RP (for Randomized Polynomial time).
  3. Monte Carlo algorithms with two-sided error, the corresponding class is BPP (for Bounded-error Probabilistic Polynomial time).

We first introduce the class ZPP of the problems which can be solved by polynomial time Las Vegas algorithms. For Las Vegas algorithms, the running time is a random variable, therefore we actually refer to the Las Vegas algorithms whose expected running time is within a polynomial of [math]\displaystyle{ n }[/math] for any input of size [math]\displaystyle{ n }[/math], where the expectation is taken over the internal randomness (coin flippings) of the algorithm.

Definition (ZPP)
The class ZPP consists of all decision problems [math]\displaystyle{ f }[/math] that have a randomized algorithm [math]\displaystyle{ A }[/math] running in expected polynomial time for any input such that for any input [math]\displaystyle{ x }[/math], [math]\displaystyle{ A(x)=f(x) }[/math].

Next we define the class RP of the problems which can be solved by polynomial time Monte Carlo algorithms with one-sided error.

Definition (RP)
The class RP consists of all decision problems [math]\displaystyle{ f }[/math] that have a randomized algorithm [math]\displaystyle{ A }[/math] running in worst-case polynomial time such that for any input [math]\displaystyle{ x }[/math],
  • if [math]\displaystyle{ f(x)=1 }[/math], then [math]\displaystyle{ \Pr[A(x)=1]\ge 1-1/2 }[/math];
  • if [math]\displaystyle{ f(x)=0 }[/math], then [math]\displaystyle{ \Pr[A(x)=0]=1 }[/math].
Remark
The choice of the error probability is arbitrary. In fact, replacing the 1/2 with any constant [math]\displaystyle{ 0\lt p\lt 1 }[/math] will not change the definition of RP.
Example
Define the decision version of the minimum cut problem as follows. For a graph [math]\displaystyle{ G }[/math], [math]\displaystyle{ f(G)=1 }[/math] if and only if there exists any cut of size smaller than [math]\displaystyle{ k }[/math], where [math]\displaystyle{ k }[/math] is an arbitrary parameter. The problem [math]\displaystyle{ f }[/math] can be solved probabilistically by Karger's min-cut algorithm in polynomial time. The error is one-sided, because if there does not exists any cut of size smaller than [math]\displaystyle{ k }[/math], then obviously the algorithm cannot find any. Therefore [math]\displaystyle{ f\in }[/math]RP.

Like NP, the class RP is also asymmetrically defined, which hints us to define the co-RP class.

Definition (co-RP)
The class co-RP consists of all decision problems [math]\displaystyle{ f }[/math] that have a randomized algorithm [math]\displaystyle{ A }[/math] running in worst-case polynomial time such that for any input [math]\displaystyle{ x }[/math],
  • if [math]\displaystyle{ f(x)=1 }[/math], then [math]\displaystyle{ \Pr[A(x)=1]= 1 }[/math];
  • if [math]\displaystyle{ f(x)=0 }[/math], then [math]\displaystyle{ \Pr[A(x)=0]\ge 1-1/2 }[/math].

We then define the class BPP of the problems which can be solved by polynomial time Monte Carlo algorithms with two-sided error.

Definition (BPP)
The class BPP consists of all decision problems [math]\displaystyle{ f }[/math] that have a randomized algorithm [math]\displaystyle{ A }[/math] running in worst-case polynomial time such that for any input [math]\displaystyle{ x }[/math],
  • if [math]\displaystyle{ f(x)=1 }[/math], then [math]\displaystyle{ \Pr[A(x)=1]\ge 1-1/4 }[/math];
  • if [math]\displaystyle{ f(x)=0 }[/math], then [math]\displaystyle{ \Pr[A(x)=0]\ge 1-1/4 }[/math].
Remark
Replacing the error probability from [math]\displaystyle{ \frac{1}{4} }[/math] to [math]\displaystyle{ \frac{1}{3} }[/math], or any constant [math]\displaystyle{ 0\lt p\lt \frac{1}{2} }[/math] will not change the definition of the BPP class.



What is known (you can even prove by yourself):

  • RP, BPP, ZPP all contain P
  • RP [math]\displaystyle{ \subseteq }[/math] NP
  • ZPP = RP[math]\displaystyle{ \cap }[/math]co-RP;
  • RP [math]\displaystyle{ \subseteq }[/math] BPP;
  • co-RP [math]\displaystyle{ \subseteq }[/math] BPP.

Open problem:

  • BPP vs P (the second most important open problem in the complexity theory to the NP vs P).

Yao's minimax principle

Given a problem [math]\displaystyle{ f }[/math], how to prove the lower bound of [math]\displaystyle{ f }[/math] for randomized algorithms? This sounds very hard, because we have to show that for all randomized algorithms which solve the problem [math]\displaystyle{ f }[/math], the expected performance is bad. How can we do that? The following principle is due to Andrew Chi-Chih Yao (姚期智):

Yao's Minimax Principle
For any problem [math]\displaystyle{ f }[/math], for any probability distribution [math]\displaystyle{ \mathcal{P} }[/math] over the input domain,
the expected running time of the optimal (Las Vegas) randomized algorithm [math]\displaystyle{ \ge }[/math] the expected running time of the optimal deterministic algorithm on the input generated by [math]\displaystyle{ \mathcal{P} }[/math].

In other words, the performance of the optimal randomized algorithm is lower bounded by the average (with respect to an arbitrary distribution of the inputs) performance of the optimal deterministic algorithm. Therefore, the task of proving lower bounds of randomized algorithms is reduced to the task of proving lower bounds of the average-case performances of deterministic algorithms, which sounds like a much more doable task (not to mention that the input distribution is chosen by us).

More than 30 years has past since its first publication. Yao's principle remains the only general technique for proving lower bounds for randomized algorithms.

Proof of the principle

First, we have to realize what randomized algorithms really are. We make the following observation:

A (Las Vegas) randomized algorithm for a problem [math]\displaystyle{ f }[/math] is a probability distribution over all deterministic algorithms for [math]\displaystyle{ f }[/math].

Then we can formally state the Yao's principle.

Yao's Minimax Principle (formally stated)
Let [math]\displaystyle{ f }[/math] be a problem with a finite set [math]\displaystyle{ \mathcal{X} }[/math] of input instances of a fixed size [math]\displaystyle{ n }[/math], and a finite set of deterministic algorithms [math]\displaystyle{ \mathcal{A} }[/math]. For input [math]\displaystyle{ x\in\mathcal{X} }[/math] and algorithm [math]\displaystyle{ A\in\mathcal{A} }[/math], let [math]\displaystyle{ T(A, x) }[/math] denote the running time of algorithm [math]\displaystyle{ A }[/math] on input [math]\displaystyle{ x }[/math]. Then for all distributions [math]\displaystyle{ \mathcal{P} }[/math] over [math]\displaystyle{ \mathcal{X} }[/math] and [math]\displaystyle{ \mathcal{Q} }[/math] over [math]\displaystyle{ \mathcal{A} }[/math],
[math]\displaystyle{ \begin{align} \min_{A \in\mathcal{A}} \mathbf{E}_{x:\mathcal{P}} \left[ T(A, x) \right] &\le \max_{x\in\mathcal{X}} \mathbf{E}_{A:\mathcal{Q}} \left[T(A, x) \right] \end{align} }[/math].

We exhibit two proofs of the principle. One is due to the probabilistic method, the other is due to game theory.

Proof by probabilistic arguments

Let [math]\displaystyle{ t_0=\min_{A \in\mathcal{A}} \mathbf{E}_{x:\mathcal{P}} \left[ T(A, x) \right] }[/math]. Then [math]\displaystyle{ \forall A_0\in\mathcal{A} }[/math], it holds that [math]\displaystyle{ \mathbf{E}_{x:\mathcal{P}} \left[ T(A_0, x) \right] \ge t_0 }[/math]. Let [math]\displaystyle{ A\in\mathcal{A} }[/math] be generated from the distribution [math]\displaystyle{ \mathcal{Q} }[/math] over [math]\displaystyle{ \mathcal{A} }[/math]. Due to the law of total expectation, it holds that

[math]\displaystyle{ \begin{align} \mathbf{E}_{x:\mathcal{P},A:\mathcal{Q}} \left[ T(A, x) \right] & = \sum_{A_0\in \mathcal{A}} \mathbf{E}_{x:\mathcal{P},A:\mathcal{Q}} \left[ T(A, x) \mid A=A_0\right] \cdot \Pr_{A:\mathcal{Q}}[A=A_0]\\ &= \sum_{A_0\in \mathcal{A}} \mathbf{E}_{x:\mathcal{P}} \left[ T(A_0, x) \right] \cdot \Pr_{A:\mathcal{Q}}[A=A_0]\\ &\ge \sum_{A_0\in \mathcal{A}} t_0\cdot \Pr_{A:\mathcal{Q}}[A=A_0]\\ &=t_0. && (*) \end{align} }[/math]

Actually, the above inequalities can be implied by an almost naive observation: "if all cases are at least [math]\displaystyle{ t_0 }[/math], then the weighted average over these cases cannot be smaller than [math]\displaystyle{ t_0 }[/math]."

Then we apply this argument again to the inputs, this time in another direction: "If the weighted average is no smaller than [math]\displaystyle{ t_0 }[/math], then there must be a case which is at least [math]\displaystyle{ t_0 }[/math]."

Formally, expanding [math]\displaystyle{ \mathbf{E}_{x:\mathcal{P},A:\mathcal{Q}} \left[ T(A, x) \right] }[/math] by [math]\displaystyle{ x }[/math],

[math]\displaystyle{ \begin{align} \mathbf{E}_{x:\mathcal{P},A:\mathcal{Q}} \left[ T(A, x) \right] &= \sum_{x_0\in \mathcal{X}} \mathbf{E}_{x:\mathcal{P},A:\mathcal{Q}} \left[ T(A, x) \mid x=x_0\right] \cdot \Pr_{x:\mathcal{P}}[x=x_0]\\ &= \sum_{x_0\in \mathcal{X}} \mathbf{E}_{A:\mathcal{Q}} \left[ T(A, x_0) \right] \cdot \Pr_{x:\mathcal{P}}[x=x_0]. & & (**) \end{align} }[/math]

Due to Equation [math]\displaystyle{ (*) }[/math], it holds that [math]\displaystyle{ \mathbf{E}_{x:\mathcal{P},A:\mathcal{Q}} \left[ T(A, x) \right] \ge t_0 }[/math]. Then it must hold that

[math]\displaystyle{ \begin{align} \exists x_0\in \mathcal{X}, \mbox{such that } \mathbf{E}_{A:\mathcal{Q}} \left[ T(A, x_0) \right] \ge t_0 = \min_{A \in\mathcal{A}} \mathbf{E}_{x:\mathcal{P}} \left[ T(A, x) \right], && (***) \end{align} }[/math]

because if to the contrary [math]\displaystyle{ \forall x_0\in \mathcal{X}, \mathbf{E}_{A:\mathcal{Q}} \left[ T(A, x_0) \right] \lt t_0 }[/math], then due to Equation [math]\displaystyle{ (**) }[/math], it holds that [math]\displaystyle{ \mathbf{E}_{x:\mathcal{P},A:\mathcal{Q}} \left[ T(A, x) \right] \lt t_0 }[/math], which contradicts [math]\displaystyle{ (*) }[/math].

Finally, [math]\displaystyle{ (***) }[/math] implies that

[math]\displaystyle{ \begin{align} \max_{x\in \mathcal{X}} \mathbf{E}_{A:\mathcal{Q}} \left[ T(A, x) \right] \ge \min_{A \in\mathcal{A}} \mathbf{E}_{x:\mathcal{P}} \left[ T(A, x) \right], \end{align} }[/math]

The Yao's principle is proved.

Proof by game-theoretical arguments

von Neumann's Minimax Theorem
For any two-player zero-sum game with a payoff matrix [math]\displaystyle{ M }[/math],
[math]\displaystyle{ \begin{align} \max_p\min_q p^TMq &= \min_q\max_p p^TMq \end{align} }[/math].
Loomis' Theorem
For any two-player zero-sum game with a payoff matrix [math]\displaystyle{ M }[/math],
[math]\displaystyle{ \begin{align} \max_p\min_j p^TMe_j &= \min_q\max_i e_i^TMq . \end{align} }[/math]
where [math]\displaystyle{ e_k }[/math] denotes the unit vector with 1 in the [math]\displaystyle{ k }[/math]th position and 0s elsewhere.
Yao's Minimax Principle
Let [math]\displaystyle{ f }[/math] be a problem with a finite set [math]\displaystyle{ \mathcal{X} }[/math] of input instances of a fixed size [math]\displaystyle{ n }[/math], and a finite set of deterministic algorithms [math]\displaystyle{ \mathcal{A} }[/math]. For input [math]\displaystyle{ x\in\mathcal{X} }[/math] and algorithm [math]\displaystyle{ A\in\mathcal{A} }[/math], let [math]\displaystyle{ T(A, x) }[/math] denote the running time of algorithm [math]\displaystyle{ A }[/math] on input [math]\displaystyle{ x }[/math]. Then for all distributions [math]\displaystyle{ \mathcal{P} }[/math] over [math]\displaystyle{ \mathcal{X} }[/math] and [math]\displaystyle{ \mathcal{Q} }[/math] over [math]\displaystyle{ \mathcal{A} }[/math],
[math]\displaystyle{ \begin{align} \min_{A \in\mathcal{A}} \mathbf{E}_{x:\mathcal{P}} \left[ T(A, x) \right] &\le \max_{x\in\mathcal{X}} \mathbf{E}_{A:\mathcal{Q}} \left[T(A, x) \right] \end{align} }[/math].

Proving lower bounds with Yao's principle

Suppose that [math]\displaystyle{ \pi }[/math] is an arbitrary permutation of a set [math]\displaystyle{ S }[/math] of [math]\displaystyle{ n }[/math] elements, and we want to find the position of an arbitrary element [math]\displaystyle{ x\in S }[/math] in [math]\displaystyle{ \pi }[/math], with as few accesses to [math]\displaystyle{ \pi }[/math] as possible.

For deterministic algorithms, with a simple adversary argument, we know that any deterministic algorithm has to access [math]\displaystyle{ n-1 }[/math] positions of [math]\displaystyle{ \pi }[/math] in the worst case. A very natural randomized algorithm for this problem is:

repeat for (n-1) times:
  uniformly pick a random position r from all unaccessed positions;
  if π(r) = x, then return r;
return the only unaccessed position;

The algorithm can be equivalently viewed as first randomly reordering the [math]\displaystyle{ \pi }[/math], and then scanning from the 1st position to the [math]\displaystyle{ (n-1) }[/math]th position of the reordered permutation to find [math]\displaystyle{ x }[/math]. If [math]\displaystyle{ x }[/math] is in the last position of the reordered permutation, then after accessing the first [math]\displaystyle{ (n-1) }[/math] positions we automatically know that the last one must be [math]\displaystyle{ x }[/math].

For any [math]\displaystyle{ \pi }[/math] and [math]\displaystyle{ x }[/math], [math]\displaystyle{ x }[/math] is in every position of the randomly reordered [math]\displaystyle{ \pi }[/math] with equal probability [math]\displaystyle{ \frac{1}{n} }[/math]. The expected number of accesses made by our randomized searching algorithm is:

[math]\displaystyle{ \begin{align} \frac{1}{n}\left((n-1)+\sum_{k=1}^{n-1}k\right) &= \frac{n+1}{2}-\frac{1}{n} \end{align} }[/math]

How good is it? Is there any randomized algorithm that can do better than this? With the help of Yao's principle, we can prove that this bound is the best we can get.

We analyze this problem first for a special class of algorithms, then generalize to all algorithms.

Lower bound for oblivious algorithms

We call a search algorithm oblivious if the algorithm follows the same predetermined order to access the positions of any input permutation. For example, the following code of linearly scanning all positions is a typical oblivious algorithm.

for(i=1 to n-1){
    if π(i) = x
        return i;
}
return n;

Without loss of generality, we can rule out those stupid algorithms that access the same position twice. (Since we are proving lower bound, we are safe to do this.) Therefore, a deterministic oblivious algorithm can be represented by a permutation [math]\displaystyle{ \rho }[/math] of [math]\displaystyle{ \{1,2,\ldots,n\} }[/math], where the [math]\displaystyle{ i }[/math]th position to access is specified by [math]\displaystyle{ \rho(i) }[/math]. By a simple adversary argument, we know that except the last element (which can be figured out by knowing the rest [math]\displaystyle{ n-1 }[/math] elements), the algorithm has to access the exact position which stores [math]\displaystyle{ x }[/math]. Therefore, the following lemma holds:

Lemma 1
For an input permutation [math]\displaystyle{ \pi }[/math] and element [math]\displaystyle{ x }[/math], an oblivious algorithm with access sequence [math]\displaystyle{ \rho }[/math] has to access [math]\displaystyle{ k }[/math] many positions, where either [math]\displaystyle{ \pi(\rho(k))=x }[/math] or [math]\displaystyle{ k=n-1 }[/math].

A randomized oblivious algorithm is a probability distribution [math]\displaystyle{ \mathcal{Q} }[/math] over all oblivious algorithms. Due to Yao's principle, to prove a lower bound for randomized oblivious algorithms, it is sufficient to lower bound the expected number of accesses required by any deterministic oblivious algorithm (a permutation [math]\displaystyle{ \rho }[/math] of [math]\displaystyle{ \{1,2,\ldots,n\} }[/math]) for the random permutation [math]\displaystyle{ \pi }[/math] of [math]\displaystyle{ S }[/math] and the random [math]\displaystyle{ x\in S }[/math].

Let [math]\displaystyle{ \pi }[/math] be a uniformly random permutation of [math]\displaystyle{ S }[/math] and [math]\displaystyle{ x }[/math] be uniformly chosen from [math]\displaystyle{ S }[/math]. Because of the symmetry, we can assume that the access sequence [math]\displaystyle{ \rho }[/math] is just [math]\displaystyle{ 1,2,\ldots,n }[/math]. This is because changing to any other access sequence will not change the distribution of the [math]\displaystyle{ i }[/math]th accessed element [math]\displaystyle{ \pi(\rho(i)) }[/math] for any [math]\displaystyle{ i }[/math] (verify this by yourself).

Because both [math]\displaystyle{ x }[/math] and [math]\displaystyle{ \pi }[/math] are uniform, [math]\displaystyle{ x }[/math] appears in each position of [math]\displaystyle{ \pi }[/math] with equal probability. Due to Lemma 1, assuming that [math]\displaystyle{ x }[/math] is in the [math]\displaystyle{ k }[/math]th position, the number of accesses needed by the algorithm is [math]\displaystyle{ k }[/math] if [math]\displaystyle{ k\lt n }[/math] and is [math]\displaystyle{ (n-1) }[/math] if [math]\displaystyle{ k=n }[/math]. Taking the average, the expected number of accesses is

[math]\displaystyle{ \begin{align} \frac{1}{n}\left((n-1)+\sum_{k=1}^{n-1}k\right) &= \frac{n+1}{2}-\frac{1}{n} \end{align} }[/math]

Due to Yao's principle, this gives the lower bound of the expected running time of randomized oblivious algorithms for the problem.

Lower bound for adaptive algorithms

For an adaptive algorithm, the next position to access is determined by the results of all previous accesses.

Formally, let [math]\displaystyle{ \tau }[/math] be a decision tree such that for an input permutation [math]\displaystyle{ \pi }[/math] and an input element [math]\displaystyle{ x }[/math], the [math]\displaystyle{ k }[/math]th accessed position [math]\displaystyle{ i_k }[/math] is given by

[math]\displaystyle{ \begin{align} i_{k} &= \begin{cases} \tau(x) & \mbox{if }k=1\\ \tau\left(x,\pi\left(i_1\right),\pi\left(i_2\right),\ldots,\pi\left(i_{k-1}\right)\right) & \mbox{if }k\gt 1 \end{cases} \end{align} }[/math]

The following figure illustrates the structure of the decision tree [math]\displaystyle{ \tau }[/math]:

File:Ln2-decision-tree.png

This decision tree is an [math]\displaystyle{ n }[/math]-ary labeled tree. The label of each edge is the input to [math]\displaystyle{ \tau }[/math], and the label of each node is the access position returned by [math]\displaystyle{ \tau }[/math] for the sequence specified by the labels along the path from the root to the node.

This model seems having substantially more power than the oblivious algorithms, because it allows the algorithm be adaptive to the input. It actually covers all possible algorithms (no matter realistic or unrealistic) for the problem.

By an adversary argument, the algorithm still has to either access the position which stores [math]\displaystyle{ x }[/math] or access [math]\displaystyle{ (n-1) }[/math] positions, because if otherwise the adversary can always let [math]\displaystyle{ x }[/math] be stored in the position which is not the one returned by the algorithm.

Apply Yao's principle. Let [math]\displaystyle{ \pi }[/math] be a uniformly random permutation of [math]\displaystyle{ S }[/math] and [math]\displaystyle{ x }[/math] be uniformly chosen from [math]\displaystyle{ S }[/math]. Due to the uniformity of [math]\displaystyle{ \pi }[/math], conditioning on any [math]\displaystyle{ \pi\left(i_1\right),\pi\left(i_2\right),\ldots,\pi\left(i_{k-1}\right) }[/math] and any [math]\displaystyle{ x }[/math], for any [math]\displaystyle{ i_k\not\in\{i_1,i_2,\ldots,i_{k-1}\} }[/math], it holds that [math]\displaystyle{ \pi(i_k) }[/math] is uniformly distributed over the set [math]\displaystyle{ S-\{\pi\left(i_1\right),\pi\left(i_2\right),\ldots,\pi\left(i_{k-1}\right)\} }[/math] of the unaccessed elements, and the distribution of [math]\displaystyle{ \pi(i_k) }[/math] is independent of [math]\displaystyle{ \pi\left(i_1\right),\pi\left(i_2\right),\ldots,\pi\left(i_{k-1}\right) }[/math] and [math]\displaystyle{ x }[/math].

The above probability distribution is the same for both the oblivious [math]\displaystyle{ \{i_1,i_2,\ldots,i_{k-1}\} }[/math] which are fixed independent of [math]\displaystyle{ \pi }[/math] and the adaptive [math]\displaystyle{ \{i_1,i_2,\ldots,i_{k-1}\} }[/math] which depends on [math]\displaystyle{ \pi }[/math] and [math]\displaystyle{ x }[/math]. Therefore the expected number of accesses needed by an adaptive algorithms is the same as that of an oblivious algorithm. The same lower bound of [math]\displaystyle{ \frac{n+1}{2}-\frac{1}{n} }[/math] holds.