P versus NP

From TCS Wiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

P versus NP is the following question of interest to people working with computers and in mathematics: Can every solved problem whose answer can be checked quickly by a computer also be quickly solved by a computer? P and NP are the two types of maths problems referred to: P problems are fast for computers to solve, and so are considered "easy". NP problems are fast (and so "easy") for a computer to check, but are not necessarily easy to solve.

In 1956, Kurt Gödel wrote a letter to John von Neumann. In this letter, Gödel asked whether a certain NP complete problem could be solved in quadratic or linear time.[1] In 1971, Stephen Cook introduced the precise statement of the P versus NP problem in his article "The complexity of theorem proving procedures".[2]

Today, many people consider this problem to be the most important open problem in computer science.[3] It is one of the seven Millennium Prize Problems selected by the Clay Mathematics Institute to carry a US$1,000,000 prize for the first correct solution.

Clarifications

For instance, if you have a problem, and someone says "The answer to your problem is the set of numbers 1, 2, 3, 4, 5", a computer may be able, quickly, to figure out if the answer is right or wrong, but it may take a very long time for the computer to actually come up with "1, 2, 3, 4, 5" on its own. For some interesting, practical questions of this kind, we lack any way to find an answer quickly, but if we are provided an answer, it is possible to check—that is, to verify—the answer quickly. In this way, NP problems may be thought of as being like riddles: it may be hard to come up with the answer to a riddle, but once one hears the answer, the answer seems obvious. In this comparison (analogy), the basic question is: are riddles really as hard as we think they are, or are we missing something?

Because these kinds of P versus NP questions are so practically important, many mathematicians, scientists, and computer programmers want to prove the general proposition, that every quickly-checked problem can also be solved quickly. This question is important enough that the Clay Mathematical Institute will give $1,000,000 to anyone who successfully provides a proof.

Digging a little deeper, we see that all P problems are NP problems: it is easy to check that a solution is correct by solving the problem and comparing the two solutions. However, people want to know about the opposite: Are there any NP problems other than P problems, or are all NP problems just P problems? If NP problems are really not the same as P problems (P ≠ NP), it would mean that no general, fast ways to solve those NP problems can exist, no matter how hard we look. However if all NP problems are P problems (P = NP), it would mean that new, very fast problem-solving methods do exist. We just have not found them yet.

Since the best efforts of scientists and mathematicians have not found general, easy methods for solving NP problems yet, many people believe that there are NP problems other than P problems (that is, that P ≠ NP is true). Most mathematicians also believe this to be true, but currently no one has proven it by rigorous mathematical analysis. If it can be proven that NP and P are the same (P = NP is true), it would have a huge impact on many aspects of day-to-day life. For this reason, the question of P versus NP is an important and widely studied topic.

Example

Suppose someone wants to build two towers, by stacking rocks of different mass. One wants to make sure that each of the towers has exactly the same mass. That means one will have to put the rocks into two piles that have the same mass. If one guesses a division of the rocks that one thinks will work, it would be easy for one to check if one was right. (To check the answer, one can divide the rocks into two piles, then use a balance to see if they have the same mass.) Because it is easy to check this problem, called 'Partition' by computer scientists—easier than to solve it outright, as we will see—it is an NP problem.

How hard is it to solve, outright? If one starts with just 100 rocks, there are [math]\displaystyle{ 2^{100-1}-1 = 633,825,300,114,114,700,748,351,602,687 }[/math], or about [math]\displaystyle{ 6.3\times 10^{29} }[/math] possible ways (combinations) to divide these rocks into two piles. If one could check one unique combination of rocks every day, it would take [math]\displaystyle{ 1.3\times 10^{22} }[/math] years of effort. For comparison, physicists believe that the universe is about [math]\displaystyle{ 1.4\times 10^{10} }[/math] years old [math]\displaystyle{ (450,000,000,000,000,000 }[/math] or about [math]\displaystyle{ 4.5\times 10^{17} }[/math] seconds[math]\displaystyle{ ) }[/math], or about one trillionth as old as the time it would take for our rock piling effort. That means that if one takes all of the time that has passed since the beginning of the universe, one would need to check more than two trillion [math]\displaystyle{ (2,000,000,000,000) }[/math] different ways of dividing the rocks (combinations) every second, in order to check all of the different ways.

If one programmed a powerful computer, to test all of these ways to divide the rocks, one might be able to check [math]\displaystyle{ 1,000,000 }[/math] combinations per second using current systems. This means one would still need [math]\displaystyle{ 2,000,000 }[/math] very powerful computers, working since the origin of the universe, to test all the ways of dividing the rocks.

However, it may be possible to find a method of dividing the rocks into two equal piles without checking all combinations. The question "Is P equal to NP?" is a shorthand for asking if any method like that can exist.

Why it matters

There are many important NP problems that people don't know how to solve in a way that is faster than testing every possible answer. Here are some examples:

  • A travelling salesman wants to visit 100 cities by driving, starting and ending his trip at home. He has a limited supply of gasoline, so he can only drive a total of 10,000 kilometers. He wants to know if he can visit all of the cities without running out of gasoline.
  • A school offers 100 different classes, and a teacher needs to choose one hour for each class' final exam. To prevent cheating, all of the students who take a class must take the exam for that class at the same time. If a student takes more than one class, then all of those exams must be at a different time. The teacher wants to know if he can schedule all of the exams in the same day so that every student is able to take the exam for each of their classes.
  • A farmer wants to take 100 watermelons of different masses to the market. She needs to pack the watermelons into boxes. Each box can only hold 20 kilograms without breaking. The farmer needs to know if 10 boxes will be enough for her to carry all 100 watermelons to market. (This is trivial, if no more than one watermelon weighs more than 2 kg then any 10 can be placed in each of the crates, if no more than ten watermelons weighs more than 2 kg then one of each of them can be placed in each crate, etc., to a fast solution; observation will be the key to any rapid solution such as this or the number set problem).
  • A large art gallery has many rooms, and each wall is covered with many expensive paintings. The owner of the gallery wants to buy cameras to watch these paintings, in case a thief tries to steal any of them. He wants to know if 100 cameras will be enough for him to make sure that each painting can be seen by at least one camera.
  • The clique problem: The principal of a school has a list of which students are friends with each other. She wants to find a group of 10% of the students that are all friends with each other.

Exponential Time

In the example above, we see that with [math]\displaystyle{ 100 }[/math] rocks, there are [math]\displaystyle{ 2^{100} }[/math] ways to partition the set of rocks. With [math]\displaystyle{ n }[/math] rocks, there are [math]\displaystyle{ 2^n }[/math] combinations. The function [math]\displaystyle{ f(n) = 2^n }[/math] is an exponential function. It's important to NP because it models the worst-case number of computations that are needed to solve a problem and, thus, the worst-case amount of time required.

And so far, for the hard problems, the solutions have required on the order of [math]\displaystyle{ 2^n }[/math] computations. For any particular problem, people have found ways to reduce the number of computations needed. One might figure out that a way to do just 1% of the worst-case number of computation and that saves a lot of computing, but that is still [math]\displaystyle{ 0.01 \times (2^n) }[/math] computations. And every extra rock still doubles the number of computations needed to solve the problem. There are insights that can produce methods to do even fewer computations producing variations of the model: e.g. [math]\displaystyle{ 2^n / n^3 }[/math]. But the exponential function still dominates as [math]\displaystyle{ n }[/math] grows.

Consider the problem of scheduling exams (described above). But suppose, next, that there are 15000 students. There's a computer program that takes the schedules of all 15000 students. It runs in an hour and outputs an exam schedule so that all students can do their exams in one week. It satisfies lots of rules (no back-to-back exams, no more than 2 exams in any 28 hour period, ...) to limit the stress of exam week. The program runs for one hour at mid-term break and everyone knows his/her exam schedule with plenty of time to prepare.

The next year, though, there are 10 more students. If the same program runs on the same computer then that one hour is going to turn into [math]\displaystyle{ 2^{10} }[/math] hours, because every additional student doubles the computations. That's [math]\displaystyle{ 6 }[/math] weeks! If there were 20 more students, then

[math]\displaystyle{ 2^{20} }[/math] hours = [math]\displaystyle{ 1048576 }[/math] hours ~ [math]\displaystyle{ 43691 }[/math] days ~ [math]\displaystyle{ 113 }[/math] years

Thus, for [math]\displaystyle{ 15000 }[/math] students, it takes one hour. For [math]\displaystyle{ 15020 }[/math] students, it takes [math]\displaystyle{ 113 }[/math] years.

As you can see, exponential functions grow really fast. Most mathematicians believe that the hardest NP problems require exponential time to solve.

NP-complete problems

Mathematicians can show that there are some NP problems that are NP-Complete. An NP-Complete problem is at least as difficult to solve as any other NP problem. This means that if someone found a method to solve any NP-Complete problem quickly, they could use that same method to solve every NP problem quickly. All of the problems listed above are NP-Complete, so if the salesman found a way to plan his trip quickly, he could tell the teacher, and she could use that same method to schedule the exams. The farmer could use the same method to determine how many boxes she needs, and the woman could use the same method to find a way to build her towers.

Because a method that quickly solves one of these problems can solve them all, there are many people who want to find one. However, because there are so many different NP-Complete problems and nobody so far has found a way to solve even one of them quickly, most experts believe that solving NP-Complete problems quickly is not possible.

Basic Properties

In computational complexity theory, the complexity class NP-complete (abbreviated NP-C or NPC), is a class of problems having two properties:

  • It is in the set of NP (non-deterministic polynomial time) problems: Any given solution to the problem can be verified quickly (in polynomial time).
  • It is also in the set of NP-hard problems: Those which are at least as hard as the hardest problems in NP. Problems that are NP-hard do not have to be elements of NP; indeed, they may not even be decidable.

Formal overview

NP-complete is a subset of NP, the set of all decision problems whose solutions can be verified in polynomial time; NP may be equivalently defined as the set of decision problems solved in polynomial time on a machine. A problem p in NP is also in NPC if and only if every other problem in NP is transformed into p in polynomial time. NP-complete was to be used as an adjective: problems in the class NP-complete were as NP+complete problems.

NP-complete problems are studied because the ability to quickly verify solutions to a problem (NP) seems to correlate with the ability to quickly solve problem (P). It is found every problem in NP is quickly solved—as called the P = NP: problem set. The single problem in NP-complete is solved quickly, faster than every problem in NP also quickly solved, because the definition of an NP-complete problem states every problem in NP must be quickly reducible to every problem in NP-complete (it is reduced in polynomial time). [1]

Examples

The Boolean satisfiability problem is known to be NP complete. In 1972, Richard Karp formulated 21 problems that are known to be NP-complete.[4] These are known as Karp's 21 NP-complete problems. They include problems such as the Integer programming problem, which applies linear programming techniques to the integers, the knapsack problem, or the vertex cover problem.

References

Template:Reflist

  1. Juris Hartmanis 1989, Gödel, von Neumann, and the P = NP problem, Bulletin of the European Association for Theoretical Computer Science, vol. 38, pp. 101–107
  2. Template:Cite book
  3. Lance Fortnow, The status of the P versus NP problem, Communications of the ACM 52 (2009), no. 9, pp. 78–86. Template:Doi
  4. Template:Cite book