随机算法 (Fall 2011)/Cuckoo hashing
Cuckoo hashing*
Cuckoo hashing also achieves constant search time on a linear space, and after acquiring the hash functions, the accesses to the table can be done in parallel. Cuckoo hashing is considered by some researchers as a more "modern" hashing technique than FKS.
The data structure is defined as follows.
For a set [math]\displaystyle{ S }[/math] of [math]\displaystyle{ n }[/math] items from the universe:
- A table [math]\displaystyle{ T }[/math] of [math]\displaystyle{ M }[/math] entries.
- Two hash functions [math]\displaystyle{ h_1 }[/math] and [math]\displaystyle{ h_2 }[/math] that map the universe [math]\displaystyle{ [N] }[/math] to [math]\displaystyle{ [M] }[/math] are stored separately in fixed places.
- For any item [math]\displaystyle{ x\in S }[/math], either [math]\displaystyle{ T[h_1(x)] }[/math] or [math]\displaystyle{ T[h_2(x)] }[/math] stores [math]\displaystyle{ x }[/math].
To search for an item [math]\displaystyle{ x }[/math]:
- Retrieve [math]\displaystyle{ h_1 }[/math] and [math]\displaystyle{ h_2 }[/math] from where they are stored.
- If [math]\displaystyle{ T[h_1(x)]=x }[/math], returns [math]\displaystyle{ h_1(x) }[/math]; If [math]\displaystyle{ T[h_2(x)]=x }[/math], returns [math]\displaystyle{ h_2(x) }[/math]; otherwise, return NOT_FOUND.
After retrieving the hash functions, it accesses the table twice, which can be done in parallel.
We only need to guarantee that it is possible to find a placement of the [math]\displaystyle{ n }[/math] items, such that
- each item is placed in one of the two entries it is hashed to;
- no two items are placed into one entry.
This problem can be formulated as a matching problem:
Suppose we have a bipartite graph, where there are [math]\displaystyle{ n }[/math] vertices on left and [math]\displaystyle{ M }[/math] vertices on right. Each vertex on left corresponds to an item and each vertex on right corresponds to a table entry. An edge connect an item vertex and an entry vertex if at least one of the two hash functions maps the item to the entry. With this reformulation, the problem of finding a feasible placement of items is reduced to finding a matching (edge independent set) of size [math]\displaystyle{ n }[/math].
It was shown by Pagh that such a matching exists with constant probability for linear table size [math]\displaystyle{ M=O(n) }[/math] when [math]\displaystyle{ h_1 }[/math] and [math]\displaystyle{ h_2 }[/math] are drawn uniformly and independently from a [math]\displaystyle{ O(\log n) }[/math]-universal family of hash functions. We will not describe the detailed proof here.