高级算法 (Fall 2019)/Dimension Reduction
Metric Embedding
A metric space is a pair
such that for any
- (identity of indiscernibles)
- (symmetry)
- (triangle inequality)
Let
is called an embedding of metric space
.
In Computer Science, a typical scenario for the metric embedding is as follows. We want to solve some difficult computation problem on a metric space
One particular important case for the metric embedding is to embed a high-dimensional metric space to a new metric space whose dimension is much lower. This is called dimension reduction. This can be very helpful because various very common computation tasks can be very hard to solve on high-dimensional space due to the curse of dimensionality.
The Johnson-Lindenstrauss Theorem
The Johnson-Lindenstrauss Theorem or Johnson-Lindenstrauss Transformation (both shorten as JLT) is a fundamental result for dimension reduction in Euclidian space.
Recall that in Euclidian space
The JLT says that in Euclidian space, it is always possible to embed a set of
Johnson-Lindenstrauss Theorem, 1984 - For any
and any positive integer , there is a positive integer such that the following holds: - For any set
with , where is arbitrary, there is an embedding such that .
- For any
The Johnson-Lindenstrauss Theorem is usually stated for the
In fact, the embedding
Johnson-Lindenstrauss Theorem (linear embedding) - For any
and any positive integer , there is a positive integer such that the following holds: - For any set
with , where is arbitrary, there is a linear transformation such that .
- For any
The theorem is proved by the probabilistic method. Specifically, we construct a random matrix
.
Therefore, if such random matrix
There are several such constructions of the random matrix
- projection onto uniform random
-dimensional subspace of ; (the original construction of Johnson and Lindenstrauss in 1984; a simplified analysis due to Dasgupta and Gupta in 1999) - random matrix with i.i.d. Gaussian entries; (due to Indyk and Motwani in 1998)
- random matrix with i.i.d. -1/+1 entries; (due to Achlioptas in 2003)
It was proved by Kasper Green Larsen and Jelani Nelson in 2016 that the JLT is optimal: there is a set of
JLT via Gaussian matrix
Here we prove the Johnson-Lindenstrauss theorem with the second construction, by the random matrix
- Each entry of
is drawn independently from the Gaussian distribution .
- Each entry of
Recall that a Gaussian distribution
and ,
and the probability density function is given by
.
Fix any two points
,
then by the union bound, the following event
holds with probability
Furthermore, dividing both sides of the inequalities
,
where
Therefore, the Johnson-Lindenstrauss theorem is proved once the following theorem on the unit vector is proved.
Theorem (JLT on unit vector) - For any positive
there is a positive integer such that for the random matrix with each entry drawn independently from the Gaussian distribution , for any unit vector with , .
- For any positive
For any
is determined by a distinct row vector
Moreover, since each entry
,
which is precisely
In summary, the random variable
,
where each
Concentration of -distribution
By the above argument, the Johnson-Lindenstrauss theorem is proved by the following concentration inequality for the sum of the squares of
Chernoff bound for sum-of-squares of Gaussian distributions - For any positive
there is a positive integer such that for i.i.d. Gaussian random variables , .
- For any positive
First, we see that this is indeed a concentration inequality that bounds the deviation of a random variable from its mean.
For each
.
By linearity of expectation, it holds that
.
We prove an equivalent concentration bound stated for the
Chernoff bound for the -distribution- For i.i.d. standard Gaussian random variables
, for , , .
- For i.i.d. standard Gaussian random variables
Note that this indeed implies the above concentration bound by setting
Proof. We first prove the upper tail. Let
be a parameter to be determined.Proposition - For standard Gaussian
, .
- For standard Gaussian
Proof. The last equation is because
is the total probability mass of a standard Gaussian distribution (which is obviously 1).
Then continue the above calculation:
This proved the upper tail. The lower tail can be symmetrically proved by optimizing over a parameter
.
As we argued above, this finishes the proof of the Johnson-Lindenstrauss Theorem.
Nearest Neighbor Search (NNS)
Nearest Neighbor Search (NNS) - Data: a set
of points from a metric space ; - Query: a point
; - Answer: a nearest neighbor of the query point
among all data points, i.e. a data point such that .
- Data: a set
-ANN (Approximate Nearest Neighbor) is an approximate ratio.
- Data: a set
of points from a metric space ; - Query: a point
; - Answer: a
-approximate nearest neighbor of the query point among all data points, i.e. a data point such that .
-ANN (Approximate Near Neighbor) is an approximate ratio; is a radius;
- Data: a set
of points from a metric space ; - Query: a point
; - Answer:
Theorem - For any instance
of the nearest neighbor problem, let , and . - Suppose that for any
there always exists a data structure for the -ANN problem, using space at most , answering each query in time , correctly with probability . - Then there exists a data structure for the
-ANN problem, using space , answering each query in time , correctly with probability .
- For any instance