高级算法 (Fall 2019)/Dimension Reduction
Metric Embedding
A metric space is a pair [math]\displaystyle{ (X,d) }[/math], where [math]\displaystyle{ X }[/math] is a set and [math]\displaystyle{ d }[/math] is a metric (or distance) on [math]\displaystyle{ X }[/math], i.e., a function
- [math]\displaystyle{ d:X^2\to\mathbb{R}_{\ge 0} }[/math]
such that for any [math]\displaystyle{ x,y,z\in X }[/math], the following axioms hold:
- (identity of indiscernibles) [math]\displaystyle{ d(x,y)=0\Leftrightarrow x=y }[/math]
- (symmetry) [math]\displaystyle{ d(x,y)=d(y,x) }[/math]
- (triangle inequality) [math]\displaystyle{ d(x,z)\le d(x,y)+d(y,z) }[/math]
Let [math]\displaystyle{ (X,d_X) }[/math] and [math]\displaystyle{ (Y,d_Y) }[/math] be two metric spaces. A mapping
- [math]\displaystyle{ \phi:X\to Y }[/math]
is called an embedding of metric space [math]\displaystyle{ X }[/math] into [math]\displaystyle{ Y }[/math]. The embedding is said to with distortion [math]\displaystyle{ \alpha\ge1 }[/math] if for any [math]\displaystyle{ x,y\in X }[/math] it holds that
- [math]\displaystyle{ \frac{1}{\alpha}\cdot d(x,y)\le d(\phi(x),\phi(y))\le \alpha\cdot d(x,y) }[/math].
In particular, when an embedding reduces the dimension of the metric space, such metric embedding is usually called dimension reduction.
The Johnson-Lindenstrauss Theorem
The Johnson-Lindenstrauss Theorem (JLT) is a fundamental result for dimension reduction in Euclidian space.
Recall that in Euclidian space [math]\displaystyle{ \mathbf{R}^d }[/math], for any two points [math]\displaystyle{ x,y\in\mathbf{R}^d }[/math], the Euclidian distance between them is given by [math]\displaystyle{ \|x-y\|=\sqrt{(x_1-y_1)^2+(x_2-y_2)^2+\cdots +(x_d-y_d)^2} }[/math], where [math]\displaystyle{ \|\cdot\|=\|\cdot\|_2 }[/math] denotes the Euclidian norm (a.k.a. the [math]\displaystyle{ \ell_2 }[/math]-norm).
The JLT says that in Euclidian space, it is always possible to embed a set of [math]\displaystyle{ n }[/math] points in arbitrary dimension to [math]\displaystyle{ O(\log n) }[/math] dimension with constant distortion. The theorem itself is stated formally as follows.
Johnson-Lindenstrauss Theorem, 1984 - For any [math]\displaystyle{ 0\lt \epsilon\lt 1 }[/math] and any positive integer [math]\displaystyle{ n }[/math], there is a positive integer [math]\displaystyle{ k=O(\epsilon^{-2}\log n) }[/math] such that the following holds:
- For any set [math]\displaystyle{ S\subset\mathbf{R}^d }[/math] with [math]\displaystyle{ |S|=n }[/math], where [math]\displaystyle{ d }[/math] is arbitrary, there is an embedding [math]\displaystyle{ \phi:\mathbf{R}^d\rightarrow\mathbf{R}^k }[/math] such that
- [math]\displaystyle{ \forall x,y\in S,\quad (1-\epsilon)\|x-y\|\le\|\phi(x)-\phi(y)\|\le(1+\epsilon)\|x-y\| }[/math].
In fact, the embedding [math]\displaystyle{ \phi:\mathbf{R}^d\rightarrow\mathbf{R}^k }[/math] can be as simple as a linear transformation [math]\displaystyle{ A\in\mathbf{R}^{d\times k} }[/math] so that [math]\displaystyle{ \phi(x)=Ax }[/math] for any [math]\displaystyle{ x\in\mathbf{R}^{d} }[/math]. Therefore, the above JLT can be stated more precisely as follows.
Johnson-Lindenstrauss Theorem (linear embedding) - For any [math]\displaystyle{ 0\lt \epsilon\lt 1 }[/math] and any positive integer [math]\displaystyle{ n }[/math], there is a positive integer [math]\displaystyle{ k=O(\epsilon^{-2}\log n) }[/math] such that the following holds:
- For any set [math]\displaystyle{ S\subset\mathbf{R}^d }[/math] with [math]\displaystyle{ |S|=n }[/math], where [math]\displaystyle{ d }[/math] is arbitrary, there is a linear transformation [math]\displaystyle{ A\in\mathbf{R}^{d\times k} }[/math] such that
- [math]\displaystyle{ \forall x,y\in S,\quad (1-\epsilon)\|x-y\|\le\|Ax-Ay\|\le(1+\epsilon)\|x-y\| }[/math].