高级算法 (Fall 2019)/Dimension Reduction: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>Etone
imported>Etone
Line 16: Line 16:
= The Johnson-Lindenstrauss Theorem =
= The Johnson-Lindenstrauss Theorem =
The '''Johnson-Lindenstrauss Theorem''' ('''JLT''') is a fundamental result for dimension reduction in Euclidian space.
The '''Johnson-Lindenstrauss Theorem''' ('''JLT''') is a fundamental result for dimension reduction in Euclidian space.
Recall that in Euclidian space <math>\mathbf{R}^d</math>, for any two points <math>x,y\in\mathbf{R}^d</math>, the Euclidian distance between them is given by <math>\|x-y\|=\sqrt{(x_1-y_1)^2+(x_2-y_2)^2+\cdots +(x_d-y_d)^2}</math>, where <math>\|\cdot\|=\|\cdot\|_2</math> denotes the Euclidian norm (a.k.a. the <math>\ell_2</math>-norm).
The JLT says that in Euclidian space, it is always possible to embed a set of <math>n</math> points in ''arbitrary'' dimension to <math>O(\log n)</math> dimension with constant distortion. The theorem itself is stated formally as follows.
{{Theorem
|Johnson-Lindenstrauss Theorem, 1984|
:For any <math>0<\epsilon<1</math> and any positive integer <math>n</math>, there is a positive integer <math>k=O(\epsilon^{-2}\log n)</math> such that the following holds:
:For any set <math>S\subset\mathbf{R}^d</math> with <math>|S|=n</math>, where <math>d</math> is arbitrary, there is an embedding <math>\phi:\mathbf{R}^d\rightarrow\mathbf{R}^k</math> such that
::<math>\forall x,y\in S,\quad (1-\epsilon)\|x-y\|\le\|\phi(x)-\phi(y)\|\le(1+\epsilon)\|x-y\|</math>.
}}
In fact, the embedding <math>\phi:\mathbf{R}^d\rightarrow\mathbf{R}^k</math> can be as simple as a linear transformation <math>A\in\mathbf{R}^{d\times k}</math> so that <math>\phi(x)=Ax</math> for any <math>x\in\mathbf{R}^{d}</math>.
Therefore, the above JLT can be stated more precisely as follows.
{{Theorem
|Johnson-Lindenstrauss Theorem (linear embedding)|
:For any <math>0<\epsilon<1</math> and any positive integer <math>n</math>, there is a positive integer <math>k=O(\epsilon^{-2}\log n)</math> such that the following holds:
:For any set <math>S\subset\mathbf{R}^d</math> with <math>|S|=n</math>, where <math>d</math> is arbitrary, there is a linear transformation <math>A\in\mathbf{R}^{d\times k}</math> such that
::<math>\forall x,y\in S,\quad (1-\epsilon)\|x-y\|\le\|Ax-Ay\|\le(1+\epsilon)\|x-y\|</math>.
}}


= Nearest Neighbor Search (NNS)=
= Nearest Neighbor Search (NNS)=


= Locality-Sensitive Hashing (LSH)=
= Locality-Sensitive Hashing (LSH)=

Revision as of 08:29, 15 October 2019

Metric Embedding

A metric space is a pair [math]\displaystyle{ (X,d) }[/math], where [math]\displaystyle{ X }[/math] is a set and [math]\displaystyle{ d }[/math] is a metric (or distance) on [math]\displaystyle{ X }[/math], i.e., a function

[math]\displaystyle{ d:X^2\to\mathbb{R}_{\ge 0} }[/math]

such that for any [math]\displaystyle{ x,y,z\in X }[/math], the following axioms hold:

  1. (identity of indiscernibles) [math]\displaystyle{ d(x,y)=0\Leftrightarrow x=y }[/math]
  2. (symmetry) [math]\displaystyle{ d(x,y)=d(y,x) }[/math]
  3. (triangle inequality) [math]\displaystyle{ d(x,z)\le d(x,y)+d(y,z) }[/math]

Let [math]\displaystyle{ (X,d_X) }[/math] and [math]\displaystyle{ (Y,d_Y) }[/math] be two metric spaces. A mapping

[math]\displaystyle{ \phi:X\to Y }[/math]

is called an embedding of metric space [math]\displaystyle{ X }[/math] into [math]\displaystyle{ Y }[/math]. The embedding is said to with distortion [math]\displaystyle{ \alpha\ge1 }[/math] if for any [math]\displaystyle{ x,y\in X }[/math] it holds that

[math]\displaystyle{ \frac{1}{\alpha}\cdot d(x,y)\le d(\phi(x),\phi(y))\le \alpha\cdot d(x,y) }[/math].

In particular, when an embedding reduces the dimension of the metric space, such metric embedding is usually called dimension reduction.

The Johnson-Lindenstrauss Theorem

The Johnson-Lindenstrauss Theorem (JLT) is a fundamental result for dimension reduction in Euclidian space.

Recall that in Euclidian space [math]\displaystyle{ \mathbf{R}^d }[/math], for any two points [math]\displaystyle{ x,y\in\mathbf{R}^d }[/math], the Euclidian distance between them is given by [math]\displaystyle{ \|x-y\|=\sqrt{(x_1-y_1)^2+(x_2-y_2)^2+\cdots +(x_d-y_d)^2} }[/math], where [math]\displaystyle{ \|\cdot\|=\|\cdot\|_2 }[/math] denotes the Euclidian norm (a.k.a. the [math]\displaystyle{ \ell_2 }[/math]-norm).

The JLT says that in Euclidian space, it is always possible to embed a set of [math]\displaystyle{ n }[/math] points in arbitrary dimension to [math]\displaystyle{ O(\log n) }[/math] dimension with constant distortion. The theorem itself is stated formally as follows.

Johnson-Lindenstrauss Theorem, 1984
For any [math]\displaystyle{ 0\lt \epsilon\lt 1 }[/math] and any positive integer [math]\displaystyle{ n }[/math], there is a positive integer [math]\displaystyle{ k=O(\epsilon^{-2}\log n) }[/math] such that the following holds:
For any set [math]\displaystyle{ S\subset\mathbf{R}^d }[/math] with [math]\displaystyle{ |S|=n }[/math], where [math]\displaystyle{ d }[/math] is arbitrary, there is an embedding [math]\displaystyle{ \phi:\mathbf{R}^d\rightarrow\mathbf{R}^k }[/math] such that
[math]\displaystyle{ \forall x,y\in S,\quad (1-\epsilon)\|x-y\|\le\|\phi(x)-\phi(y)\|\le(1+\epsilon)\|x-y\| }[/math].

In fact, the embedding [math]\displaystyle{ \phi:\mathbf{R}^d\rightarrow\mathbf{R}^k }[/math] can be as simple as a linear transformation [math]\displaystyle{ A\in\mathbf{R}^{d\times k} }[/math] so that [math]\displaystyle{ \phi(x)=Ax }[/math] for any [math]\displaystyle{ x\in\mathbf{R}^{d} }[/math]. Therefore, the above JLT can be stated more precisely as follows.

Johnson-Lindenstrauss Theorem (linear embedding)
For any [math]\displaystyle{ 0\lt \epsilon\lt 1 }[/math] and any positive integer [math]\displaystyle{ n }[/math], there is a positive integer [math]\displaystyle{ k=O(\epsilon^{-2}\log n) }[/math] such that the following holds:
For any set [math]\displaystyle{ S\subset\mathbf{R}^d }[/math] with [math]\displaystyle{ |S|=n }[/math], where [math]\displaystyle{ d }[/math] is arbitrary, there is a linear transformation [math]\displaystyle{ A\in\mathbf{R}^{d\times k} }[/math] such that
[math]\displaystyle{ \forall x,y\in S,\quad (1-\epsilon)\|x-y\|\le\|Ax-Ay\|\le(1+\epsilon)\|x-y\| }[/math].

Nearest Neighbor Search (NNS)

Locality-Sensitive Hashing (LSH)