imported>Etone |
imported>Etone |
Line 1: |
Line 1: |
| = Metric Embedding=
| | 学号(研究生) 姓名 |
| A '''metric space''' is a pair <math>(X,d)</math>, where <math>X</math> is a set and <math>d</math> is a '''metric''' (or '''distance''') on <math>X</math>, i.e., a function
| |
| :<math>d:X^2\to\mathbb{R}_{\ge 0}</math>
| |
| such that for any <math>x,y,z\in X</math>, the following axioms hold:
| |
| # (identity of indiscernibles) <math>d(x,y)=0\Leftrightarrow x=y</math>
| |
| # (symmetry) <math>d(x,y)=d(y,x)</math>
| |
| # (triangle inequality) <math>d(x,z)\le d(x,y)+d(y,z)</math>
| |
|
| |
|
| Let <math>(X,d_X)</math> and <math>(Y,d_Y)</math> be two metric spaces. A mapping
| | DZ1928004 刘尹成 |
| :<math>\phi:X\to Y</math>
| |
| is called an '''embedding''' of metric space <math>X</math> into <math>Y</math>. The embedding is said to be with '''distortion''' <math>\alpha\ge1</math> if for any <math>x,y\in X</math> it holds that
| |
| :<math>\frac{1}{\alpha}\cdot d(x,y)\le d(\phi(x),\phi(y))\le \alpha\cdot d(x,y)</math>.
| |
|
| |
|
| In Computer Science, a typical scenario for the metric embedding is as follows. We want to solve some difficult computation problem on a metric space <math>(X,d)</math>. Instead of solving this problem directly on the original metric space, we embed the metric into a new metric space <math>(Y,d_Y)</math> (with low distortion) where the computation problem is much easier to solve.
| | MG1928002 陈旭 |
| | |
| | MG1928003 邓煜恒 |
| | |
| | MG1928005 龚丹毅 |
| | |
| | MG1928006 冀雅琴 |
| | |
| | MG1928007 康志杰 |
| | |
| | MG1928008 李敏 |
| | |
| | MG1928009 李同新 |
| | |
| | MG1928012 蔺惠娟 |
|
| |
|
| One particular important case for the metric embedding is to embed a high-dimensional metric space to a new metric space whose dimension is much lower. This is called dimension reduction. This can be very helpful because various very common computation tasks can be very hard to solve on high-dimensional space due to the [https://en.wikipedia.org/wiki/Curse_of_dimensionality curse of dimensionality].
| | MG1928013 令狐飘 |
|
| |
|
| = The Johnson-Lindenstrauss Theorem =
| | MG1928016 刘姝君 |
| The '''Johnson-Lindenstrauss Theorem''' or '''Johnson-Lindenstrauss Transformation''' (both shorten as '''JLT''') is a fundamental result for dimension reduction in Euclidian space.
| | |
| | MG1928018 卢昱彤 |
| | |
| | MG1928019 陆晓娟 |
| | |
| | MG1928020 马晨明 |
| | |
| | MG1928026 石天润 |
|
| |
|
| Recall that in Euclidian space <math>\mathbf{R}^d</math>, for any two points <math>x,y\in\mathbf{R}^d</math>, the Euclidian distance between them is given by <math>\|x-y\|=\sqrt{(x_1-y_1)^2+(x_2-y_2)^2+\cdots +(x_d-y_d)^2}</math>, where <math>\|\cdot\|=\|\cdot\|_2</math> denotes the Euclidian norm (a.k.a. the <math>\ell_2</math>-norm).
| | MG1928027 谭洁 |
| | |
| | MG1928029 陶智 |
| | |
| | MG1928030 肖成龙 |
|
| |
|
| The JLT says that in Euclidian space, it is always possible to embed a set of <math>n</math> points in ''arbitrary'' dimension to <math>O(\log n)</math> dimension with constant distortion. The theorem itself is stated formally as follows.
| | MG1928032 徐梓添 |
|
| |
|
| {{Theorem
| | MG1928037 赵驿航 |
| |Johnson-Lindenstrauss Theorem, 1984|
| |
| :For any <math>0<\epsilon<1</math> and any positive integer <math>n</math>, there is a positive integer <math>k=O(\epsilon^{-2}\log n)</math> such that the following holds:
| |
| :For any set <math>S\subset\mathbf{R}^d</math> with <math>|S|=n</math>, where <math>d</math> is arbitrary, there is an embedding <math>\phi:\mathbf{R}^d\rightarrow\mathbf{R}^k</math> such that
| |
| ::<math>\forall x,y\in S,\quad (1-\epsilon)\|x-y\|\le\|\phi(x)-\phi(y)\|\le(1+\epsilon)\|x-y\|</math>.
| |
| }}
| |
|
| |
|
| In fact, the embedding <math>\phi:\mathbf{R}^d\rightarrow\mathbf{R}^k</math> can be as simple as a linear transformation <math>A\in\mathbf{R}^{d\times k}</math> so that <math>\phi(x)=Ax</math> for any <math>x\in\mathbf{R}^{d}</math>.
| | MG1928038 陈喆 |
| Therefore, the above JLT can be stated more precisely as follows.
| | |
| | MG1928039 都昊 |
|
| |
|
| {{Theorem
| | MG1928045 彭蔚然 |
| |Johnson-Lindenstrauss Theorem (linear embedding)|
| | |
| :For any <math>0<\epsilon<1</math> and any positive integer <math>n</math>, there is a positive integer <math>k=O(\epsilon^{-2}\log n)</math> such that the following holds:
| | MG1928046 邱子键 |
| :For any set <math>S\subset\mathbf{R}^d</math> with <math>|S|=n</math>, where <math>d</math> is arbitrary, there is a linear transformation <math>A\in\mathbf{R}^{d\times k}</math> such that
| |
| ::<math>\forall x,y\in S,\quad (1-\epsilon)\|x-y\|\le\|Ax-Ay\|\le(1+\epsilon)\|x-y\|</math>.
| |
| }}
| |
|
| |
|
| The theorem is proved by the probabilistic method. Specifically, we construct a random matrix <math>A\in\mathbf{R}^{d\times k}</math> and show that with high probability (<math>1-O(1/n)</math>) it is a good embedding satisfying:
| | MG1928053 姚靖心 |
| :<math>\forall x,y\in S,\quad (1-\epsilon)\|x-y\|\le\|Ax-Ay\|\le(1+\epsilon)\|x-y\|</math>.
| |
| Therefore, if such random matrix <math>A\in\mathbf{R}^{d\times k}</math> is efficient to construct, it immediately gives us an efficient randomized algorithm for dimension reduction in the Euclidian space.
| |
|
| |
|
| There are several such constructions of the random matrix <math>A\in\mathbf{R}^{d\times k}</math>, including:
| | MG1928054 战杨志豪 |
| * projection onto uniform random <math>k</math>-dimensional subspace of <math>\mathbf{R}^{d}</math>; (The original construction of Johnson and Lindenstrauss in 1984; a simplified analysis of Dasgupta and Gupta in 1999)
| |
| * random matrix with i.i.d. Gaussian entries; (due to Indyk and Motwani in 1998)
| |
| * random matrix with i.i.d. -1/+1 entries; (due to Achlioptas in 2003)
| |
|
| |
|
| ==JLT via Gaussian matrix==
| | 学号(本科生) 姓名 |
|
| |
|
| ==Concentration of <math>\chi^2</math>-distribution ==
| | 161200070 赵志展 |
|
| |
|
| = Nearest Neighbor Search (NNS)=
| | 161158085 张昱培 |
| | |
| | 161170043 王雅媛 |
| | |
| | 161170054 游振宇 |
|
| |
|
| = Locality-Sensitive Hashing (LSH)=
| | 161158084 王译铭 |
| | |
| | 161158040 马梦楠 |
| | |
| | 161158029 栗卓 |
| | |
| | 161170026 刘世淦 |
| | |
| | 学号(交换生) 姓名 |
| | |
| | 198354018 張沁全 |