高级算法 (Fall 2019)/Dimension Reduction and Namelist Assignment4 2019: Difference between pages

VisualWikitext

Latest revision as of 00:42, 4 December 2019

学号（研究生）姓名

DZ1928004 刘尹成

MG1928002 陈旭

MG1928003 邓煜恒

MG1928005 龚丹毅

MG1928006 冀雅琴

MG1928007 康志杰

MG1928008 李敏

MG1928009 李同新

MG1928012 蔺惠娟

MG1928013 令狐飘

MG1928016 刘姝君

MG1928018 卢昱彤

MG1928019 陆晓娟

MG1928020 马晨明

MG1928026 石天润

MG1928027 谭洁

MG1928029 陶智

MG1928030 肖成龙

MG1928032 徐梓添

MG1928037 赵驿航

MG1928038 陈喆

MG1928039 都昊

MG1928045 彭蔚然

MG1928046 邱子键

MG1928053 姚靖心

MG1928054 战杨志豪

学号（本科生）姓名

161200070 赵志展

161158085 张昱培

161170043 王雅媛

161170054 游振宇

161158084 王译铭

161158040 马梦楠

161158029 栗卓

161170026 刘世淦

学号（交换生）姓名

198354018 張沁全

@@ Line 1: / Line 1: @@
-= Metric Embedding=
+学号（研究生）             姓名
-A '''metric space''' is a pair <math>(X,d)</math>, where <math>X</math> is a set and <math>d</math> is a '''metric''' (or '''distance''') on <math>X</math>, i.e., a function
-:<math>d:X^2\to\mathbb{R}_{\ge 0}</math>
-such that for any <math>x,y,z\in X</math>, the following axioms hold:
-# (identity of indiscernibles) <math>d(x,y)=0\Leftrightarrow x=y</math>
-# (symmetry) <math>d(x,y)=d(y,x)</math>
-# (triangle inequality) <math>d(x,z)\le d(x,y)+d(y,z)</math>
-Let <math>(X,d_X)</math> and <math>(Y,d_Y)</math> be two metric spaces. A mapping
+DZ1928004		刘尹成
-:<math>\phi:X\to Y</math>
-is called an '''embedding''' of metric space <math>X</math> into <math>Y</math>. The embedding is said to be with '''distortion'''  <math>\alpha\ge1</math> if for any <math>x,y\in X</math> it holds that
-:<math>\frac{1}{\alpha}\cdot d(x,y)\le d(\phi(x),\phi(y))\le \alpha\cdot d(x,y)</math>.
-In Computer Science, a typical scenario for the metric embedding is as follows. We want to solve some difficult computation problem on a metric space <math>(X,d)</math>. Instead of solving this problem directly on the original metric space, we embed the metric into a new metric space <math>(Y,d_Y)</math> (with low distortion) where the computation problem is much easier to solve.
+MG1928002		陈旭
+MG1928003		邓煜恒
+MG1928005		龚丹毅
+MG1928006		冀雅琴
+MG1928007		康志杰
+MG1928008		李敏
+MG1928009		李同新
+MG1928012		蔺惠娟
-One particular important case for the metric embedding is to embed a high-dimensional metric space to a new metric space whose dimension is much lower. This is called dimension reduction. This can be very helpful because various very common computation tasks can be very hard to solve on high-dimensional space due to the [https://en.wikipedia.org/wiki/Curse_of_dimensionality curse of dimensionality].
+MG1928013		令狐飘
-= The Johnson-Lindenstrauss Theorem =
+MG1928016		刘姝君
-The '''Johnson-Lindenstrauss Theorem''' or '''Johnson-Lindenstrauss Transformation''' (both shorten as '''JLT''') is a fundamental result for dimension reduction in Euclidian space.
+MG1928018		卢昱彤
+MG1928019		陆晓娟
+MG1928020		马晨明
+MG1928026		石天润
-Recall that in Euclidian space <math>\mathbf{R}^d</math>, for any two points <math>x,y\in\mathbf{R}^d</math>, the Euclidian distance between them is given by <math>\|x-y\|=\sqrt{(x_1-y_1)^2+(x_2-y_2)^2+\cdots +(x_d-y_d)^2}</math>, where <math>\|\cdot\|=\|\cdot\|_2</math> denotes the Euclidian norm (a.k.a. the <math>\ell_2</math>-norm).
+MG1928027		谭洁
+MG1928029		陶智
+MG1928030		肖成龙
-The JLT says that in Euclidian space, it is always possible to embed a set of <math>n</math> points in ''arbitrary'' dimension to <math>O(\log n)</math> dimension with constant distortion. The theorem itself is stated formally as follows.
+MG1928032		徐梓添
-{{Theorem
+MG1928037		赵驿航
-|Johnson-Lindenstrauss Theorem, 1984|
-:For any <math>0<\epsilon<1/2</math> and any positive integer <math>n</math>, there is a positive integer <math>k=O(\epsilon^{-2}\log n)</math> such that the following holds:
-:For any set <math>S\subset\mathbf{R}^d</math> with <math>|S|=n</math>, where <math>d</math> is arbitrary, there is an embedding <math>\phi:\mathbf{R}^d\rightarrow\mathbf{R}^k</math> such that
-::<math>\forall x,y\in S,\quad (1-\epsilon)\|x-y\|^2\le\|\phi(x)-\phi(y)\|^2\le(1+\epsilon)\|x-y\|^2</math>.
-}}
-The Johnson-Lindenstrauss Theorem is usually stated for the <math>\ell_2^2</math>-norm <math>\|\cdot\|^2</math> instead of the Euclidian norm <math>\|\cdot\|</math> itself. Note that this does not change anything other than the constant faction in <math>k=O(\epsilon^{-2}\log n)</math> because <math>(1\pm\epsilon)^{\frac{1}{2}}=1\pm\Theta(\epsilon)</math>. The reason for stating the theorem in <math>\ell_2^2</math>-norm is because <math>\|\cdot\|^2</math> is a sum (rather than a square root) which is easier to analyze.
+MG1928038		陈喆
+MG1928039		都昊
-In fact, the embedding <math>\phi:\mathbf{R}^d\rightarrow\mathbf{R}^k</math> can be as simple as a linear transformation <math>A\in\mathbf{R}^{k\times d}</math> so that <math>\phi(x)=Ax</math> for any <math>x\in\mathbf{R}^{d}</math>.
+MG1928045		彭蔚然
-Therefore, the above theorem can be stated more precisely as follows.
+MG1928046		邱子键
-{{Theorem
+MG1928053		姚靖心
-|Johnson-Lindenstrauss Theorem (linear embedding)|
-:For any <math>0<\epsilon<1/2</math> and any positive integer <math>n</math>, there is a positive integer <math>k=O(\epsilon^{-2}\log n)</math> such that the following holds:
-:For any set <math>S\subset\mathbf{R}^d</math> with <math>|S|=n</math>, where <math>d</math> is arbitrary, there is a linear transformation <math>A\in\mathbf{R}^{k\times d}</math> such that
-::<math>\forall x,y\in S,\quad (1-\epsilon)\|x-y\|^2\le\|Ax-Ay\|^2\le(1+\epsilon)\|x-y\|^2</math>.
-}}
-The theorem is proved by the probabilistic method. Specifically, we construct a random matrix <math>A\in\mathbf{R}^{k\times d}</math> and show that with high probability (<math>1-O(1/n)</math>) it is a good embedding satisfying:
+MG1928054		战杨志豪
-:<math>\forall x,y\in S,\quad (1-\epsilon)\|x-y\|\le\|Ax-Ay\|\le(1+\epsilon)\|x-y\|</math>.
-Therefore, if such random matrix <math>A</math> is efficient to construct, it immediately gives us an efficient randomized algorithm for dimension reduction in the Euclidian space.
-There are several such constructions of the random matrix <math>A\in\mathbf{R}^{k\times d}</math>, including:
+学号（本科生）		姓名
-* projection onto uniform random <math>k</math>-dimensional subspace of <math>\mathbf{R}^{d}</math>; (the original construction of Johnson and Lindenstrauss in 1984; a simplified analysis due to [https://cseweb.ucsd.edu/~dasgupta/papers/jl.pdf Dasgupta and Gupta in 1999])
-* random matrix with i.i.d. Gaussian entries; (due to [https://www.cs.princeton.edu/courses/archive/spr04/cos598B/bib/IndykM-curse.pdf Indyk and Motwani in 1998])
-* random matrix with i.i.d. -1/+1 entries; (due to [https://www.sciencedirect.com/science/article/pii/S0022000003000254?via%3Dihub#BIB8 Achlioptas in 2003])
-==JLT via Gaussian matrix==
+161200070		赵志展
-Here we prove the Johnson-Lindenstrauss theorem with the second construction, by the random matrix <math>A\in\mathbf{R}^{k\times d}</math> with i.i.d. Gaussian entries. The construction of <math>A</math> is very simple:
-:{|border="2" width="100%" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
-|
-*Each entry of <math>A\in\mathbf{R}^{k\times d}</math> is drawn independently from the Gaussian distribution <math>\mathcal{N}(0,1/k)</math>.
-|}
-Recall that a [https://en.wikipedia.org/wiki/Normal_distribution Gaussian distribution] <math>\mathcal{N}(\mu,\sigma^2)</math> is specified by its mean <math>\mu</math> and standard deviation <math>\sigma</math> such that for a random variable <math>X</math> distributed as <math>\mathcal{N}(\mu,\sigma^2)</math>, we have
-:<math>\mathbf{E}[X]=\mu</math> and <math>\mathbf{Var}[X]=\sigma^2</math>,
-and the probability density function is given by <math>p(x)=\frac{1}{\sqrt{2\pi\sigma^2}}\mathrm{e}^{-\frac{(x-\mu)^2}{2\sigma^2}}</math>, therefore
-:<math>\Pr[X\le t]=\int_{-\infty}^t\frac{1}{\sqrt{2\pi\sigma^2}}\mathrm{e}^{-\frac{(x-\mu)^2}{2\sigma^2}}\,\mathrm{d}x</math>.
-Fix any two points <math>x,y</math> out of the <math>n</math> points in <math>S\subset \mathbf{R}^d</math>, if we can show that
+161158085		张昱培
-:<math>\Pr\left[(1-\epsilon)\|x-y\|^2\le\|Ax-Ay\|^2\le(1+\epsilon)\|x-y\|^2\right]\ge 1-\frac{1}{n^3}</math>,
-then by the union bound, the following event
+161170043		王雅媛
-:<math>\forall x,y\in S,\quad (1-\epsilon)\|x-y\|^2\le\|Ax-Ay\|^2\le(1+\epsilon)\|x-y\|^2</math>
-holds with probability <math>1-O(1/n)</math>. The Johnson-Lindenstrauss theorem follows.
+161170054		游振宇
-Furthermore, dividing both sides of the inequalities <math>(1-\epsilon)\|x-y\|^2\le\|Ax-Ay\|^2\le(1+\epsilon)\|x-y\|^2</math> by the factor <math>\|x-y\|^2</math> gives us
+161158084		王译铭
-:<math>(1-\epsilon)\le\frac{\|Ax-Ay\|^2}{\|x-y\|^2}=\left\|A\frac{(x-y)}{\|x-y\|^2}\right\|^2\le(1+\epsilon)</math>,
-where <math>\frac{(x-y)}{\|x-y\|^2}</math> is a unit vector.
-Therefore, the Johnson-Lindenstrauss theorem is proved once the following theorem on the unit vector is proved.
+161158040		马梦楠
-{{Theorem
-|Theorem (JLT on unit vector)|
-:For any positive <math>\epsilon,\delta<1/2</math> there is a positive integer <math>k=O\left(\epsilon^{-2}\log \frac{1}{\delta}\right)</math> such that for the random matrix <math>A\in\mathbf{R}^{k\times d}</math> with each entry drawn independently from the Gaussian distribution <math>\mathcal{N}(0,1/k)</math>, for any unit vector <math>u\in\mathbf{R}^d</math> with <math>\|u\|=1</math>,
-::<math>\Pr\left[\left|\|Au\|^2-1\right|>\epsilon\right]<\delta</math>.
-}}
-For any <math>u\in\mathbf{R}^d</math>, we have <math>\|Au\|^2=\sum_{i=1}^k(Au)_i^2</math>, where the <math>(Au)_i</math>'s are independent of each other, because each
+161158029		栗卓
-:<math>(Au)_i=\langle A_{i\cdot},u\rangle=\sum_{j=1}^dA_{ij}u_j</math>
-is determined by a distinct row vector <math>A_{i\cdot}</math> of the matrix <math>A</math> with independent entries.
+161170026		刘世淦
-Moreover, since each entry <math>A_{ij}\sim\mathcal{N}(0,1/k)</math> independently and recall that for any two independent Gaussian random variables <math>X_1\sim \mathcal{N}(\mu_1,\sigma_1^2)</math> and <math>X_2\sim \mathcal{N}(\mu_2,\sigma_2^2)</math>, their weighted sum <math>aX_1+bX_2</math> follows the Gaussian distribution <math>\mathcal{N}(a\mu_1+b\mu_2,a^2\sigma_1^2+b^2\sigma_2^2)</math>, therefore <math>(Au)_i=\langle A_{i\cdot},u\rangle=\sum_{j=1}^dA_{ij}u_j</math> follows the Gaussian distribution
+学号（交换生）		姓名
-:<math>(Au)_i\sim\mathcal{N}\left(0,\sum_{j=1}^d\frac{u_j^2}{k}\right)</math>,
-which is precisely <math>\mathcal{N}\left(0,\frac{1}{k}\right)</math> if <math>u\in\mathbf{R}^d</math> is a unit vector, i.e. <math>\|u\|^2=\sum_{j=1}^du_j^2=1</math>.
+198354018		張沁全
-In summary, the random variable <math>\|Au\|^2</math> that we are interested in, can be represented as a sum-of-square form
-:<math>\|Au\|^2=\sum_{i=1}^kY_i^2</math>,
-where each <math>Y_i=(Au)_i</math> independently follows the Gaussian distribution <math>\mathcal{N}\left(0,\frac{1}{k}\right)</math>.
-==Concentration of <math>\chi^2</math>-distribution ==
-By the above argument, the Johnson-Lindenstrauss theorem is proved by the following concentration inequality for the sum of the squares of <math>k</math> Gaussian distributions.
-{{Theorem
-|Chernoff bound for sum-of-squares of Gaussian distributions|
-:For any positive <math>\epsilon,\delta<1/2</math> there is a positive integer <math>k=O\left(\epsilon^{-2}\log \frac{1}{\delta}\right)</math> such that for i.i.d. Gaussian random variables <math>Y_1,Y_2,\ldots,Y_k\sim\mathcal{N}(0,1/k)</math>,
-::<math>\Pr\left[\left|\sum_{i=1}^kY_i^2-1\right|>\epsilon\right]<\delta</math>.
-}}
-First, we see that this is indeed a concentration inequality that bounds the deviation of a random variable from its mean.
-For each <math>Y_i\sim\mathcal{N}\left(0,\frac{1}{k}\right)</math>, we know that <math>\mathbf{E}[Y_i]=0</math> and  <math>\mathbf{Var}[Y_i]=\mathbf{E}[Y_i^2]-\mathbf{E}[Y_i]^2=\frac{1}{k}</math>, thus
-:<math>\mathbf{E}[Y_i^2]=\mathbf{Var}[Y_i]+\mathbf{E}[Y_i]^2=\frac{1}{k}</math>.
-By linearity of expectation, it holds that
-:<math>\mathbf{E}\left[\sum_{i=1}^kY_i^2\right]=\sum_{i=1}^k\mathbf{E}\left[Y_i^2\right]=1</math>.
-We prove an equivalent concentration bound stated for the [https://en.wikipedia.org/wiki/Chi-squared_distribution '''<math>\chi^2</math>-distribution'''] (sum of the squares of <math>k</math> '''standard''' Gaussian distributions).
-{{Theorem
-|Chernoff bound for the <math>\chi^2</math>-distribution|
-:For i.i.d. standard Gaussian random variables <math>X_1,X_2,\ldots,X_k\sim\mathcal{N}(0,1)</math>,
-:*<math>\Pr\left[\sum_{i=1}^kX_i^2>(1+\epsilon)k\right]<\mathrm{e}^{-\epsilon^2k/8}</math>,
-:*<math>\Pr\left[\sum_{i=1}^kX_i^2<(1-\epsilon)k\right]<\mathrm{e}^{-\epsilon^2k/8}</math>.
-}}
-Note that this indeed implies the above concentration bound by setting <math>X_i=\sqrt{k}\cdot Y_i</math> and <math>\delta\ge\mathrm{e}^{-\epsilon^2k/8}</math>.
-{{Proof|
-We first prove the upper tail. Let <math>\lambda>0</math> be a parameter to be determined.
-:<math>
-\begin{align}
-\Pr\left[\sum_{i=1}^kX_i^2>(1+\epsilon)k\right]
-&=\Pr\left[\mathrm{e}^{\lambda \sum_{i=1}^kX_i^2}>\mathrm{e}^{(1+\epsilon)\lambda k}\right]\\
-&< \mathrm{e}^{-(1+\epsilon)\lambda k}\cdot \mathbf{E}\left[\mathrm{e}^{\lambda \sum_{i=1}^kX_i^2}\right] && \text{(Markov's inequality)}\\
-&= \mathrm{e}^{-(1+\epsilon)\lambda k}\cdot \prod_{i=1}^k \mathbf{E}\left[\mathrm{e}^{\lambda X_i^2}\right] && \text{(independence)}
-\end{align}
-</math>
-{{Theorem|Proposition|
-:For standard Gaussian <math>X\sim\mathcal{N}(0,1)</math>, <math>\mathbf{E}\left[\mathrm{e}^{\lambda X^2}\right]=\frac{1}{\sqrt{1-2\lambda}}</math>.
-}}
-{{Proof|
-:<math>
-\begin{align}
-\mathbf{E}\left[\mathrm{e}^{\lambda X^2}\right]
-&=
-\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}\mathrm{e}^{\lambda x^2}\mathrm{e}^{-x^2/2}\,\mathrm{d}x\\
-&=
-\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}\mathrm{e}^{-(1-2\lambda)x^2/2}\,\mathrm{d}x\\
-&=
-\frac{1}{\sqrt{1-2\lambda}}\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}\mathrm{e}^{-y^2/2}\,\mathrm{d}y &&(\text{set }y=\sqrt{1-2\lambda}x)\\
-&=\frac{1}{\sqrt{1-2\lambda}}.
-\end{align}
-</math>
-The last equation is because <math>\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}\mathrm{e}^{-y^2/2}\,\mathrm{d}y</math> is the total probability mass of a standard Gaussian distribution (which is obviously 1).
-}}
-Then continue the above calculation:
-:<math>
-\begin{align}
-\Pr\left[\sum_{i=1}^kX_i^2>(1+\epsilon)k\right]
-&<
-\mathrm{e}^{-(1+\epsilon)\lambda k}\cdot \left(\frac{1}{\sqrt{1-2\lambda}}\right)^k
-&=
-\mathrm{e}^{-\epsilon\lambda k}\cdot \left(\frac{\mathrm{e}^{-\lambda}}{\sqrt{1-2\lambda}}\right)^k
-\end{align}
-</math>
-}}
-= Nearest Neighbor Search (NNS)=
-= Locality-Sensitive Hashing (LSH)=

高级算法 (Fall 2019)/Dimension Reduction and Namelist Assignment4 2019: Difference between pages

Latest revision as of 00:42, 4 December 2019

Navigation menu

Search