高级算法 (Fall 2019)/Problem Set 2: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>TCSseminar
(Created page with "*作业电子版于2019/10/8 23:59 之前提交到邮箱 <font color=blue>njuadvalg@163.com</font> *每道题目的解答都要有<font color="red" size=5>完整的解题过...")
 
imported>TCSseminar
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
*作业电子版于2019/10/8 23:59 之前提交到邮箱 <font color=blue>njuadvalg@163.com</font>
*作业电子版于2019/11/5 23:59 之前提交到邮箱 <font color=blue>njuadvalg@163.com</font>
*每道题目的解答都要有<font color="red" size=5>完整的解题过程</font>。中英文不限。
*每道题目的解答都要有<font color="red" size=5>完整的解题过程</font>。中英文不限。


== Problem 1 ==
== Problem 1==
Modify the Karger's Contraction algorithm so that it works for the ''weighted min-cut problem''. Prove that the modified algorithm returns a weighted minimum cut with probability at least <math>\frac{2}{n(n-1)}</math>.
Let <math>X</math> be a real-valued random variable with finite <math>\mathbb{E}[X]</math> and finite <math>\mathbb{E}\left[\mathrm{e}^{\lambda X}\right]</math> for all <math>\lambda\ge 0</math>. We define the '''log-moment-generating function''' as
The weighted min-cut problem is defined as follows.
:<math>\Psi_X(\lambda):=\ln\mathbb{E}[\mathrm{e}^{\lambda X}] \quad\text{ for all }\lambda\ge 0</math>,
and its ''dual function'':
:<math>\Psi_X^*(t):=\sup_{\lambda\ge 0}(\lambda t-\Psi_X(\lambda))</math>.
Assume that <math>X</math> is NOT almost surely constant. Then due to the convexity of <math>\mathrm{e}^{\lambda X}</math> with respect to <math>\lambda</math>, the function <math>\Psi_X(\lambda)</math> is ''strictly'' convex over <math>\lambda\ge 0</math>.
*Prove the following Chernoff bound:
::<math>\Pr[X\ge t]\le\exp(-\Psi_X^*(t))</math>.
:In particular if <math>\Psi_X(\lambda)</math> is continuously differentiable, prove that the supreme in <math>\Psi_X^*(t)</math> is achieved at the unique <math>\lambda\ge 0</math> satisfying
::<math>\Psi_X'(\lambda)=t</math>
:where <math>\Psi_X'(\lambda)</math> denotes the derivative of <math>\Psi_X(\lambda)</math> with respect to <math>\lambda</math>.
 
*'''Normal random variables.''' Let <math>X\sim \mathrm{N}(\mu,\sigma)</math> be a Gaussian random variable with mean <math>\mu</math> and standard deviation <math>\sigma</math>. What are the <math>\Psi_X(\lambda)</math> and <math>\Psi_X^*(t)</math>? And give a tail inequality to upper bound the probability <math>\Pr[X\ge t]</math>.
 
*'''Poisson random variables.''' Let <math>X\sim \mathrm{Pois}(\nu)</math> be a Poisson random variable with parameter <math>\nu</math>, that is, <math>\Pr[X=k]=\mathrm{e}^{-\nu}\nu^k/k!</math> for all <math>k=0,1,2,\ldots</math>. What are the <math>\Psi_X(\lambda)</math> and <math>\Psi_X^*(t)</math>? And give a tail inequality to upper bound the probability <math>\Pr[X\ge t]</math>.
 
*'''Bernoulli random variables.''' Let <math>X\in\{0,1\}</math> be a single Bernoulli trial with probability of success <math>p</math>, that is, <math>\Pr[X=1]=1-\Pr[X=0]=p</math>. Show that for any <math>t\in(p,1)</math>, we have <math>\Psi_X^*(t)=D(Y \| X)</math> where <math>Y\in\{0,1\}</math> is a Bernoulli random variable with parameter <math>t</math> and <math>D(Y \| X)=(1-t)\ln\frac{1-t}{1-p}+t\ln\frac{t}{p}</math> is the [https://en.wikipedia.org/wiki/Kullback–Leibler_divergence '''Kullback-Leibler divergence'''] between <math>Y</math> and <math>X</math>.
 
*'''Sum of independent random variables.''' Let <math>X=\sum_{i=1}^nX_i</math> be the sum of <math>n</math> independently and identically distributed random variables <math>X_1,X_2,\ldots, X_n</math>. Show that <math>\Psi_X(\lambda)=\sum_{i=1}^n\Psi_{X_i}(\lambda)</math> and <math>\Psi_X^*(t)=n\Psi^*_{X_i}(\frac{t}{n})</math>. Also for binomial random variable <math>X\sim \mathrm{Bin}(n,p)</math>, give an upper bound to the tail inequality <math>\Pr[X\ge t]</math> in terms of KL-divergence.  
 
:Give an upper bound to <math>\Pr[X\ge t]</math> when every <math>X_i</math> follows the geometric distribution with a probability <math>p</math> of success.


*'''Input''': an undirected weighted graph <math>G(V, E)</math>, where every edge <math>e \in E</math> is associated with a positive real weight <math>w_e</math>;
*'''Output''': a cut <math>C</math> in <math>G</math> such that <math>\sum_{e \in C} w_e</math> is minimized.


== Problem 2 ==
== Problem 2 ==
Let <math>G=(V,E)</math> be a graph, where <math>n = |V|</math> and <math>m = |E|</math>. In class, we generated a random cut <math>\{S,T\}</math> by sampling <math>X_v \in \{0,1\}</math> uniformly and independently for each <math>v \in V</math> and constructing <math>S = \{v \in V \mid X_v = 1 \}</math> and <math>T = \{v \in V \mid X_v = 0 \}</math>. Now, consider an alternative way to generate the random cut <math>\{S,T\}</math>. Suppose that <math>n</math> is an even number. Define a collection of subset as
An <math>n</math>-dimensional hypercube <math>Q_n</math> is a graph with <math>2^n</math> vertices, where each vertex is represented by an <math>n</math>-bit vector, and there is an edge between two vertices if and only if their bit-vectors differ in exactly one bit.
:<math>\mathcal{F} = \{H \subseteq V: |H| = n /2 \}</math>.
 
We sample a random subset <math>S \in \mathcal{F}</math> uniformly at random and construct <math>T = V \setminus S</math>.
Given an <math>n</math>-dimensional hypercube with some non-empty subset of vertices <math>S</math>, which is called marked black. Let <math>f(u)</math> denote the shortest distance from vertex <math>u</math> to any black vertex. Formally,
* Give the expected size <math>|E(S,T)|</math> of such random cut.
 
* Let <math>\mathcal{R}(\cdot)</math> denote such a random source that given any <math>0\leq p\leq 1</math>, <math>\mathcal{R}(p)</math> returns an independent random sample <math>X \in \{0,1\}</math> such that <math>\Pr[X= 1] = p</math>. Give an algorithm that uses <math>\mathcal{R}(\cdot)</math> as a subroutine to generate random subset <math>S \in \mathcal{F}</math> uniformly at random. Prove the correctness of your algorithm. Analyze the number of times that the random sources is called by the algorithm.
<math>f(u) = \min_{v \in S}\mathrm{dist}_{Q_n}(u,v),</math>
 
where <math>\mathrm{dist}_{Q_n}(u,v)</math> denotes the length of the shortest path between <math>u</math> and <math>v</math> in graph <math>Q_n</math> .
 
Prove that if we choose <math>u</math> from all <math>2^n</math> vertices uniformly at random, then, with high probability, <math>f(u)</math> can not deviate from its expectation too much:
 
<math>\mathrm{Pr}[|f(u) - \mathbb{E}[f(u)]| \geq t\sqrt{n \log n}] \leq n^{-c}.</math>
 
Give the relation between <math>c</math> and <math>t</math>.


== Problem 3 ==
== Problem 3 ==
Two ''rooted'' trees <math>T_1</math> and <math>T_2</math> are said to be '''isomorphic''' if there exists a bijection <math>\phi</math> that maps vertices of <math>T_1</math> to those of <math>T_2</math> satisfying the following condition: for each ''internal'' vertex <math>v</math> of <math>T_1</math> with children <math>u_1, u_2, ..., u_k</math>, the set of children of vertex <math>\phi(v)</math> in <math>T_2</math> is precisely <math>\{\phi(u_1), \phi(u_2),...,\phi(u_k)\}</math>, no ordering among children assumed.
Let <math>Y_1,Y_2,Y_3,\ldots,Y_n</math> be a set of <math>n</math> random variables where each <math>Y_i \in \{0,1\}</math>. All variables <math>Y_i</math> follow some joint distribution over <math>\{0,1\}^n</math> and they may NOT be mutually independent. Assume the following property holds for <math>(Y_i)_{1\leq i \leq n}</math>. For any <math>1\leq i \leq n</math> and it holds that
 
<math>\forall c_1\in \{0,1\},c_2\in \{0,1\},\ldots,c_{i-1}\in \{0,1\} , \quad \Pr[Y_i = 1 \mid \forall 1\leq j<i,  Y_j = c_j  ] \leq p.</math>
 
Let <math>X_1,X_2,X_3,\ldots,X_n</math> be a set of <math>n</math> mutually independent random variables where each <math>X_i \in \{0,1\}</math>. Assume
 
<math>\forall 1\leq i \leq n:\quad \Pr[X_i = 1] = p.</math>
 
Prove that for any <math>a \geq 0</math>, it holds that
 
<math>\Pr\left[\sum_{i= 1}^n Y_i \geq a\right] \leq \Pr\left[\sum_{i= 1}^n X_i \geq a\right].</math>
 
Prove that for any <math>t \geq 0</math>, it holds that
 
<math>\Pr\left[\sum_{i= 1}^n Y_i \geq np + t\right] \leq \exp\left( -\frac{2t^2}{n}\right).</math>
 
 
<strong>Hint</strong>: Although random variables <math>Y_1,Y_2,Y_3,\ldots,Y_n</math> may not be mutually independent, we can still bound the tail probability for <math>\sum_{i=1}^nY_i</math>. This tool is called the stochastic dominance.
 
To prove the first inequality, you only need to construct a coupling <math>\mathcal{C}</math> between <math>(X_i)_{1\leq i \leq n}</math>  and <math> (Y_i)_{ 1\leq i\leq n }</math> such that
 
<math> \Pr_{\mathcal{C}}[\,\forall 1\leq i \leq n, Y_i \leq X_i\,] = 1.</math>


Given an efficient randomized algorithm with bounded one-side error (false positive), for testing isomorphism between rooted trees with <math>n</math> vertices. Analyze your algorithm.
This implies the random sequence  <math>(Y_i)_{1\leq i \leq n}</math> is stochastically dominated by the random sequence  <math>(X_i)_{1\leq i \leq n}</math>.


== Problem 4 ==
== Problem 4 ==
Design a randomized algorithm to decide if an integer sequence <math>a_1,...,a_n</math> is a permutation of another integer sequence <math>b_1,...,b_n</math>. Give upper bounds on the time complexity and the error probability.
Let <math>U</math> be a universal set.
We use <math>2^U</math> to denote the collection of all subsets of <math>U</math>.
Let <math>\mathcal{F}</math> be a family of hash functions, in which each hash function <math>h:2^U \rightarrow \{0,1\}^m</math> maps subsets of <math>U</math> to 0-1 vectors of length <math>m</math>.
A locality sensitive hashing scheme is a distribution on a family <math>\mathcal{F}</math> of hash functions operating on <math>2^U</math>, such that for two subsets <math>A,B \in 2^U</math>,
 
<math>(1) \qquad \Pr_{h\in\mathcal{F}}[h(A)=h(B)]=sim(A,B).</math>
 
Here <math>sim:2^U \times 2^U \rightarrow [0,1] </math>  is called the similarity function. Given a hash function family <math> \mathcal{F} </math> that satisfies Equation (1), we will say that <math>\mathcal{F}</math> is a locality sensitive hash function family corresponding to similarity function <math>sim(\cdot,\cdot)</math>.
 
*For any similarity function <math> sim(A,B)</math>  that admits a locality sensitive hash function family as defined in Equation (1), prove that the distance function <math> d(A,B) \triangleq 1-sim(A,B)</math> satisfies triangle inequality, formally,
 
<math> \forall A,B,C \in 2^U :\quad d(A,B) + d(B,C) \geq d(A,C).</math>
 
*Show that there is no locality sensitive hash function family corresponding to Dice's and the Overlap coefficient. Dice's coefficient is defined as:
 
<math>sim_{Dice}(A,B)=\frac{2|A\cap B|}{|A| + |B|}.</math>


== Problem 5 ==
Overlap coefficient is defined as:
Let <math>X_1,X_2,\ldots,X_n</math> be <math>n</math> random variables, where each <math>X_i \in \{0, 1\}</math> follows the distribution <math>\mu_i</math>. For each <math>1\leq i \leq n</math>, let <math>\rho_i = \mathbb{E}[X_i]</math> and assume <math>\rho_i \geq \frac{1}{2}</math>. Consider the problem of estimating the value of
:<math>Z = \prod_{i = 1}^n \rho_i</math>.
For each <math>1\leq  i \leq n</math>, the algorithm draws <math>s</math> random samples <math>X_i^{(1)},X_i^{(2)},\ldots,X_i^{(s)}</math> independently from the distribution <math>\mu_i</math>, and computes
:<math>\widehat{\rho}_{i}=\frac{1}{s}\sum_{j=1}^s X_i^{(j)}</math>.
Finally, the algorithm outputs the product of all <math>\widehat{Z}_{i}</math>:
:<math>\widehat{Z}=\prod_{i= 1}^n\widehat{\rho}_i</math>.
Express <math>s</math> as a function of <math>n,\varepsilon,\delta</math> so that the output <math>\widehat{Z}</math> satisfies
:<math>\Pr\left[\mathrm{e}^{-\varepsilon}Z \leq \widehat{Z} \leq \mathrm{e}^{\varepsilon}Z\right] \geq 1- \delta</math>.
Try to make <math>s</math> as small as possible.


== Problem 6 ==
<math>sim_{Ovl}(A,B) = \frac{|A\cap B|}{\min(|A|, |B|)}.</math>
In Balls-and-Bins model, we throw <math>m</math> balls independently and uniformly at random into <math>n</math> bins. We know that the maximum load is <math>\Theta\left(\frac{\log n}{\log\log n}\right)</math> with high probability when <math>m=\Theta(n)</math>.
The two-choice paradigm is another way to throw <math>m</math> balls into <math>n</math> bins: each ball is thrown into the least loaded of two bins chosen independently and uniformly at random(it could be the case that the two chosen bins are exactly the same, and then the ball will be thrown into that bin), and breaks the tie arbitrarily. When <math>m=\Theta(n)</math>, the maximum load of two-choice paradigm is known to be <math>\Theta(\log\log n)</math> with high probability, which is exponentially less than the maxim load when there is only one random choice. This phenomenon is called '''''the power of two choices'''''.


Here are the questions:
<font color = "blue">Hint: use the triangle inequality result. </font>
*Consider the following paradigm: we throw <math>n</math> balls into <math>n</math> bins. The first <math>\frac{n}{2}</math> balls are thrown into bins independently and uniformly at random. The remaining <math>\frac{n}{2}</math> balls are thrown into bins using the two-choice paradigm. What is the maximum load with high probability? You need to give an asymptotically tight bound (in the form of <math>\Theta(\cdot)</math>).


*Replace the above paradigm to the following: the first <math>\frac{n}{2}</math> balls are thrown into bins using the  two-choice paradigm while the remaining <math>\frac{n}{2}</math> balls are thrown into bins independently and uniformly at random.  What is the maximum load with high probability in this case? You need to give an asymptotically tight bound.
* Construct a collection of hash function <math>\mathcal{B}</math> where  <math>f : \{0,1\}^m \rightarrow \{0,1\}</math> for each <math>f \in \mathcal{B}</math>, together with a probability distribution on <math>\mathcal{B}</math> such that
 
<math>\forall x, y \in \{0,1\}^m:\quad
  \Pr_{f \in \mathcal{B}}[f(x) = f(y)] = \begin{cases}
1 &\text{ if } x= y;\\
\frac{1}{2} &\text{ if } x \neq y.
\end{cases}
</math>


*Replace the above paradigm to the following: assume all <math>n</math> balls are thrown in a sequence. For every <math>1\le i\le n</math>, if <math>i</math> is odd, we throw <math>i</math>-th ball into bins independently and uniformly at random, otherwise, we throw it into bins using the two-choice paradigm. What is the maximum load with high probability in this case? You need to give an asymptotically tight bound.
Then use the hash function family <math>\mathcal{B}</math> to prove the following result.
Given a locality sensitive hash function family <math>\mathcal{F}</math> (<math>h:2^U \rightarrow \{0,1\}^m</math> for each <math>h \in \mathcal{F}</math>) corresponding to a similarity function <math>sim(A,B)</math>, we can obtain a locality sensitive hash function <math>\mathcal{F}'</math> (<math>h':2^U \rightarrow \{0,1\}</math> for each <math>h' \in \mathcal{F}'</math>) corresponding to the similarity function <math>\frac{1+sim(A,B)}{2}</math>.

Latest revision as of 13:40, 22 October 2019

  • 作业电子版于2019/11/5 23:59 之前提交到邮箱 njuadvalg@163.com
  • 每道题目的解答都要有完整的解题过程。中英文不限。

Problem 1

Let [math]\displaystyle{ X }[/math] be a real-valued random variable with finite [math]\displaystyle{ \mathbb{E}[X] }[/math] and finite [math]\displaystyle{ \mathbb{E}\left[\mathrm{e}^{\lambda X}\right] }[/math] for all [math]\displaystyle{ \lambda\ge 0 }[/math]. We define the log-moment-generating function as

[math]\displaystyle{ \Psi_X(\lambda):=\ln\mathbb{E}[\mathrm{e}^{\lambda X}] \quad\text{ for all }\lambda\ge 0 }[/math],

and its dual function:

[math]\displaystyle{ \Psi_X^*(t):=\sup_{\lambda\ge 0}(\lambda t-\Psi_X(\lambda)) }[/math].

Assume that [math]\displaystyle{ X }[/math] is NOT almost surely constant. Then due to the convexity of [math]\displaystyle{ \mathrm{e}^{\lambda X} }[/math] with respect to [math]\displaystyle{ \lambda }[/math], the function [math]\displaystyle{ \Psi_X(\lambda) }[/math] is strictly convex over [math]\displaystyle{ \lambda\ge 0 }[/math].

  • Prove the following Chernoff bound:
[math]\displaystyle{ \Pr[X\ge t]\le\exp(-\Psi_X^*(t)) }[/math].
In particular if [math]\displaystyle{ \Psi_X(\lambda) }[/math] is continuously differentiable, prove that the supreme in [math]\displaystyle{ \Psi_X^*(t) }[/math] is achieved at the unique [math]\displaystyle{ \lambda\ge 0 }[/math] satisfying
[math]\displaystyle{ \Psi_X'(\lambda)=t }[/math]
where [math]\displaystyle{ \Psi_X'(\lambda) }[/math] denotes the derivative of [math]\displaystyle{ \Psi_X(\lambda) }[/math] with respect to [math]\displaystyle{ \lambda }[/math].
  • Normal random variables. Let [math]\displaystyle{ X\sim \mathrm{N}(\mu,\sigma) }[/math] be a Gaussian random variable with mean [math]\displaystyle{ \mu }[/math] and standard deviation [math]\displaystyle{ \sigma }[/math]. What are the [math]\displaystyle{ \Psi_X(\lambda) }[/math] and [math]\displaystyle{ \Psi_X^*(t) }[/math]? And give a tail inequality to upper bound the probability [math]\displaystyle{ \Pr[X\ge t] }[/math].
  • Poisson random variables. Let [math]\displaystyle{ X\sim \mathrm{Pois}(\nu) }[/math] be a Poisson random variable with parameter [math]\displaystyle{ \nu }[/math], that is, [math]\displaystyle{ \Pr[X=k]=\mathrm{e}^{-\nu}\nu^k/k! }[/math] for all [math]\displaystyle{ k=0,1,2,\ldots }[/math]. What are the [math]\displaystyle{ \Psi_X(\lambda) }[/math] and [math]\displaystyle{ \Psi_X^*(t) }[/math]? And give a tail inequality to upper bound the probability [math]\displaystyle{ \Pr[X\ge t] }[/math].
  • Bernoulli random variables. Let [math]\displaystyle{ X\in\{0,1\} }[/math] be a single Bernoulli trial with probability of success [math]\displaystyle{ p }[/math], that is, [math]\displaystyle{ \Pr[X=1]=1-\Pr[X=0]=p }[/math]. Show that for any [math]\displaystyle{ t\in(p,1) }[/math], we have [math]\displaystyle{ \Psi_X^*(t)=D(Y \| X) }[/math] where [math]\displaystyle{ Y\in\{0,1\} }[/math] is a Bernoulli random variable with parameter [math]\displaystyle{ t }[/math] and [math]\displaystyle{ D(Y \| X)=(1-t)\ln\frac{1-t}{1-p}+t\ln\frac{t}{p} }[/math] is the Kullback-Leibler divergence between [math]\displaystyle{ Y }[/math] and [math]\displaystyle{ X }[/math].
  • Sum of independent random variables. Let [math]\displaystyle{ X=\sum_{i=1}^nX_i }[/math] be the sum of [math]\displaystyle{ n }[/math] independently and identically distributed random variables [math]\displaystyle{ X_1,X_2,\ldots, X_n }[/math]. Show that [math]\displaystyle{ \Psi_X(\lambda)=\sum_{i=1}^n\Psi_{X_i}(\lambda) }[/math] and [math]\displaystyle{ \Psi_X^*(t)=n\Psi^*_{X_i}(\frac{t}{n}) }[/math]. Also for binomial random variable [math]\displaystyle{ X\sim \mathrm{Bin}(n,p) }[/math], give an upper bound to the tail inequality [math]\displaystyle{ \Pr[X\ge t] }[/math] in terms of KL-divergence.
Give an upper bound to [math]\displaystyle{ \Pr[X\ge t] }[/math] when every [math]\displaystyle{ X_i }[/math] follows the geometric distribution with a probability [math]\displaystyle{ p }[/math] of success.


Problem 2

An [math]\displaystyle{ n }[/math]-dimensional hypercube [math]\displaystyle{ Q_n }[/math] is a graph with [math]\displaystyle{ 2^n }[/math] vertices, where each vertex is represented by an [math]\displaystyle{ n }[/math]-bit vector, and there is an edge between two vertices if and only if their bit-vectors differ in exactly one bit.

Given an [math]\displaystyle{ n }[/math]-dimensional hypercube with some non-empty subset of vertices [math]\displaystyle{ S }[/math], which is called marked black. Let [math]\displaystyle{ f(u) }[/math] denote the shortest distance from vertex [math]\displaystyle{ u }[/math] to any black vertex. Formally,

[math]\displaystyle{ f(u) = \min_{v \in S}\mathrm{dist}_{Q_n}(u,v), }[/math]

where [math]\displaystyle{ \mathrm{dist}_{Q_n}(u,v) }[/math] denotes the length of the shortest path between [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math] in graph [math]\displaystyle{ Q_n }[/math] .

Prove that if we choose [math]\displaystyle{ u }[/math] from all [math]\displaystyle{ 2^n }[/math] vertices uniformly at random, then, with high probability, [math]\displaystyle{ f(u) }[/math] can not deviate from its expectation too much:

[math]\displaystyle{ \mathrm{Pr}[|f(u) - \mathbb{E}[f(u)]| \geq t\sqrt{n \log n}] \leq n^{-c}. }[/math]

Give the relation between [math]\displaystyle{ c }[/math] and [math]\displaystyle{ t }[/math].

Problem 3

Let [math]\displaystyle{ Y_1,Y_2,Y_3,\ldots,Y_n }[/math] be a set of [math]\displaystyle{ n }[/math] random variables where each [math]\displaystyle{ Y_i \in \{0,1\} }[/math]. All variables [math]\displaystyle{ Y_i }[/math] follow some joint distribution over [math]\displaystyle{ \{0,1\}^n }[/math] and they may NOT be mutually independent. Assume the following property holds for [math]\displaystyle{ (Y_i)_{1\leq i \leq n} }[/math]. For any [math]\displaystyle{ 1\leq i \leq n }[/math] and it holds that

[math]\displaystyle{ \forall c_1\in \{0,1\},c_2\in \{0,1\},\ldots,c_{i-1}\in \{0,1\} , \quad \Pr[Y_i = 1 \mid \forall 1\leq j\lt i, Y_j = c_j ] \leq p. }[/math]

Let [math]\displaystyle{ X_1,X_2,X_3,\ldots,X_n }[/math] be a set of [math]\displaystyle{ n }[/math] mutually independent random variables where each [math]\displaystyle{ X_i \in \{0,1\} }[/math]. Assume

[math]\displaystyle{ \forall 1\leq i \leq n:\quad \Pr[X_i = 1] = p. }[/math]

Prove that for any [math]\displaystyle{ a \geq 0 }[/math], it holds that

[math]\displaystyle{ \Pr\left[\sum_{i= 1}^n Y_i \geq a\right] \leq \Pr\left[\sum_{i= 1}^n X_i \geq a\right]. }[/math]

Prove that for any [math]\displaystyle{ t \geq 0 }[/math], it holds that

[math]\displaystyle{ \Pr\left[\sum_{i= 1}^n Y_i \geq np + t\right] \leq \exp\left( -\frac{2t^2}{n}\right). }[/math]


Hint: Although random variables [math]\displaystyle{ Y_1,Y_2,Y_3,\ldots,Y_n }[/math] may not be mutually independent, we can still bound the tail probability for [math]\displaystyle{ \sum_{i=1}^nY_i }[/math]. This tool is called the stochastic dominance.

To prove the first inequality, you only need to construct a coupling [math]\displaystyle{ \mathcal{C} }[/math] between [math]\displaystyle{ (X_i)_{1\leq i \leq n} }[/math] and [math]\displaystyle{ (Y_i)_{ 1\leq i\leq n } }[/math] such that

[math]\displaystyle{ \Pr_{\mathcal{C}}[\,\forall 1\leq i \leq n, Y_i \leq X_i\,] = 1. }[/math]

This implies the random sequence [math]\displaystyle{ (Y_i)_{1\leq i \leq n} }[/math] is stochastically dominated by the random sequence [math]\displaystyle{ (X_i)_{1\leq i \leq n} }[/math].

Problem 4

Let [math]\displaystyle{ U }[/math] be a universal set. We use [math]\displaystyle{ 2^U }[/math] to denote the collection of all subsets of [math]\displaystyle{ U }[/math]. Let [math]\displaystyle{ \mathcal{F} }[/math] be a family of hash functions, in which each hash function [math]\displaystyle{ h:2^U \rightarrow \{0,1\}^m }[/math] maps subsets of [math]\displaystyle{ U }[/math] to 0-1 vectors of length [math]\displaystyle{ m }[/math]. A locality sensitive hashing scheme is a distribution on a family [math]\displaystyle{ \mathcal{F} }[/math] of hash functions operating on [math]\displaystyle{ 2^U }[/math], such that for two subsets [math]\displaystyle{ A,B \in 2^U }[/math],

[math]\displaystyle{ (1) \qquad \Pr_{h\in\mathcal{F}}[h(A)=h(B)]=sim(A,B). }[/math]

Here [math]\displaystyle{ sim:2^U \times 2^U \rightarrow [0,1] }[/math] is called the similarity function. Given a hash function family [math]\displaystyle{ \mathcal{F} }[/math] that satisfies Equation (1), we will say that [math]\displaystyle{ \mathcal{F} }[/math] is a locality sensitive hash function family corresponding to similarity function [math]\displaystyle{ sim(\cdot,\cdot) }[/math].

  • For any similarity function [math]\displaystyle{ sim(A,B) }[/math] that admits a locality sensitive hash function family as defined in Equation (1), prove that the distance function [math]\displaystyle{ d(A,B) \triangleq 1-sim(A,B) }[/math] satisfies triangle inequality, formally,

[math]\displaystyle{ \forall A,B,C \in 2^U :\quad d(A,B) + d(B,C) \geq d(A,C). }[/math]

  • Show that there is no locality sensitive hash function family corresponding to Dice's and the Overlap coefficient. Dice's coefficient is defined as:

[math]\displaystyle{ sim_{Dice}(A,B)=\frac{2|A\cap B|}{|A| + |B|}. }[/math]

Overlap coefficient is defined as:

[math]\displaystyle{ sim_{Ovl}(A,B) = \frac{|A\cap B|}{\min(|A|, |B|)}. }[/math]

Hint: use the triangle inequality result.

  • Construct a collection of hash function [math]\displaystyle{ \mathcal{B} }[/math] where [math]\displaystyle{ f : \{0,1\}^m \rightarrow \{0,1\} }[/math] for each [math]\displaystyle{ f \in \mathcal{B} }[/math], together with a probability distribution on [math]\displaystyle{ \mathcal{B} }[/math] such that

[math]\displaystyle{ \forall x, y \in \{0,1\}^m:\quad \Pr_{f \in \mathcal{B}}[f(x) = f(y)] = \begin{cases} 1 &\text{ if } x= y;\\ \frac{1}{2} &\text{ if } x \neq y. \end{cases} }[/math]

Then use the hash function family [math]\displaystyle{ \mathcal{B} }[/math] to prove the following result. Given a locality sensitive hash function family [math]\displaystyle{ \mathcal{F} }[/math] ([math]\displaystyle{ h:2^U \rightarrow \{0,1\}^m }[/math] for each [math]\displaystyle{ h \in \mathcal{F} }[/math]) corresponding to a similarity function [math]\displaystyle{ sim(A,B) }[/math], we can obtain a locality sensitive hash function [math]\displaystyle{ \mathcal{F}' }[/math] ([math]\displaystyle{ h':2^U \rightarrow \{0,1\} }[/math] for each [math]\displaystyle{ h' \in \mathcal{F}' }[/math]) corresponding to the similarity function [math]\displaystyle{ \frac{1+sim(A,B)}{2} }[/math].