Revision as of 08:31, 10 October 2017

Count Distinct Elements

An estimator by hashing

Flajolet-Martin algorithm

@@ Line 1: / Line 1: @@
-== Lovász Local Lemma==
+=Count Distinct Elements=
-Given a sequence of events <math>A_1,A_2,\ldots,A_n</math>, we use the '''dependency graph''' to describe the dependencies between these events.
-{{Theorem
+== An estimator by hashing ==
-|Definition|
-:Let <math>A_1,A_2,\ldots,A_n</math> be a sequence of events. A graph <math>D=(V,E)</math> on the set of vertices <math>V=\{1,2,\ldots,n\}</math> is called a '''dependency graph''' for the events <math>A_1,\ldots,A_n</math> if for each <math>i</math>, <math>1\le i\le n</math>, the event <math>A_i</math> is mutually independent of all the events <math>\{A_j\mid (i,j)\not\in E\}</math>.
-}}
-The notion of mutual independence between an event and a set of events is formally defined as follows.
+==Flajolet-Martin algorithm==
-{{Theorem|Definition|
-:An event <math>A</math> is said to be '''mutually independent''' of events <math>B_1,B_2,\ldots, B_k</math>, if for any disjoint <math>I^+,I^-\subseteq\{1,2,lots,k\}</math>, it holds that
-::<math>\Pr\left[A\mid \bigwedge_{i\in I^+}B_i\wedge \bigwedge_{i\in I^-}\overline{B_i}\right]=\Pr[A]</math>.
-}}
-;Example
+= Set  Membership=
-:Let <math>X_1,X_2,\ldots,X_m</math> be a set of ''mutually independent'' random variables. Each event <math>A_i</math> is a predicate defined on a number of variables among <math>X_1,X_2,\ldots,X_m</math>. Let <math>v(A_i)</math> be the unique smallest set of variables which determine <math>A_i</math>. The dependency graph <math>D=(V,E)</math> is defined by
-:::<math>(i,j)\in E</math> iff <math>v(A_i)\cap v(A_j)\neq \emptyset</math>.
-The following lemma, known as the Lovász local lemma, first proved by Erdős and Lovász in 1975, is an extremely powerful tool, as it supplies a way for dealing with rare events.
+== Perfect hashing==
-{{Theorem
+== Bloom filter ==
-|Lovász Local Lemma (symmetric case)|
-:Let <math>A_1,A_2,\ldots,A_n</math> be a set of events, and assume that the following hold:
-:#for all <math>1\le i\le n</math>, <math>\Pr[A_i]\le p</math>;
-:#the maximum degree of the dependency graph for the events <math>A_1,A_2,\ldots,A_n</math> is <math>d</math>, and
-:::<math>ep(d+1)\le 1</math>.
-:Then
-::<math>\Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]>0</math>.
-}}
-We will prove a general version of the local lemma, where the events <math>A_i</math> are not symmetric. This generalization is due to Spencer.
+= Frequency Estimation=
-{{Theorem
-|Lovász Local Lemma (general case)|
-:Let <math>D=(V,E)</math> be the dependency graph of events <math>A_1,A_2,\ldots,A_n</math>. Suppose there exist real numbers <math>x_1,x_2,\ldots, x_n</math> such that <math>0\le x_i<1</math> and for all <math>1\le i\le n</math>,
-::<math>\Pr[A_i]\le x_i\prod_{(i,j)\in E}(1-x_j)</math>.
-:Then
-::<math>\Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]\ge\prod_{i=1}^n(1-x_i)</math>.
-}}
-To see that the general LLL implies symmetric LLL, we set <math>x_i=\frac{1}{d+1}</math> for all <math>i=1,2,\ldots,n</math>. Then we have <math>\left(1-\frac{1}{d+1}\right)^d>\frac{1}{\mathrm{e}}</math>.
+== Count-min sketch==
-Assume the condition in the symmetric LLL:
-:#for all <math>1\le i\le n</math>, <math>\Pr[A_i]\le p</math>;
-:#<math>ep(d+1)\le 1</math>;
-then it is easy to verify that for all <math>1\le i\le n</math>,
-:<math>\Pr[A_i]\le p\le\frac{1}{e(d+1)}<\frac{1}{d+1}\left(1-\frac{1}{d+1}\right)^d\le x_i\prod_{(i,j)\in E}(1-x_j)</math>.
-Due to the general LLL, we have
-:<math>\Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]\ge\prod_{i=1}^n(1-x_i)=\left(1-\frac{1}{d+1}\right)^n>0</math>.
-This proves the symmetric LLL.
-Now we prove the general LLL by the original induction proof.
-{{Proof|
-First, apply the chain rule. We have
-:<math>\Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]=\prod_{i=1}^n\Pr\left[\overline{A_i}\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]=\prod_{i=1}^n\left(1-\Pr\left[{A_i}\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]\right)</math>.
-Next we prove by induction on <math>m</math> that for any set of <math>m</math> events <math>i_1,\ldots,i_m</math>,
-:<math>\Pr\left[A_{i_1}\mid \bigwedge_{j=2}^m\overline{A_{i_j}}\right]\le x_{i_1}</math>.
-The local lemma follows immediately by the above chain rule.
-For <math>m=1</math>, this is obvious because
-:<math>\Pr[A_{i_1}]\le x_{i_1}\prod_{(i_1,j)\in E}(1-x_j)\le x_{i_1}</math>.
-For general <math>m</math>, let <math>i_2,\ldots,i_k</math> be the set of vertices adjacent to  <math>i_1</math> in the dependency graph, i.e. event <math>A_{i_1}</math> is mutually independent of <math>A_{i_{k+1}},A_{i_{k+2}},\ldots, A_{i_{m}}</math>.
-By conditional probability, we have
-:<math>
-\Pr\left[A_{i_1}\mid \bigwedge_{j=2}^m\overline{A_{i_j}}\right]
-=\frac{\Pr\left[ A_i\wedge \bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]}
-{\Pr\left[\bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]}
-</math>.
-First, we bound the numerator. Due to that <math>A_{i_1}</math> is mutually independent of <math>A_{i_{k+1}},A_{i_{k+2}},\ldots, A_{i_{m}}</math>, we have
-:<math>
-\begin{align}
-\Pr\left[ A_{i_1}\wedge \bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]
-&\le\Pr\left[ A_{i_1}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]\\
-&=\Pr[A_{i_1}]\\
-&\le x_{i_1}\prod_{(i_1,j)\in E}(1-x_j).
-\end{align}
-</math>
-Next, we bound the denominator. Applying the chain rule, we have
-:<math>
-\Pr\left[\bigwedge_{j=2}^k\overline{A_{i_j}}\mid \bigwedge_{j=k+1}^m\overline{A_{i_j}}\right]
-=\prod_{j=2}^k\Pr\left[\overline{A_{i_j}}\mid \bigwedge_{\ell=j+1}^m\overline{A_{i_\ell}}\right]
-</math>
-which by the induction hypothesis, is at least
-:<math>
-\prod_{j=2}^k(1-x_{i_j})=\prod_{\{i_1,i_j\}\in E}(1-x_j)
-</math>
-where <math>E</math> is the set of edges in the dependency graph.
-Altogether, we prove the induction hypothesis
-:<math>
-\Pr\left[A_{i_1}\mid \bigwedge_{j=2}^m\overline{A_{i_j}}\right]
-\le\frac{x_{i_1}\prod_{(i_1,j)\in E}(1-x_j)}{\prod_{\{i_1,i_j\}\in E}(1-x_j)}\le x_{i_1}.
-</math>
-Due to the chain rule, it holds that
-:<math>
-\begin{align}
-\Pr\left[\bigwedge_{i=1}^n\overline{A_i}\right]
-&=\prod_{i=1}^n\Pr\left[\overline{A_i}\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]\\
-&=\prod_{i=1}^n\left(1-\Pr\left[A_i\mid \bigwedge_{j=1}^{i-1}\overline{A_{j}}\right]\right)\\
-&\ge\prod_{i=1}^n\left(1-x_i\right).
-\end{align}
-</math>
-}}

高级算法 (Fall 2016)/Nonconstructive Proof of Lovász Local Lemma and 高级算法 (Fall 2017)/Hashing and Sketching: Difference between pages

Revision as of 08:31, 10 October 2017

Contents

Count Distinct Elements

An estimator by hashing

Flajolet-Martin algorithm

Set Membership

Perfect hashing

Bloom filter

Frequency Estimation

Count-min sketch

Navigation menu

高级算法 (Fall 2016)/Nonconstructive Proof of Lovász Local Lemma and 高级算法 (Fall 2017)/Hashing and Sketching: Difference between pages

Revision as of 08:31, 10 October 2017

Count Distinct Elements

An estimator by hashing

Flajolet-Martin algorithm

Set Membership

Perfect hashing

Bloom filter

Frequency Estimation

Count-min sketch

Navigation menu

Search