Revision as of 08:31, 10 October 2017

Count Distinct Elements

An estimator by hashing

Flajolet-Martin algorithm

@@ Line 1: / Line 1: @@
-=Total Variation Distance=
+=Count Distinct Elements=
-In probability theory, the '''total variation distance''' measures the difference between two probability distributions.
-{{Theorem|Definition (total variation distance)|
-:Let <math>p</math> and <math>q</math> be two probability distributions over the same finite state space <math>\Omega</math>, the '''total variation distance''' between <math>p</math> and <math>q</math> is defined as
-::<math>\|p-q\|_{TV}=\frac{1}{2}\sum_{x\in\Omega}|p(x)-q(x)|=\frac{1}{2}\|p-q\|_1</math>,
-:where <math>\|\cdot\|_1</math> is the <math>\ell_1</math>-norm of vectors.
-}}
-It can be verified (left as an exercise) that
-:<math>\max_{A\subset\Omega}|p(A)-q(A)|=\frac{1}{2}\sum_{x\in\Omega}|p(x)-q(x)|</math>,
-thus the total variation distance can be equivalently defined as
-:<math>\|p-q\|_{TV}=\max_{A\subset\Omega}|p(A)-q(A)|</math>.
-So the total variation distance between two distributions gives an upper bound on the difference between the probabilities of the same event according to the two distributions.
-= Mixing Time =
+== An estimator by hashing ==
-The '''mixing time''' of a Markov chain gives the rate at which a Markov chain converges to the stationary distribution.
-{{Theorem
+==Flajolet-Martin algorithm==
-|Definition (mixing time)|
-:Let <math>\pi</math> be the stationary of the chain, and <math>p_x^{(t)}</math> be the distribution after <math>t</math> steps when the initial state is <math>x</math>.
-:* <math>
-\Delta_x(t)=\|p_x^{(t)}-\pi\|_{TV}
-</math> is the distance to stationary distribution <math>\pi</math> after <math>t</math> steps, started at state <math>x</math>.
-:* <math>\Delta(t)=\max_{x\in\Omega}\Delta_x(t)</math> is the maximum distance to stationary distribution <math>\pi</math> after <math>t</math> steps.
-:*<math>\tau_x(\epsilon)=\min\{t\mid\Delta_x(t)\le\epsilon\}</math> is the time until the total variation distance to the stationary distribution, started at the initial state <math>x</math>, reaches <math>\epsilon</math>.
-:* <math>\tau(\epsilon)=\max_{x\in\Omega}\tau_x(\epsilon)</math> is the time until the total variation distance to the stationary distribution, started at the worst possible initial state, reaches <math>\epsilon</math>.
-}}
-We note that <math>\Delta_x(t)</math> is monotonically non-increasing in <math>t</math>. So the definition of <math>\tau_x(\epsilon)</math> makes sense, and is actually the inverse of <math>\Delta_x(t)</math>.
-{{Theorem|Definition (mixing time)|
+= Set  Membership=
-:The '''mixing time''' <math>\tau_{\mathrm{mix}}</math> of a Markov chain is <math>\tau(1/2\mathrm{e})</math>.
-}}
-The mixing time is the time until the total variation distance to the stationary distribution, starting from the worst possible initial state <math>x\in\Omega</math>, reaches <math>\frac{1}{2\mathrm{e}}</math>. The value <math>\frac{1}{2\mathrm{e}}</math> is chosen just for the convenience of calculation. The next proposition says that <math>\tau(\epsilon)</math> with general <math>\epsilon</math> can be estimated from <math>\tau_{\mathrm{mix}}</math>.
-{{Theorem|Proposition|
+== Perfect hashing==
-# <math>\Delta(k\cdot\tau_{\mathrm{mix}})\le \mathrm{e}^{-k}</math> for any integer <math>k\ge1</math>.
-#<math>\tau(\epsilon)\le\tau_{\mathrm{mix}}\cdot\left\lceil\ln\frac{1}{\epsilon}\right\rceil</math>.
-}}
-So the distance to stationary distribution <math>\Delta(t)</math> decays exponentially in multiplications of <math>\tau_\mathrm{mix}</math>.
-Both the formal proofs of the monotonicity of <math>\Delta_x(t)</math> and the above proposition uses the coupling technique and is postponed to next section.
+== Bloom filter ==
+= Frequency Estimation=
+== Count-min sketch==

随机算法 (Fall 2011)/Mixing Time and 高级算法 (Fall 2017)/Hashing and Sketching: Difference between pages

Revision as of 08:31, 10 October 2017

Contents

Count Distinct Elements

An estimator by hashing

Flajolet-Martin algorithm

Set Membership

Perfect hashing

Bloom filter

Frequency Estimation

Count-min sketch

Navigation menu

随机算法 (Fall 2011)/Mixing Time and 高级算法 (Fall 2017)/Hashing and Sketching: Difference between pages

Revision as of 08:31, 10 October 2017

Count Distinct Elements

An estimator by hashing

Flajolet-Martin algorithm

Set Membership

Perfect hashing

Bloom filter

Frequency Estimation

Count-min sketch

Navigation menu

Search