imported>WikiSysop |
imported>Etone |
Line 1: |
Line 1: |
| =Total Variation Distance= | | =Count Distinct Elements= |
| In probability theory, the '''total variation distance''' measures the difference between two probability distributions.
| |
| {{Theorem|Definition (total variation distance)|
| |
| :Let <math>p</math> and <math>q</math> be two probability distributions over the same finite state space <math>\Omega</math>, the '''total variation distance''' between <math>p</math> and <math>q</math> is defined as
| |
| ::<math>\|p-q\|_{TV}=\frac{1}{2}\sum_{x\in\Omega}|p(x)-q(x)|=\frac{1}{2}\|p-q\|_1</math>,
| |
| :where <math>\|\cdot\|_1</math> is the <math>\ell_1</math>-norm of vectors.
| |
| }}
| |
| It can be verified (left as an exercise) that
| |
| :<math>\max_{A\subset\Omega}|p(A)-q(A)|=\frac{1}{2}\sum_{x\in\Omega}|p(x)-q(x)|</math>,
| |
| thus the total variation distance can be equivalently defined as
| |
| :<math>\|p-q\|_{TV}=\max_{A\subset\Omega}|p(A)-q(A)|</math>.
| |
| So the total variation distance between two distributions gives an upper bound on the difference between the probabilities of the same event according to the two distributions.
| |
|
| |
|
| = Mixing Time = | | == An estimator by hashing == |
| The '''mixing time''' of a Markov chain gives the rate at which a Markov chain converges to the stationary distribution.
| |
|
| |
|
| {{Theorem
| | ==Flajolet-Martin algorithm== |
| |Definition (mixing time)|
| |
| :Let <math>\pi</math> be the stationary of the chain, and <math>p_x^{(t)}</math> be the distribution after <math>t</math> steps when the initial state is <math>x</math>.
| |
| :* <math>
| |
| \Delta_x(t)=\|p_x^{(t)}-\pi\|_{TV}
| |
| </math> is the distance to stationary distribution <math>\pi</math> after <math>t</math> steps, started at state <math>x</math>.
| |
| :* <math>\Delta(t)=\max_{x\in\Omega}\Delta_x(t)</math> is the maximum distance to stationary distribution <math>\pi</math> after <math>t</math> steps.
| |
| :*<math>\tau_x(\epsilon)=\min\{t\mid\Delta_x(t)\le\epsilon\}</math> is the time until the total variation distance to the stationary distribution, started at the initial state <math>x</math>, reaches <math>\epsilon</math>.
| |
| :* <math>\tau(\epsilon)=\max_{x\in\Omega}\tau_x(\epsilon)</math> is the time until the total variation distance to the stationary distribution, started at the worst possible initial state, reaches <math>\epsilon</math>.
| |
| }}
| |
| We note that <math>\Delta_x(t)</math> is monotonically non-increasing in <math>t</math>. So the definition of <math>\tau_x(\epsilon)</math> makes sense, and is actually the inverse of <math>\Delta_x(t)</math>.
| |
|
| |
|
| {{Theorem|Definition (mixing time)|
| | = Set Membership= |
| :The '''mixing time''' <math>\tau_{\mathrm{mix}}</math> of a Markov chain is <math>\tau(1/2\mathrm{e})</math>.
| |
| }}
| |
| The mixing time is the time until the total variation distance to the stationary distribution, starting from the worst possible initial state <math>x\in\Omega</math>, reaches <math>\frac{1}{2\mathrm{e}}</math>. The value <math>\frac{1}{2\mathrm{e}}</math> is chosen just for the convenience of calculation. The next proposition says that <math>\tau(\epsilon)</math> with general <math>\epsilon</math> can be estimated from <math>\tau_{\mathrm{mix}}</math>.
| |
|
| |
|
| {{Theorem|Proposition|
| | == Perfect hashing== |
| # <math>\Delta(k\cdot\tau_{\mathrm{mix}})\le \mathrm{e}^{-k}</math> for any integer <math>k\ge1</math>.
| |
| #<math>\tau(\epsilon)\le\tau_{\mathrm{mix}}\cdot\left\lceil\ln\frac{1}{\epsilon}\right\rceil</math>.
| |
| }}
| |
| So the distance to stationary distribution <math>\Delta(t)</math> decays exponentially in multiplications of <math>\tau_\mathrm{mix}</math>.
| |
|
| |
|
| Both the formal proofs of the monotonicity of <math>\Delta_x(t)</math> and the above proposition uses the coupling technique and is postponed to next section.
| | == Bloom filter == |
| | |
| | = Frequency Estimation= |
| | |
| | == Count-min sketch== |