随机算法 (Spring 2013)/Problem Set 4 and 高级算法 (Fall 2017)/Hashing and Sketching: Difference between pages

From TCS Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Etone
 
imported>Etone
 
Line 1: Line 1:
== Problem 1 ==
=Count Distinct Elements=
Consider the Markov chain of graph coloring
{{Theorem|Markov Chain for Graph Coloring|
:Start with a proper coloring of <math>G(V,E)</math>. At each step:
# Pick a vertex <math>v\in V</math> and a color <math>c\in[q]</math> uniformly at random.
# Change the color of <math>v</math> to <math>c</math> if the resulting coloring is proper; do nothing if otherwise.
}}


Show that the Markov chain is:
== An estimator by hashing ==
# aperiodic;
# irreducible if <math>q\ge \Delta+2</math>;
# with uniform stationary distribution.


== Problem 2 ==
==Flajolet-Martin algorithm==
Consider the following random walk on hypercube:
{{Theorem|Yet another random Walk on Hypercube|
: At each step, for the current state <math>x\in\{0,1\}^n</math>:
# pick an <math>i\in\{0,1,2,\ldots,n\}</math> uniformly at random;
# flip <math>x_i</math> (let <math>x_i=1-x_i</math>) if <math>i\neq 0</math>.
}}
* Show that the random walk is ergodic.
* Give the stationary distribution of the random walk.
* Analyze the mixing time of the random walk by coupling.


Hint.1: Construct a coupling rule such that the Hamming distance between two states never increases.
= Set  Membership=


Hint.2: When constructing the coupling, consider a cyclic permutation of disagreeing positions.
== Perfect hashing==


== Problem 3==
== Bloom filter ==
Consider the following random walk over all subsets <math>S\in{[n]\choose k}</math> for some <math>k\le \frac{n}{2}</math>:
{{Theorem|Random walk over <math>k</math>-subsets|
: At each step, for the current state <math>S\in{[n]\choose k}</math>:
# with probability <math>p</math>, do nothing;
# else pick an <math>x\in S</math> and a <math>y\in[n]\setminus S</math> independently and uniformly at random, and change the current set to be <math>S\setminus\{x\}\cup\{y\}</math>.
}}
You are allowed to choose a self-loop probability <math>p</math> for your convenience.
* Show that the random walk is ergodic
* Give the stationary distribution of the random walk.
* Prove that the mixing time is <math>O(n\log k)</math> by coupling.


Hint.1: Considering a coupling <math>(S,T)</math>, the <math>[n]</math> is partitioned into <math>S\cap T,S\setminus T,T\setminus S,\overline{S\cup T}</math>. Design a coupling rule to adopt different cases (of where <math>x</math> and <math>y</math> belong).
= Frequency Estimation=


Hint.2: Use a cyclic permutation of elements in <math>S\triangle T</math> which is the symmetric difference between <math>S</math> and <math>T</math>.
== Count-min sketch==
 
== Problem 4 ==
Let <math>G(V,E)</math> be a connected undirected simple graph (no self-loops and parallel edges) defined on <math>n</math> vertices. Let <math>\phi(G)</math> be the expansion ratio of <math>G</math>. <math>G</math> is NOT necessarily regular. For any <math>v\in V</math>, let <math>d_v</math> be the degree of vertex <math>v</math>.
* What is the largest possible value for <math>\phi(G)</math>? Construct a graph <math>G</math> with this expansion ratio and explain why it is the largest.
* What is the smallest possible value for <math>\phi(G)</math>? Construct a graph <math>G</math> with this expansion ratio and explain why it is the smallest.
* Run a lazy random walk on <math>G</math>. What is the stationary distribution? Starting from an arbitrary vertex in an arbitrary unknown <math>G</math>, how long in the worst case should you run the random walk to guarantee the distribution of the current position is within a total variation distance of <math>\epsilon</math> from the stationary distribution? Give an upper bound of the time in terms of <math>n</math> and <math>\epsilon</math>.
* Suppose that the maximum degree of <math>G</math> is known but the graph is not necessarily regular. Design a random walk with uniform stationary distribution. How long should you run the random walk to be within <math>\epsilon</math>-close to the uniform distribution in the worst case?

Revision as of 08:31, 10 October 2017

Count Distinct Elements

An estimator by hashing

Flajolet-Martin algorithm

Set Membership

Perfect hashing

Bloom filter

Frequency Estimation

Count-min sketch