# 高级算法 (Fall 2018)/Problem Set 3

## Problem 1

In this problem, we will explore the idea of negative association, show that the classical Chernoff bounds also hold for sum of negatively associated random variables, and see negative association in action by considering occupancy numbers in the balls and bins model. Let ${\displaystyle {\boldsymbol {X}}=(X_{1},\cdots ,X_{n})}$ be a vector of random variables. We say random variables ${\displaystyle {\boldsymbol {X}}}$ are negatively associated if for all disjoint subsets ${\displaystyle I,J\subseteq [n]}$,

${\displaystyle \mathbb {E} [f(X_{i},i\in I)g(X_{j},j\in J)]\leq \mathbb {E} [f(X_{i},i\in I)]\mathbb {E} [g(X_{j},j\in J)]}$

for all non-decreasing function ${\displaystyle f:\mathbb {R} ^{|I|}\rightarrow \mathbb {R} }$ and ${\displaystyle g:\mathbb {R} ^{|J|}\rightarrow \mathbb {R} }$.

Intuitively, if a set of random variables is negatively related, then if any monotone increasing function ${\displaystyle f}$ of one subset of variables increases then any other monotone increasing function ${\displaystyle g}$ of a disjoint set of variables must decrease.

(a) Let ${\displaystyle X_{1},\cdots ,X_{n}}$ be a set of negatively associated random variables, show that for any non-negative non-decreasing function ${\displaystyle f_{i}}$ where ${\displaystyle i\in [n]}$,

${\displaystyle \mathbb {E} \left[\prod _{i\in [n]}f_{i}(X_{i})\right]\leq \prod _{i\in [n]}\mathbb {E} [f_{i}(X_{i})]}$

(b) Show that the classical Chernoff bounds can be applied as is to ${\displaystyle X=\sum _{i\in [n]}X_{i}}$ if the random variables ${\displaystyle X_{1},\cdots ,X_{n}}$ are negatively associated. (Consider both the upper tail and the lower tail.)

To establish the negative association condition, the following two properties are usually very helpful:

• (Closure under products).If ${\displaystyle {\boldsymbol {X}}=(X_{1},\cdots ,X_{n})}$ is a set of negatively associated random variables, and ${\displaystyle {\boldsymbol {Y}}=(Y_{1},\cdots ,Y_{m})}$ is also a set of negatively associated random variables, but ${\displaystyle {\boldsymbol {X}}}$ and ${\displaystyle {\boldsymbol {Y}}}$ are mutually independent, then the augmented vector ${\displaystyle ({\boldsymbol {X}},{\boldsymbol {Y}})=(X_{1},\cdots ,X_{n},Y_{1},\cdots ,Y_{m})}$ is a set of negatively associated random variables.
• (Disjoint monotone aggregation). Let ${\displaystyle {\boldsymbol {X}}=(X_{1},\cdots ,X_{n})}$ be a set of negatively associated random variables. Let ${\displaystyle I_{1},\cdots ,I_{k}\subseteq [n]}$ be disjoint index sets for some positive integer ${\displaystyle k}$. For ${\displaystyle j\in [k]}$, let ${\displaystyle f_{j}:\mathbb {R} ^{|I_{j}|}\rightarrow \mathbb {R} }$ be functions that are all non–decreasing or all non–increasing, and define ${\displaystyle Y_{j}=f_{j}(X_{i},i\in I_{j})}$. Then, ${\displaystyle {\boldsymbol {Y}}=(Y_{1},\cdots ,Y_{k})}$ is also a set of negatively associated random variables. (That is, non–decreasing or non–increasing functions of disjoint subsets of negatively associated variables are also negatively associated.)

We now consider the paradigmatic example of negative dependence: occupancy numbers in the balls and bins model. Again, ${\displaystyle m}$ balls are thrown independently into ${\displaystyle n}$ bins. However, this time, the balls and the bins are not necessarily identical: ball ${\displaystyle k}$ has probability ${\displaystyle p_{i,k}}$ of landing in bin ${\displaystyle i}$, for ${\displaystyle k\in [m]}$ and ${\displaystyle i\in [n]}$, with ${\displaystyle \sum _{i\in [n]}p_{i,k}=1}$ for each ${\displaystyle k\in [m]}$. Define indicator random variable ${\displaystyle B_{i,k}}$ taking value one if and only if ball ${\displaystyle k}$ lands in bin ${\displaystyle i}$. The occupancy numbers are ${\displaystyle B_{i}=\sum _{k\in [m]}B_{i,k}}$. That is, ${\displaystyle B_{i}}$ denote the number of balls that land in bin ${\displaystyle i}$.

(c) Intuitively, ${\displaystyle B_{1},\cdots ,B_{n}}$ are negatively associated: if we know one bin has more balls, then clearly other bins are more likely to have less balls. Now, show that the occupancy numbers ${\displaystyle B_{1},\cdots ,B_{n}}$ are negatively associated, formally.

## Problem 2

In the edge-coloring problem, we are given a graph ${\displaystyle G}$ and are asked to color the edges of the graph in such a way that adjacent edges receive different colors, using as few colors as possible. The minimum number of colors is usually called the chromatic index and denoted as ${\displaystyle \chi '(G)}$. It can been shown that if ${\displaystyle G}$ is a ${\displaystyle \Delta }$-regular bipartite graph, then ${\displaystyle \chi '(G)=\Delta }$.

In this problem, we study a simple and distributed procedure to compute an edge-coloring for a ${\displaystyle \Delta }$-regular bipartite graph ${\displaystyle G}$. Specifically, each edge ${\displaystyle e}$ starts with a list ${\displaystyle L_{e}=[\Delta ]}$ containing ${\displaystyle \Delta }$ colors. In each step, for each edge ${\displaystyle e}$ which has not decided its color yet, the edge ${\displaystyle e}$ picks a tentative color ${\displaystyle t_{e}\in L_{e}}$ independently and uniformly at random. If ${\displaystyle t_{e}\neq t_{e'}}$ for all edges ${\displaystyle e'}$ adjacent to ${\displaystyle e}$, then ${\displaystyle t_{e}}$ becomes the final color of edge ${\displaystyle e}$. After each step, each edge ${\displaystyle e}$ that has not decided its color will update ${\displaystyle L_{e}}$ to exclude the colors that are already chosen by its adjacent edges.

(a) To begin with, we want to analyze what happens to the degree (of uncolored edges) of a vertex after the first step. Fix a vertex ${\displaystyle u}$ and let ${\displaystyle Z}$ be the number of edges incident on ${\displaystyle u}$ that get a final color in the first step. Prove a bound of the following form:

${\displaystyle \Pr(|Z-\mathbb {E} [Z]|>t)\leq 2e^{-\Theta (t^{2}/\Delta )}}$

(b) [Bonus Question] Assume graph ${\displaystyle G}$ contains ${\displaystyle n}$ vertices, will repeating the aforementioned step ${\displaystyle O(\log ^{2}{n})}$ times solve edge-coloring with high probability in ${\displaystyle n}$? If your answer is "yes", prove it. Otherwise, adjust the procedure and prove that the modified distributed algorithm solves edge-coloring with high probability in ${\displaystyle n}$ within ${\displaystyle O(\log ^{2}{n})}$ steps. Notice, you can enlarge the number of colors used by the algorithm by a factor of two (i.e., you can set ${\displaystyle L_{e}=[2\Delta ]}$ initially), but your modification to the procedure itself should not be too dramatic.

Hint: Leverage what you have proved in part (a) to show that, for any fixed vertex ${\displaystyle u}$, in the first step, a constant fraction of its incident edges will be colored, with at least some constant probability. Build upon this observation to develop a complete analysis. The complication, however, is that does the observation always hold true during the entire execution?

## Problem 3

In this problem, we will demonstrate a common algorithmic application of the Johnson-Lindenstrauss Theorem (JLT). Specifically, we will consider the ${\displaystyle k}$-means clustering problem, which is widely used in areas like machine learning and data mining.

In the ${\displaystyle k}$-means clustering problem, we are given ${\displaystyle n}$ points ${\displaystyle \mathbf {x} _{1},\cdots ,\mathbf {x} _{n}\in \mathbb {R} ^{d}}$, and the goal is to partition these points into ${\displaystyle k}$ disjoint sets (or, cluster) ${\displaystyle C_{1},\cdots ,C_{k}}$ so as to minimize the cost:

${\displaystyle {\texttt {cost}}(\mathbf {x} _{1},\cdots ,\mathbf {x} _{n},C_{1},\cdots ,C_{k})=\sum _{i=1}^{k}\sum _{j\in C_{i}}\lVert \mathbf {x} _{j}-\mathbf {\mu } _{i}\rVert _{2}^{2}}$

Here, ${\displaystyle \mathbf {\mu } _{i}=(1/|C_{i}|)\cdot \sum _{j\in C_{i}}\mathbf {x} _{j}}$ is the mean of the points in cluster ${\displaystyle C_{i}}$. In other words, we want to find ${\displaystyle k}$ clusters that have as little internal variance as possible.

(a) Prove the following equality:

${\displaystyle {\texttt {cost}}(\mathbf {x} _{1},\cdots ,\mathbf {x} _{n},C_{1},\cdots ,C_{k})=\sum _{i=1}^{k}{\frac {1}{|C_{i}|}}\sum _{j,l\in C_{i},j

(b) Suppose we use the Johnson-Lindenstrauss method taught in class to embed ${\displaystyle \mathbf {x} _{1},\cdots ,\mathbf {x} _{n}}$ into ${\displaystyle O(\log {n}/\epsilon ^{2})}$-dimensional points ${\displaystyle \mathbf {\tilde {x}} _{1},\cdots ,\mathbf {\tilde {x}} _{n}}$. Show that with high probability in ${\displaystyle n}$, the following inequality holds for all clusters:

${\displaystyle (1-\epsilon ){\texttt {cost}}(\mathbf {x} _{1},\cdots ,\mathbf {x} _{n},C_{1},\cdots ,C_{k})\leq {\texttt {cost}}(\mathbf {\tilde {x}} _{1},\cdots ,\mathbf {\tilde {x}} _{n},C_{1},\cdots ,C_{k})\leq (1+\epsilon ){\texttt {cost}}(\mathbf {x} _{1},\cdots ,\mathbf {x} _{n},C_{1},\cdots ,C_{k})}$

(c) Suppose we know a set of clusters (i.e., a partition of the ${\displaystyle n}$ points) ${\displaystyle {\tilde {C}}_{1},\cdots ,{\tilde {C}}_{k}}$ such that:

${\displaystyle {\texttt {cost}}(\mathbf {\tilde {x}} _{1},\cdots ,\mathbf {\tilde {x}} _{n},{\tilde {C}}_{1},\cdots ,{\tilde {C}}_{k})\leq \gamma \cdot {\texttt {cost}}(\mathbf {\tilde {x}} _{1},\cdots ,\mathbf {\tilde {x}} _{n},{\tilde {C}}_{1}^{*},\cdots ,{\tilde {C}}_{k}^{*})}$

where ${\displaystyle {\tilde {C}}_{1}^{*},\cdots ,{\tilde {C}}_{k}^{*}}$ is the optimal partition for points ${\displaystyle \mathbf {\tilde {x}} _{1},\cdots ,\mathbf {\tilde {x}} _{n}}$. That is, assume we know a ${\displaystyle \gamma }$ approximation to the optimal clustering for the dimensionality reduced points. Show that, when ${\displaystyle \epsilon \leq 1/2}$,

${\displaystyle {\texttt {cost}}(\mathbf {x} _{1},\cdots ,\mathbf {x} _{n},{\tilde {C}}_{1},\cdots ,{\tilde {C}}_{k})\leq (1+O(\epsilon ))\cdot \gamma \cdot {\texttt {cost}}(\mathbf {x} _{1},\cdots ,\mathbf {x} _{n},C_{1}^{*},\cdots ,C_{k}^{*})}$

where ${\displaystyle C_{1}^{*},\cdots ,C_{k}^{*}}$ is the optimal partition for ${\displaystyle \mathbf {x} _{1},\cdots ,\mathbf {x} _{n}}$.

In other words, we can compute an approximately optimal clustering for high dimensional original data using just the dimensionality reduced data. In many cases, this approach will speed up algorithms for solving ${\displaystyle k}$-means clustering.