In this problem, we will explore the idea of negative association, show that the classical Chernoff bounds also hold for sum of negatively associated random variables, and see negative association in action by considering occupancy numbers in the balls and bins model.
Let be a vector of random variables. We say random variables are negatively associated if for all disjoint subsets ,
for all non-decreasing function and .
Intuitively, if a set of random variables is negatively related, then if any monotone increasing function of one subset of variables increases then any other monotone increasing function of a disjoint set of variables must decrease.
(a) Let be a set of negatively associated random variables, show that for any non-negative non-decreasing function where ,
(b) Show that the classical Chernoff bounds can be applied as is to if the random variables are negatively associated. (Consider both the upper tail and the lower tail.)
To establish the negative association condition, the following two properties are usually very helpful:
- (Closure under products).If is a set of negatively associated random variables, and is also a set of negatively associated random variables, but and are mutually independent, then the augmented vector is a set of negatively associated random variables.
- (Disjoint monotone aggregation). Let be a set of negatively associated random variables. Let be disjoint index sets for some positive integer . For , let be functions that are all non–decreasing or all non–increasing, and define . Then, is also a set of negatively associated random variables. (That is, non–decreasing or non–increasing functions of disjoint subsets of negatively associated variables are also negatively associated.)
We now consider the paradigmatic example of negative dependence: occupancy numbers in the balls and bins model. Again, balls are thrown independently into bins. However, this time, the balls and the bins are not necessarily identical: ball has probability of landing in bin , for and , with for each . Define indicator random variable taking value one if and only if ball lands in bin . The occupancy numbers are . That is, denote the number of balls that land in bin .
(c) Intuitively, are negatively associated: if we know one bin has more balls, then clearly other bins are more likely to have less balls. Now, show that the occupancy numbers are negatively associated, formally.
In the edge-coloring problem, we are given a graph and are asked to color the edges of the graph in such a way that adjacent edges receive different colors, using as few colors as possible. The minimum number of colors is usually called the chromatic index and denoted as . It can been shown that if is a -regular bipartite graph, then .
In this problem, we study a simple and distributed procedure to compute an edge-coloring for a -regular bipartite graph . Specifically, each edge starts with a list containing colors. In each step, for each edge which has not decided its color yet, the edge picks a tentative color independently and uniformly at random. If for all edges adjacent to , then becomes the final color of edge . After each step, each edge that has not decided its color will update to exclude the colors that are already chosen by its adjacent edges.
(a) To begin with, we want to analyze what happens to the degree (of uncolored edges) of a vertex after the first step. Fix a vertex and let be the number of edges incident on that get a final color in the first step. Prove a bound of the following form:
(b) [Bonus Question] Assume graph contains vertices, will repeating the aforementioned step times solve edge-coloring with high probability in ? If your answer is "yes", prove it. Otherwise, adjust the procedure and prove that the modified distributed algorithm solves edge-coloring with high probability in within steps. Notice, you can enlarge the number of colors used by the algorithm by a factor of two (i.e., you can set initially), but your modification to the procedure itself should not be too dramatic.
Hint: Leverage what you have proved in part (a) to show that, for any fixed vertex , in the first step, a constant fraction of its incident edges will be colored, with at least some constant probability. Build upon this observation to develop a complete analysis. The complication, however, is that does the observation always hold true during the entire execution?
In this problem, we will demonstrate a common algorithmic application of the Johnson-Lindenstrauss Theorem (JLT). Specifically, we will consider the -means clustering problem, which is widely used in areas like machine learning and data mining.
In the -means clustering problem, we are given points , and the goal is to partition these points into disjoint sets (or, cluster) so as to minimize the cost:
Here, is the mean of the points in cluster . In other words, we want to find clusters that have as little internal variance as possible.
(a) Prove the following equality:
(b) Suppose we use the Johnson-Lindenstrauss method taught in class to embed into -dimensional points . Show that with high probability in , the following inequality holds for all clusters:
(c) Suppose we know a set of clusters (i.e., a partition of the points) such that:
where is the optimal partition for points . That is, assume we know a approximation to the optimal clustering for the dimensionality reduced points. Show that, when ,
where is the optimal partition for .
In other words, we can compute an approximately optimal clustering for high dimensional original data using just the dimensionality reduced data. In many cases, this approach will speed up algorithms for solving -means clustering.