高级算法 (Fall 2018)/Problem Set 3: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>TCSseminar
imported>TCSseminar
 
Line 12: Line 12:
Intuitively, if a set of random variables is negatively related, then if any monotone increasing function <math>f</math> of one subset of variables increases then any other monotone increasing function <math>g</math> of a disjoint set of variables must decrease.
Intuitively, if a set of random variables is negatively related, then if any monotone increasing function <math>f</math> of one subset of variables increases then any other monotone increasing function <math>g</math> of a disjoint set of variables must decrease.


'''(a)''' Let <math>X_1,\cdots,X_n</math> be a set of negatively associated random variables, show that for any non-decreasing function <math>f_i</math> where <math>i\in[n]</math>,
'''(a)''' Let <math>X_1,\cdots,X_n</math> be a set of negatively associated random variables, show that for any non-negative non-decreasing function <math>f_i</math> where <math>i\in[n]</math>,


:<math>\mathbb{E}\left[\prod_{i\in[n]}f_i(X_i)\right]\leq\prod_{i\in[n]}\mathbb{E}[f_i(X_i)]</math>
:<math>\mathbb{E}\left[\prod_{i\in[n]}f_i(X_i)\right]\leq\prod_{i\in[n]}\mathbb{E}[f_i(X_i)]</math>

Latest revision as of 10:59, 3 November 2018

每道题目的解答都要有完整的解题过程。中英文不限。

Problem 1

In this problem, we will explore the idea of negative association, show that the classical Chernoff bounds also hold for sum of negatively associated random variables, and see negative association in action by considering occupancy numbers in the balls and bins model. Let [math]\displaystyle{ \boldsymbol{X}=(X_1,\cdots,X_n) }[/math] be a vector of random variables. We say random variables [math]\displaystyle{ \boldsymbol{X} }[/math] are negatively associated if for all disjoint subsets [math]\displaystyle{ I,J\subseteq[n] }[/math],

[math]\displaystyle{ \mathbb{E}[f(X_i,i\in I)g(X_j,j\in J)]\leq \mathbb{E}[f(X_i,i\in I)]\mathbb{E}[g(X_j,j\in J)] }[/math]

for all non-decreasing function [math]\displaystyle{ f:\mathbb{R}^{|I|}\rightarrow\mathbb{R} }[/math] and [math]\displaystyle{ g:\mathbb{R}^{|J|}\rightarrow\mathbb{R} }[/math].

Intuitively, if a set of random variables is negatively related, then if any monotone increasing function [math]\displaystyle{ f }[/math] of one subset of variables increases then any other monotone increasing function [math]\displaystyle{ g }[/math] of a disjoint set of variables must decrease.

(a) Let [math]\displaystyle{ X_1,\cdots,X_n }[/math] be a set of negatively associated random variables, show that for any non-negative non-decreasing function [math]\displaystyle{ f_i }[/math] where [math]\displaystyle{ i\in[n] }[/math],

[math]\displaystyle{ \mathbb{E}\left[\prod_{i\in[n]}f_i(X_i)\right]\leq\prod_{i\in[n]}\mathbb{E}[f_i(X_i)] }[/math]

(b) Show that the classical Chernoff bounds can be applied as is to [math]\displaystyle{ X=\sum_{i\in[n]}X_i }[/math] if the random variables [math]\displaystyle{ X_1,\cdots,X_n }[/math] are negatively associated. (Consider both the upper tail and the lower tail.)

To establish the negative association condition, the following two properties are usually very helpful:

  • (Closure under products).If [math]\displaystyle{ \boldsymbol{X}=(X_1,\cdots,X_n) }[/math] is a set of negatively associated random variables, and [math]\displaystyle{ \boldsymbol{Y}=(Y_1,\cdots,Y_m) }[/math] is also a set of negatively associated random variables, but [math]\displaystyle{ \boldsymbol{X} }[/math] and [math]\displaystyle{ \boldsymbol{Y} }[/math] are mutually independent, then the augmented vector [math]\displaystyle{ (\boldsymbol{X},\boldsymbol{Y})=(X_1,\cdots,X_n,Y_1,\cdots,Y_m) }[/math] is a set of negatively associated random variables.
  • (Disjoint monotone aggregation). Let [math]\displaystyle{ \boldsymbol{X}=(X_1,\cdots,X_n) }[/math] be a set of negatively associated random variables. Let [math]\displaystyle{ I_1,\cdots,I_k\subseteq[n] }[/math] be disjoint index sets for some positive integer [math]\displaystyle{ k }[/math]. For [math]\displaystyle{ j\in[k] }[/math], let [math]\displaystyle{ f_j:\mathbb{R}^{|I_j|}\rightarrow\mathbb{R} }[/math] be functions that are all non–decreasing or all non–increasing, and define [math]\displaystyle{ Y_j=f_j(X_i,i\in I_j) }[/math]. Then, [math]\displaystyle{ \boldsymbol{Y}=(Y_1,\cdots,Y_k) }[/math] is also a set of negatively associated random variables. (That is, non–decreasing or non–increasing functions of disjoint subsets of negatively associated variables are also negatively associated.)

We now consider the paradigmatic example of negative dependence: occupancy numbers in the balls and bins model. Again, [math]\displaystyle{ m }[/math] balls are thrown independently into [math]\displaystyle{ n }[/math] bins. However, this time, the balls and the bins are not necessarily identical: ball [math]\displaystyle{ k }[/math] has probability [math]\displaystyle{ p_{i,k} }[/math] of landing in bin [math]\displaystyle{ i }[/math], for [math]\displaystyle{ k\in[m] }[/math] and [math]\displaystyle{ i\in[n] }[/math], with [math]\displaystyle{ \sum_{i\in[n]}p_{i,k}=1 }[/math] for each [math]\displaystyle{ k\in[m] }[/math]. Define indicator random variable [math]\displaystyle{ B_{i,k} }[/math] taking value one if and only if ball [math]\displaystyle{ k }[/math] lands in bin [math]\displaystyle{ i }[/math]. The occupancy numbers are [math]\displaystyle{ B_i=\sum_{k\in[m]}B_{i,k} }[/math]. That is, [math]\displaystyle{ B_i }[/math] denote the number of balls that land in bin [math]\displaystyle{ i }[/math].

(c) Intuitively, [math]\displaystyle{ B_1,\cdots,B_n }[/math] are negatively associated: if we know one bin has more balls, then clearly other bins are more likely to have less balls. Now, show that the occupancy numbers [math]\displaystyle{ B_1,\cdots,B_n }[/math] are negatively associated, formally.

Problem 2

In the edge-coloring problem, we are given a graph [math]\displaystyle{ G }[/math] and are asked to color the edges of the graph in such a way that adjacent edges receive different colors, using as few colors as possible. The minimum number of colors is usually called the chromatic index and denoted as [math]\displaystyle{ \chi'(G) }[/math]. It can been shown that if [math]\displaystyle{ G }[/math] is a [math]\displaystyle{ \Delta }[/math]-regular bipartite graph, then [math]\displaystyle{ \chi'(G)=\Delta }[/math].

In this problem, we study a simple and distributed procedure to compute an edge-coloring for a [math]\displaystyle{ \Delta }[/math]-regular bipartite graph [math]\displaystyle{ G }[/math]. Specifically, each edge [math]\displaystyle{ e }[/math] starts with a list [math]\displaystyle{ L_e=[\Delta] }[/math] containing [math]\displaystyle{ \Delta }[/math] colors. In each step, for each edge [math]\displaystyle{ e }[/math] which has not decided its color yet, the edge [math]\displaystyle{ e }[/math] picks a tentative color [math]\displaystyle{ t_e\in L_e }[/math] independently and uniformly at random. If [math]\displaystyle{ t_e\neq t_{e'} }[/math] for all edges [math]\displaystyle{ e' }[/math] adjacent to [math]\displaystyle{ e }[/math], then [math]\displaystyle{ t_e }[/math] becomes the final color of edge [math]\displaystyle{ e }[/math]. After each step, each edge [math]\displaystyle{ e }[/math] that has not decided its color will update [math]\displaystyle{ L_e }[/math] to exclude the colors that are already chosen by its adjacent edges.

(a) To begin with, we want to analyze what happens to the degree (of uncolored edges) of a vertex after the first step. Fix a vertex [math]\displaystyle{ u }[/math] and let [math]\displaystyle{ Z }[/math] be the number of edges incident on [math]\displaystyle{ u }[/math] that get a final color in the first step. Prove a bound of the following form:

[math]\displaystyle{ \Pr(|Z-\mathbb{E}[Z]|\gt t)\leq 2e^{-\Theta(t^2/\Delta)} }[/math]

(b) [Bonus Question] Assume graph [math]\displaystyle{ G }[/math] contains [math]\displaystyle{ n }[/math] vertices, will repeating the aforementioned step [math]\displaystyle{ O(\log^2{n}) }[/math] times solve edge-coloring with high probability in [math]\displaystyle{ n }[/math]? If your answer is "yes", prove it. Otherwise, adjust the procedure and prove that the modified distributed algorithm solves edge-coloring with high probability in [math]\displaystyle{ n }[/math] within [math]\displaystyle{ O(\log^2{n}) }[/math] steps. Notice, you can enlarge the number of colors used by the algorithm by a factor of two (i.e., you can set [math]\displaystyle{ L_e=[2\Delta] }[/math] initially), but your modification to the procedure itself should not be too dramatic.

Hint: Leverage what you have proved in part (a) to show that, for any fixed vertex [math]\displaystyle{ u }[/math], in the first step, a constant fraction of its incident edges will be colored, with at least some constant probability. Build upon this observation to develop a complete analysis. The complication, however, is that does the observation always hold true during the entire execution?

Problem 3

In this problem, we will demonstrate a common algorithmic application of the Johnson-Lindenstrauss Theorem (JLT). Specifically, we will consider the [math]\displaystyle{ k }[/math]-means clustering problem, which is widely used in areas like machine learning and data mining.

In the [math]\displaystyle{ k }[/math]-means clustering problem, we are given [math]\displaystyle{ n }[/math] points [math]\displaystyle{ \mathbf{x}_1,\cdots,\mathbf x_n\in\mathbb{R}^d }[/math], and the goal is to partition these points into [math]\displaystyle{ k }[/math] disjoint sets (or, cluster) [math]\displaystyle{ C_1,\cdots,C_k }[/math] so as to minimize the cost:

[math]\displaystyle{ \texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k)=\sum_{i=1}^{k}\sum_{j\in C_i}\lVert\mathbf{x}_j-\mathbf{\mu}_i\rVert^2_2 }[/math]

Here, [math]\displaystyle{ \mathbf{\mu}_i=(1/|C_i|)\cdot\sum_{j\in C_i}\mathbf{x}_j }[/math] is the mean of the points in cluster [math]\displaystyle{ C_i }[/math]. In other words, we want to find [math]\displaystyle{ k }[/math] clusters that have as little internal variance as possible.

(a) Prove the following equality:

[math]\displaystyle{ \texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k)=\sum_{i=1}^{k}\frac{1}{|C_i|}\sum_{j,l\in C_i,j\lt l}\lVert\mathbf{x}_j-\mathbf{x}_l\rVert^2_2 }[/math]

(b) Suppose we use the Johnson-Lindenstrauss method taught in class to embed [math]\displaystyle{ \mathbf{x}_1,\cdots,\mathbf{x}_n }[/math] into [math]\displaystyle{ O(\log{n}/\epsilon^2) }[/math]-dimensional points [math]\displaystyle{ \mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n }[/math]. Show that with high probability in [math]\displaystyle{ n }[/math], the following inequality holds for all clusters:

[math]\displaystyle{ (1-\epsilon)\texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k) \leq\texttt{cost}(\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n,C_1,\cdots,C_k) \leq(1+\epsilon)\texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C_1,\cdots,C_k) }[/math]

(c) Suppose we know a set of clusters (i.e., a partition of the [math]\displaystyle{ n }[/math] points) [math]\displaystyle{ \tilde{C}_1,\cdots,\tilde{C}_k }[/math] such that:

[math]\displaystyle{ \texttt{cost}(\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n,\tilde{C}_1,\cdots,\tilde{C}_k)\leq\gamma\cdot\texttt{cost}(\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n,\tilde{C}^*_1,\cdots,\tilde{C}^*_k) }[/math]

where [math]\displaystyle{ \tilde{C}^*_1,\cdots,\tilde{C}^*_k }[/math] is the optimal partition for points [math]\displaystyle{ \mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_n }[/math]. That is, assume we know a [math]\displaystyle{ \gamma }[/math] approximation to the optimal clustering for the dimensionality reduced points. Show that, when [math]\displaystyle{ \epsilon\leq 1/2 }[/math],

[math]\displaystyle{ \texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,\tilde{C}_1,\cdots,\tilde{C}_k)\leq(1+O(\epsilon))\cdot\gamma\cdot\texttt{cost}(\mathbf{x}_1,\cdots,\mathbf{x}_n,C^*_1,\cdots,C^*_k) }[/math]

where [math]\displaystyle{ C^*_1,\cdots,C^*_k }[/math] is the optimal partition for [math]\displaystyle{ \mathbf{x}_1,\cdots,\mathbf{x}_n }[/math].

In other words, we can compute an approximately optimal clustering for high dimensional original data using just the dimensionality reduced data. In many cases, this approach will speed up algorithms for solving [math]\displaystyle{ k }[/math]-means clustering.