Revision as of 06:04, 18 April 2013

Set Balancing

Supposed that we have an [math]\displaystyle{ n\times m }[/math] matrix [math]\displaystyle{ A }[/math] with 0-1 entries. We are looking for a [math]\displaystyle{ b\in\{-1,+1\}^m }[/math] that minimizes [math]\displaystyle{ \|Ab\|_\infty }[/math].

Recall that [math]\displaystyle{ \|\cdot\|_\infty }[/math] is the infinity norm (also called [math]\displaystyle{ L_\infty }[/math] norm) of a vector, and for the vector [math]\displaystyle{ c=Ab }[/math],

[math]\displaystyle{ \|Ab\|_\infty=\max_{i=1,2,\ldots,n}|c_i| }[/math].

We can also describe this problem as an optimization:

[math]\displaystyle{ \begin{align} \mbox{minimize } &\quad \|Ab\|_\infty\\ \mbox{subject to: } &\quad b\in\{-1,+1\}^m. \end{align} }[/math]

This problem is called set balancing for a reason.

The problem arises in designing statistical experiments. Suppose that we have [math]\displaystyle{ m }[/math] subjects, each of which may have up to [math]\displaystyle{ n }[/math] features. This gives us an [math]\displaystyle{ n\times m }[/math] matrix [math]\displaystyle{ A }[/math]:

[math]\displaystyle{ \begin{array}{c} \mbox{feature 1:}\\ \mbox{feature 2:}\\ \vdots\\ \mbox{feature n:}\\ \end{array} \left[ \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1m}\\ a_{21} & a_{22} & \cdots & a_{2m}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \cdots & a_{nm}\\ \end{array} \right], }[/math]

where each column represents a subject and each row represent a feature. An entry [math]\displaystyle{ a_{ij}\in\{0,1\} }[/math] indicates whether subject [math]\displaystyle{ j }[/math] has feature [math]\displaystyle{ i }[/math].

By multiplying a vector [math]\displaystyle{ b\in\{-1,+1\}^m }[/math]

[math]\displaystyle{ \left[ \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1m}\\ a_{21} & a_{22} & \cdots & a_{2m}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \cdots & a_{nm}\\ \end{array} \right] \left[ \begin{array}{c} b_{1}\\ b_{2}\\ \vdots\\ b_{m}\\ \end{array} \right] = \left[ \begin{array}{c} c_{1}\\ c_{2}\\ \vdots\\ c_{n}\\ \end{array} \right], }[/math]

the subjects are partitioned into two disjoint groups: one for -1 and other other for +1. Each [math]\displaystyle{ c_i }[/math] gives the difference between the numbers of subjects with feature [math]\displaystyle{ i }[/math] in the two groups. By minimizing [math]\displaystyle{ \|Ab\|_\infty=\|c\|_\infty }[/math], we ask for an optimal partition so that each feature is roughly as balanced as possible between the two groups.

In a scientific experiment, one of the group serves as a control group (对照组). Ideally, we want the two groups are statistically identical, which is usually impossible to achieve in practice. The requirement of minimizing [math]\displaystyle{ \|Ab\|_\infty }[/math] actually means the statistical difference between the two groups are minimized.

We propose an extremely simple "randomized algorithm" for computing a [math]\displaystyle{ b\in\{-1,+1\}^m }[/math]: for each [math]\displaystyle{ i=1,2,\ldots, m }[/math], let [math]\displaystyle{ b_i }[/math] be independently chosen from [math]\displaystyle{ \{-1,+1\} }[/math], such that

[math]\displaystyle{ b_i= \begin{cases} -1 & \mbox{with probability }\frac{1}{2}\\ +1 &\mbox{with probability }\frac{1}{2} \end{cases}. }[/math]

This procedure can hardly be called as an "algorithm", because its decision is made disregard of the input [math]\displaystyle{ A }[/math]. We then show that despite of this obliviousness, the algorithm chooses a good enough [math]\displaystyle{ b }[/math], such that for any [math]\displaystyle{ A }[/math], [math]\displaystyle{ \|Ab\|_\infty=O(\sqrt{m\ln n}) }[/math] with high probability.

Theorem

Let [math]\displaystyle{ A }[/math] be an [math]\displaystyle{ n\times m }[/math] matrix with 0-1 entries. For a random vector [math]\displaystyle{ b }[/math] with [math]\displaystyle{ m }[/math] entries chosen independently and with equal probability from [math]\displaystyle{ \{-1,+1\} }[/math],

[math]\displaystyle{ \Pr[\|Ab\|_\infty\gt 2\sqrt{2m\ln n}]\le\frac{2}{n} }[/math].

Proof.

Consider particularly the [math]\displaystyle{ i }[/math]-th row of [math]\displaystyle{ A }[/math]. The entry of [math]\displaystyle{ Ab }[/math] contributed by row [math]\displaystyle{ i }[/math] is [math]\displaystyle{ c_i=\sum_{j=1}^m a_{ij}b_j }[/math].

Let [math]\displaystyle{ k }[/math] be the non-zero entries in the row. If [math]\displaystyle{ k\le2\sqrt{2m\ln n} }[/math], then clearly [math]\displaystyle{ |c_i| }[/math] is no greater than [math]\displaystyle{ 2\sqrt{2m\ln n} }[/math]. On the other hand if [math]\displaystyle{ k\gt 2\sqrt{2m\ln n} }[/math] then the [math]\displaystyle{ k }[/math] nonzero terms in the sum

[math]\displaystyle{ c_i=\sum_{j=1}^m a_{ij}b_j }[/math]

are independent, each with probability 1/2 of being either +1 or -1.

Thus, for these [math]\displaystyle{ k }[/math] nonzero terms, each [math]\displaystyle{ b_i }[/math] is either positive or negative independently with equal probability. There are expectedly [math]\displaystyle{ \mu=\frac{k}{2} }[/math] positive [math]\displaystyle{ b_i }[/math]'s among these [math]\displaystyle{ k }[/math] terms, and [math]\displaystyle{ c_i\lt -2\sqrt{2m\ln n} }[/math] only occurs when there are less than [math]\displaystyle{ \frac{k}{2}-\sqrt{2m\ln n}=\left(1-\delta\right)\mu }[/math] positive [math]\displaystyle{ b_i }[/math]'s, where [math]\displaystyle{ \delta=\frac{2\sqrt{2m\ln n}}{k} }[/math]. Applying Chernoff bound, this event occurs with probability at most

[math]\displaystyle{ \begin{align} \exp\left(-\frac{\mu\delta^2}{2}\right) &= \exp\left(-\frac{k}{2}\cdot\frac{8m\ln n}{2k^2}\right)\\ &= \exp\left(-\frac{2m\ln n}{k}\right)\\ &\le \exp\left(-\frac{2m\ln n}{m}\right)\\ &\le n^{-2}. \end{align} }[/math]

The same argument can be applied to negative [math]\displaystyle{ b_i }[/math]'s, so that the probability that [math]\displaystyle{ c_i\gt 2\sqrt{2m\ln n} }[/math] is at most [math]\displaystyle{ n^{-2} }[/math]. Therefore, by the union bound,

[math]\displaystyle{ \Pr[|c_i|\gt 2\sqrt{2m\ln n}]\le\frac{2}{n^2} }[/math].

Apply the union bound to all [math]\displaystyle{ n }[/math] rows.

[math]\displaystyle{ \Pr[\|Ab\|_\infty\gt 2\sqrt{2m\ln n}]\le n\cdot\Pr[|c_i|\gt 2\sqrt{2m\ln n}]\le\frac{2}{n} }[/math].

[math]\displaystyle{ \square }[/math]

How good is this randomized algorithm? In fact when [math]\displaystyle{ m=n }[/math] there exists a matrix [math]\displaystyle{ A }[/math] such that [math]\displaystyle{ \|Ab\|_\infty=\Omega(\sqrt{n}) }[/math] for any choice of [math]\displaystyle{ b\in\{-1,+1\}^n }[/math].

组合数学 (Spring 2013)/Problem Set 2 and 随机算法 (Spring 2013)/Applications of Chernoff Bound: Difference between pages

Revision as of 06:04, 18 April 2013

Set Balancing

Navigation menu

@@ Line 1: / Line 1: @@
-*<font color="red" size=4>每道题目的解答都要有完整的解题过程。中英文不限。</font>
+=Set Balancing=
-*<font color="red" size=5>这次作业只有一星期的时间。</font>
+Supposed that we have an <math>n\times m</math> matrix <math>A</math> with 0-1 entries. We are looking for a <math>b\in\{-1,+1\}^m</math> that minimizes <math>\|Ab\|_\infty</math>.
-== Problem 1==
+Recall that <math>\|\cdot\|_\infty</math> is the infinity norm (also called <math>L_\infty</math> norm) of a vector, and for the vector <math>c=Ab</math>,
-Prove the following identity:
+:<math>\|Ab\|_\infty=\max_{i=1,2,\ldots,n}|c_i|</math>.
-*<math>\sum_{k=1}^n k{n\choose k}= n2^{n-1}</math>.
-(Hint: Use double counting.)
+We can also describe this problem as an optimization:
+:<math>\begin{align}
+\mbox{minimize }
+&\quad
+\|Ab\|_\infty\\
+\mbox{subject to: }
+&\quad
+b\in\{-1,+1\}^m.
+\end{align}</math>
-== Problem 2 ==
+This problem is called set balancing for a reason.
-Show that among any group of <math>n</math> people, where <math>n\ge 2</math>, there are at least two people who know exactly the same number of people in the group (assuming that "knowing" is a symmetric relation).
-== Problem 3 ==
+{|border="1"
-Let <math>S</math> be a subset of <math>\{1,2,\ldots,2n\}</math> such that <math>|S|>n</math>. Show that there exist <math>a,b\in S</math> such that <math>a</math> and <math>b</math> are coprime.
+|The problem arises in designing statistical experiments. Suppose that we have <math>m</math> '''subjects''', each of which may have up to <math>n</math> '''features'''. This gives us an <math>n\times m</math> matrix <math>A</math>:
+:<math>
+\begin{array}{c}
+\mbox{feature 1:}\\
+\mbox{feature 2:}\\
+\vdots\\
+\mbox{feature n:}\\
+\end{array}
+\left[
+\begin{array}{cccc}
+a_{11} & a_{12} & \cdots & a_{1m}\\
+a_{21} & a_{22} & \cdots & a_{2m}\\
+\vdots & \vdots & \ddots & \vdots\\
+a_{n1} & a_{n2} & \cdots & a_{nm}\\
+\end{array}
+\right],
+</math>
+where each column represents a subject and each row represent a feature. An entry <math>a_{ij}\in\{0,1\}</math> indicates whether subject <math>j</math> has feature <math>i</math>.
-== Problem 4 ==
+By multiplying a vector <math>b\in\{-1,+1\}^m</math>
-(Due to Karger)
+:<math>
+\left[
+\begin{array}{cccc}
+a_{11} & a_{12} & \cdots & a_{1m}\\
+a_{21} & a_{22} & \cdots & a_{2m}\\
+\vdots & \vdots & \ddots & \vdots\\
+a_{n1} & a_{n2} & \cdots & a_{nm}\\
+\end{array}
+\right]
+\left[
+\begin{array}{c}
+b_{1}\\
+b_{2}\\
+\vdots\\
+b_{m}\\
+\end{array}
+\right]
+=
+\left[
+\begin{array}{c}
+c_{1}\\
+c_{2}\\
+\vdots\\
+c_{n}\\
+\end{array}
+\right],
+</math>
+the subjects are partitioned into two disjoint groups: one for -1 and other other for +1. Each <math>c_i</math> gives the difference between the numbers of subjects with feature <math>i</math> in the two groups. By minimizing <math>\|Ab\|_\infty=\|c\|_\infty</math>, we ask for an optimal partition so that each feature is roughly as balanced as possible between the two groups.
-Balls of 8 different colors are in 6 bins. There are 20 balls of each color. Prove that there must be a bin containing 2 pairs of balls from the two different colors of balls.
+In a scientific experiment, one of the group serves as a [http://en.wikipedia.org/wiki/Scientific_control control group] (对照组). Ideally, we want the two groups are statistically identical, which is usually impossible to achieve in practice. The requirement of minimizing <math>\|Ab\|_\infty</math> actually means the statistical difference between the two groups are minimized.
+|}
-== Problem 5 ==
-(Erdős-spencer 1974)
-Let <math>n</math> coins of weights 0 and 1 be given. We are also given a scale with which we may weigh any subset of the coins. Our goal is to determine the weights of coins (i.e. which coins are 0 and which are 1) with the minimal number of weighings.
+We propose an extremely simple "randomized algorithm" for computing a <math>b\in\{-1,+1\}^m</math>: for each <math>i=1,2,\ldots, m</math>, let <math>b_i</math> be independently chosen from <math>\{-1,+1\}</math>, such that
+:<math>b_i=
+\begin{cases}
+-1 & \mbox{with probability }\frac{1}{2}\\
++1 &\mbox{with probability }\frac{1}{2}
+\end{cases}.
+</math>
-This problem can be formalized as follows: A collection <math>S_1,S_1,\ldots,S_m\subseteq [n]</math> is called '''determining''' if an arbitrary subset <math>T\subseteq[n]</math> can be uniquely determined by the cardinalities <math>|S_i\cap T|, 1\le i\le m</math>.
+This procedure can hardly be called as an "algorithm", because its decision is made disregard of the input <math>A</math>. We then show that despite of this obliviousness, the algorithm chooses a good enough <math>b</math>, such that for any <math>A</math>, <math>\|Ab\|_\infty=O(\sqrt{m\ln n})</math> with high probability.
+{{Theorem
+|Theorem|
+:Let <math>A</math> be an <math>n\times m</math> matrix with 0-1 entries. For a random vector <math>b</math> with <math>m</math> entries chosen independently and with equal probability from <math>\{-1,+1\}</math>,
+::<math>\Pr[\|Ab\|_\infty>2\sqrt{2m\ln n}]\le\frac{2}{n}</math>.
+}}
+{{Proof|
+Consider particularly the <math>i</math>-th row of <math>A</math>. The entry of <math>Ab</math> contributed by row <math>i</math> is <math>c_i=\sum_{j=1}^m a_{ij}b_j</math>.
-* Prove that if there is a determining collection <math>S_1,S_1,\ldots,S_m\subseteq [n]</math>, then there is a way to determine the weights of <math>n</math> coins with <math>m</math> weighings.
+Let <math>k</math> be the non-zero entries in the row. If <math>k\le2\sqrt{2m\ln n}</math>, then clearly <math>|c_i|</math> is no greater than <math>2\sqrt{2m\ln n}</math>. On the other hand if <math>k>2\sqrt{2m\ln n}</math> then the <math>k</math> nonzero terms in the sum
-* Use pigeonhole principle to show that if a collection <math>S_1,S_1,\ldots,S_m\subseteq [n]</math> is determining, then it must hold that <math>m\ge \frac{n}{\log_2(n+1)}</math>.
+:<math>c_i=\sum_{j=1}^m a_{ij}b_j</math>
+are independent, each with probability 1/2 of being either +1 or -1.
-(This gives a lower bound for the number of weighings required to determine the weights of coins.)
+Thus, for these <math>k</math> nonzero terms, each <math>b_i</math> is either positive or negative independently with equal probability. There are expectedly <math>\mu=\frac{k}{2}</math> positive <math>b_i</math>'s among these <math>k</math> terms, and <math>c_i<-2\sqrt{2m\ln n}</math> only occurs when there are less than <math>\frac{k}{2}-\sqrt{2m\ln n}=\left(1-\delta\right)\mu</math> positive <math>b_i</math>'s, where <math>\delta=\frac{2\sqrt{2m\ln n}}{k}</math>. Applying Chernoff bound, this event occurs with probability at most
+:<math>\begin{align}
+\exp\left(-\frac{\mu\delta^2}{2}\right)
+&=
+\exp\left(-\frac{k}{2}\cdot\frac{8m\ln n}{2k^2}\right)\\
+&=
+\exp\left(-\frac{2m\ln n}{k}\right)\\
+&\le
+\exp\left(-\frac{2m\ln n}{m}\right)\\
+&\le n^{-2}.
+\end{align}
+</math>
+The same argument can be applied to negative <math>b_i</math>'s, so that the probability that <math>c_i>2\sqrt{2m\ln n}</math> is at most <math>n^{-2}</math>. Therefore, by the union bound,
+:<math>\Pr[|c_i|> 2\sqrt{2m\ln n}]\le\frac{2}{n^2}</math>.
+Apply the union bound to all <math>n</math> rows.
+:<math>\Pr[\|Ab\|_\infty>2\sqrt{2m\ln n}]\le n\cdot\Pr[|c_i|> 2\sqrt{2m\ln n}]\le\frac{2}{n}</math>.
+}}
+How good is this randomized algorithm? In fact when <math>m=n</math> there exists a matrix <math>A</math> such that <math>\|Ab\|_\infty=\Omega(\sqrt{n})</math> for any choice of <math>b\in\{-1,+1\}^n</math>.

组合数学 (Spring 2013)/Problem Set 2 and 随机算法 (Spring 2013)/Applications of Chernoff Bound: Difference between pages

Revision as of 06:04, 18 April 2013

Set Balancing

Navigation menu

Search