组合数学 (Spring 2013)/Problem Set 2 and 随机算法 (Spring 2013)/Applications of Chernoff Bound: Difference between pages
imported>Etone No edit summary |
imported>Etone (Created page with "=Set Balancing= Supposed that we have an <math>n\times m</math> matrix <math>A</math> with 0-1 entries. We are looking for a <math>b\in\{-1,+1\}^m</math> that minimizes <math>\|A…") |
||
Line 1: | Line 1: | ||
=Set Balancing= | |||
Supposed that we have an <math>n\times m</math> matrix <math>A</math> with 0-1 entries. We are looking for a <math>b\in\{-1,+1\}^m</math> that minimizes <math>\|Ab\|_\infty</math>. | |||
= | Recall that <math>\|\cdot\|_\infty</math> is the infinity norm (also called <math>L_\infty</math> norm) of a vector, and for the vector <math>c=Ab</math>, | ||
:<math>\|Ab\|_\infty=\max_{i=1,2,\ldots,n}|c_i|</math>. | |||
We can also describe this problem as an optimization: | |||
:<math>\begin{align} | |||
\mbox{minimize } | |||
&\quad | |||
\|Ab\|_\infty\\ | |||
\mbox{subject to: } | |||
&\quad | |||
b\in\{-1,+1\}^m. | |||
\end{align}</math> | |||
This problem is called set balancing for a reason. | |||
= | {|border="1" | ||
|The problem arises in designing statistical experiments. Suppose that we have <math>m</math> '''subjects''', each of which may have up to <math>n</math> '''features'''. This gives us an <math>n\times m</math> matrix <math>A</math>: | |||
:<math> | |||
\begin{array}{c} | |||
\mbox{feature 1:}\\ | |||
\mbox{feature 2:}\\ | |||
\vdots\\ | |||
\mbox{feature n:}\\ | |||
\end{array} | |||
\left[ | |||
\begin{array}{cccc} | |||
a_{11} & a_{12} & \cdots & a_{1m}\\ | |||
a_{21} & a_{22} & \cdots & a_{2m}\\ | |||
\vdots & \vdots & \ddots & \vdots\\ | |||
a_{n1} & a_{n2} & \cdots & a_{nm}\\ | |||
\end{array} | |||
\right], | |||
</math> | |||
where each column represents a subject and each row represent a feature. An entry <math>a_{ij}\in\{0,1\}</math> indicates whether subject <math>j</math> has feature <math>i</math>. | |||
== | By multiplying a vector <math>b\in\{-1,+1\}^m</math> | ||
:<math> | |||
\left[ | |||
\begin{array}{cccc} | |||
a_{11} & a_{12} & \cdots & a_{1m}\\ | |||
a_{21} & a_{22} & \cdots & a_{2m}\\ | |||
\vdots & \vdots & \ddots & \vdots\\ | |||
a_{n1} & a_{n2} & \cdots & a_{nm}\\ | |||
\end{array} | |||
\right] | |||
\left[ | |||
\begin{array}{c} | |||
b_{1}\\ | |||
b_{2}\\ | |||
\vdots\\ | |||
b_{m}\\ | |||
\end{array} | |||
\right] | |||
= | |||
\left[ | |||
\begin{array}{c} | |||
c_{1}\\ | |||
c_{2}\\ | |||
\vdots\\ | |||
c_{n}\\ | |||
\end{array} | |||
\right], | |||
</math> | |||
the subjects are partitioned into two disjoint groups: one for -1 and other other for +1. Each <math>c_i</math> gives the difference between the numbers of subjects with feature <math>i</math> in the two groups. By minimizing <math>\|Ab\|_\infty=\|c\|_\infty</math>, we ask for an optimal partition so that each feature is roughly as balanced as possible between the two groups. | |||
In a scientific experiment, one of the group serves as a [http://en.wikipedia.org/wiki/Scientific_control control group] (对照组). Ideally, we want the two groups are statistically identical, which is usually impossible to achieve in practice. The requirement of minimizing <math>\|Ab\|_\infty</math> actually means the statistical difference between the two groups are minimized. | |||
|} | |||
We propose an extremely simple "randomized algorithm" for computing a <math>b\in\{-1,+1\}^m</math>: for each <math>i=1,2,\ldots, m</math>, let <math>b_i</math> be independently chosen from <math>\{-1,+1\}</math>, such that | |||
:<math>b_i= | |||
\begin{cases} | |||
-1 & \mbox{with probability }\frac{1}{2}\\ | |||
+1 &\mbox{with probability }\frac{1}{2} | |||
\end{cases}. | |||
</math> | |||
This | This procedure can hardly be called as an "algorithm", because its decision is made disregard of the input <math>A</math>. We then show that despite of this obliviousness, the algorithm chooses a good enough <math>b</math>, such that for any <math>A</math>, <math>\|Ab\|_\infty=O(\sqrt{m\ln n})</math> with high probability. | ||
{{Theorem | |||
|Theorem| | |||
:Let <math>A</math> be an <math>n\times m</math> matrix with 0-1 entries. For a random vector <math>b</math> with <math>m</math> entries chosen independently and with equal probability from <math>\{-1,+1\}</math>, | |||
::<math>\Pr[\|Ab\|_\infty>2\sqrt{2m\ln n}]\le\frac{2}{n}</math>. | |||
}} | |||
{{Proof| | |||
Consider particularly the <math>i</math>-th row of <math>A</math>. The entry of <math>Ab</math> contributed by row <math>i</math> is <math>c_i=\sum_{j=1}^m a_{ij}b_j</math>. | |||
Let <math>k</math> be the non-zero entries in the row. If <math>k\le2\sqrt{2m\ln n}</math>, then clearly <math>|c_i|</math> is no greater than <math>2\sqrt{2m\ln n}</math>. On the other hand if <math>k>2\sqrt{2m\ln n}</math> then the <math>k</math> nonzero terms in the sum | |||
:<math>c_i=\sum_{j=1}^m a_{ij}b_j</math> | |||
are independent, each with probability 1/2 of being either +1 or -1. | |||
( | Thus, for these <math>k</math> nonzero terms, each <math>b_i</math> is either positive or negative independently with equal probability. There are expectedly <math>\mu=\frac{k}{2}</math> positive <math>b_i</math>'s among these <math>k</math> terms, and <math>c_i<-2\sqrt{2m\ln n}</math> only occurs when there are less than <math>\frac{k}{2}-\sqrt{2m\ln n}=\left(1-\delta\right)\mu</math> positive <math>b_i</math>'s, where <math>\delta=\frac{2\sqrt{2m\ln n}}{k}</math>. Applying Chernoff bound, this event occurs with probability at most | ||
:<math>\begin{align} | |||
\exp\left(-\frac{\mu\delta^2}{2}\right) | |||
&= | |||
\exp\left(-\frac{k}{2}\cdot\frac{8m\ln n}{2k^2}\right)\\ | |||
&= | |||
\exp\left(-\frac{2m\ln n}{k}\right)\\ | |||
&\le | |||
\exp\left(-\frac{2m\ln n}{m}\right)\\ | |||
&\le n^{-2}. | |||
\end{align} | |||
</math> | |||
The same argument can be applied to negative <math>b_i</math>'s, so that the probability that <math>c_i>2\sqrt{2m\ln n}</math> is at most <math>n^{-2}</math>. Therefore, by the union bound, | |||
:<math>\Pr[|c_i|> 2\sqrt{2m\ln n}]\le\frac{2}{n^2}</math>. | |||
Apply the union bound to all <math>n</math> rows. | |||
:<math>\Pr[\|Ab\|_\infty>2\sqrt{2m\ln n}]\le n\cdot\Pr[|c_i|> 2\sqrt{2m\ln n}]\le\frac{2}{n}</math>. | |||
}} | |||
How good is this randomized algorithm? In fact when <math>m=n</math> there exists a matrix <math>A</math> such that <math>\|Ab\|_\infty=\Omega(\sqrt{n})</math> for any choice of <math>b\in\{-1,+1\}^n</math>. |
Revision as of 06:04, 18 April 2013
Set Balancing
Supposed that we have an [math]\displaystyle{ n\times m }[/math] matrix [math]\displaystyle{ A }[/math] with 0-1 entries. We are looking for a [math]\displaystyle{ b\in\{-1,+1\}^m }[/math] that minimizes [math]\displaystyle{ \|Ab\|_\infty }[/math].
Recall that [math]\displaystyle{ \|\cdot\|_\infty }[/math] is the infinity norm (also called [math]\displaystyle{ L_\infty }[/math] norm) of a vector, and for the vector [math]\displaystyle{ c=Ab }[/math],
- [math]\displaystyle{ \|Ab\|_\infty=\max_{i=1,2,\ldots,n}|c_i| }[/math].
We can also describe this problem as an optimization:
- [math]\displaystyle{ \begin{align} \mbox{minimize } &\quad \|Ab\|_\infty\\ \mbox{subject to: } &\quad b\in\{-1,+1\}^m. \end{align} }[/math]
This problem is called set balancing for a reason.
The problem arises in designing statistical experiments. Suppose that we have [math]\displaystyle{ m }[/math] subjects, each of which may have up to [math]\displaystyle{ n }[/math] features. This gives us an [math]\displaystyle{ n\times m }[/math] matrix [math]\displaystyle{ A }[/math]:
where each column represents a subject and each row represent a feature. An entry [math]\displaystyle{ a_{ij}\in\{0,1\} }[/math] indicates whether subject [math]\displaystyle{ j }[/math] has feature [math]\displaystyle{ i }[/math]. By multiplying a vector [math]\displaystyle{ b\in\{-1,+1\}^m }[/math]
the subjects are partitioned into two disjoint groups: one for -1 and other other for +1. Each [math]\displaystyle{ c_i }[/math] gives the difference between the numbers of subjects with feature [math]\displaystyle{ i }[/math] in the two groups. By minimizing [math]\displaystyle{ \|Ab\|_\infty=\|c\|_\infty }[/math], we ask for an optimal partition so that each feature is roughly as balanced as possible between the two groups. In a scientific experiment, one of the group serves as a control group (对照组). Ideally, we want the two groups are statistically identical, which is usually impossible to achieve in practice. The requirement of minimizing [math]\displaystyle{ \|Ab\|_\infty }[/math] actually means the statistical difference between the two groups are minimized. |
We propose an extremely simple "randomized algorithm" for computing a [math]\displaystyle{ b\in\{-1,+1\}^m }[/math]: for each [math]\displaystyle{ i=1,2,\ldots, m }[/math], let [math]\displaystyle{ b_i }[/math] be independently chosen from [math]\displaystyle{ \{-1,+1\} }[/math], such that
- [math]\displaystyle{ b_i= \begin{cases} -1 & \mbox{with probability }\frac{1}{2}\\ +1 &\mbox{with probability }\frac{1}{2} \end{cases}. }[/math]
This procedure can hardly be called as an "algorithm", because its decision is made disregard of the input [math]\displaystyle{ A }[/math]. We then show that despite of this obliviousness, the algorithm chooses a good enough [math]\displaystyle{ b }[/math], such that for any [math]\displaystyle{ A }[/math], [math]\displaystyle{ \|Ab\|_\infty=O(\sqrt{m\ln n}) }[/math] with high probability.
Theorem - Let [math]\displaystyle{ A }[/math] be an [math]\displaystyle{ n\times m }[/math] matrix with 0-1 entries. For a random vector [math]\displaystyle{ b }[/math] with [math]\displaystyle{ m }[/math] entries chosen independently and with equal probability from [math]\displaystyle{ \{-1,+1\} }[/math],
- [math]\displaystyle{ \Pr[\|Ab\|_\infty\gt 2\sqrt{2m\ln n}]\le\frac{2}{n} }[/math].
- Let [math]\displaystyle{ A }[/math] be an [math]\displaystyle{ n\times m }[/math] matrix with 0-1 entries. For a random vector [math]\displaystyle{ b }[/math] with [math]\displaystyle{ m }[/math] entries chosen independently and with equal probability from [math]\displaystyle{ \{-1,+1\} }[/math],
Proof. Consider particularly the [math]\displaystyle{ i }[/math]-th row of [math]\displaystyle{ A }[/math]. The entry of [math]\displaystyle{ Ab }[/math] contributed by row [math]\displaystyle{ i }[/math] is [math]\displaystyle{ c_i=\sum_{j=1}^m a_{ij}b_j }[/math].
Let [math]\displaystyle{ k }[/math] be the non-zero entries in the row. If [math]\displaystyle{ k\le2\sqrt{2m\ln n} }[/math], then clearly [math]\displaystyle{ |c_i| }[/math] is no greater than [math]\displaystyle{ 2\sqrt{2m\ln n} }[/math]. On the other hand if [math]\displaystyle{ k\gt 2\sqrt{2m\ln n} }[/math] then the [math]\displaystyle{ k }[/math] nonzero terms in the sum
- [math]\displaystyle{ c_i=\sum_{j=1}^m a_{ij}b_j }[/math]
are independent, each with probability 1/2 of being either +1 or -1.
Thus, for these [math]\displaystyle{ k }[/math] nonzero terms, each [math]\displaystyle{ b_i }[/math] is either positive or negative independently with equal probability. There are expectedly [math]\displaystyle{ \mu=\frac{k}{2} }[/math] positive [math]\displaystyle{ b_i }[/math]'s among these [math]\displaystyle{ k }[/math] terms, and [math]\displaystyle{ c_i\lt -2\sqrt{2m\ln n} }[/math] only occurs when there are less than [math]\displaystyle{ \frac{k}{2}-\sqrt{2m\ln n}=\left(1-\delta\right)\mu }[/math] positive [math]\displaystyle{ b_i }[/math]'s, where [math]\displaystyle{ \delta=\frac{2\sqrt{2m\ln n}}{k} }[/math]. Applying Chernoff bound, this event occurs with probability at most
- [math]\displaystyle{ \begin{align} \exp\left(-\frac{\mu\delta^2}{2}\right) &= \exp\left(-\frac{k}{2}\cdot\frac{8m\ln n}{2k^2}\right)\\ &= \exp\left(-\frac{2m\ln n}{k}\right)\\ &\le \exp\left(-\frac{2m\ln n}{m}\right)\\ &\le n^{-2}. \end{align} }[/math]
The same argument can be applied to negative [math]\displaystyle{ b_i }[/math]'s, so that the probability that [math]\displaystyle{ c_i\gt 2\sqrt{2m\ln n} }[/math] is at most [math]\displaystyle{ n^{-2} }[/math]. Therefore, by the union bound,
- [math]\displaystyle{ \Pr[|c_i|\gt 2\sqrt{2m\ln n}]\le\frac{2}{n^2} }[/math].
Apply the union bound to all [math]\displaystyle{ n }[/math] rows.
- [math]\displaystyle{ \Pr[\|Ab\|_\infty\gt 2\sqrt{2m\ln n}]\le n\cdot\Pr[|c_i|\gt 2\sqrt{2m\ln n}]\le\frac{2}{n} }[/math].
- [math]\displaystyle{ \square }[/math]
How good is this randomized algorithm? In fact when [math]\displaystyle{ m=n }[/math] there exists a matrix [math]\displaystyle{ A }[/math] such that [math]\displaystyle{ \|Ab\|_\infty=\Omega(\sqrt{n}) }[/math] for any choice of [math]\displaystyle{ b\in\{-1,+1\}^n }[/math].