imported>Etone |
imported>Addbot |
Line 1: |
Line 1: |
| =Set Balancing=
| | In [[number theory]] a '''Carmichael number''' is a [[composite number|composite]] positive [[integer]] <math>n</math>, which satisfies the [[congruence]] <math>b^{n-1}\equiv 1\pmod{n}</math> for all integers <math>b</math> which are [[coprime|relatively prime]] to <math>n</math>. Being relatively prime means that they do not have common divisors, other than 1. Such numbers are named after [[Robert Daniel Carmichael|Robert Carmichael]]. |
| Supposed that we have an <math>n\times m</math> matrix <math>A</math> with 0-1 entries. We are looking for a <math>b\in\{-1,+1\}^m</math> that minimizes <math>\|Ab\|_\infty</math>.
| |
|
| |
|
| Recall that <math>\|\cdot\|_\infty</math> is the infinity norm (also called <math>L_\infty</math> norm) of a vector, and for the vector <math>c=Ab</math>,
| | All [[prime number]]s <math>p</math> satisfy <math>b^{p-1}\equiv 1\pmod{p}</math> for all integers <math>b</math> which are relatively prime to <math>p</math>. This has been proven by the famous mathematician [[Pierre de Fermat]]. In most cases if a number <math>n</math> is composite, it does '''not''' satisfy this congruence equation. So, there exist not so many Carmichael numbers. We can say that Carmichael numbers are composite numbers that behave a little bit like they would be a prime number. |
| :<math>\|Ab\|_\infty=\max_{i=1,2,\ldots,n}|c_i|</math>.
| | {{Math-stub}} |
| | | [[Category:Number theory]] |
| We can also describe this problem as an optimization:
| |
| :<math>\begin{align}
| |
| \mbox{minimize } | |
| &\quad
| |
| \|Ab\|_\infty\\
| |
| \mbox{subject to: }
| |
| &\quad
| |
| b\in\{-1,+1\}^m.
| |
| \end{align}</math>
| |
| | |
| This problem is called set balancing for a reason.
| |
| | |
| {|border="1"
| |
| |The problem arises in designing statistical experiments. Suppose that we have <math>m</math> '''subjects''', each of which may have up to <math>n</math> '''features'''. This gives us an <math>n\times m</math> matrix <math>A</math>:
| |
| :<math>
| |
| \begin{array}{c}
| |
| \mbox{feature 1:}\\
| |
| \mbox{feature 2:}\\
| |
| \vdots\\
| |
| \mbox{feature n:}\\
| |
| \end{array}
| |
| \left[
| |
| \begin{array}{cccc}
| |
| a_{11} & a_{12} & \cdots & a_{1m}\\
| |
| a_{21} & a_{22} & \cdots & a_{2m}\\
| |
| \vdots & \vdots & \ddots & \vdots\\
| |
| a_{n1} & a_{n2} & \cdots & a_{nm}\\
| |
| \end{array}
| |
| \right],
| |
| </math>
| |
| where each column represents a subject and each row represent a feature. An entry <math>a_{ij}\in\{0,1\}</math> indicates whether subject <math>j</math> has feature <math>i</math>.
| |
| | |
| By multiplying a vector <math>b\in\{-1,+1\}^m</math>
| |
| :<math>
| |
| \left[
| |
| \begin{array}{cccc}
| |
| a_{11} & a_{12} & \cdots & a_{1m}\\
| |
| a_{21} & a_{22} & \cdots & a_{2m}\\
| |
| \vdots & \vdots & \ddots & \vdots\\
| |
| a_{n1} & a_{n2} & \cdots & a_{nm}\\
| |
| \end{array}
| |
| \right]
| |
| \left[
| |
| \begin{array}{c}
| |
| b_{1}\\
| |
| b_{2}\\
| |
| \vdots\\
| |
| b_{m}\\
| |
| \end{array}
| |
| \right]
| |
| =
| |
| \left[
| |
| \begin{array}{c}
| |
| c_{1}\\
| |
| c_{2}\\
| |
| \vdots\\
| |
| c_{n}\\
| |
| \end{array}
| |
| \right],
| |
| </math>
| |
| the subjects are partitioned into two disjoint groups: one for -1 and other other for +1. Each <math>c_i</math> gives the difference between the numbers of subjects with feature <math>i</math> in the two groups. By minimizing <math>\|Ab\|_\infty=\|c\|_\infty</math>, we ask for an optimal partition so that each feature is roughly as balanced as possible between the two groups.
| |
| | |
| In a scientific experiment, one of the group serves as a [http://en.wikipedia.org/wiki/Scientific_control control group] (对照组). Ideally, we want the two groups are statistically identical, which is usually impossible to achieve in practice. The requirement of minimizing <math>\|Ab\|_\infty</math> actually means the statistical difference between the two groups are minimized. | |
| |}
| |
| | |
| | |
| We propose an extremely simple "randomized algorithm" for computing a <math>b\in\{-1,+1\}^m</math>: for each <math>i=1,2,\ldots, m</math>, let <math>b_i</math> be independently chosen from <math>\{-1,+1\}</math>, such that
| |
| :<math>b_i=
| |
| \begin{cases}
| |
| -1 & \mbox{with probability }\frac{1}{2}\\
| |
| +1 &\mbox{with probability }\frac{1}{2}
| |
| \end{cases}.
| |
| </math>
| |
| | |
| This procedure can hardly be called as an "algorithm", because its decision is made disregard of the input <math>A</math>. We then show that despite of this obliviousness, the algorithm chooses a good enough <math>b</math>, such that for any <math>A</math>, <math>\|Ab\|_\infty=O(\sqrt{m\ln n})</math> with high probability.
| |
| {{Theorem
| |
| |Theorem|
| |
| :Let <math>A</math> be an <math>n\times m</math> matrix with 0-1 entries. For a random vector <math>b</math> with <math>m</math> entries chosen independently and with equal probability from <math>\{-1,+1\}</math>,
| |
| ::<math>\Pr[\|Ab\|_\infty>2\sqrt{2m\ln n}]\le\frac{2}{n}</math>.
| |
| }}
| |
| {{Proof|
| |
| Consider particularly the <math>i</math>-th row of <math>A</math>. The entry of <math>Ab</math> contributed by row <math>i</math> is <math>c_i=\sum_{j=1}^m a_{ij}b_j</math>.
| |
| | |
| Let <math>k</math> be the non-zero entries in the row. If <math>k\le2\sqrt{2m\ln n}</math>, then clearly <math>|c_i|</math> is no greater than <math>2\sqrt{2m\ln n}</math>. On the other hand if <math>k>2\sqrt{2m\ln n}</math> then the <math>k</math> nonzero terms in the sum
| |
| :<math>c_i=\sum_{j=1}^m a_{ij}b_j</math>
| |
| are independent, each with probability 1/2 of being either +1 or -1.
| |
| | |
| Thus, for these <math>k</math> nonzero terms, each <math>b_i</math> is either positive or negative independently with equal probability. There are expectedly <math>\mu=\frac{k}{2}</math> positive <math>b_i</math>'s among these <math>k</math> terms, and <math>c_i<-2\sqrt{2m\ln n}</math> only occurs when there are less than <math>\frac{k}{2}-\sqrt{2m\ln n}=\left(1-\delta\right)\mu</math> positive <math>b_i</math>'s, where <math>\delta=\frac{2\sqrt{2m\ln n}}{k}</math>. Applying Chernoff bound, this event occurs with probability at most
| |
| :<math>\begin{align}
| |
| \exp\left(-\frac{\mu\delta^2}{2}\right)
| |
| &=
| |
| \exp\left(-\frac{k}{2}\cdot\frac{8m\ln n}{2k^2}\right)\\
| |
| &=
| |
| \exp\left(-\frac{2m\ln n}{k}\right)\\
| |
| &\le
| |
| \exp\left(-\frac{2m\ln n}{m}\right)\\
| |
| &\le n^{-2}.
| |
| \end{align}
| |
| </math>
| |
| | |
| The same argument can be applied to negative <math>b_i</math>'s, so that the probability that <math>c_i>2\sqrt{2m\ln n}</math> is at most <math>n^{-2}</math>. Therefore, by the union bound,
| |
| :<math>\Pr[|c_i|> 2\sqrt{2m\ln n}]\le\frac{2}{n^2}</math>.
| |
| Apply the union bound to all <math>n</math> rows.
| |
| :<math>\Pr[\|Ab\|_\infty>2\sqrt{2m\ln n}]\le n\cdot\Pr[|c_i|> 2\sqrt{2m\ln n}]\le\frac{2}{n}</math>. | |
| }}
| |
| | |
| | |
| How good is this randomized algorithm? In fact when <math>m=n</math> there exists a matrix <math>A</math> such that <math>\|Ab\|_\infty=\Omega(\sqrt{n})</math> for any choice of <math>b\in\{-1,+1\}^n</math>.
| |