Randomized Algorithms (Spring 2010)/More on Chernoff bounds
Set balancing
Supposed that we have an
Recall that
.
We can also describe this problem as an optimization:
This problem is called set balancing for a reason.
The problem arises in designing statistical experiments. Suppose that we have where each column represents a subject and each row represent a feature. An entry By multiplying a vector the subjects are partitioned into two disjoint groups: one for -1 and other other for +1. Each In a scientific experiment, one of the group serves as a control group (对照组). Ideally, we want the two groups are statistically identical, which is usually impossible to achieve in practice. The requirement of minimizing |
We propose an extremely simple "randomized algorithm" for computing a
This procedure can hardly be called as an "algorithm", because its decision is made disregard of the input
Theorem
|
Proof:
Consider particularly the
Let
are independent, each with probability 1/2 of being either +1 or -1.
Thus, for these
The same argument can be applied to negative
.
Apply the union bound to all
.
How good is this randomized algorithm? In fact when
Permutation Routing
The problem raises from parallel computing. Consider that we have
- Every processor is sending a packet to a unique destination. Therefore for
the set of processors, the destinations are given by a permutation of , such that for every processor , the processor is sending a packet to processor . - The communication is synchronized, such that for each round, every link (an edge of the graph) can forward at most one packet.
With a complete graph as the network. For any permutation
Routing on a hypercube
A hypercube (sometimes called Boolean cube, Hamming cube, or just cube) is defined over
A
Bit-Fixing Routing Algorithm: |
For each packet:
|
- Oblivious routing algorithms
- This algorithm is blessed with a desirable property: at each routing step, the choice of link depends only on the the current node and the destination. We call the algorithms with this property oblivious routing algorithms. (Actually, the standard definition of obliviousness allows the choice also depends on the origin. The bit-fixing algorithm is even more oblivious than this standard definition.) Compared to the routing algorithms which are adaptive to the path that the packet traversed, oblivious routing is more simple thus can be implemented by smaller routing table (or simple devices called switches).
- Queuing policies
- When routing
packets in parallel, it is possible that more than one packets want to use the same edge at the same time. We assume that a queue is associated to each edge, such that the packets to be delivered through an edge are put into the queue associated with the edge. With some queuing policy (e.g. FIFO, or furthest to do), the queued packets are delivered through the edge by at most one packet per each round.
For the bit-fixing routing algorithm defined above, regardless of the queuing policy, there always exists a bad permutation
This is pretty bad, because we expect that the routing time is comparable to the diameter of the network, which is only
The lower bound actually applies generally for any deterministic oblivious routing algorithms:
Theorem [Kaklamanis, Krizanc, Tsantilas, 1991]
|
The proof of the lower bound is rather technical and complicated. However, the intuition is quite clear: for any oblivious rule for routing, there always exists a permutation which causes a very high congestion, such that many packets have to be delivered through the same edge, thus no matter what queuing policy is used, the maximum delay must be very high.
Average-case Analysis for Independent Destinations
We analyze the average-case performance of the bit-fixing routing algorithm. We relax the problem to non-permutation destinations. That is, instead of restricting that every processor has a distinct destination, we now allow each processor choose an arbitrary destination in
For the average case, for each node
For each node
Reduce the delay of a route to the number of packets that pass through the route
We consider the delay incurred by each node, which is the total time that its packet is waiting in the queue. The total running time of the algorithm is bounded by the maximum delay plus
We assume that the queueing policy satisfies a very natural requirement:
- Assumption of queuing policy
- If a queue is not empty at the beginning of a time step, some packet is sent along the edge associated with that queue during that time step.
Lemma 2.1:
|
Proof: See Lemma 4.5 in the textbook [MR].
Represent the delay as the sum of independent trials
Let the random variable
Fix a node
We will then bound
Estimate the expectation of the sum
For any edge
where we abuse the notation
Therefore,
For every node
It is obvious that we can count the sum of lengths of a set of routes by accumulating their passes through edges, that is
Therefore,
where the sum
An important observation is that the distribution of
The length of
Apply the Chernoff bound
We apply the following form of the Chernoff bound:
Chernoff bound
|
It holds that
.
Note that
The running time is the maximum delay plus the length of a route, thus is
A two-phase randomized routing algorithm
The above analysis of the performance of bit-fixing for independent random destinations hints us that we can first route the packets to random "relay"s to avoid the high congestion. This was first discovered by Leslie Valiant who uses the idea to give a simple and elegant randomized routing algorithm for permutation routing.
The algorithm works in two phases.
Two-Phase Routing Algorithm: |
For each packet:
Phase I: Route the packet to a random destination using the bit-fixing algorithm. Phase II: Route the packet from the random location to its final destination using the bit-fixing algorithm. |
It looks counter-intuitive that first routing the packets to irrelevant intermediate nodes actually improves the overall performance.
To simplify the analysis, we assume that no packet is sent in Phase II before all packets have finished Phase I.
Phase I is exactly the bit-fixing routing for uniformly and independently random destinations, which as we analyzed in the last section, has a running time within
The Phase II is a "backward" running of Phase I. All the analysis of Phase I can be directly applied to Phase II. Thus, the running time of Phase II is
Low-Distortion Embeddings
Consider a problem as follows: We have a set of
Formally, we are looking for a map
This problem has various important applications in both theory and practice. In many tasks, the data points are drawn from a high dimensional space, however, computations on high-dimensional data are usually hard due to the infamous "curse of dimensionality". The computational tasks can be greatly eased if we can project the data points onto a space of low dimension while the pairwise relations between the points are approximately preserved.
Johnson-Lindenstrauss Theorem
The Johnson-Lindenstrauss Theorem states that it is possible to project
Johnson-Lindenstrauss Theorem
For any Then for any set
Furthermore, this map can be found in expected polynomial time. |
The random projections
The map
The projection (due to Johnson-Lindenstrauss):
|
The projected point
The purpose of multiplying the scalar
Besides the uniform random subspace, there are other choices of random projections known to have good performances, including:
- A matrix whose entries follow i.i.d. normal distributions. (Due to Indyk-Motwani)
- A matrix whose entries are i.i.d.
. (Due to Achlioptas)
In both cases, the matrix is also multiplied by a fixed scalar for normalization.
A proof of the Theorem
We present a proof due to Dasgupta-Gupta, which is much simpler than the original proof of Johnson-Lindenstrauss. The proof is for the projection onto uniform random subspace. The idea of the proof is outlined as follows:
- To bound the distortions to pairwise distances, it is sufficient to bound the distortions to the length of unit vectors.
- A uniform random subspace of a fixed unit vector is identically distributed as a fixed subspace of a uniform random unit vector. We can fix the subspace as the first k coordinates of the vector, thus it is sufficient to bound the length (norm) of the first k coordinates of a uniform random unit vector.
- Prove that for a uniform random unit vector, the length of its first k coordinates is concentrated to the expectation.
From pairwise distances to unit vectors
Let
Think of
so
We can further simplify the problem by normalizing the
is equivalent to that
Thus, we only need to bound the distortions for the unit vectors, i.e. the vectors
Lemma 3.1
|
As we argued above, this lemma implies the Johnson-Lindenstrauss Theorem.
Random projection of fixed unit vector fixed projection of random unit vector
Let
Let
In other words,
A key observation is that:
Observation
|
The proof of this observation is omitted here.
With this observation, it is sufficient to work on the subspace of the first
Lemma 3.2
|
Due to the above observation, Lemma 3.2 implies Lemma 3.1 and thus proves the Johnson-Lindenstrauss theorem.
Note that
.
Since
.
Lemma 3.2 actually states that
Concentration of
We now prove Lemma 3.2. Specifically, we will prove the
.
The
Due to the discussion in the last section, this can be interpreted as a concentration bound for
The following is a very useful fact regarding the generation of uniform unit vectors.
Generating uniform unit vector:
|
Then for
.
To avoid writing a lot of
The probability is a tail probability of the sum of
- The
's are independent. - Because
's are normal, it is known that the moment generating functions for 's can be computed as follows:
Fact 3.3: - If
follows the normal distribution , then , for
- If
Therefore, we can re-apply the technique of the Chernoff bound (applying Markov's inequality to the moment generating function and optimizing the parameter
The last term
so that
which is is
.
So we have proved that
.
With the same argument, the other direction can be proved so that
,
which is also
Lemma 3.2 is proved. As we discussed in the previous sections, Lemma 3.2 implies Lemma 3.1, which implies the Johnson-Lindenstrauss theorem.