随机算法 (Fall 2011)/Problem set 3: Difference between revisions
imported>Etone |
imported>Etone |
||
Line 29: | Line 29: | ||
Questions: | Questions: | ||
# Give an integer programming for the problem and its linear programming relaxation. | # Give an integer programming for the problem and its linear programming relaxation. | ||
# Consider the following | # Consider the following randomized rounding algorithm: independently round each fractional value to <math>\{0,1\}</math>; and repeatedly apply this process to the variables rounded to 0 until every subset <math>S_i\in\mathcal{S}</math> is "hited" once. | ||
Use this idea to develop an randomized approximation algorithm and prove that your algorithm returns an <math>O(\log m)</math>-approximate solution with probability at least <math>\frac{1}{2}</math>. | Use this idea to develop an randomized approximation algorithm and prove that your algorithm returns an <math>O(\log m)</math>-approximate solution with probability at least <math>\frac{1}{2}</math>. |
Revision as of 09:17, 6 November 2011
Problem 1
A boolean code is a mapping [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math]. Each [math]\displaystyle{ x\in\{0,1\}^k }[/math] is called a message and [math]\displaystyle{ y=C(x) }[/math] is called a codeword. The code rate [math]\displaystyle{ r }[/math] of a code [math]\displaystyle{ C }[/math] is [math]\displaystyle{ r=\frac{k}{n} }[/math]. A boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] is a linear code if it is a linear transformation, i.e. there is a matrix [math]\displaystyle{ A\in\{0,1\}^{k\times n} }[/math] such that [math]\displaystyle{ C(x)=Ax }[/math] for any [math]\displaystyle{ x\in\{0,1\}^k }[/math], where the additions and multiplications are defined over the finite field of order two, [math]\displaystyle{ (\{0,1\},+_{\bmod 2},\times_{\bmod 2}) }[/math].
The distance between two codeword [math]\displaystyle{ y_1 }[/math] and [math]\displaystyle{ y_2 }[/math], denoted by [math]\displaystyle{ d(y_1,y_2) }[/math], is defined as the Hamming distance between them. Formally, [math]\displaystyle{ d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^k|y_1(i)-y_2(i)| }[/math]. The distance of a code [math]\displaystyle{ C }[/math] is the minimum distance between any two codewords. Formally, [math]\displaystyle{ d=\min_{x_1,x_2\in \{0,1\}^k}d(C(x_1),C(x_2)) }[/math].
Usually we want to make both the code rate [math]\displaystyle{ r }[/math] and the code distance [math]\displaystyle{ d }[/math] as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.
- Prove that there exists a bolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] of code rate [math]\displaystyle{ r }[/math] and distance [math]\displaystyle{ \left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n }[/math]. Try to optimize the constant in [math]\displaystyle{ \Theta(\cdot) }[/math].
- Prove a similar result for linear boolean codes.
Problem 2
Given a binary string, define a run as a maximal sequence of contiguous 1s; for example, the following string
- [math]\displaystyle{ \underbrace{111}_{3}00\underbrace{11}_{2}00\underbrace{111111}_{5}0\underbrace{1}_{1}0\underbrace{11}_{2} }[/math]
contains 5 runs, of length 3, 2, 6, 1, and 2.
Let [math]\displaystyle{ S }[/math] be a binary string of length [math]\displaystyle{ n }[/math], generated uniformly at random. Let [math]\displaystyle{ X_k }[/math] be the number of runs in [math]\displaystyle{ S }[/math] of length [math]\displaystyle{ k }[/math] or more.
- Compute the exact value of [math]\displaystyle{ \mathbb{E}[X_k] }[/math] as a function of [math]\displaystyle{ n }[/math] and [math]\displaystyle{ k }[/math].
- Give the best concentration bound you can for [math]\displaystyle{ |X_k -\mathbb{E}[X_k]| }[/math].
Problem 3
The set cover problem is defined as follows:
- Let [math]\displaystyle{ U=\{u_1,u_2,\ldots,u_n\} }[/math] be a set of [math]\displaystyle{ n }[/math] elements, and let [math]\displaystyle{ \mathcal{S}=\{S_1,S_2,\ldots,S_m\} }[/math] be a family of subsets of [math]\displaystyle{ U }[/math]. For each [math]\displaystyle{ u_i\in U }[/math], let [math]\displaystyle{ w_i }[/math] be a nonnegative weight of [math]\displaystyle{ u_i }[/math]. The goal is to find a subset [math]\displaystyle{ V\subseteq U }[/math] with the minimum total weight [math]\displaystyle{ \sum_{i\in V}w_i }[/math], that intersects with all [math]\displaystyle{ S_i\in\mathcal{S} }[/math].
This problem is NP-hard.
(Remark: There are two equivalent definitions of the set cover problem. We take the hitting set version.)
Questions:
- Give an integer programming for the problem and its linear programming relaxation.
- Consider the following randomized rounding algorithm: independently round each fractional value to [math]\displaystyle{ \{0,1\} }[/math]; and repeatedly apply this process to the variables rounded to 0 until every subset [math]\displaystyle{ S_i\in\mathcal{S} }[/math] is "hited" once.
Use this idea to develop an randomized approximation algorithm and prove that your algorithm returns an [math]\displaystyle{ O(\log m) }[/math]-approximate solution with probability at least [math]\displaystyle{ \frac{1}{2} }[/math].
Problem 4
- The maximum directed cut problem (MAX-DICUT).
We are given as input a directed graph [math]\displaystyle{ G=(V,E) }[/math], with each directed edge [math]\displaystyle{ (u,v)\in E }[/math] having a nonnegative weight [math]\displaystyle{ w_{uv}\ge 0 }[/math]. The goal is to partition [math]\displaystyle{ V }[/math] into two sets [math]\displaystyle{ S\, }[/math] and [math]\displaystyle{ \bar{S}=V\setminus S }[/math] so as to maximize the value of [math]\displaystyle{ \sum_{(u,v)\in E\atop u\in S,v\not\in S}w_{uv} }[/math], that is, the total weight of the edges going from [math]\displaystyle{ S\, }[/math] to [math]\displaystyle{ \bar{S} }[/math].
- Give a randomized [math]\displaystyle{ \frac{1}{4} }[/math]-approximation algorithm based on random sampling.
- Prove that the following is an integer programming for the problem:
- [math]\displaystyle{ \begin{align} \text{maximize} && \sum_{(i,j)\in E}w_{ij}z_{ij}\\ \text{subject to} && z_{ij} &\le x_i, & \forall (i,j)&\in E,\\ && z_{ij} &\le 1-x_j, & \forall (i,j)&\in E,\\ && x_i &\in\{0,1\}, & \forall i&\in V,\\ && 0 \le z_{ij}&\le 1, & \forall (i,j)&\in E. \end{align} }[/math]
- Consider a randomized rounding algorithm that solves an LP relaxation of the above integer programming and puts vertex [math]\displaystyle{ i }[/math] in [math]\displaystyle{ S }[/math] with probability [math]\displaystyle{ f(x_i^*) }[/math]. We may assume that [math]\displaystyle{ f(x) }[/math] is a linear function in the form [math]\displaystyle{ f(x)=ax+b }[/math] with some constant [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] to be fixed. Try to find good [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] so that the randomized rounding algorithm has a good approximation ratio.