随机算法 (Spring 2013)/Problem Set 3

From TCS Wiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Problem 1

Given a binary string, define a run as a maximal sequence of contiguous 1s; for example, the following string

[math]\displaystyle{ \underbrace{111}_{3}00\underbrace{11}_{2}00\underbrace{111111}_{6}0\underbrace{1}_{1}0\underbrace{11}_{2} }[/math]

contains 5 runs, of length 3, 2, 6, 1, and 2.

Let [math]\displaystyle{ S }[/math] be a binary string of length [math]\displaystyle{ n }[/math], generated uniformly at random. Let [math]\displaystyle{ X_k }[/math] be the number of runs in [math]\displaystyle{ S }[/math] of length [math]\displaystyle{ k }[/math] or more.

  • Compute the exact value of [math]\displaystyle{ \mathbb{E}[X_k] }[/math] as a function of [math]\displaystyle{ n }[/math] and [math]\displaystyle{ k }[/math].
  • Give the best concentration bound you can for [math]\displaystyle{ |X_k -\mathbb{E}[X_k]| }[/math].

Problem 2

A boolean code is a mapping [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math]. Each [math]\displaystyle{ x\in\{0,1\}^k }[/math] is called a message and [math]\displaystyle{ y=C(x) }[/math] is called a codeword. The code rate [math]\displaystyle{ r }[/math] of a code [math]\displaystyle{ C }[/math] is [math]\displaystyle{ r=\frac{k}{n} }[/math]. A boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] is a linear code if it is a linear transformation, i.e. there is a matrix [math]\displaystyle{ A\in\{0,1\}^{k\times n} }[/math] such that [math]\displaystyle{ C(x)=Ax }[/math] for any [math]\displaystyle{ x\in\{0,1\}^k }[/math], where the additions and multiplications are defined over the finite field of order two, [math]\displaystyle{ (\{0,1\},+_{\bmod 2},\times_{\bmod 2}) }[/math].

The distance between two codeword [math]\displaystyle{ y_1 }[/math] and [math]\displaystyle{ y_2 }[/math], denoted by [math]\displaystyle{ d(y_1,y_2) }[/math], is defined as the Hamming distance between them. Formally, [math]\displaystyle{ d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)| }[/math]. The distance of a code [math]\displaystyle{ C }[/math] is the minimum distance between any two codewords. Formally, [math]\displaystyle{ d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2)) }[/math].

Usually we want to make both the code rate [math]\displaystyle{ r }[/math] and the code distance [math]\displaystyle{ d }[/math] as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.

  • Use the probabilistic method to prove that there exists a boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] of code rate [math]\displaystyle{ r }[/math] and distance [math]\displaystyle{ \left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n }[/math]. Try to optimize the constant in [math]\displaystyle{ \Theta(\cdot) }[/math].
  • Prove a similar result for linear boolean codes.

Problem 3

The maximum directed cut problem (MAX-DICUT).

We are given as input a directed graph [math]\displaystyle{ G=(V,E) }[/math], with each directed edge [math]\displaystyle{ (u,v)\in E }[/math] having a nonnegative weight [math]\displaystyle{ w_{uv}\ge 0 }[/math]. The goal is to partition [math]\displaystyle{ V }[/math] into two sets [math]\displaystyle{ S\, }[/math] and [math]\displaystyle{ \bar{S}=V\setminus S }[/math] so as to maximize the value of [math]\displaystyle{ \sum_{(u,v)\in E\atop u\in S,v\not\in S}w_{uv} }[/math], that is, the total weight of the edges going from [math]\displaystyle{ S\, }[/math] to [math]\displaystyle{ \bar{S} }[/math].

  • Give a randomized [math]\displaystyle{ \frac{1}{4} }[/math]-approximation algorithm based on random sampling.
  • Prove that the following is an integer programming for the problem:
[math]\displaystyle{ \begin{align} \text{maximize} && \sum_{(i,j)\in E}w_{ij}z_{ij}\\ \text{subject to} && z_{ij} &\le x_i, & \forall (i,j)&\in E,\\ && z_{ij} &\le 1-x_j, & \forall (i,j)&\in E,\\ && x_i &\in\{0,1\}, & \forall i&\in V,\\ && 0 \le z_{ij}&\le 1, & \forall (i,j)&\in E. \end{align} }[/math]
  • Consider a randomized rounding algorithm that solves an LP relaxation of the above integer programming and puts vertex [math]\displaystyle{ i }[/math] in [math]\displaystyle{ S }[/math] with probability [math]\displaystyle{ f(x_i^*) }[/math]. We may assume that [math]\displaystyle{ f(x) }[/math] is a linear function in the form [math]\displaystyle{ f(x)=ax+b }[/math] with some constant [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] to be fixed. Try to find good [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] so that the randomized rounding algorithm has a good approximation ratio.

Problem 4

The set cover problem is defined as follows:

  • Let [math]\displaystyle{ U=\{u_1,u_2,\ldots,u_n\} }[/math] be a set of [math]\displaystyle{ n }[/math] elements, and let [math]\displaystyle{ \mathcal{S}=\{S_1,S_2,\ldots,S_m\} }[/math] be a family of subsets of [math]\displaystyle{ U }[/math]. For each [math]\displaystyle{ u_i\in U }[/math], let [math]\displaystyle{ w_i }[/math] be a nonnegative weight of [math]\displaystyle{ u_i }[/math]. The goal is to find a subset [math]\displaystyle{ V\subseteq U }[/math] with the minimum total weight [math]\displaystyle{ \sum_{i\in V}w_i }[/math], that intersects with all [math]\displaystyle{ S_i\in\mathcal{S} }[/math].

This problem is NP-hard.

(Remark: There are two equivalent definitions of the set cover problem. We take the hitting set version.)

Questions:

  • Prove that the following is an integer programming for the problem:
[math]\displaystyle{ \begin{align} \text{minimize} && \sum_{(i,j)\in E}w_{i}x_{i}\\ \text{subject to} && \sum_{i:u_i\in S_j}x_i &\ge 1, &1\le j\le m,\\ && x_i &\in\{0,1\}, & 1\le i\le n. \end{align} }[/math]
  • Give a randomized rounding algorithm which returns an [math]\displaystyle{ O(\log m) }[/math]-approximate solution with probability at least [math]\displaystyle{ \frac{1}{2} }[/math]. (Hint: you may repeat the randomized rounding process if there remains some uncovered subsets after one time of applying the randomized rounding.)