# 随机算法 (Spring 2013)/Problem Set 3

## Problem 1

Given a binary string, define a run as a maximal sequence of contiguous 1s; for example, the following string

${\displaystyle \underbrace {111} _{3}00\underbrace {11} _{2}00\underbrace {111111} _{6}0\underbrace {1} _{1}0\underbrace {11} _{2}}$

contains 5 runs, of length 3, 2, 6, 1, and 2.

Let ${\displaystyle S}$ be a binary string of length ${\displaystyle n}$, generated uniformly at random. Let ${\displaystyle X_{k}}$ be the number of runs in ${\displaystyle S}$ of length ${\displaystyle k}$ or more.

• Compute the exact value of ${\displaystyle \mathbb {E} [X_{k}]}$ as a function of ${\displaystyle n}$ and ${\displaystyle k}$.
• Give the best concentration bound you can for ${\displaystyle |X_{k}-\mathbb {E} [X_{k}]|}$.

## Problem 2

A boolean code is a mapping ${\displaystyle C:\{0,1\}^{k}\rightarrow \{0,1\}^{n}}$. Each ${\displaystyle x\in \{0,1\}^{k}}$ is called a message and ${\displaystyle y=C(x)}$ is called a codeword. The code rate ${\displaystyle r}$ of a code ${\displaystyle C}$ is ${\displaystyle r={\frac {k}{n}}}$. A boolean code ${\displaystyle C:\{0,1\}^{k}\rightarrow \{0,1\}^{n}}$ is a linear code if it is a linear transformation, i.e. there is a matrix ${\displaystyle A\in \{0,1\}^{k\times n}}$ such that ${\displaystyle C(x)=Ax}$ for any ${\displaystyle x\in \{0,1\}^{k}}$, where the additions and multiplications are defined over the finite field of order two, ${\displaystyle (\{0,1\},+_{\bmod {2}},\times _{\bmod {2}})}$.

The distance between two codeword ${\displaystyle y_{1}}$ and ${\displaystyle y_{2}}$, denoted by ${\displaystyle d(y_{1},y_{2})}$, is defined as the Hamming distance between them. Formally, ${\displaystyle d(y_{1},y_{2})=\|y_{1}-y_{2}\|_{1}=\sum _{i=1}^{n}|y_{1}(i)-y_{2}(i)|}$. The distance of a code ${\displaystyle C}$ is the minimum distance between any two codewords. Formally, ${\displaystyle d=\min _{x_{1},x_{2}\in \{0,1\}^{k} \atop x_{1}\neq x_{2}}d(C(x_{1}),C(x_{2}))}$.

Usually we want to make both the code rate ${\displaystyle r}$ and the code distance ${\displaystyle d}$ as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.

• Use the probabilistic method to prove that there exists a boolean code ${\displaystyle C:\{0,1\}^{k}\rightarrow \{0,1\}^{n}}$ of code rate ${\displaystyle r}$ and distance ${\displaystyle \left({\frac {1}{2}}-\Theta \left({\sqrt {r}}\right)\right)n}$. Try to optimize the constant in ${\displaystyle \Theta (\cdot )}$.
• Prove a similar result for linear boolean codes.

## Problem 3

The maximum directed cut problem (MAX-DICUT).

We are given as input a directed graph ${\displaystyle G=(V,E)}$, with each directed edge ${\displaystyle (u,v)\in E}$ having a nonnegative weight ${\displaystyle w_{uv}\geq 0}$. The goal is to partition ${\displaystyle V}$ into two sets ${\displaystyle S\,}$ and ${\displaystyle {\bar {S}}=V\setminus S}$ so as to maximize the value of ${\displaystyle \sum _{(u,v)\in E \atop u\in S,v\not \in S}w_{uv}}$, that is, the total weight of the edges going from ${\displaystyle S\,}$ to ${\displaystyle {\bar {S}}}$.

• Give a randomized ${\displaystyle {\frac {1}{4}}}$-approximation algorithm based on random sampling.
• Prove that the following is an integer programming for the problem:
{\displaystyle {\begin{aligned}{\text{maximize}}&&\sum _{(i,j)\in E}w_{ij}z_{ij}\\{\text{subject to}}&&z_{ij}&\leq x_{i},&\forall (i,j)&\in E,\\&&z_{ij}&\leq 1-x_{j},&\forall (i,j)&\in E,\\&&x_{i}&\in \{0,1\},&\forall i&\in V,\\&&0\leq z_{ij}&\leq 1,&\forall (i,j)&\in E.\end{aligned}}}
• Consider a randomized rounding algorithm that solves an LP relaxation of the above integer programming and puts vertex ${\displaystyle i}$ in ${\displaystyle S}$ with probability ${\displaystyle f(x_{i}^{*})}$. We may assume that ${\displaystyle f(x)}$ is a linear function in the form ${\displaystyle f(x)=ax+b}$ with some constant ${\displaystyle a}$ and ${\displaystyle b}$ to be fixed. Try to find good ${\displaystyle a}$ and ${\displaystyle b}$ so that the randomized rounding algorithm has a good approximation ratio.

## Problem 4

The set cover problem is defined as follows:

• Let ${\displaystyle U=\{u_{1},u_{2},\ldots ,u_{n}\}}$ be a set of ${\displaystyle n}$ elements, and let ${\displaystyle {\mathcal {S}}=\{S_{1},S_{2},\ldots ,S_{m}\}}$ be a family of subsets of ${\displaystyle U}$. For each ${\displaystyle u_{i}\in U}$, let ${\displaystyle w_{i}}$ be a nonnegative weight of ${\displaystyle u_{i}}$. The goal is to find a subset ${\displaystyle V\subseteq U}$ with the minimum total weight ${\displaystyle \sum _{i\in V}w_{i}}$, that intersects with all ${\displaystyle S_{i}\in {\mathcal {S}}}$.

This problem is NP-hard.

(Remark: There are two equivalent definitions of the set cover problem. We take the hitting set version.)

Questions:

• Prove that the following is an integer programming for the problem:
{\displaystyle {\begin{aligned}{\text{minimize}}&&\sum _{(i,j)\in E}w_{i}x_{i}\\{\text{subject to}}&&\sum _{i:u_{i}\in S_{j}}x_{i}&\geq 1,&1\leq j\leq m,\\&&x_{i}&\in \{0,1\},&1\leq i\leq n.\end{aligned}}}
• Give a randomized rounding algorithm which returns an ${\displaystyle O(\log m)}$-approximate solution with probability at least ${\displaystyle {\frac {1}{2}}}$. (Hint: you may repeat the randomized rounding process if there remains some uncovered subsets after one time of applying the randomized rounding.)