# 随机算法 (Spring 2014)/Problem Set 2

## Problem 1

(Due to D.R. Karger and R. Motwani.)

1. Let ${\displaystyle S,T}$ be two disjoint subsets of a universe ${\displaystyle U}$ such that ${\displaystyle |S|=|T|=n}$. Suppose we select a random set ${\displaystyle R\subseteq U}$ by independently sampling each element of ${\displaystyle U}$ with probability ${\displaystyle p}$. We say that the random sample ${\displaystyle R}$ is good if the following two conditions hold: ${\displaystyle R\cap S=\emptyset }$ and ${\displaystyle R\cap T\neq \emptyset }$. Show that for ${\displaystyle p=1/n}$, the probability that ${\displaystyle R}$ is good is larger than some positive constant.
2. Suppose now that the random set ${\displaystyle R}$ is chosen by sampling the elements of ${\displaystyle U}$ with only pairwise independence. Show that for a suitable choice of the value of ${\displaystyle p}$, the probability that ${\displaystyle R}$ is good is larger than some positive constant.

## Problem 2

1. Generalize the LazySelect algorithm for the ${\displaystyle k}$-selection problem: Given as input an array of ${\displaystyle n}$ distinct numbers and an integer ${\displaystyle k}$, find the ${\displaystyle k}$th smallest number in the array.
2. Use the Chernoff bounds instead of Chebyshev's inequality in the analysis of the LazySelect Algorithm and try to use as few random samples as possible.

## Problem 3

A boolean code is a mapping ${\displaystyle C:\{0,1\}^{k}\rightarrow \{0,1\}^{n}}$. Each ${\displaystyle x\in \{0,1\}^{k}}$ is called a message and ${\displaystyle y=C(x)}$ is called a codeword. The code rate ${\displaystyle r}$ of a code ${\displaystyle C}$ is ${\displaystyle r={\frac {k}{n}}}$. A boolean code ${\displaystyle C:\{0,1\}^{k}\rightarrow \{0,1\}^{n}}$ is a linear code if it is a linear transformation, i.e. there is a matrix ${\displaystyle A\in \{0,1\}^{n\times k}}$ such that ${\displaystyle C(x)=Ax}$ for any ${\displaystyle x\in \{0,1\}^{k}}$, where the additions and multiplications are defined over the finite field of order two, ${\displaystyle (\{0,1\},+_{\bmod {2}},\times _{\bmod {2}})}$.

The distance between two codeword ${\displaystyle y_{1}}$ and ${\displaystyle y_{2}}$, denoted by ${\displaystyle d(y_{1},y_{2})}$, is defined as the Hamming distance between them. Formally, ${\displaystyle d(y_{1},y_{2})=\|y_{1}-y_{2}\|_{1}=\sum _{i=1}^{n}|y_{1}(i)-y_{2}(i)|}$. The distance of a code ${\displaystyle C}$ is the minimum distance between any two codewords. Formally, ${\displaystyle d=\min _{x_{1},x_{2}\in \{0,1\}^{k} \atop x_{1}\neq x_{2}}d(C(x_{1}),C(x_{2}))}$.

Usually we want to make both the code rate ${\displaystyle r}$ and the code distance ${\displaystyle d}$ as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.

• Use the probabilistic method to prove that there exists a boolean code ${\displaystyle C:\{0,1\}^{k}\rightarrow \{0,1\}^{n}}$ of code rate ${\displaystyle r}$ and distance ${\displaystyle \left({\frac {1}{2}}-\Theta \left({\sqrt {r}}\right)\right)n}$. Try to optimize the constant in ${\displaystyle \Theta (\cdot )}$.
• Prove a similar result for linear boolean codes.

## Problem 4

Let ${\displaystyle X_{1},X_{2},\dots ,X_{n}}$ be independent geometrically distributed random variables each having expectation 2 (each of the ${\displaystyle X_{i}}$ is an independent experiment counting the number of tosses of an unbiased coin up to and including the first HEADS). Let ${\displaystyle X=\sum _{i=1}^{n}X_{i}}$ and ${\displaystyle \delta }$ be a positive real constant. Derive the best upper bound you can on ${\displaystyle \Pr[X>(1+\delta )(2n)]}$.