概率论 (Summer 2013)/Problem Set 4

From TCS Wiki
Jump to navigation Jump to search

Problem 1

Let [math]\displaystyle{ X_1,X_2,\dots,X_n }[/math] be independent geometrically distributed random variables each having expectation 2 (each of the [math]\displaystyle{ X_i }[/math] is an independent experiment counting the number of tosses of an unbiased coin up to and including the first HEADS). Let [math]\displaystyle{ X=\sum_{i=1}^nX_i }[/math] and [math]\displaystyle{ \delta }[/math] be a positive real constant. Derive the best upper bound you can on [math]\displaystyle{ \Pr[X\gt (1+\delta)(2n)] }[/math].

Problem 2

A boolean code is a mapping [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math]. Each [math]\displaystyle{ x\in\{0,1\}^k }[/math] is called a message and [math]\displaystyle{ y=C(x) }[/math] is called a codeword. The code rate [math]\displaystyle{ r }[/math] of a code [math]\displaystyle{ C }[/math] is [math]\displaystyle{ r=\frac{k}{n} }[/math]. A boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] is a linear code if it is a linear transformation, i.e. there is a matrix [math]\displaystyle{ A\in\{0,1\}^{n\times k} }[/math] such that [math]\displaystyle{ C(x)=Ax }[/math] for any [math]\displaystyle{ x\in\{0,1\}^k }[/math], where the additions and multiplications are defined over the finite field of order two, [math]\displaystyle{ (\{0,1\},+_{\bmod 2},\times_{\bmod 2}) }[/math].

The distance between two codeword [math]\displaystyle{ y_1 }[/math] and [math]\displaystyle{ y_2 }[/math], denoted by [math]\displaystyle{ d(y_1,y_2) }[/math], is defined as the Hamming distance between them. Formally, [math]\displaystyle{ d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)| }[/math]. The distance of a code [math]\displaystyle{ C }[/math] is the minimum distance between any two codewords. Formally, [math]\displaystyle{ d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2)) }[/math].

Usually we want to make both the code rate [math]\displaystyle{ r }[/math] and the code distance [math]\displaystyle{ d }[/math] as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.

  • Use the probabilistic method to prove that there exists a boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] of code rate [math]\displaystyle{ r }[/math] and distance [math]\displaystyle{ \left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n }[/math]. Try to optimize the constant in [math]\displaystyle{ \Theta(\cdot) }[/math].
  • Prove a similar result for linear boolean codes.

Problem 3

  1. We have [math]\displaystyle{ n }[/math] servers to run a system. Each machine fails independently with probability [math]\displaystyle{ p\le 5% }[/math]. The whole system fails if over [math]\displaystyle{ 8% }[/math] of the machine fail. Show that the probability is exponentially small as the size of the system grows.
  2. We want to improve the robustness of the system using the same [math]\displaystyle{ n }[/math] machines. One approach is to utilize virtualization: We run [math]\displaystyle{ m\ge n }[/math] virtual machines over [math]\displaystyle{ n }[/math] real machines with the following requirements:
  • each virtual machines includes exactly [math]\displaystyle{ 4 }[/math] real machines; and
  • each real machine participates in at most [math]\displaystyle{ d }[/math] virtual machines.
Suppose that each real machine still fails independently with probability [math]\displaystyle{ 5% }[/math], and a virtual machines fails if all the real machines it includes fail (This is because we have a copy of the virtual machine on every real machine it includes). Now consider a more vulnerable system: it fails if over [math]\displaystyle{ 1/100000 }[/math] of the virtual machines fail. Show that for any constant [math]\displaystyle{ d }[/math], the probability that the system fails is exponentially small as [math]\displaystyle{ n }[/math] grows.

Problem 4

Given a binary string, define a run as a maximal sequence of contiguous 1s; for example, the following string

[math]\displaystyle{ \underbrace{111}_{3}00\underbrace{11}_{2}00\underbrace{111111}_{6}0\underbrace{1}_{1}0\underbrace{11}_{2} }[/math]

contains 5 runs, of length 3, 2, 6, 1, and 2.

Let [math]\displaystyle{ S }[/math] be a binary string of length [math]\displaystyle{ n }[/math], generated uniformly at random. Let [math]\displaystyle{ X_k }[/math] be the number of runs in [math]\displaystyle{ S }[/math] of length [math]\displaystyle{ k }[/math] or more.

  • Compute the exact value of [math]\displaystyle{ \mathbb{E}[X_k] }[/math] as a function of [math]\displaystyle{ n }[/math] and [math]\displaystyle{ k }[/math].
  • Give the best concentration bound you can for [math]\displaystyle{ |X_k -\mathbb{E}[X_k]| }[/math].