概率论 (Summer 2013)/Problem Set 4: Difference between revisions
imported>Zhangchihao |
imported>Zhangchihao |
||
(2 intermediate revisions by the same user not shown) | |||
Line 14: | Line 14: | ||
== Problem 3 == | == Problem 3 == | ||
# We have <math>n</math> servers to run a system. Each machine fails <i>independently</i> with probability <math>p\le 5%</math>. The whole system fails if over <math>8%</math> of the machine fail. Show that the probability is exponentially small as the size of the system grows. | |||
# We want to improve the robustness of the system using the same <math>n</math> machines. One approach is to utilize virtualization: We run <math>m\ge n</math> virtual machines over <math>n</math> real machines with the following requirements: | |||
:* each virtual machines includes exactly <math>4</math> real machines; and | |||
:* each real machine participates in at most <math>d</math> virtual machines. | |||
:Suppose that each real machine still fails independently with probability <math>5%</math>, and a virtual machines fails if all the real machines it includes fail (This is because we have a copy of the virtual machine on every real machine it includes). Now consider a more vulnerable system: it fails if over <math>1/100000</math> of the virtual machines fail. Show that for any constant <math>d</math>, the probability that the system fails is exponentially small as <math>n</math> grows. | |||
== Problem 4 == | == Problem 4 == | ||
Given a binary string, define a '''run''' as a <font color=red>maximal</font> sequence of contiguous 1s; for example, the following string | |||
:<math>\underbrace{111}_{3}00\underbrace{11}_{2}00\underbrace{111111}_{6}0\underbrace{1}_{1}0\underbrace{11}_{2}</math> | |||
contains 5 runs, of length 3, 2, 6, 1, and 2. | |||
Let <math>S</math> be a binary string of length <math>n</math>, generated uniformly at random. Let <math>X_k</math> be the number of runs in <math>S</math> of length <math>k</math> or more. | |||
*Compute the exact value of <math>\mathbb{E}[X_k]</math> as a function of <math>n</math> and <math>k</math>. | |||
*Give the best concentration bound you can for <math>|X_k -\mathbb{E}[X_k]|</math>. |
Latest revision as of 11:49, 25 July 2013
Problem 1
Let [math]\displaystyle{ X_1,X_2,\dots,X_n }[/math] be independent geometrically distributed random variables each having expectation 2 (each of the [math]\displaystyle{ X_i }[/math] is an independent experiment counting the number of tosses of an unbiased coin up to and including the first HEADS). Let [math]\displaystyle{ X=\sum_{i=1}^nX_i }[/math] and [math]\displaystyle{ \delta }[/math] be a positive real constant. Derive the best upper bound you can on [math]\displaystyle{ \Pr[X\gt (1+\delta)(2n)] }[/math].
Problem 2
A boolean code is a mapping [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math]. Each [math]\displaystyle{ x\in\{0,1\}^k }[/math] is called a message and [math]\displaystyle{ y=C(x) }[/math] is called a codeword. The code rate [math]\displaystyle{ r }[/math] of a code [math]\displaystyle{ C }[/math] is [math]\displaystyle{ r=\frac{k}{n} }[/math]. A boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] is a linear code if it is a linear transformation, i.e. there is a matrix [math]\displaystyle{ A\in\{0,1\}^{n\times k} }[/math] such that [math]\displaystyle{ C(x)=Ax }[/math] for any [math]\displaystyle{ x\in\{0,1\}^k }[/math], where the additions and multiplications are defined over the finite field of order two, [math]\displaystyle{ (\{0,1\},+_{\bmod 2},\times_{\bmod 2}) }[/math].
The distance between two codeword [math]\displaystyle{ y_1 }[/math] and [math]\displaystyle{ y_2 }[/math], denoted by [math]\displaystyle{ d(y_1,y_2) }[/math], is defined as the Hamming distance between them. Formally, [math]\displaystyle{ d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)| }[/math]. The distance of a code [math]\displaystyle{ C }[/math] is the minimum distance between any two codewords. Formally, [math]\displaystyle{ d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2)) }[/math].
Usually we want to make both the code rate [math]\displaystyle{ r }[/math] and the code distance [math]\displaystyle{ d }[/math] as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.
- Use the probabilistic method to prove that there exists a boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] of code rate [math]\displaystyle{ r }[/math] and distance [math]\displaystyle{ \left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n }[/math]. Try to optimize the constant in [math]\displaystyle{ \Theta(\cdot) }[/math].
- Prove a similar result for linear boolean codes.
Problem 3
- We have [math]\displaystyle{ n }[/math] servers to run a system. Each machine fails independently with probability [math]\displaystyle{ p\le 5% }[/math]. The whole system fails if over [math]\displaystyle{ 8% }[/math] of the machine fail. Show that the probability is exponentially small as the size of the system grows.
- We want to improve the robustness of the system using the same [math]\displaystyle{ n }[/math] machines. One approach is to utilize virtualization: We run [math]\displaystyle{ m\ge n }[/math] virtual machines over [math]\displaystyle{ n }[/math] real machines with the following requirements:
- each virtual machines includes exactly [math]\displaystyle{ 4 }[/math] real machines; and
- each real machine participates in at most [math]\displaystyle{ d }[/math] virtual machines.
- Suppose that each real machine still fails independently with probability [math]\displaystyle{ 5% }[/math], and a virtual machines fails if all the real machines it includes fail (This is because we have a copy of the virtual machine on every real machine it includes). Now consider a more vulnerable system: it fails if over [math]\displaystyle{ 1/100000 }[/math] of the virtual machines fail. Show that for any constant [math]\displaystyle{ d }[/math], the probability that the system fails is exponentially small as [math]\displaystyle{ n }[/math] grows.
Problem 4
Given a binary string, define a run as a maximal sequence of contiguous 1s; for example, the following string
- [math]\displaystyle{ \underbrace{111}_{3}00\underbrace{11}_{2}00\underbrace{111111}_{6}0\underbrace{1}_{1}0\underbrace{11}_{2} }[/math]
contains 5 runs, of length 3, 2, 6, 1, and 2.
Let [math]\displaystyle{ S }[/math] be a binary string of length [math]\displaystyle{ n }[/math], generated uniformly at random. Let [math]\displaystyle{ X_k }[/math] be the number of runs in [math]\displaystyle{ S }[/math] of length [math]\displaystyle{ k }[/math] or more.
- Compute the exact value of [math]\displaystyle{ \mathbb{E}[X_k] }[/math] as a function of [math]\displaystyle{ n }[/math] and [math]\displaystyle{ k }[/math].
- Give the best concentration bound you can for [math]\displaystyle{ |X_k -\mathbb{E}[X_k]| }[/math].