概率论 (Summer 2013)/Problem Set 4: Difference between revisions

Latest revision as of 11:49, 25 July 2013

Problem 1

Let [math]\displaystyle{ X_1,X_2,\dots,X_n }[/math] be independent geometrically distributed random variables each having expectation 2 (each of the [math]\displaystyle{ X_i }[/math] is an independent experiment counting the number of tosses of an unbiased coin up to and including the first HEADS). Let [math]\displaystyle{ X=\sum_{i=1}^nX_i }[/math] and [math]\displaystyle{ \delta }[/math] be a positive real constant. Derive the best upper bound you can on [math]\displaystyle{ \Pr[X\gt (1+\delta)(2n)] }[/math].

Problem 2

A boolean code is a mapping [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math]. Each [math]\displaystyle{ x\in\{0,1\}^k }[/math] is called a message and [math]\displaystyle{ y=C(x) }[/math] is called a codeword. The code rate [math]\displaystyle{ r }[/math] of a code [math]\displaystyle{ C }[/math] is [math]\displaystyle{ r=\frac{k}{n} }[/math]. A boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] is a linear code if it is a linear transformation, i.e. there is a matrix [math]\displaystyle{ A\in\{0,1\}^{n\times k} }[/math] such that [math]\displaystyle{ C(x)=Ax }[/math] for any [math]\displaystyle{ x\in\{0,1\}^k }[/math], where the additions and multiplications are defined over the finite field of order two, [math]\displaystyle{ (\{0,1\},+_{\bmod 2},\times_{\bmod 2}) }[/math].

The distance between two codeword [math]\displaystyle{ y_1 }[/math] and [math]\displaystyle{ y_2 }[/math], denoted by [math]\displaystyle{ d(y_1,y_2) }[/math], is defined as the Hamming distance between them. Formally, [math]\displaystyle{ d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)| }[/math]. The distance of a code [math]\displaystyle{ C }[/math] is the minimum distance between any two codewords. Formally, [math]\displaystyle{ d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2)) }[/math].

Usually we want to make both the code rate [math]\displaystyle{ r }[/math] and the code distance [math]\displaystyle{ d }[/math] as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.

Use the probabilistic method to prove that there exists a boolean code [math]\displaystyle{ C:\{0,1\}^k\rightarrow\{0,1\}^n }[/math] of code rate [math]\displaystyle{ r }[/math] and distance [math]\displaystyle{ \left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n }[/math]. Try to optimize the constant in [math]\displaystyle{ \Theta(\cdot) }[/math].
Prove a similar result for linear boolean codes.

Problem 3

We have [math]\displaystyle{ n }[/math] servers to run a system. Each machine fails independently with probability [math]\displaystyle{ p\le 5% }[/math]. The whole system fails if over [math]\displaystyle{ 8% }[/math] of the machine fail. Show that the probability is exponentially small as the size of the system grows.
We want to improve the robustness of the system using the same [math]\displaystyle{ n }[/math] machines. One approach is to utilize virtualization: We run [math]\displaystyle{ m\ge n }[/math] virtual machines over [math]\displaystyle{ n }[/math] real machines with the following requirements:

each virtual machines includes exactly [math]\displaystyle{ 4 }[/math] real machines; and
each real machine participates in at most [math]\displaystyle{ d }[/math] virtual machines.

Suppose that each real machine still fails independently with probability [math]\displaystyle{ 5% }[/math], and a virtual machines fails if all the real machines it includes fail (This is because we have a copy of the virtual machine on every real machine it includes). Now consider a more vulnerable system: it fails if over [math]\displaystyle{ 1/100000 }[/math] of the virtual machines fail. Show that for any constant [math]\displaystyle{ d }[/math], the probability that the system fails is exponentially small as [math]\displaystyle{ n }[/math] grows.

Problem 4

Given a binary string, define a run as a maximal sequence of contiguous 1s; for example, the following string

[math]\displaystyle{ \underbrace{111}_{3}00\underbrace{11}_{2}00\underbrace{111111}_{6}0\underbrace{1}_{1}0\underbrace{11}_{2} }[/math]

contains 5 runs, of length 3, 2, 6, 1, and 2.

Let [math]\displaystyle{ S }[/math] be a binary string of length [math]\displaystyle{ n }[/math], generated uniformly at random. Let [math]\displaystyle{ X_k }[/math] be the number of runs in [math]\displaystyle{ S }[/math] of length [math]\displaystyle{ k }[/math] or more.

Compute the exact value of [math]\displaystyle{ \mathbb{E}[X_k] }[/math] as a function of [math]\displaystyle{ n }[/math] and [math]\displaystyle{ k }[/math].
Give the best concentration bound you can for [math]\displaystyle{ |X_k -\mathbb{E}[X_k]| }[/math].

@@ Line 3: / Line 3: @@
 Let <math>X_1,X_2,\dots,X_n</math> be independent geometrically distributed random variables each having expectation 2 (each of the <math>X_i</math> is an independent experiment counting the number of tosses of an unbiased coin up to and including the first HEADS). Let <math>X=\sum_{i=1}^nX_i</math> and <math>\delta</math> be a positive real constant. Derive the best upper bound you can on <math>\Pr[X>(1+\delta)(2n)]</math>.
-== Problem 2 ==
+==Problem 2==
+A '''boolean code''' is a mapping <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math>. Each <math>x\in\{0,1\}^k</math> is called a '''message''' and <math>y=C(x)</math> is called a '''codeword'''. The '''code rate''' <math>r</math> of a code <math>C</math> is <math>r=\frac{k}{n}</math>. A boolean code <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math> is a '''linear code''' if it is a linear transformation, i.e. there is a matrix <math>A\in\{0,1\}^{n\times k}</math> such that <math>C(x)=Ax</math> for any <math>x\in\{0,1\}^k</math>, where the additions and multiplications are defined over the finite field of order two, <math>(\{0,1\},+_{\bmod 2},\times_{\bmod 2})</math>.
+The '''distance''' between two codeword <math>y_1</math> and <math>y_2</math>, denoted by <math>d(y_1,y_2)</math>, is defined as the Hamming distance between them. Formally, <math>d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)|</math>. The distance of a code <math>C</math> is the minimum distance between any two codewords. Formally, <math>d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2))</math>.
+Usually we want to make both the code rate <math>r</math> and the code distance <math>d</math> as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.
+* Use the probabilistic method to prove that there exists a boolean code <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math> of code rate <math>r</math> and distance <math>\left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n</math>. Try to optimize the constant in <math>\Theta(\cdot)</math>.
+* Prove a similar result for linear boolean codes.
 == Problem 3 ==
+# We have <math>n</math> servers to run a system. Each machine fails <i>independently</i> with probability <math>p\le 5%</math>. The whole system fails if over <math>8%</math> of the machine fail. Show that the probability is exponentially small as the size of the system grows.
+# We want to improve the robustness of the system using the same <math>n</math> machines. One approach is to utilize virtualization: We run <math>m\ge n</math> virtual machines over <math>n</math> real machines with the following requirements:
+:* each virtual machines includes exactly <math>4</math> real machines; and
+:* each real machine participates in at most <math>d</math> virtual machines.
+:Suppose that each real machine still fails independently with probability <math>5%</math>, and a virtual machines fails if all the real machines it includes fail (This is because we have a copy of the virtual machine on every real machine it includes). Now consider a more vulnerable system: it fails if over <math>1/100000</math> of the virtual machines fail. Show that for any constant <math>d</math>, the probability that the system fails is exponentially small as <math>n</math> grows.
 == Problem 4 ==
+Given a binary string, define a '''run''' as a <font color=red>maximal</font> sequence of contiguous 1s; for example, the following string
+:<math>\underbrace{111}_{3}00\underbrace{11}_{2}00\underbrace{111111}_{6}0\underbrace{1}_{1}0\underbrace{11}_{2}</math>
+contains 5 runs, of length 3, 2, 6, 1, and 2.
+Let <math>S</math> be a binary string of length <math>n</math>, generated uniformly at random. Let <math>X_k</math> be the number of runs in <math>S</math> of length <math>k</math> or more.
+*Compute the exact value of <math>\mathbb{E}[X_k]</math> as a function of <math>n</math> and <math>k</math>.
+*Give the best concentration bound you can for <math>|X_k -\mathbb{E}[X_k]|</math>.

概率论 (Summer 2013)/Problem Set 4: Difference between revisions

Latest revision as of 11:49, 25 July 2013

Contents

Problem 1

Problem 2

Problem 3

Problem 4

Navigation menu

概率论 (Summer 2013)/Problem Set 4: Difference between revisions

Latest revision as of 11:49, 25 July 2013

Problem 1

Problem 2

Problem 3

Problem 4

Navigation menu

Search