概率论与数理统计 (Spring 2024)/Problem Set 2: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
Zhangxy (talk | contribs)
Zhangxy (talk | contribs)
 
(3 intermediate revisions by the same user not shown)
Line 50: Line 50:
</li>
</li>


<li> [<strong>Geometric distribution (II)</strong>] Prove that geometry distribution is the only discrete memoryless distribution with range values <math>\mathbb{N}_+</math>.
<li> [<strong>Geometric distribution (II)</strong>] Prove that geometric distribution is the only discrete memoryless distribution with range values <math>\mathbb{N}_+</math>.
</li>
</li>


Line 75: Line 75:
</li>
</li>
</ul>
</ul>


== Problem 4 (Linearity of Expectation, 12 points) ==
== Problem 4 (Linearity of Expectation, 12 points) ==
Line 101: Line 100:


<li>[<strong>Expected Mex</strong>]
<li>[<strong>Expected Mex</strong>]
Let <math>X_1,X_2,\ldots,X_{100} \sim \mathrm{Geo}(1/2)</math> be independent random variables. Compute <math>\mathbf{E}[\mathrm{mex}(X_1,X_2,\ldots,X_{100})]</math>, where <math>\mathrm{mex}(a_1,a_2,\ldots,a_n)</math> is the smallest nonnegative integer that does not appear in <math>a_1,a_2,\ldots,a_n</math>. Your answer is considered correct if the absolute error does not exceed <math>10^{-6}</math>. (Hint: Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required.)
Let <math>X_1,X_2,\ldots,X_{100} \sim \mathrm{Geo}(1/2)</math> be independent random variables. Compute <math>\mathbf{E}[\mathrm{mex}(X_1,X_2,\ldots,X_{100})]</math>, where <math>\mathrm{mex}(a_1,a_2,\ldots,a_n)</math> is the smallest positive integer that does not appear in <math>a_1,a_2,\ldots,a_n</math>. Your answer is considered correct if the absolute error does not exceed <math>10^{-6}</math>. (Hint: Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required.)
</li>
</li>
</li>
</li>
Line 107: Line 106:
</ul>
</ul>


== Problem 5 (Probability method) ==
== Problem 5 (Probability meets graph theory) ==
<ul>
<ul>
<li>[<strong>Random social networks</strong>]
<li>[<strong>Random social networks</strong>]
Line 125: Line 124:
<math>G</math> belongs to <math>S</math> or has a neighbor in <math>S</math>.  
<math>G</math> belongs to <math>S</math> or has a neighbor in <math>S</math>.  


Let <math>G = (V, E)</math> be an <math>n</math>-vertex graph with minimum degree <math>d > 1</math>. Prove that <math>G</math> has a dominating set with at most <math>\frac{n\tp{1+\log_{\e}(d+1)}}{d+1}</math> vertices.
Let <math>G = (V, E)</math> be an <math>n</math>-vertex graph with minimum degree <math>d > 1</math>. Prove that <math>G</math> has a dominating set with at most <math>\frac{n\left(1+\log(d+1)\right)}{d+1}</math> vertices. (Hint: Consider a random vertex subset <math>S \subseteq V</math> by including each vertex independently with
probability <math>p := \log(d + 1)/(d + 1)</math>.)
</li>
</li>
</ul>
</ul>

Latest revision as of 19:27, 13 April 2024

  • 每道题目的解答都要有完整的解题过程,中英文不限。
  • 我们推荐大家使用LaTeX, markdown等对作业进行排版。

Assumption throughout Problem Set 2

Without further notice, we are working on probability space [math]\displaystyle{ (\Omega,\mathcal{F},\mathbf{Pr}) }[/math].

Without further notice, we assume that the expectation of random variables are well-defined.

The term [math]\displaystyle{ \log }[/math] used in this context refers to the natural logarithm.

Problem 1 (Warm-up problems, 12 points)

  • [Function of random variable (I)] Show that, if [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] are random variables, then so are [math]\displaystyle{ X+Y }[/math], [math]\displaystyle{ XY }[/math] and [math]\displaystyle{ \min\{X,Y\} }[/math].
  • [Function of random variable (II)] Let [math]X[/math] be a random variable with distribution function [math]\max(0,\min(1,x))[/math]. Let [math]F[/math] be a distribution function which is continuous and strictly increasing. Show that [math]Y=F^{-1}(X)[/math] be a random variable with distribution function [math]F[/math].
  • [Independence] Let [math]\displaystyle{ X_r }[/math], [math]\displaystyle{ 1\leq r\leq n }[/math] be independent random variables which are symmetric about [math]\displaystyle{ 0 }[/math]; that is, [math]\displaystyle{ X_r }[/math] and [math]\displaystyle{ -X_r }[/math] have the same distributions. Show that, for all [math]\displaystyle{ x }[/math], [math]\displaystyle{ \mathbf{Pr}[S_n \geq x] = \mathbf{Pr}[S_n \leq -x] }[/math] where [math]\displaystyle{ S_n = \sum_{r=1}^n X_r }[/math]. Is the conclusion true without the assumtion of independence?
  • [Dependence] Let [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] be discrete random variables with joint mass function [math]\displaystyle{ f(x,y) = \frac{C}{(x+y-1)(x+y)(x+y+1)} }[/math] where [math]\displaystyle{ x,y \in \mathbb{N}_+ }[/math] (in other words, [math]\displaystyle{ x,y = 1,2,3,\cdots }[/math]). Find (1) the value of [math]\displaystyle{ C }[/math], (2) marginal mass function of [math]\displaystyle{ X }[/math] and (3) [math]\displaystyle{ \mathbf{E}[X] }[/math].
  • [Expectation] Is it generally true that [math]\mathbf{E}[1/X] = 1/\mathbf{E}[X][/math]? Is it ever true that [math]\mathbf{E}[1/X] = 1/\mathbf{E}[X][/math]?
  • [Entropy of discrete random variable] Let [math]X[/math] be a discrete random variable with range of values [math][N] = \{1,2,\ldots,N\}[/math] and probability mass function [math]p[/math]. Define [math]H(X) = -\sum_{n \ge 1} p(n) \log p(n)[/math] with convention [math]0\log 0 = 0[/math]. Prove that [math]H(X) \le \log N[/math] using Jensen's inequality.
  • [Law of total expectation] Let [math]X \sim \mathrm{Geom}(p)[/math] for some parameter [math]p \in (0,1)[/math]. Calculate [math]\mathbf{E}[X][/math] using the law of total expectation.
  • [Random number of random variables] Let [math]\displaystyle{ \{X_n\}_{n \ge 1} }[/math] be identically distributed random variable and [math]\displaystyle{ N }[/math] be a random variable taking values in the non-negative integers and independent of the [math]\displaystyle{ X_n }[/math] for all [math]\displaystyle{ n \ge 1 }[/math]. Prove that [math]\displaystyle{ \mathbf{E}\left[\sum_{i=1}^N X_i\right] = \mathbf{E}[N] \mathbf{E}[X_1] }[/math].

Problem 2 (Distribution of random variable, 8 points)

  • [Cumulative distribution function (CDF)] Let [math]\displaystyle{ X }[/math] be a random variable with cumulative distribution function [math]\displaystyle{ F }[/math].
    1. Show that [math]\displaystyle{ Y = aX+b }[/math] is a random variable where [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] are real constants, and express the CDF of [math]\displaystyle{ Y }[/math] by [math]\displaystyle{ F }[/math]. (Hint: Try expressing the event [math]Y=aX+b\le y[/math] by countably many set operations on the events defined on [math]X[/math].)
    2. Let [math]\displaystyle{ G }[/math] be the CDF of random variable [math]\displaystyle{ Z:\Omega\rightarrow \mathbb{R} }[/math] and [math]\displaystyle{ 0\leq \lambda \leq 1 }[/math], show that
      • [math]\displaystyle{ \lambda F + (1-\lambda)G }[/math] is a CDF function.
      • The product [math]\displaystyle{ FG }[/math] is a CDF function, and if [math]\displaystyle{ Z }[/math] and [math]\displaystyle{ X }[/math] are independent, then [math]\displaystyle{ FG }[/math] is the CDF of [math]\displaystyle{ \max\{X,Z\} }[/math].
  • [Probability mass function (PMF)] We toss [math]\displaystyle{ n }[/math] coins, and each one shows heads with probability [math]\displaystyle{ p }[/math], independently of each of the others. Each coin which shows head is tossed again. (If the coin shows tail, it won't be tossed again.) Let [math]\displaystyle{ X }[/math] be the number of heads resulting from the second round of tosses, and [math]\displaystyle{ Y }[/math] be the number of heads resulting from all tosses, which includes the first and (possible) second round of each toss.
    1. Find the PMF of [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math].
    2. Find [math]\displaystyle{ \mathbf{E}[X] }[/math] and [math]\displaystyle{ \mathbf{E}[Y] }[/math].
    3. Let [math]\displaystyle{ p_X }[/math] be the PMF of [math]\displaystyle{ X }[/math], show that [math]p_X(k-1)p_X(k+1)\leq [p_X(k)]^2[/math] for [math]\displaystyle{ 1\leq k \leq n-1 }[/math].

Problem 3 (Discrete random variable)

  • [Geometric distribution (I)] Every package of some intrinsically dull commodity includes a small and exciting plastic object. There are [math]c[/math] different types of object, and each package is equally likely to contain any given type. You buy one package each day.

    1. Find the expected number of days which elapse between the acquisitions of the [math]j[/math]-th new type of object and the [math](j + 1)[/math]-th new type.
    2. Find the expected number of days which elapse before you have a full set of objects.
  • [Geometric distribution (II)] Prove that geometric distribution is the only discrete memoryless distribution with range values [math]\displaystyle{ \mathbb{N}_+ }[/math].
  • [Binomial distribution] Let [math]\displaystyle{ n_1,n_2 \in \mathbb{N}_+ }[/math] and [math]\displaystyle{ 0 \le p \le 1 }[/math] be parameters, and [math]\displaystyle{ X \sim \mathrm{Bin}(n_1,p),Y \sim \mathrm{Bin}(n_2,p) }[/math] be independent random variables. Prove that [math]\displaystyle{ X+Y \sim \mathrm{Bin}(n_1+n_2,p) }[/math].
  • [Negative binomial distribution] Let [math]\displaystyle{ X }[/math] follows the negative binomial distribution with parameter [math]\displaystyle{ r \in \mathbb{N}_+ }[/math] and [math]\displaystyle{ p \in (0,1) }[/math]. Calculate [math]\displaystyle{ \mathbf{Var}[X] = \mathbf{E}[X^2] - \left(\mathbf{E}[X]\right)^2 }[/math].
  • [Hypergeometric distribution] An urn contains [math]N[/math] balls, [math]b[/math] of which are blue and [math]r = N -b[/math] of which are red. A random sample of [math]n[/math] balls is drawn without replacement (无放回) from the urn. Let [math]B[/math] the number of blue balls in this sample. Show that if [math]N, b[/math], and [math]r[/math] approach [math]+\infty[/math] in such a way that [math]b/N \rightarrow p[/math] and [math]r/N \rightarrow 1 - p[/math], then [math]\mathbf{Pr}(B = k) \rightarrow {n\choose k}p^k(1-p)^{n-k}[/math] for [math]0\leq k \leq n[/math].
  • [Poisson distribution] In your pocket is a random number [math]\displaystyle{ N }[/math] of coins, where [math]\displaystyle{ N }[/math] has the Poisson distribution with parameter [math]\displaystyle{ \lambda }[/math]. You toss each coin once, with heads showing with probability [math]\displaystyle{ p }[/math] each time. Let [math]\displaystyle{ X }[/math] be the (random) number of heads outcomes and [math]\displaystyle{ Y }[/math] be the (also random) number of tails.
    1. Find the joint mass function of [math]\displaystyle{ (X,Y) }[/math].
    2. Find PMF of the marginal distribution of [math]\displaystyle{ X }[/math] in [math]\displaystyle{ (X,Y) }[/math]. Are [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] independent?
  • [Conditional distribution (I)] Let [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] be independent [math]\displaystyle{ \text{Bin}(n, p) }[/math] random variables, and let [math]\displaystyle{ Z = X + Y }[/math]. Show that the conditional distribution of [math]\displaystyle{ X }[/math] given [math]\displaystyle{ Z = N }[/math] is the hypergeometric distribution.
  • [Conditional distribution (II)] Let [math]\displaystyle{ \lambda,\mu \gt 0 }[/math] and [math]\displaystyle{ n \in \mathbb{N} }[/math] be parameters, and [math]\displaystyle{ X \sim \mathrm{Pois}(\lambda), Y \sim \mathrm{Pois}(\mu) }[/math] be independent random variables. Find out the conditional distribution of [math]\displaystyle{ X }[/math], given [math]\displaystyle{ X+Y = n }[/math].

Problem 4 (Linearity of Expectation, 12 points)

  • [Streak] Suppose we flip a fair coin [math]\displaystyle{ n }[/math] times independently to obtain a sequence of flips [math]\displaystyle{ X_1, X_2, \ldots , X_n }[/math]. A streak of flips is a consecutive subsequence of flips that are all the same. For example, if [math]\displaystyle{ X_3 }[/math], [math]\displaystyle{ X_4 }[/math], and [math]\displaystyle{ X_5 }[/math] are all heads, there is a streak of length [math]\displaystyle{ 3 }[/math] starting at the third lip. (If [math]\displaystyle{ X_6 }[/math] is also heads, then there is also a streak of length [math]\displaystyle{ 4 }[/math] starting at the third lip.) Find the expected number of streaks of length [math]\displaystyle{ k }[/math] for some integer [math]\displaystyle{ k \ge 1 }[/math].
  • [Number of cycles] At a banquet, there are [math]\displaystyle{ n }[/math] people who shake hands according to the following process: In each round, two idle hands are randomly selected and shaken (these two hands are no longer idle). After [math]\displaystyle{ n }[/math] rounds, there will be no idle hands left, and the [math]\displaystyle{ n }[/math] people will form several cycles. For example, when [math]\displaystyle{ n=3 }[/math], the following situation may occur: the left and right hands of the first person are held together, the left hand of the second person and the right hand of the third person are held together, and the right hand of the second person and the left hand of the third person are held together. In this case, three people form two cycles. How many cycles are expected to be formed after [math]\displaystyle{ n }[/math] rounds?
  • [Paper Cutting] We have a rectangular piece of paper divided into [math]\displaystyle{ H \times W }[/math] squares, where two of those squares are painted black and the rest are painted white. If we let [math]\displaystyle{ (i,j) }[/math] denote the square at the [math]\displaystyle{ i }[/math]-th row and [math]\displaystyle{ j }[/math]-th column, the squares painted black are [math]\displaystyle{ (h_1,w_1) }[/math] and [math]\displaystyle{ (h_2,w_2) }[/math]. Bob will repeat the following operation to cut the piece of paper:
      Assume that we have [math]\displaystyle{ h \times w }[/math] squares remaining. There are [math]\displaystyle{ (h−1) }[/math] horizontal lines and [math]\displaystyle{ (w−1) }[/math] vertical lines that are parallel to the edges of the piece and pass the borders of the squares. He chooses one of these lines uniformly at random and cuts the piece into two along that line. Then, if the two black squares are on the same piece, he throws away the other piece and continues the process; otherwise, he ends the process.

    Find the expected value of the number of times Bob cuts a piece of paper until he ends the process.

  • [Expected Mex] Let [math]\displaystyle{ X_1,X_2,\ldots,X_{100} \sim \mathrm{Geo}(1/2) }[/math] be independent random variables. Compute [math]\displaystyle{ \mathbf{E}[\mathrm{mex}(X_1,X_2,\ldots,X_{100})] }[/math], where [math]\displaystyle{ \mathrm{mex}(a_1,a_2,\ldots,a_n) }[/math] is the smallest positive integer that does not appear in [math]\displaystyle{ a_1,a_2,\ldots,a_n }[/math]. Your answer is considered correct if the absolute error does not exceed [math]\displaystyle{ 10^{-6} }[/math]. (Hint: Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required.)

Problem 5 (Probability meets graph theory)

  • [Random social networks] Let [math]\displaystyle{ G = (V, E) }[/math] be a fixed undirected graph without isolating vertex. Let [math]\displaystyle{ d_v }[/math] be the degree of vertex [math]\displaystyle{ v }[/math]. Let [math]\displaystyle{ Y }[/math] be a uniformly chosen vertex, and [math]\displaystyle{ Z }[/math] a uniformly chosen neighbor of [math]\displaystyle{ Y }[/math].
    1. Show that [math]\displaystyle{ \mathbf{E}[d_Z] \geq \mathbf{E}[d_Y] }[/math].
    2. Interpret this inequality in the context of social networks, in which the vertices represent people, and the edges represent friendship.
  • [Turán's Theorem] Let [math]\displaystyle{ G=(V,E) }[/math] be a fixed undirected graph, and write [math]\displaystyle{ d_v }[/math] for the degree of the vertex [math]\displaystyle{ v }[/math]. Use probablistic method to prove that [math]\displaystyle{ \alpha(G) \ge \sum_{v \in V} \frac{1}{d_v + 1} }[/math], where [math]\displaystyle{ \alpha(G) }[/math] is the size of a maximum independent set. (Hint: Consider the following random procedure for generating an independent set [math]\displaystyle{ I }[/math] from a graph with vertex set [math]\displaystyle{ V }[/math]: First, generate a random permutation of the vertices, denoted as [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math]. Then, construct the independent set [math]\displaystyle{ I }[/math] as follows: For each vertex [math]\displaystyle{ v_i \in V }[/math], add [math]\displaystyle{ v_i }[/math] to [math]\displaystyle{ I }[/math] if and only if none of its predecessors in the permutation, i.e., [math]\displaystyle{ v_1,\ldots,v_{i-1} }[/math], are neighbors of [math]\displaystyle{ v_i }[/math].)
  • [Dominating set] A dominating set of vertices in an undirected graph [math]\displaystyle{ G = (V, E) }[/math] is a set [math]\displaystyle{ S \subseteq V }[/math] such that every vertex of [math]\displaystyle{ G }[/math] belongs to [math]\displaystyle{ S }[/math] or has a neighbor in [math]\displaystyle{ S }[/math]. Let [math]\displaystyle{ G = (V, E) }[/math] be an [math]\displaystyle{ n }[/math]-vertex graph with minimum degree [math]\displaystyle{ d \gt 1 }[/math]. Prove that [math]\displaystyle{ G }[/math] has a dominating set with at most [math]\displaystyle{ \frac{n\left(1+\log(d+1)\right)}{d+1} }[/math] vertices. (Hint: Consider a random vertex subset [math]\displaystyle{ S \subseteq V }[/math] by including each vertex independently with probability [math]\displaystyle{ p := \log(d + 1)/(d + 1) }[/math].)

Problem 6 (1D random walk, 8 points)

Let [math]\displaystyle{ p \in (0,1) }[/math] be a constant, and [math]\displaystyle{ \{X_n\}_{n \ge 1} }[/math] be independent Bernoulli trials with successful probability [math]\displaystyle{ p }[/math]. Define [math]\displaystyle{ S_n = 2\sum_{i=1}^n X_i - n }[/math] and [math]\displaystyle{ S_0 = 0 }[/math].

  • [Range of random walk] The range [math]\displaystyle{ R_n }[/math] of [math]\displaystyle{ S_0, S_1, \ldots, S_n }[/math] is defined as the number of distinct values taken by the sequence. Show that [math]\displaystyle{ \mathbf{Pr}\left(R_n = R_{n-1}+1\right) = \mathbf{Pr}\left(\forall 1 \le i \le n, S_i \neq 0\right) }[/math] as [math]\displaystyle{ n \to \infty }[/math], and deduce that [math]\displaystyle{ n^{-1} \mathbf{E}[R_n]\to \mathbf{Pr}(\forall i \ge 1, S_i \neq 0) }[/math]. Hence show that [math]\displaystyle{ n^{-1} \mathbf{E}[R_n] \to |2p-1| }[/math] as [math]\displaystyle{ n \to \infty }[/math].
  • [Symmetric 1D random walk (III)] Suppose [math]\displaystyle{ p = \frac{1}{2} }[/math]. Prove that [math]\displaystyle{ \mathbf{E}[|S_n|] = \Theta(\sqrt{n}) }[/math].