概率论与数理统计 (Spring 2025)/Problem Set 2

From TCS Wiki
Revision as of 08:10, 14 March 2025 by Zouzongrui (talk | contribs) (Problem 1 (Warm-up problems, 16 points))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
  • 每道题目的解答都要有完整的解题过程,中英文不限。
  • 我们推荐大家使用LaTeX, markdown等对作业进行排版。

Assumption throughout Problem Set 2

Without further notice, we are working on probability space [math]\displaystyle{ (\Omega,\mathcal{F},\mathbf{Pr}) }[/math].

Without further notice, we assume that the expectation of random variables are well-defined.

The term [math]\displaystyle{ \log }[/math] used in this context refers to the natural logarithm.

Problem 1 (Warm-up problems, 16 points)

  • [Cumulative distribution function (CDF) (I)] Express the distribution functions of [math]X^+ = \max\{0,X\}[/math], [math]X^- = -\min\{0,X\}[/math], [math]|X|=X^+ + X^-[/math], [math]-X[/math], in terms of the distribution function [math]F[/math] of the random variable [math]X[/math].
  • [Cumulative distribution function (CDF) (II)] Let [math]X[/math] be a random variable with distribution function [math]\max(0,\min(1,x))[/math]. Let [math]F[/math] be a distribution function which is continuous and strictly increasing. Show that [math]Y=F^{-1}(X)[/math] is a random variable with distribution function [math]F[/math].
  • [Probability density function (PDF)] We toss [math]\displaystyle{ n }[/math] coins, and each one shows heads with probability [math]\displaystyle{ p }[/math], independently of each of the others. Each coin which shows head is tossed again. (If the coin shows tail, it won't be tossed again.) Let [math]\displaystyle{ X }[/math] be the number of heads resulting from the second round of tosses, and [math]\displaystyle{ Y }[/math] be the number of heads resulting from all tosses, which includes the first and (possible) second round of each toss.
  1. Find the PDF of [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math].
  2. Find [math]\displaystyle{ \mathbf{E}[X] }[/math] and [math]\displaystyle{ \mathbf{E}[Y] }[/math].
  3. Let [math]\displaystyle{ p_X }[/math] be the PDF of [math]\displaystyle{ X }[/math], show that [math]p_X(k-1)p_X(k+1)\leq [p_X(k)]^2[/math] for [math]\displaystyle{ 1\leq k \leq n-1 }[/math].
  • [Independence] Let [math]\displaystyle{ X_r }[/math], [math]\displaystyle{ 1\leq r\leq n }[/math] be independent random variables which are symmetric about [math]\displaystyle{ 0 }[/math]; that is, [math]\displaystyle{ X_r }[/math] and [math]\displaystyle{ -X_r }[/math] have the same distributions. Show that, for all [math]\displaystyle{ x }[/math], [math]\displaystyle{ \mathbf{Pr}[S_n \geq x] = \mathbf{Pr}[S_n \leq -x] }[/math] where [math]\displaystyle{ S_n = \sum_{r=1}^n X_r }[/math]. Is the conclusion true without the assumtion of independence?
  • [Dependence] Let [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] be discrete random variables with joint mass function [math]\displaystyle{ f(x,y) = \frac{C}{(x+y-1)(x+y)(x+y+1)} }[/math] where [math]\displaystyle{ x,y \in \mathbb{N}_+ }[/math] (in other words, [math]\displaystyle{ x,y = 1,2,3,\cdots }[/math]). Find (1) the value of [math]\displaystyle{ C }[/math], (2) marginal mass function of [math]\displaystyle{ X }[/math] and (3) [math]\displaystyle{ \mathbf{E}[X] }[/math].
  • [Expectation] It is required to place in order [math]\displaystyle{ n }[/math] books [math]\displaystyle{ B_1, B_2, \cdots, B_n }[/math] on a library shelf in such way that readers searching from left to right waste as little time as possible on average. Assuming that a random reader requires book [math]\displaystyle{ B_i }[/math] with probability [math]\displaystyle{ p_i }[/math], find the ordering of the books which minimizes the expected number of titles examined by a random reader before discovery of the required book.
  • [Law of total expectation] Let [math]X \sim \mathrm{Geom}(p)[/math] for some parameter [math]p \in (0,1)[/math]. Calculate [math]\mathbf{E}[X][/math] using the law of total expectation.
  • [Entropy of discrete random variable] Let [math]X[/math] be a discrete random variable with range of values [math][N] = \{1,2,\ldots,N\}[/math] and probability mass function [math]p[/math]. Define [math]H(X) = -\sum_{n \ge 1} p(n) \log p(n)[/math] with convention [math]0\log 0 = 0[/math]. Prove that [math]H(X) \le \log N[/math] using Jensen's inequality.

Problem 2 (Discrete random variable, 14 points)

  • [Geometric distribution (I)] Every package of some intrinsically dull commodity includes a small and exciting plastic object. There are [math]c[/math] different types of object, and each package is equally likely to contain any given type. You buy one package each day.

    1. Find the expected number of days which elapse between the acquisitions of the [math]j[/math]-th new type of object and the [math](j + 1)[/math]-th new type.
    2. Find the expected number of days which elapse before you have a full set of objects.
  • [Geometric distribution (II)] Prove that geometric distribution is the only discrete memoryless distribution with range values [math]\displaystyle{ \mathbb{N}_+ }[/math].
  • [Binomial distribution] Let [math]\displaystyle{ n_1,n_2 \in \mathbb{N}_+ }[/math] and [math]\displaystyle{ 0 \le p \le 1 }[/math] be parameters, and [math]\displaystyle{ X \sim \mathrm{Bin}(n_1,p),Y \sim \mathrm{Bin}(n_2,p) }[/math] be independent random variables. Prove that [math]\displaystyle{ X+Y \sim \mathrm{Bin}(n_1+n_2,p) }[/math].
  • [Negative binomial distribution] Let [math]\displaystyle{ X }[/math] follows the negative binomial distribution with parameter [math]\displaystyle{ r \in \mathbb{N}_+ }[/math] and [math]\displaystyle{ p \in (0,1) }[/math]. Calculate [math]\displaystyle{ \mathbf{Var}[X] = \mathbf{E}[X^2] - \left(\mathbf{E}[X]\right)^2 }[/math].
  • [Hypergeometric distribution] An urn contains [math]N[/math] balls, [math]b[/math] of which are blue and [math]r = N -b[/math] of which are red. A random sample of [math]n[/math] balls is drawn without replacement (无放回) from the urn. Let [math]B[/math] the number of blue balls in this sample. Show that if [math]N, b[/math], and [math]r[/math] approach [math]+\infty[/math] in such a way that [math]b/N \rightarrow p[/math] and [math]r/N \rightarrow 1 - p[/math], then [math]\mathbf{Pr}(B = k) \rightarrow {n\choose k}p^k(1-p)^{n-k}[/math] for [math]0\leq k \leq n[/math].
  • [Poisson distribution] In your pocket is a random number [math]\displaystyle{ N }[/math] of coins, where [math]\displaystyle{ N }[/math] has the Poisson distribution with parameter [math]\displaystyle{ \lambda }[/math]. You toss each coin once, with heads showing with probability [math]\displaystyle{ p }[/math] each time. Let [math]\displaystyle{ X }[/math] be the (random) number of heads outcomes and [math]\displaystyle{ Y }[/math] be the (also random) number of tails.
    1. Find the joint mass function of [math]\displaystyle{ (X,Y) }[/math].
    2. Find PDF of the marginal distribution of [math]\displaystyle{ X }[/math] in [math]\displaystyle{ (X,Y) }[/math]. Are [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] independent?
  • [Conditional distribution] Let [math]\displaystyle{ \lambda,\mu \gt 0 }[/math] and [math]\displaystyle{ n \in \mathbb{N} }[/math] be parameters, and [math]\displaystyle{ X \sim \mathrm{Pois}(\lambda), Y \sim \mathrm{Pois}(\mu) }[/math] be independent random variables. Find out the conditional distribution of [math]\displaystyle{ X }[/math], given [math]\displaystyle{ X+Y = n }[/math].

Problem 3 (Linearity of Expectation, 12 points)

  • [Streak] Suppose we flip a fair coin [math]\displaystyle{ n }[/math] times independently to obtain a sequence of flips [math]\displaystyle{ X_1, X_2, \ldots , X_n }[/math]. A streak of flips is a consecutive subsequence of flips that are all the same. For example, if [math]\displaystyle{ X_3 }[/math], [math]\displaystyle{ X_4 }[/math], and [math]\displaystyle{ X_5 }[/math] are all heads, there is a streak of length [math]\displaystyle{ 3 }[/math] starting at the third lip. (If [math]\displaystyle{ X_6 }[/math] is also heads, then there is also a streak of length [math]\displaystyle{ 4 }[/math] starting at the third lip.) Find the expected number of streaks of length [math]\displaystyle{ k }[/math] for some integer [math]\displaystyle{ k \ge 1 }[/math].
  • [Number of cycles] At a banquet, there are [math]\displaystyle{ n }[/math] people who shake hands according to the following process: In each round, two idle hands are randomly selected and shaken (these two hands are no longer idle). After [math]\displaystyle{ n }[/math] rounds, there will be no idle hands left, and the [math]\displaystyle{ n }[/math] people will form several cycles. For example, when [math]\displaystyle{ n=3 }[/math], the following situation may occur: the left and right hands of the first person are held together, the left hand of the second person and the right hand of the third person are held together, and the right hand of the second person and the left hand of the third person are held together. In this case, three people form two cycles. How many cycles are expected to be formed after [math]\displaystyle{ n }[/math] rounds?
  • [Number of operations] We have a directed graph [math]\displaystyle{ G=(V,E) }[/math] without self-loops and multi-edges. Until the graph becomes empty, repeat the following operation:
      Choose one (unerased) vertex uniformly at random (independently from the previous choices). Then, erase that vertex and all vertices that are reachable from the chosen vertex by traversing some edges. Erasing a vertex will also erase the edges incident to it.

    Let [math]\displaystyle{ R_v }[/math] denotes the number of vertices from which vertex [math]\displaystyle{ v }[/math] is reachable. Prove that the expected value of the number of operations equals to [math]\displaystyle{ \sum_{v \in V} \frac{1}{R_v} }[/math].

  • [Expected Mex] Let [math]\displaystyle{ X_1,X_2,\ldots,X_{100} \sim \mathrm{Geo}(1/2) }[/math] be independent random variables. Compute [math]\displaystyle{ \mathbf{E}[\mathrm{mex}(X_1,X_2,\ldots,X_{100})] }[/math], where [math]\displaystyle{ \mathrm{mex}(a_1,a_2,\ldots,a_n) }[/math] is the smallest positive integer that does not appear in [math]\displaystyle{ a_1,a_2,\ldots,a_n }[/math]. Your answer is considered correct if the absolute error does not exceed [math]\displaystyle{ 10^{-6} }[/math]. (Hint: Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required.)

Problem 4 (Probability meets graph theory)

  • [Random social networks] Let [math]\displaystyle{ G = (V, E) }[/math] be a fixed undirected graph without isolating vertex. Let [math]\displaystyle{ d_v }[/math] be the degree of vertex [math]\displaystyle{ v }[/math]. Let [math]\displaystyle{ Y }[/math] be a uniformly chosen vertex, and [math]\displaystyle{ Z }[/math] a uniformly chosen neighbor of [math]\displaystyle{ Y }[/math].
    1. Show that [math]\displaystyle{ \mathbf{E}[d_Z] \geq \mathbf{E}[d_Y] }[/math].
    2. Interpret this inequality in the context of social networks, in which the vertices represent people, and the edges represent friendship.
  • [Turán's Theorem] Let [math]\displaystyle{ G=(V,E) }[/math] be a fixed undirected graph, and write [math]\displaystyle{ d_v }[/math] for the degree of the vertex [math]\displaystyle{ v }[/math]. Use probablistic method to prove that [math]\displaystyle{ \alpha(G) \ge \sum_{v \in V} \frac{1}{d_v + 1} }[/math], where [math]\displaystyle{ \alpha(G) }[/math] is the size of a maximum independent set. (Hint: Consider the following random procedure for generating an independent set [math]\displaystyle{ I }[/math] from a graph with vertex set [math]\displaystyle{ V }[/math]: First, generate a random permutation of the vertices, denoted as [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math]. Then, construct the independent set [math]\displaystyle{ I }[/math] as follows: For each vertex [math]\displaystyle{ v_i \in V }[/math], add [math]\displaystyle{ v_i }[/math] to [math]\displaystyle{ I }[/math] if and only if none of its predecessors in the permutation, i.e., [math]\displaystyle{ v_1,\ldots,v_{i-1} }[/math], are neighbors of [math]\displaystyle{ v_i }[/math].)
  • [Tournament] Prove that there is a tournament [math]\displaystyle{ T }[/math] with [math]\displaystyle{ n }[/math] players and at least [math]\displaystyle{ n!/2^{n-1} }[/math]Hamiltonian paths.

Problem 5 (1D random walk, 8 points)

Let [math]\displaystyle{ p \in (0,1) }[/math] be a constant, and [math]\displaystyle{ \{X_n\}_{n \ge 1} }[/math] be independent Bernoulli trials with successful probability [math]\displaystyle{ p }[/math]. Define [math]\displaystyle{ S_n = 2\sum_{i=1}^n X_i - n }[/math] and [math]\displaystyle{ S_0 = 0 }[/math].

  • [Range of random walk] The range [math]\displaystyle{ R_n }[/math] of [math]\displaystyle{ S_0, S_1, \ldots, S_n }[/math] is defined as the number of distinct values taken by the sequence. Show that [math]\displaystyle{ \mathbf{Pr}\left(R_n = R_{n-1}+1\right) = \mathbf{Pr}\left(\forall 1 \le i \le n, S_i \neq 0\right) }[/math] as [math]\displaystyle{ n \to \infty }[/math], and deduce that [math]\displaystyle{ n^{-1} \mathbf{E}[R_n]\to \mathbf{Pr}(\forall i \ge 1, S_i \neq 0) }[/math]. Hence show that [math]\displaystyle{ n^{-1} \mathbf{E}[R_n] \to |2p-1| }[/math] as [math]\displaystyle{ n \to \infty }[/math].
  • [Symmetric 1D random walk (III)] Suppose [math]\displaystyle{ p = \frac{1}{2} }[/math]. Prove that [math]\displaystyle{ \mathbf{E}[|S_n|] = \Theta(\sqrt{n}) }[/math].