概率论与数理统计 (Spring 2023)/Problem Set 2
- 目前作业非最终版本!
- 每道题目的解答都要有完整的解题过程,中英文不限。
- 我们推荐大家使用LaTeX, markdown等对作业进行排版。
Assumption throughout Problem Set 2
Without further notice, we are working on probability space [math]\displaystyle{ (\Omega,\mathcal{F},\mathbf{Pr}) }[/math].
Without further notice, we assume that the expectation of random variables are well-defined.
The term [math]\displaystyle{ \log }[/math] used in this context refers to the natural logarithm.
Problem 1 (Warm-up problems)
- [Function of random variable (I)] Let [math]X[/math] be a random variable and [math]g:\mathbb{R} \to \mathbb{R}[/math] be a continuous and strictly increasing function. Show that [math]Y = g(X)[/math] is a random variable.
- [Function of random variable (II)] Let [math]X[/math] be a random variable with distribution function [math]\max(0,\min(1,x))[/math]. Let [math]F[/math] be a distribution function which is continuous and strictly increasing. Show that [math]Y=F^{-1}(X)[/math] be a random variable with distribution function [math]F[/math].
- [Marginal distribution] Let [math](X_1, X_2)[/math] be a random vector satisfying [math]\mathbf{Pr}[(X_1,X_2) = (0,0)] = \mathbf{Pr}[(X_1,X_2) = (1,0)] = \mathbf{Pr}[(X_1,X_2)=(0,1)]=\frac{1}{3}[/math]. Find out the marginal distribution of [math]X_1[/math].
- [Independence] Show that discrete random variables [math]X[/math] and [math]Y[/math] are independent if and only if [math]p_{X,Y}(x,y)[/math] can be written as [math]g(x) h(y)[/math] for some function [math]g,h[/math], where [math]p_{X,Y}[/math] is the joint mass function of [math](X,Y)[/math].
- [Entropy of discrete random variable] Let [math]X[/math] be a discrete random variable with range of values [math]\mathbb{N}_+[/math] and probability mass function [math]p[/math]. Define [math]H(X) = -\sum_{n \ge 1} p(n) \log p(n)[/math] with convention [math]0\log 0 = 0[/math]. Prove that [math]H(X) \ge 0[/math] using Jensen's inequality.
- [Law of total expectation] Let [math]X \sim \mathrm{Geom}(p)[/math] for some parameter [math]p \in (0,1)[/math]. Calculate [math]\mathbf{E}[X][/math] using the law of total expectation.
- [Random number of random variables] Let [math]\displaystyle{ \{X_n\}_{n \ge 1} }[/math] be identically distributed random variable and [math]\displaystyle{ N }[/math] be a random variable taking values in the non-negative integers and independent of the [math]\displaystyle{ X_n }[/math] for all [math]\displaystyle{ n \ge 1 }[/math]. Prove that [math]\displaystyle{ \mathbf{E}\left[\sum_{i=1}^N X_i\right] = \mathbf{E}[N] \mathbf{E}[X_1] }[/math].
Problem 2 (Distribution of random variable)
- [Cumulative distribution function (CDF)] Let [math]\displaystyle{ X }[/math] be a random variable with cumulative distribution function [math]\displaystyle{ F }[/math].
- Show that [math]\displaystyle{ Y = aX+b }[/math] is a random variable where [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] are real constants, and express the CDF of [math]\displaystyle{ Y }[/math] by [math]\displaystyle{ F }[/math].
- Let [math]\displaystyle{ G }[/math] be the CDF of random variable [math]\displaystyle{ Z:\Omega\rightarrow \mathbb{R} }[/math] and [math]\displaystyle{ 0\leq \lambda \leq 1 }[/math], show that
- [math]\displaystyle{ \lambda F + (1-\lambda)G }[/math] is a CDF function.
- The product [math]\displaystyle{ FG }[/math] is a CDF function, and if [math]\displaystyle{ Z }[/math] and [math]\displaystyle{ X }[/math] are independent, then [math]\displaystyle{ FG }[/math] is the CDF of [math]\displaystyle{ \max\{X,Z\} }[/math].
- [Probability mass function (PMF)] We toss [math]\displaystyle{ n }[/math] coins, and each one shows heads with probability [math]\displaystyle{ p }[/math], independently of each of the others. Each coin which shows head is tossed again. (If the coin shows tail, it won't be tossed again.) Let [math]\displaystyle{ X }[/math] be the number of heads resulting from the second round of tosses, and [math]\displaystyle{ Y }[/math] be the number of heads resulting from all tosses, which includes the first and (possible) second round of each toss.
- Find the PMF of [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math].
- Find [math]\displaystyle{ \mathbb{E}[X] }[/math] and [math]\displaystyle{ \mathbb{E}[Y] }[/math].
- Let [math]\displaystyle{ p_X }[/math] be the PMF of [math]\displaystyle{ X }[/math], show that [math]p_X(k-1)p_X(k+1)\leq [p_X(k)]^2[/math] for [math]\displaystyle{ 1\leq k \leq n-1 }[/math].
- [Probability density function (PDF)] Let [math]p(x) = C e^{-x-e^{-x}}[/math]. Find the value of [math]C[/math] so that [math]p[/math] is a probability density function.
Problem 3 (Discrete random variable, 20 points)
- [Geometric distribution (I)] Every package of some intrinsically dull commodity includes a small and exciting plastic object. There are [math]c[/math] different types of object, and each package is equally likely to contain any given type. You buy one package each day.
- Find the expected number of days which elapse between the acquisitions of the [math]j[/math]-th new type of object and the [math](j + 1)[/math]-th new type.
- Find the expected number of days which elapse before you have a full set of objects.
- [Geometric distribution (II)] Prove that geometry distribution is the only discrete memoryless distribution with range values [math]\displaystyle{ \mathbb{N}_+ }[/math].
- [Binomial distribution] Let [math]\displaystyle{ n_1,n_2 \in \mathbb{N}_+ }[/math] and [math]\displaystyle{ 0 \le p \le 1 }[/math] be parameters, and [math]\displaystyle{ X \sim \mathrm{Bin}(n_1,p),Y \sim \mathrm{Bin}(n_2,p) }[/math] be independent random variables. Prove that [math]\displaystyle{ X+Y \sim \mathrm{Bin}(n_1+n_2,p) }[/math].
- [Negative binomial distribution] Let [math]\displaystyle{ X }[/math] follows the negative binomial distribution with parameter [math]\displaystyle{ r \in \mathbb{N}_+ }[/math] and [math]\displaystyle{ p \in (0,1) }[/math]. Calculate [math]\displaystyle{ \mathbf{Var}[X] = \mathbf{E}[X^2] - \left(\mathbf{E}[X]\right)^2 }[/math].
- [Hypergeometric distribution] An urn contains [math]N[/math] balls, [math]b[/math] of which are blue and [math]r = N -b[/math] of which are red. A random sample of [math]n[/math] balls is drawn without replacement (无放回) from the urn. Let [math]B[/math] the number of blue balls in this sample. Show that if [math]N, b[/math], and [math]r[/math] approach [math]+\infty[/math] in such a way that [math]b/N \rightarrow p[/math] and [math]r/N \rightarrow 1 - p[/math], then [math]\mathbf{Pr}(B = k) \rightarrow {n\choose k}p^k(1-p)^{n-k}[/math] for [math]0\leq k \leq n[/math].
- [Poisson distribution] In your pocket is a random number [math]\displaystyle{ N }[/math] of coins, where [math]\displaystyle{ N }[/math] has the Poisson distribution with parameter [math]\displaystyle{ \lambda }[/math]. You toss each coin once, with heads showing with probability [math]\displaystyle{ p }[/math] each time. Let [math]\displaystyle{ X }[/math] be the (random) number of heads outcomes and [math]\displaystyle{ Y }[/math] be the (also random) number of tails.
- Find the joint mass function of [math]\displaystyle{ (X,Y) }[/math].
- Find PMF of the marginal distribution of [math]\displaystyle{ X }[/math] in [math]\displaystyle{ (X,Y) }[/math]. Are [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] independent?
- [Conditional distribution (I)] Let [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] be independent [math]\displaystyle{ \text{Bin}(n, p) }[/math] random variables, and let [math]\displaystyle{ Z = X + Y }[/math]. Show that the conditional distribution of [math]\displaystyle{ X }[/math] given [math]\displaystyle{ Z = N }[/math] is the hypergeometric distribution.
- [Conditional distribution (II)] Let [math]\displaystyle{ \lambda,\mu \gt 0 }[/math] and [math]\displaystyle{ n \in \mathbb{N} }[/math] be parameters, and [math]\displaystyle{ X \sim \mathrm{Pois}(\lambda), Y \sim \mathrm{Pois}(\mu) }[/math] be independent random variables. Find out the conditional distribution of [math]\displaystyle{ X }[/math], given [math]\displaystyle{ X+Y = n }[/math].
Problem 4 (Probability meets graph theory)
In this part we will work on undirected simple graphs. For any [math]\displaystyle{ G = (V,E) }[/math], a clique (团,完全子图) is a subset of vertices [math]\displaystyle{ C\subset V }[/math], such that every two distinct vertices of [math]\displaystyle{ C }[/math] are adjacent, and an independent set (独立集) is a subset of vertices [math]\displaystyle{ I\subset V }[/math] such that no two distinct vertices of [math]\displaystyle{ I }[/math] are adjacent.
- [Erdős–Rényi random graph]
Consider [math]\displaystyle{ G\sim G(n,p) }[/math] where [math]\displaystyle{ G(n,p) }[/math] is the Erdős–Rényi random graph model.
- Let [math]\displaystyle{ p\in (0,1) }[/math]. A " triangle " in a graph is a clique of size [math]\displaystyle{ 3 }[/math]. Find the expected number of triangles in [math]\displaystyle{ G }[/math].
- Let [math]\displaystyle{ p\in (0,1) }[/math]. For any [math]\displaystyle{ 2\leq q\leq n }[/math], let the random variable [math]\displaystyle{ N_q }[/math] be the number of [math]\displaystyle{ q }[/math]-cliques. Here, a [math]\displaystyle{ q }[/math]-clique is a clique of size [math]\displaystyle{ q }[/math]. Find [math]\displaystyle{ \mathbf{E}[N_q] }[/math]. (Hint: use indicators and the linearity of expectation.)
- Let [math]\displaystyle{ p = 1/2 }[/math]. For an undirected graph [math]\displaystyle{ G }[/math], define [math]\displaystyle{ \alpha(G) = \max\{|S|:S \text{ is an independent set}\} }[/math]. Show that when [math]\displaystyle{ n\rightarrow \infty }[/math], [math]\displaystyle{ \mathbf{Pr}[\alpha(G) \geq 3\log_2 n +1] \rightarrow 0 }[/math]. Also, please interpret this result in the context of social networks, in which the vertices represent people, and the edges represent friendship.
- [Random social networks]
Let [math]\displaystyle{ G = (V, E) }[/math] be a fixed undirected graph without isolating vertex.
Let [math]\displaystyle{ d_v }[/math] be the degree of vertex [math]\displaystyle{ v }[/math]. Let [math]\displaystyle{ Y }[/math] be a uniformly chosen vertex, and [math]\displaystyle{ Z }[/math] a uniformly chosen neighbor of [math]\displaystyle{ Y }[/math].
- Show that [math]\displaystyle{ \mathbf{E}[d_Z] \geq \mathbf{E}[d_Y] }[/math].
- Interpret this inequality in the context of social networks, in which the vertices represent people, and the edges represent friendship.
- [Turán's Theorem] Let [math]\displaystyle{ G=(V,E) }[/math] be a fixed undirected graph, and write [math]\displaystyle{ d_v }[/math] for the degree of the vertex [math]\displaystyle{ v }[/math]. Use probablistic method to prove that [math]\displaystyle{ \alpha(G) \ge \sum_{v \in V} \frac{1}{d_v + 1} }[/math]. (Hint: Consider the following random procedure for generating an independent set [math]\displaystyle{ I }[/math] from a graph with vertex set [math]\displaystyle{ V }[/math]: First, generate a random permutation of the vertices, denoted as [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math]. Then, construct the independent set [math]\displaystyle{ I }[/math] as follows: For each vertex [math]\displaystyle{ v_i \in V }[/math], add [math]\displaystyle{ v_i }[/math] to [math]\displaystyle{ I }[/math] if and only if none of its predecessors in the permutation, i.e., [math]\displaystyle{ v_1,\ldots,v_{i-1} }[/math], are neighbors of [math]\displaystyle{ v_i }[/math].)
Problem 5 (1D random walk)
Let [math]\displaystyle{ p \in (0,1) }[/math] be a constant, and [math]\displaystyle{ \{X_n\}_{n \ge 1} }[/math] be independent Bernoulli trials with successful probability [math]\displaystyle{ p }[/math]. Define [math]\displaystyle{ S_n = 2\sum_{i=1}^n X_i - n }[/math] and [math]\displaystyle{ S_0 = 0 }[/math].
- [Range of random walk] The range [math]\displaystyle{ R_n }[/math] of [math]\displaystyle{ S_0, S_1, \ldots, S_n }[/math] is defined as the number of distinct values taken by the sequence. Show that [math]\displaystyle{ \mathbf{Pr}\left(R_n = R_{n-1}+1\right) = \mathbf{Pr}\left(\forall 1 \le i \le n, S_i \neq 0\right) }[/math] as [math]\displaystyle{ n \to \infty }[/math], and deduce that [math]\displaystyle{ n^{-1} \mathbf{E}[R_n]\to \mathbf{Pr}(\forall i \ge 1, S_i \neq 0) }[/math]. Hence show that [math]\displaystyle{ n^{-1} \mathbf{E}[R_n] \to |2p-1| }[/math] as [math]\displaystyle{ n \to \infty }[/math].
- [Symmetric 1D random walk (IV)] Suppose [math]\displaystyle{ p=\frac{1}{2} }[/math]. Let [math]\displaystyle{ N_n }[/math] be the number of points that have been visited by [math]\displaystyle{ S }[/math] exactly once up to [math]\displaystyle{ n }[/math], that is the size of set [math]\displaystyle{ \{0 \le i \le n\mid \forall 0 \le j \le n \text{ and } j \neq i,S_i \neq S_j\} }[/math]. Prove that [math]\displaystyle{ \mathbf{E}[N_n] = 2 }[/math] for all [math]\displaystyle{ n \ge 1 }[/math].
- [Symmetric 1D random walk (V)] Suppose [math]\displaystyle{ p = \frac{1}{2} }[/math]. Prove that [math]\displaystyle{ \mathbf{E}[|S_n|] = \Theta(\sqrt{n}) }[/math].