概率论与数理统计 (Spring 2023)/Problem Set 2: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
Zouzongrui (talk | contribs)
No edit summary
Zouzongrui (talk | contribs)
No edit summary
Line 43: Line 43:
<li>Find the PMF of <math>X</math> and <math>Y</math>.</li>
<li>Find the PMF of <math>X</math> and <math>Y</math>.</li>
<li>Find <math>\mathbb{E}[X]</math> and <math>\mathbb{E}[Y]</math>.</li>
<li>Find <math>\mathbb{E}[X]</math> and <math>\mathbb{E}[Y]</math>.</li>
<li>Let <math>f</math> be the PMF of <math>X</math>, show that [math]f(k-1)f(k+1)\leq f(k)^2[/math] for <math>1\leq k \leq n-1</math>.</li>
<li>Let <math>p_X</math> be the PMF of <math>X</math>, show that [math]p_X(k-1)p_X(k+1)\leq p_X(k)^2[/math] for <math>1\leq k \leq n-1</math>.</li>
</ol></li>
</ol></li>



Revision as of 05:24, 28 March 2023

  • 目前作业非最终版本!
  • 每道题目的解答都要有完整的解题过程,中英文不限。
  • 我们推荐大家使用LaTeX, markdown等对作业进行排版。

Assumption throughout Problem Set 2

Without further notice, we are working on probability space [math]\displaystyle{ (\Omega,\mathcal{F},\mathbf{Pr}) }[/math].

Without further notice, we assume that the expectation of random variables are well-defined.

The term [math]\displaystyle{ \log }[/math] used in this context refers to the natural logarithm.

Problem 1 (Warm-up problems)

  • [Function of random variable (I)] Let [math]X[/math] be a random variable and [math]g:\mathbb{R} \to \mathbb{R}[/math] be a continuous and strictly increasing function. Show that [math]Y = g(X)[/math] is a random variable.
  • [Function of random variable (II)] Let [math]X[/math] be a random variable with distribution function [math]\max(0,\min(1,x))[/math]. Let [math]F[/math] be a distribution function which is continuous and strictly increasing. Show that [math]Y=F^{-1}(X)[/math] be a random variable with distribution function [math]F[/math].
  • [Marginal distribution] Let [math](X_1, X_2)[/math] be a random vector satisfying [math]\mathbf{Pr}[(X_1,X_2) = (0,0)] = \mathbf{Pr}[(X_1,X_2) = (1,0)] = \mathbf{Pr}[(X_1,X_2)=(0,1)]=\frac{1}{3}[/math]. Find out the marginal distribution of [math]X_1[/math].
  • [Independence] Show that discrete random variables [math]X[/math] and [math]Y[/math] are independent if and only if [math]p_{X,Y}(x,y)[/math] can be written as [math]g(x) h(y)[/math] for some function [math]g,h[/math], where [math]p_{X,Y}[/math] is the joint mass function of [math](X,Y)[/math].
  • [Entropy of discrete random variable] Let [math]X[/math] be a discrete random variable with range of values [math]\mathbb{N}_+[/math] and probability mass function [math]p[/math]. Define [math]H(X) = -\sum_{n \ge 1} p(n) \log p(n)[/math] with convention [math]0\log 0 = 0[/math]. Prove that [math]H(X) \ge 0[/math] using Jensen's inequality.
  • [Law of total expectation] Let [math]X \sim \mathrm{Geom}(p)[/math] for some parameter [math]p \in (0,1)[/math]. Calculate [math]\mathbf{E}[X][/math] using the law of total expectation.
  • [Random number of random variables] Let [math]\displaystyle{ \{X_n\}_{n \ge 1} }[/math] be identically distributed random variable and [math]\displaystyle{ N }[/math] be a random variable taking values in the non-negative integers and independent of the [math]\displaystyle{ X_n }[/math] for all [math]\displaystyle{ n \ge 1 }[/math]. Prove that [math]\displaystyle{ \mathbf{E}\left[\sum_{i=1}^N X_i\right] = \mathbf{E}[N] \mathbf{E}[X_1] }[/math].

Problem 2 (Distribution of random variable)

  • [Cumulative distribution function (CDF)] Let [math]\displaystyle{ X }[/math] be a random variable with cumulative distribution function [math]\displaystyle{ F }[/math].
    1. Show that [math]\displaystyle{ Y = aX+b }[/math] is a random variable where [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] are real constants, and express the CDF of [math]\displaystyle{ Y }[/math] by [math]\displaystyle{ F }[/math].
    2. Let [math]\displaystyle{ G }[/math] be the CDF of random variable [math]\displaystyle{ Z:\Omega\rightarrow \mathbb{R} }[/math] and [math]\displaystyle{ 0\leq \lambda \leq 1 }[/math], show that
      • [math]\displaystyle{ \lambda F + (1-\lambda)G }[/math] is a CDF function.
      • The product [math]\displaystyle{ FG }[/math] is a CDF function, and if [math]\displaystyle{ Z }[/math] and [math]\displaystyle{ X }[/math] are independent, then [math]\displaystyle{ FG }[/math] is the CDF of [math]\displaystyle{ \max\{X,Z\} }[/math].
  • [Probability mass function (PMF)] We toss [math]\displaystyle{ n }[/math] coins, and each one shows heads with probability [math]\displaystyle{ p }[/math], independently of each of the others. Each coin which shows head is tossed again. (If the coin shows tail, it won't be tossed again.) Let [math]\displaystyle{ X }[/math] be the number of heads resulting from the second round of tosses, and [math]\displaystyle{ Y }[/math] be the number of heads resulting from all tosses, which includes the first and (possible) second round of each toss.
    1. Find the PMF of [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math].
    2. Find [math]\displaystyle{ \mathbb{E}[X] }[/math] and [math]\displaystyle{ \mathbb{E}[Y] }[/math].
    3. Let [math]\displaystyle{ p_X }[/math] be the PMF of [math]\displaystyle{ X }[/math], show that [math]p_X(k-1)p_X(k+1)\leq p_X(k)^2[/math] for [math]\displaystyle{ 1\leq k \leq n-1 }[/math].
  • [Probability density function (PDF)] Let [math]p(x) = C e^{-x-e^{-x}}[/math]. Find the value of [math]C[/math] so that [math]p[/math] is a probability density function.

Problem 3 (Discrete random variable, 20 points)

  • [Geometric distribution (I)] Every package of some intrinsically dull commodity includes a small and exciting plastic object. There are [math]c[/math] different types of object, and each package is equally likely to contain any given type. You buy one package each day.

    1. Find the expected number of days which elapse between the acquisitions of the [math]j[/math]-th new type of object and the [math](j + 1)[/math]-th new type.
    2. Find the expected number of days which elapse before you have a full set of objects.
  • [Geometric distribution (II)] Prove that geometry distribution is the only discrete memoryless distribution with range values [math]\displaystyle{ \mathbb{N}_+ }[/math].
  • [Binomial distribution] Let [math]\displaystyle{ n_1,n_2 \in \mathbb{N}_+ }[/math] and [math]\displaystyle{ 0 \le p \le 1 }[/math] be parameters, and [math]\displaystyle{ X \sim \mathrm{Bin}(n_1,p),Y \sim \mathrm{Bin}(n_2,p) }[/math] be independent random variables. Prove that [math]\displaystyle{ X+Y \sim \mathrm{Bin}(n_1+n_2,p) }[/math].
  • [Negative binomial distribution] Let [math]\displaystyle{ X }[/math] follows the negative binomial distribution with parameter [math]\displaystyle{ r \in \mathbb{N}_+ }[/math] and [math]\displaystyle{ p \in (0,1) }[/math]. Calculate [math]\displaystyle{ \mathbf{Var}[X] = \mathbf{E}[X^2] - \left(\mathbf{E}[X]\right)^2 }[/math].
  • [Hypergeometric distribution] An urn contains [math]N[/math] balls, [math]b[/math] of which are blue and [math]r = N -b[/math] of which are red. A random sample of [math]n[/math] balls is drawn without replacement (无放回) from the urn. Let [math]B[/math] the number of blue balls in this sample. Show that if [math]N, b[/math], and [math]r[/math] approach [math]+\infty[/math] in such a way that [math]b/N \rightarrow p[/math] and [math]r/N \rightarrow 1 - p[/math], then [math]\mathbf{Pr}(B = k) \rightarrow {n\choose k}p^k(1-p)^{n-k}[/math] for [math]0\leq k \leq n[/math].
  • [Poisson distribution] In your pocket is a random number [math]\displaystyle{ N }[/math] of coins, where [math]\displaystyle{ N }[/math] has the Poisson distribution with parameter [math]\displaystyle{ \lambda }[/math]. You toss each coin once, with heads showing with probability [math]\displaystyle{ p }[/math] each time. Let [math]\displaystyle{ X }[/math] be the (random) number of heads outcomes and [math]\displaystyle{ Y }[/math] be the (also random) number of tails.
    1. Find the joint mass function of [math]\displaystyle{ (X,Y) }[/math].
    2. Find PMF of the marginal distribution of [math]\displaystyle{ X }[/math] in [math]\displaystyle{ (X,Y) }[/math]. Are [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] independent?
  • [Conditional distribution (I)] Let [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] be independent [math]\displaystyle{ \text{Bin}(n, p) }[/math] random variables, and let [math]\displaystyle{ Z = X + Y }[/math]. Show that the conditional distribution of [math]\displaystyle{ X }[/math] given [math]\displaystyle{ Z = N }[/math] is the hypergeometric distribution.
  • [Conditional distribution (II)] Let [math]\displaystyle{ \lambda,\mu \gt 0 }[/math] and [math]\displaystyle{ n \in \mathbb{N} }[/math] be parameters, and [math]\displaystyle{ X \sim \mathrm{Pois}(\lambda), Y \sim \mathrm{Pois}(\mu) }[/math] be independent random variables. Find out the conditional distribution of [math]\displaystyle{ X }[/math], given [math]\displaystyle{ X+Y = n }[/math].

Problem 4 (Probability meets graph theory)

In this part we will work on undirected simple graphs. For any [math]\displaystyle{ G = (V,E) }[/math], a clique (团,完全子图) is a subset of vertices [math]\displaystyle{ C\subset V }[/math], such that every two distinct vertices of [math]\displaystyle{ C }[/math] are adjacent, and an independent set (独立集) is a subset of vertices [math]\displaystyle{ I\subset V }[/math] such that no two distinct vertices of [math]\displaystyle{ I }[/math] are adjacent.

  • [Turan's Theorem] Let [math]\displaystyle{ G=(V,E) }[/math] be an undirected graph, and write [math]\displaystyle{ d_v }[/math] for the degree of the vertex [math]\displaystyle{ v }[/math]. Use probablistic method to prove that [math]\displaystyle{ \alpha(G) \ge \sum_{v \in V} \frac{1}{d_v + 1} }[/math]. (Hint: You may design a random procedure that generates an independent set [math]\displaystyle{ I }[/math] and prove that [math]\displaystyle{ \mathbf{E}[I] \ge \sum_{v \in V} \frac{1}{d_v+1} }[/math].)
  • [Erdős–Rényi random graph] Consider [math]\displaystyle{ G\sim G(n,p) }[/math] where [math]\displaystyle{ G(n,p) }[/math] is the Erdős–Rényi random graph model.
    1. Let [math]\displaystyle{ p\in (0,1) }[/math]. For any [math]\displaystyle{ 2\leq q\leq n }[/math], let [math]\displaystyle{ N_q }[/math] be the random variable of number of [math]\displaystyle{ q }[/math]-cliques. Here, a [math]\displaystyle{ q }[/math]-clique is a clique of size [math]\displaystyle{ q }[/math]. Find [math]\displaystyle{ \mathbf{E}[N_q] }[/math]. (Hint: use indicators and the linearity of expectation.)
    2. Let [math]\displaystyle{ p = 1/2 }[/math]. For an undirected graph [math]\displaystyle{ G }[/math], define [math]\displaystyle{ \alpha(G) = \max\{|S|:S \text{ is an independent set}\} }[/math]. Show that when [math]\displaystyle{ n\rightarrow \infty }[/math], [math]\displaystyle{ \mathbf{Pr}[\alpha(G) \geq 3\log_2 n +1] \rightarrow 0 }[/math]. Also, please interpret this result in the context of social networks, in which the vertices represent people, and the edges represent friendship.
  • [Random social networks] Let [math]\displaystyle{ G = (V, E) }[/math] be an undirected graph without isolating vertex. Let [math]\displaystyle{ d_v }[/math] be the degree of vertex [math]\displaystyle{ v }[/math]. Let [math]\displaystyle{ Y }[/math] be a uniformly chosen vertex, and [math]\displaystyle{ Z }[/math] a uniformly chosen neighbor of [math]\displaystyle{ Y }[/math].
    1. Show that [math]\displaystyle{ \mathbf{E}[d_Z] \geq \mathbf{E}[d_Y] }[/math].
    2. Interpret this inequality in the context of social networks, in which the vertices represent people, and the edges represent friendship.

Problem 5 (1D random walk)

Let [math]\displaystyle{ p \in (0,1) }[/math] be a constant, and [math]\displaystyle{ \{X_n\}_{n \ge 1} }[/math] be independent Bernoulli trials with successful probability [math]\displaystyle{ p }[/math]. Define [math]\displaystyle{ S_n = 2\sum_{i=1}^n X_i - n }[/math] and [math]\displaystyle{ S_0 = 0 }[/math].

  • [Range of random walk] The range [math]\displaystyle{ R_n }[/math] of [math]\displaystyle{ S_0, S_1, \ldots, S_n }[/math] is defined as the number of distinct values taken by the sequence. Show that [math]\displaystyle{ \mathbf{Pr}\left(R_n = R_{n-1}+1\right) = \mathbf{Pr}\left(\forall 1 \le i \le n, S_i \neq 0\right) }[/math], and deduce that [math]\displaystyle{ n^{-1} \mathbf{E}[R_n]\to \mathbf{Pr}(\forall i \ge 1, S_i \neq 0) }[/math]. Hence show that [math]\displaystyle{ n^{-1} \mathbf{E}[R_n] \to |2p-1| }[/math].
  • [Symmetric 1D random walk (IV)] Suppose [math]\displaystyle{ p=\frac{1}{2} }[/math]. Let [math]\displaystyle{ N_n }[/math] be the number of points that have been visited by [math]\displaystyle{ S }[/math] exactly once up to [math]\displaystyle{ n }[/math], that is the size of set [math]\displaystyle{ \{0 \le i \le n\mid \forall 0 \le j \le n \text{ and } j \neq i,S_i \neq S_j\} }[/math]. Prove that [math]\displaystyle{ \mathbf{E}[N_n] = 2 }[/math] for all [math]\displaystyle{ n \ge 1 }[/math].
  • [Symmetric 1D random walk (V)] Suppose [math]\displaystyle{ p = \frac{1}{2} }[/math]. Prove that [math]\displaystyle{ \mathbf{E}[|S_n|] = \Theta(\sqrt{n}) }[/math].