# 概率论与数理统计 (Spring 2023)/Problem Set 4

• 每道题目的解答都要有完整的解题过程，中英文不限。
• 我们推荐大家使用LaTeX, markdown等对作业进行排版。
• Bonus problem为附加题（选做）。

## Assumption throughout Problem Set 4

Without further notice, we are working on probability space $\displaystyle{ (\Omega,\mathcal{F},\mathbf{Pr}) }$.

Without further notice, we assume that the expectation of random variables are well-defined.

The term $\displaystyle{ \log }$ used in this context refers to the natural logarithm.

## Problem 1 (Continuous Random Variables, 30 points)

• [Density function] Determine the value of $\displaystyle{ C }$ such that $\displaystyle{ f(x) = C\exp(-x-e^{-x}), x\in \mathbb{R} }$ is a probability density function (PDF) for a continuous random variable.
• [Independence] Let $\displaystyle{ X }$ and $\displaystyle{ Y }$ be independent and identically distributed continuous random variables with cumulative distribution function (CDF) $\displaystyle{ F }$ and probability density function (PDF) $\displaystyle{ f }$. Find out the density functions of $\displaystyle{ V = \max\{X,Y\} }$ and $\displaystyle{ U = \min\{X,Y\} }$.
• [Correlation] Let $\displaystyle{ X }$ be uniformly distributed on $\displaystyle{ (-1,1) }$ and $\displaystyle{ Y_k = \cos(k \pi X) }$ for $\displaystyle{ k=1,2,\ldots,n }$. Are the random variables $\displaystyle{ Y_1, Y_2, \ldots, Y_n }$ correlated? independent? You should prove your claim rigorously.
• [Expectation of random variables (I)] Let $\displaystyle{ X }$ be a continuous random variable with mean $\displaystyle{ \mu }$ and cumulative distribution function (CDF) $\displaystyle{ F }$.
• Suppose $\displaystyle{ X \ge 0 }$. Show that $\displaystyle{ \int_{0}^a F(x) dx = \int_{a}^{\infty} [1-F(x)] dx }$ if and only if $\displaystyle{ a = \mu }$.
• Suppose $\displaystyle{ X }$ has finite variance. Show that $\displaystyle{ g(a) = \mathbb{E}((X-a)^2) }$ achieves the minimum when $\displaystyle{ a = \mu }$.
• [Expectation of random variables (II)] Let $\displaystyle{ X, Y }$ be two independent and identically distributed continuous random variables with cumulative distribution function (CDF) $\displaystyle{ F }$. Furthermore, $\displaystyle{ X,Y \ge 0 }$. Show that $\displaystyle{ \mathbb{E}[|X-Y|] = 2 \left(\mathbb{E}[X] - \int_{0}^{\infty} (1-F(x))^2 dx\right) }$
• [Conditional distribution] Let $\displaystyle{ X }$ and $\displaystyle{ Y }$ be two random variables. The joint density of $\displaystyle{ X }$ and $\displaystyle{ Y }$ is given by $\displaystyle{ f(x,y) = c(x^2 - y^2)e^{-x} }$, where $\displaystyle{ 0\leq x \lt \infty }$ and $\displaystyle{ -x\leq y \leq x }$. Here, $\displaystyle{ c\in \mathbb{R}_+ }$ is a constant. Find out the conditional distribution of $\displaystyle{ Y }$, given $\displaystyle{ X = x }$.
• [Uniform Distribution (I)] Let $\displaystyle{ P_i = (X_i,Y_i), 1\leq i\leq n }$, be independent, uniformly distributed points in the unit square $\displaystyle{ [0,1]^2 }$. A point $\displaystyle{ P_i }$ is called "peripheral" if, for all $\displaystyle{ r = 1,2,\cdots,n }$, either $\displaystyle{ X_r \leq X_i }$ or $\displaystyle{ Y_r \leq Y_i }$, or both. Find out the expected number of peripheral points.
• [Uniform Distribution (II)] Derive the moment generating function of the standard uniform distribution, i.e., uniform distribution on $\displaystyle{ (0,1) }$.
• [Exponential distribution] Let $\displaystyle{ X }$ have an exponential distribution. Show that $\displaystyle{ \textbf{Pr}[X\gt s+x|X\gt s] = \textbf{Pr}[X\gt x] }$, for $\displaystyle{ x,s\geq 0 }$. This is the memoryless property. Show that the exponential distribution is the only continuous distribution with this property.
• [Normal distribution(I)] Let $\displaystyle{ X,Y\sim N(0,1) }$ be two independent and identically distributed normal random variables. Let $\displaystyle{ Z = X-Y }$. Find the density function of $\displaystyle{ Z }$ and $\displaystyle{ |Z| }$ respectively.
• [Normal distribution(II)] Let $\displaystyle{ X }$ have the $\displaystyle{ N(0,1) }$ distribution and let $\displaystyle{ a\gt 0 }$. Show that the random variable $\displaystyle{ Y }$ given by $\displaystyle{ \begin{equation*} Y = \begin{cases} X, & |X|\lt a \\ -X, & |X|\geq a \end{cases} \end{equation*} }$ has the $\displaystyle{ N(0,1) }$ distribution, and find an expression for $\displaystyle{ \rho(a) = \textbf{Cov}(X,Y) }$ in terms of the density function $\displaystyle{ \phi }$ of $\displaystyle{ X }$.
• [Random process (I)] Given a real number $\displaystyle{ U\lt 1 }$ as input of the following process, find out the expected returning value.
 Process 1 Input: real numbers $\displaystyle{ U \lt 1 }$; initialize $\displaystyle{ x = 1 }$ and $\displaystyle{ count = 0 }$; while $\displaystyle{ x \gt U }$ do choose $\displaystyle{ y \in (0,1) }$ uniformly at random; update $\displaystyle{ x = x * y }$ and $\displaystyle{ count = count + 1 }$; return $\displaystyle{ count }$;
• [Random process (II)] Given a real number $\displaystyle{ U\lt 1 }$ as input of the following process, find out the expected returning value.
 Process 2 Input: real numbers $\displaystyle{ U \lt 1 }$; initialize $\displaystyle{ x = 0 }$ and $\displaystyle{ count = 0 }$; while $\displaystyle{ x \lt U }$ do choose $\displaystyle{ y \in (0,1) }$ uniformly at random; update $\displaystyle{ x = x + y }$ and $\displaystyle{ count = count + 1 }$; return $\displaystyle{ count }$;
• [Random semicircle] We sample $\displaystyle{ n }$ points within a circle $\displaystyle{ C=\{(x,y) \in \mathbb{R}^2 \mid x^2+y^2 \le 1\} }$ independently and uniformly at random (i.e., the density function $\displaystyle{ f(x,y) \propto 1_{(x,y) \in C} }$). Find out the probability that they all lie within some semicircle of the original circle $\displaystyle{ C }$. (Hint: you may apply the technique of change of variables, see function of random variables or Chapter 4.7 in [GS])
• [Stochastic domination] Let $\displaystyle{ X, Y }$ be continuous random variables. Show that $\displaystyle{ X }$ dominates $\displaystyle{ Y }$ stochastically if and only if $\displaystyle{ \mathbb{E}[f(X)]\geq \mathbb{E}[f(Y)] }$ for any non-decreasing function $\displaystyle{ f }$ for which the expectations exist.

## Problem 2 (Modes of Convergence, 15 points) (Bonus problem)

• [Connection of convergence modes (I)] Let $\displaystyle{ (X_n)_{n \ge 1}, (Y_n)_{n \ge 1}, X, Y }$ be random variables and $\displaystyle{ c\in\mathbb{R} }$ be a real number.
• Suppose $\displaystyle{ X_n \overset{D}{\to} X }$ and $\displaystyle{ Y_n \overset{D}{\to} c }$. Prove that $\displaystyle{ X_nY_n \overset{D}{\to} cX }$.
• Construct an example such that $\displaystyle{ X_n \overset{D}{\to} X }$ and $\displaystyle{ Y_n \overset{D}{\to} Y }$ but $\displaystyle{ X_nY_n }$ does not converge to $\displaystyle{ XY }$ in distribution.
• [Connection of convergence modes (II)] Let $\displaystyle{ (X_n)_{n \ge 1}, X }$ be random variables. Prove that $\displaystyle{ X_n \overset{P}{\to} X }$ if and only if for every subsequence $\displaystyle{ X_{n(m)} }$, there exists a further subsequence $\displaystyle{ Y_k = X_{n(m_k)} }$ that converges almost surely to $\displaystyle{ X }$. (Hint: you may use the first Borel-Cantelli lemma.)
• [Extension of Borel-Cantelli Lemma] Let $\displaystyle{ (A_n)_{n \ge 1} }$ be events. Suppose $\displaystyle{ \sum_{n \ge 1} \mathbf{Pr}(A_n)=+\infty }$. Show that $\displaystyle{ \mathbf{Pr}(A_n \text{ i.o.}) \ge \limsup_{n \to \infty} \frac{ \left(\sum_{k=1}^n\mathbf{Pr}(A_k)\right)^2 }{\sum_{1\le j,k \le n} \mathbf{Pr}(A_j \cap A_k)} }$.

## Problem 3 (LLN and CLT, 15 points + 5 points)

In this problem, you may apply the results of Laws of Large Numbers (LLN) and the Central Limit Theorem (CLT) to solve the problems.

• [St. Petersburg paradox] Consider the well-known game involving a fair coin. In this game, if it takes $\displaystyle{ k }$ tosses to obtain a head, you will win $\displaystyle{ 2^k }$ dollars as the reward. Despite the game's expected reward being infinite, people tend to offer relatively modest amounts to participate. The following provides a mathematical explanation for this phenomenon.
• For each $\displaystyle{ n \ge 1 }$, let $\displaystyle{ X_{n,1}, X_{n,2},\ldots, X_{n,k} }$ be independent random variables. Furthermore, let $\displaystyle{ b_n \gt 0 }$ be real numbers with $\displaystyle{ b_n \to \infty }$ and $\displaystyle{ \widetilde{X}_{n,k} = X_{n,k} \mathbf{1}_{|X_{n,k}| \le b_n} }$ for all $\displaystyle{ 1 \le k \le n }$. If $\displaystyle{ \sum_{k=1}^n \mathbf{Pr}(|X_{n,k}| \gt b_n) \to 0 }$ and $\displaystyle{ b_n^{-2} \sum_{k=1}^n \mathbf{E}[\widetilde{X}_{n,k}^2] \to 0 }$ when $\displaystyle{ n \to \infty }$, then $\displaystyle{ (S_n-a_n)/b_n \overset{P}{\to} 0 }$, where $\displaystyle{ S_n = \sum_{k=1}^n X_{n,k} }$ and $\displaystyle{ a_n = \sum_{k=1}^n \mathbf{E}[\widetilde{X}_{n,k}] }$.
• Let $\displaystyle{ S_n }$ be the total winnings after playing $\displaystyle{ n }$ rounds of the game. Prove that $\displaystyle{ \frac{S_n}{n \log_2 n} \overset{P}{\to} 1 }$. (Therefore, a fair price to play this game $\displaystyle{ n }$ times is roughly $\displaystyle{ n \log_2 n }$ dollars)
• (Bonus problem, 5 points) Let $\displaystyle{ S_n }$ be the total winnings after playing $\displaystyle{ n }$ rounds of the game. Prove that $\displaystyle{ \limsup_{n \to \infty} \frac{S_n}{n \log_2 n} = \infty }$ almost surely. (Hint: You may use Borel-Cantelli lemmas)
• [Asymptotic equipartition property] Let $\displaystyle{ X_1,X_2,\ldots \in \{1,2,\ldots,r\} }$ be independent random variables with density function $\displaystyle{ p }$. Let $\displaystyle{ \pi_n(\omega) = \prod_{i=1}^n p(X_i(\omega)) }$ be the probability of the realization we observed in the first $\displaystyle{ n }$ random variables. Let $\displaystyle{ H = -\sum_{k=1}^r p(k) \log p(k) }$ be the entropy of $\displaystyle{ X_1 }$. Prove that for any $\displaystyle{ \epsilon \gt 0 }$, $\displaystyle{ \mathbf{Pr}\left(e^{-n(H+\epsilon)} \lt \pi_n(\omega) \lt e^{-n(H-\epsilon)}\right) \to 1 }$ when $\displaystyle{ n \to \infty }$.
• [Normalized sum] Let $\displaystyle{ X_1,X_2,\ldots }$ be i.i.d. random variables with $\displaystyle{ \mathbf{E}[X_1] = 0 }$ and $\displaystyle{ \mathbf{Var}[X_1] = \sigma^2 \in (0,+\infty) }$. Show $\displaystyle{ \frac{\sum_{k=1}^n X_k}{\left(\sum_{k=1}^n X_k^2\right)^{1/2}} \overset{D}{\to} N(0,1) }$ as $\displaystyle{ n \to \infty }$.

## Problem 4 (Concentration of measure)

• [Tossing coins] We repeatedly toss a fair coin (with an equal probability of heads and tails). Let the random variable $\displaystyle{ X }$ be the number of throws required to obtain a total of $\displaystyle{ n }$ heads. Show that $\displaystyle{ \textbf{Pr}[X \gt 2n + 2\sqrt{n\log n}]\leq O(1/n) }$.
• [Chernoff vs Chebyshev] We have a standard six-sided die. Let $\displaystyle{ X }$ be the number of times a 6 occurs in $\displaystyle{ n }$ throws off the die. Compare the best upper bounds on $\displaystyle{ \textbf{Pr}[X\geq n/4] }$ that you can obtain using Chebyshev's inequality and Chernoff bounds.
• [$\displaystyle{ k }$-th moment bound] Let $\displaystyle{ X }$ be a random variable with expectation $\displaystyle{ 0 }$ such that moment generating function $\displaystyle{ \mathbf{E}[\exp(t|X|)] }$ is finite for some $\displaystyle{ t \gt 0 }$. We can use the following two kinds of tail inequalities for $\displaystyle{ X }$:

Chernoff Bound

\displaystyle{ \begin{align} \mathbf{Pr}[|X| \geq \delta] \leq \min_{t \geq 0} \frac{\mathbf{E}[e^{t|X|}]}{e^{t\delta}} \end{align} }

$\displaystyle{ k }$th-Moment Bound

\displaystyle{ \begin{align} \mathbf{Pr}[|X| \geq \delta] \leq \frac{\mathbf{E}[|X|^k]}{\delta^k} \end{align} }
1. Show that for each $\displaystyle{ \delta }$, there exists a choice of $\displaystyle{ k }$ such that the $\displaystyle{ k }$th-moment bound is no weaker than the Chernoff bound. (Hint: Use the probabilistic method.)
2. Why would we still prefer the Chernoff bound to the (seemingly) stronger $\displaystyle{ k }$-th moment bound?
• [Chernoff bound meets graph theory]
• Show that with a probability approaching 1 (as $\displaystyle{ n }$ tends to infinity), the Erdős–Rényi random graph $\displaystyle{ \textbf{G}(n,1/2) }$ has the property that the maximum degree is $\displaystyle{ (\frac{n}{2} + O(\sqrt{n\log n})) }$.
• Show that with a probability approaching 1 (as $\displaystyle{ n }$ tends to infinity), the Erdős–Rényi random graph $\displaystyle{ \textbf{G}(n,1/2) }$ has the property that the diameter is exactly 2. The diameter of a graph $\displaystyle{ G }$ is the maximum distance between any pair of vertices.