概率论与数理统计 (Spring 2023)/Problem Set 2
Problem 1 (Warm-up problems)
- [Function of random variable (I)] Let [math]X[/math] be a random variable and [math]g:\mathbb{R} \to \mathbb{R}[/math] be a continuous and strictly increasing function. Show that [math]Y = g(X)[/math] is a random variable.
- [Function of random variable (II)] Let [math]X[/math] be a random variable with distribution function [math]\max(0,\min(1,x))[/math]. Let [math]F[/math] be a distribution function which is continuous and strictly increasing. Show that [math]Y=F^{-1}(X)[/math] be a random variable with distribution function [math]F[/math].
- [Marginal distribution] Let [math](X_1, X_2)[/math] be a random vector satisfying [math]\mathbf{Pr}[(X_1,X_2) = (0,0)] = \mathbf{Pr}[(X_1,X_2) = (1,0)] = \mathbf{Pr}[(X_1,X_2)=(0,1)]=\frac{1}{3}[/math]. Find out the marginal distribution of [math]X_1[/math].
- [Independence] Show that discrete random variables [math]X[/math] and [math]Y[/math] are independent if and only if [math]f_{X,Y}(x,y) = f_X(x) f_Y(y)[/math], where [math]f_{X,Y}[/math] is the joint mass function of [math](X,Y)[/math], and [math]f_X[/math] (respectively, [math]f_Y[/math]) is the mass function of [math]X[/math] (respectively, [math]Y[/math]).
- [Entropy of discrete random variable] Let [math]X[/math] be a discrete random variable with range of values [math]\mathbb{N}_+[/math] and probability mass function [math]p[/math]. Define [math]H(X) = -\sum_{n \ge 1} p(n) \log p(n)[/math] with convention [math]0\log 0 = 0[/math]. Prove that [math]H(X) \ge 0[/math] using Jensen's inequality.
- [Law of total expectation] Let [math]X \sim \mathrm{Geom}(p)[/math] for some parameter [math]p \in (0,1)[/math]. Calculate [math]\mathbf{E}[X][/math] using the law of total expectation.
Problem 2 (Distribution of random variable)
- [CDF] Given [math]\displaystyle{ (\Omega, \mathcal{F}, \mathbf{Pr}) }[/math], let [math]\displaystyle{ X }[/math] be a random variable with cumulative distribution function [math]\displaystyle{ F }[/math].
- Show that [math]\displaystyle{ Y = aX+b }[/math] is a random variable where [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] are real constants, and express the CDF of [math]\displaystyle{ Y }[/math] by [math]\displaystyle{ F }[/math].
- Let [math]\displaystyle{ G }[/math] be the CDF of random variable [math]\displaystyle{ Z:\Omega\rightarrow \mathbb{R} }[/math] and [math]\displaystyle{ 0\leq \lambda \leq 1 }[/math], show that
- [math]\displaystyle{ \lambda F + (1-\lambda)G }[/math] is a CDF function.
- The product [math]\displaystyle{ FG }[/math] is a CDF function, and if [math]\displaystyle{ Z }[/math] and [math]\displaystyle{ X }[/math] are independent, then [math]\displaystyle{ FG }[/math] is the CDF of [math]\displaystyle{ \max\{X,Z\} }[/math].
- [PMF] We toss [math]\displaystyle{ n }[/math] coins, and each one shows heads with probability [math]\displaystyle{ p }[/math], independently of each of the others. Each coin which shows head is tossed again. (If the coin shows tail, it won't be tossed again.) Let [math]\displaystyle{ X }[/math] be the number of heads resulting from the second round of tosses, and [math]\displaystyle{ Y }[/math] be the number of heads resulting from all tosses, which includes the first and (possible) second round of each toss.
- Find the PMF of [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math].
- Find [math]\displaystyle{ \mathbb{E}[X] }[/math] and [math]\displaystyle{ \mathbb{E}[Y] }[/math].
- Let [math]\displaystyle{ f(k) }[/math] for [math]\displaystyle{ 0\leq k\leq n }[/math] be the PMF of [math]\displaystyle{ X }[/math], show that [math]f(k-1)f(k+1)\leq f(k)^2[/math] for [math]\displaystyle{ 1\leq k \leq n-1 }[/math].
- [PDF] Find the values of [math]C_1[/math] and [math]C_2[/math] for the following probability density functions:
- [math]p(x) = C_1(x(1-x))^{-\frac{1}{2}}[/math], [math]0<x<1[/math].
- [math]p(x) = C_2e^{-x-e^{-x}}[/math], [math]x\in \mathbb{R}[/math].
- [Distance between distributions] In this part, let [math]\mu, \nu: \Omega \rightarrow [0,1][/math] be two arbitrary distributions on [math]\Omega[/math]. (Here, for any [math]v\in [0,1]^\Omega[/math], we say [math]v[/math] is a distribution on [math]\Omega[/math] if and only if it satisfies that [math]\sum_{\omega \in \Omega} v(\omega) =1[/math].) For any [math]S\subset \Omega[/math] and distribution [math]v[/math] on [math]\Omega[/math], we write [math]v(S) = \sum_{\omega\in S}v(\omega)[/math] for simplicity. If [math]A:\Omega\rightarrow \Omega'[/math] is any deterministic mapping, then for distribution [math]v[/math] on [math]\Omega[/math], we use [math]v' = A(v)[/math] to denote the new distribution on [math]\Omega'[/math], where [math]v'(\omega') = \sum_{\omega\in A^{-1}(\omega')} v(\omega)[/math] for any [math]\omega'\in \Omega'[/math].
- (Total variation distance) Let [math]d_{TV}(\mu,\nu) = \sup_{S\subset \Omega} |\mu(S) - \nu(S)|[/math]. Show that:
- [math]d_{TV} =\frac{1}{2}\sum_{\omega \in \Omega} |\mu(\omega) - \nu(\omega)| = \sum_{\omega:\mu(\omega)\geq \nu(\omega)} \mu(\omega)- \nu(\omega) = 1-\sum_{\omega \in \Omega} \min(\mu(\omega), \nu(\omega))[/math]
- (Post-processing.) For any deterministic mapping [math]A:\Omega\rightarrow \Omega'[/math], [math]d_{TV}(A(\mu),A(\nu))\leq d_{TV}(\mu, \nu)[/math].
- There exists an [math]\Omega'[/math] and a [math]A:\Omega \rightarrow \Omega'[/math] such that [math]d_{TV}(A(\mu),A(\nu))= d_{TV}(\mu, \nu)[/math].
-
(Kullback-Leibler divergence) Let [math]d_{KL}(\mu||\nu) = \mathbb{E}_{\omega \sim \mu}\left[\log \frac{\mu(\omega)}{\nu(\omega)}\right][/math]. Here, [math]\mathbb{E}_{\omega \sim \mu}[\cdot][/math] means the expectation is over the distribution [math]\mu[/math]. Show that:
- [math]d_{KL}(\mu||\nu)\geq 0[/math] for any [math]\mu[/math] and [math]\nu[/math] on [math]\Omega[/math]. (Hint: use Jensen's inequality.)
- (Post-processing.) For any deterministic mapping [math]A:\Omega\rightarrow \Omega'[/math], [math]d_{KL}(A(\mu)||A(\nu))\leq d_{KL}(\mu || \nu)[/math].
- [math]d_{TV}(\mu ,\nu) \leq \sqrt{\frac{1}{2}d_{KL}(\mu||\nu)}[/math]. (Hint: Find an [math]\Omega'[/math] and a [math]A:\Omega \rightarrow \Omega'[/math] such that [math]d_{TV}(\mu,\nu) = d_{TV}(A(\mu),A(\nu))\leq \sqrt{\frac{1}{2}d_{KL}(A(\mu)||A(\nu))} \leq \sqrt{\frac{1}{2}d_{KL}(\mu||\nu)}[/math] to complete the proof.)
- (Total variation distance) Let [math]d_{TV}(\mu,\nu) = \sup_{S\subset \Omega} |\mu(S) - \nu(S)|[/math]. Show that: