数据科学基础 (Fall 2024)/Problem Set 3: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
 
(One intermediate revision by the same user not shown)
Line 19: Line 19:


== Problem 2 (Expectation) ==  
== Problem 2 (Expectation) ==  
* ['''Expectation'''] Is it generally true that [math]\mathbf{E}[1/X] = 1/\mathbf{E}[X][/math]? Is it ever true that [math]\mathbf{E}[1/X] = 1/\mathbf{E}[X][/math]?
* ['''Expectation'''] Is it generally true that [math]\mathbb{E}[1/X] = 1/\mathbb{E}[X][/math]? Is it ever true that [math]\mathbb{E}[1/X] = 1/\mathbb{E}[X][/math]?
* ['''Error'''] Let <math>X</math> and <math>Y</math> be independent with mean <math>\mu</math>. Explain the error in the following equation: <math>\mathbb E[X|X+Y=z]=\mathbb E[X|X=z-Y]=\mathbb E[z-Y]=z-\mu</math>.
* ['''Error'''] Let <math>X</math> and <math>Y</math> be independent with mean <math>\mu</math>. Explain the error in the following equation: <math>\mathbb E[X|X+Y=z]=\mathbb E[X|X=z-Y]=\mathbb E[z-Y]=z-\mu</math>.
* ['''Run'''] A biased coin is tossed <math>n</math> times, and heads shows with probability <math>p</math> on each toss. A run is a sequence of throws which result in the same outcome, so that, for example, the sequence HHTHTTH contains five runs. Show that the expected number of runs is 1 + 2(n− 1)p(1− p).
* ['''Run'''] A biased coin is tossed <math>n</math> times, and heads shows with probability <math>p</math> on each toss. A run is a sequence of throws which result in the same outcome, so that, for example, the sequence HHTHTTH contains five runs. Show that the expected number of runs is 1 + 2(n− 1)p(1− p).
Line 30: Line 30:
  An urn contains <math>b</math> blue and <math>r</math> red balls. Balls are removed at random until the first blue ball is drawn. Show that the expected number drawn is <math>(b + r + 1)/(b + 1)</math>.
  An urn contains <math>b</math> blue and <math>r</math> red balls. Balls are removed at random until the first blue ball is drawn. Show that the expected number drawn is <math>(b + r + 1)/(b + 1)</math>.
</li>
</li>
<li> The balls are replaced and then removed at random until all the remaining balls are of the same colour. Find the expected number remaining in the urn.
<li> The balls are replaced and then removed at random until all the remaining balls are of the same colour. (现在我们把所有球都放回去,然后再次随机地把球无放回地取出来,直到剩下的小球都是同色的。) Find the expected number remaining in the urn.
</li>
</li>
<li> Let <math>X</math> and <math>Y</math> be independent random variables taking values in the non-negative integers, with finite means. Let <math>U= \min\{X,Y \}</math> and <math>V= \max\{X,Y \}</math>. Show that  
<li> Let <math>X</math> and <math>Y</math> be independent random variables taking values in the non-negative integers, with finite means. Let <math>U= \min\{X,Y \}</math> and <math>V= \max\{X,Y \}</math>. Show that  

Latest revision as of 07:22, 31 October 2024

  • 每道题目的解答都要有完整的解题过程,中英文不限。
  • 我们推荐大家使用LaTeX, markdown等对作业进行排版。
  • 没有条件的同学可以用纸笔完成作业之后拍照。

Assumption throughout Problem Set 3

Without further notice, we are working on probability space [math]\displaystyle{ (\Omega,\mathcal{F},\Pr) }[/math].

Without further notice, we assume that the expectation of random variables are well-defined.

Problem 1 (Discrete random variable)

  • [Multinomial distribution] Prove by calculating that the marginal distribution of multinomial distribution is binomial distribution.
  • [Capture–recapture] A population of [math]\displaystyle{ b }[/math] animals has had a number [math]\displaystyle{ a }[/math] of its members captured, marked, and released. Let [math]\displaystyle{ X }[/math] be the number of animals it is necessary to recapture (without re-release) in order to obtain [math]\displaystyle{ m }[/math] marked animals. Show that [math]\displaystyle{ \Pr[X=n]=\frac ab \binom{a-1}{m-1}\binom{b-a}{n-m}/\binom{b-1}{n-1} }[/math] and find [math]\displaystyle{ \mathbb E[X] }[/math]. This distribution has been called negative hypergeometric.
  • [Poisson distribution(i)] In your pocket is a random number [math]\displaystyle{ N }[/math] of coins, where [math]\displaystyle{ N }[/math] has the Poisson distribution with parameter [math]\displaystyle{ \lambda }[/math]. You toss each coin once, with heads showing with probability [math]\displaystyle{ p }[/math] each time. Let [math]\displaystyle{ X }[/math] be the (random) number of heads outcomes and [math]\displaystyle{ Y }[/math] be the (also random) number of tails.
  1. Find the joint mass function of [math]\displaystyle{ (X,Y) }[/math].
  2. Find PMF of the marginal distribution of [math]\displaystyle{ X }[/math] in [math]\displaystyle{ (X,Y) }[/math]. Are [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] independent?
  • [Poisson distribution(ii)] Let [math]\displaystyle{ \lambda,\mu \gt 0 }[/math] and [math]\displaystyle{ n \in \mathbb{N} }[/math] be parameters, and [math]\displaystyle{ X \sim \mathrm{Pois}(\lambda), Y \sim \mathrm{Pois}(\mu) }[/math] be independent random variables. Find out the conditional distribution of [math]\displaystyle{ X }[/math], given [math]\displaystyle{ X+Y = n }[/math].

Problem 2 (Expectation)

  • [Expectation] Is it generally true that [math]\mathbb{E}[1/X] = 1/\mathbb{E}[X][/math]? Is it ever true that [math]\mathbb{E}[1/X] = 1/\mathbb{E}[X][/math]?
  • [Error] Let [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] be independent with mean [math]\displaystyle{ \mu }[/math]. Explain the error in the following equation: [math]\displaystyle{ \mathbb E[X|X+Y=z]=\mathbb E[X|X=z-Y]=\mathbb E[z-Y]=z-\mu }[/math].
  • [Run] A biased coin is tossed [math]\displaystyle{ n }[/math] times, and heads shows with probability [math]\displaystyle{ p }[/math] on each toss. A run is a sequence of throws which result in the same outcome, so that, for example, the sequence HHTHTTH contains five runs. Show that the expected number of runs is 1 + 2(n− 1)p(1− p).
  • [Streak] Suppose we flip a fair coin [math]\displaystyle{ n }[/math] times independently to obtain a sequence of flips [math]\displaystyle{ X_1, X_2, \ldots , X_n }[/math]. A streak of flips is a consecutive subsequence of flips that are all the same. For example, if [math]\displaystyle{ X_3 }[/math], [math]\displaystyle{ X_4 }[/math], and [math]\displaystyle{ X_5 }[/math] are all heads, there is a streak of length [math]\displaystyle{ 3 }[/math] starting at the third lip. (If [math]\displaystyle{ X_6 }[/math] is also heads, then there is also a streak of length [math]\displaystyle{ 4 }[/math] starting at the third lip.) Find the expected number of streaks of length [math]\displaystyle{ k }[/math] for some integer [math]\displaystyle{ k \ge 1 }[/math].
  • [Tail sum for expectation (double counting)]
  1. If [math]\displaystyle{ X }[/math] takes non-negative integer values show that [math]\displaystyle{ \mathbb E[X]=\sum_{n=0}^\infty \Pr[X\gt n] }[/math].
  2. An urn contains [math]\displaystyle{ b }[/math] blue and [math]\displaystyle{ r }[/math] red balls. Balls are removed at random until the first blue ball is drawn. Show that the expected number drawn is [math]\displaystyle{ (b + r + 1)/(b + 1) }[/math].
  3. The balls are replaced and then removed at random until all the remaining balls are of the same colour. (现在我们把所有球都放回去,然后再次随机地把球无放回地取出来,直到剩下的小球都是同色的。) Find the expected number remaining in the urn.
  4. Let [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] be independent random variables taking values in the non-negative integers, with finite means. Let [math]\displaystyle{ U= \min\{X,Y \} }[/math] and [math]\displaystyle{ V= \max\{X,Y \} }[/math]. Show that
    [math]\displaystyle{ \begin{align*} \mathbb E[U]&=\sum_{r=1}^\infty \Pr[X\ge r]\Pr[Y\ge r]\\ \mathbb E[V]&=\sum_{r=1}^\infty \bigg(\Pr[X\ge r]+\Pr[Y\ge r]-\Pr[X\ge r]\Pr[Y\ge r]\bigg)\\ \mathbb E[UV]&=\sum_{r,s=1}^\infty \Pr[X\ge r]\Pr[Y\ge s]. \end{align*} }[/math]
  5. Let [math]\displaystyle{ X }[/math] take values in the non-negative integers. Show that [math]\displaystyle{ \mathbb E[X^2]=\mathbb E[X]+2\sum_{r=0}^\infty r\Pr[X\gt r]=\sum_{r=0}^\infty (2r+1)\Pr[X\gt r] }[/math], and find a similar formula for [math]\displaystyle{ \mathbb E[X^3] }[/math].

Problem 3 (Random graph)

  • [Erdős–Rényi random graph] Consider [math]\displaystyle{ G\sim G(n,p) }[/math] where [math]\displaystyle{ G(n,p) }[/math] is the Erdős–Rényi random graph model.
  1. Let [math]\displaystyle{ p\in (0,1) }[/math]. A "triangle" in a graph is a clique of size [math]\displaystyle{ 3 }[/math]. Find the expected number of triangles in [math]\displaystyle{ G }[/math]. (Hint: use indicators and the linearity of expectation.)
  2. Let [math]\displaystyle{ p\in (0,1) }[/math]. For any [math]\displaystyle{ 2\leq q\leq n }[/math], let the random variable [math]\displaystyle{ N_q }[/math] be the number of [math]\displaystyle{ q }[/math]-cliques. Here, a [math]\displaystyle{ q }[/math]-clique is a clique of size [math]\displaystyle{ q }[/math]. Find [math]\displaystyle{ \mathbb{E}[N_q] }[/math].
  3. Let [math]\displaystyle{ p = 1/2 }[/math]. For an undirected graph [math]\displaystyle{ G }[/math], define [math]\displaystyle{ \alpha(G) = \max\{|S|:S \text{ is an independent set}\} }[/math]. Show that when [math]\displaystyle{ n\rightarrow \infty }[/math], [math]\displaystyle{ \Pr[\alpha(G) \geq 3\log_2 n +1] \rightarrow 0 }[/math]. Also, please interpret this result in the context of social networks, in which the vertices represent people, and the edges represent friendship.
  • [Random social networks] Let [math]\displaystyle{ G = (V, E) }[/math] be a random graph with [math]\displaystyle{ m = |V | }[/math] vertices and edge-set [math]\displaystyle{ E }[/math]. Write [math]\displaystyle{ d_v }[/math] for the degree of vertex [math]\displaystyle{ v }[/math], that is, the number of edges meeting at [math]\displaystyle{ v }[/math]. Let [math]\displaystyle{ Y }[/math] be a uniformly chosen vertex, and [math]\displaystyle{ Z }[/math] a uniformly chosen neighbour of [math]\displaystyle{ Y }[/math].
  1. Show that [math]\displaystyle{ \mathbb{E}[d_Z] \geq \mathbb{E}[d_Y] }[/math].
  2. Interpret this inequality when the vertices represent people, and the edges represent friendship.

Problem 4 (Symmetric 1D Random walk)

Let [math]\displaystyle{ p \in (0,1) }[/math] be a constant, and [math]\displaystyle{ \{X_n\}_{n \ge 1} }[/math] be independent Bernoulli trials with successful probability [math]\displaystyle{ p }[/math]. Define [math]\displaystyle{ S_n = 2\sum_{i=1}^n X_i - n }[/math] and [math]\displaystyle{ S_0 = 0 }[/math].

  • [Range of random walk] The range [math]\displaystyle{ R_n }[/math] of [math]\displaystyle{ S_0, S_1, \ldots, S_n }[/math] is defined as the number of distinct values taken by the sequence. Show that [math]\displaystyle{ \Pr\left[R_n = R_{n-1}+1\right] = \Pr\left[\forall 1 \le i \le n, S_i \neq 0\right] }[/math] as [math]\displaystyle{ n \to \infty }[/math], and deduce that [math]\displaystyle{ n^{-1} \mathbb{E}[R_n]\to \Pr[\forall i \ge 1, S_i \neq 0] }[/math]. Hence show that [math]\displaystyle{ n^{-1} \mathbb{E}[R_n] \to |2p-1| }[/math] as [math]\displaystyle{ n \to \infty }[/math].
  • [Symmetric 1D random walk (IV)] Suppose [math]\displaystyle{ p = \frac{1}{2} }[/math]. Prove that [math]\displaystyle{ \mathbb{E}[|S_n|] = \Theta(\sqrt{n}) }[/math].