数据科学基础 (Fall 2024)/Problem Set 3: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
Created page with "*每道题目的解答都要有完整的解题过程,中英文不限。 *我们推荐大家使用LaTeX, markdown等对作业进行排版。 *没有条件的同学可以用纸笔完成作业之后拍照。 == Assumption throughout Problem Set 3 == <p>Without further notice, we are working on probability space <math>(\Omega,\mathcal{F},\mathbf{Pr})</math>.</p> <p>Without further notice, we assume that the expectation of random variables are well-defined.</p> == Pr..."
 
Line 66: Line 66:
Define <math>S_n = 2\sum_{i=1}^n X_i - n</math> and <math>S_0 = 0</math>.
Define <math>S_n = 2\sum_{i=1}^n X_i - n</math> and <math>S_0 = 0</math>.


* ['''Range of random walk'''] The range <math>R_n</math> of <math>S_0, S_1, \ldots, S_n</math> is defined as the number of distinct values taken by the sequence. Show that <math>\mathbf{Pr}\left(R_n = R_{n-1}+1\right) = \mathbf{Pr}\left(\forall 1 \le i \le n, S_i \neq 0\right)</math> as <math>n \to \infty</math>, and deduce that <math>n^{-1} \mathbf{E}[R_n]\to \mathbf{Pr}(\forall i \ge 1, S_i \neq 0)</math>. Hence show that <math>n^{-1} \mathbf{E}[R_n] \to |2p-1|</math> as <math>n \to \infty</math>.
* ['''Range of random walk'''] The range <math>R_n</math> of <math>S_0, S_1, \ldots, S_n</math> is defined as the number of distinct values taken by the sequence. Show that <math>\Pr\left[R_n = R_{n-1}+1\right] = \Pr\left[\forall 1 \le i \le n, S_i \neq 0\right]</math> as <math>n \to \infty</math>, and deduce that <math>n^{-1} \mathbb{E}[R_n]\to \Pr[\forall i \ge 1, S_i \neq 0]</math>. Hence show that <math>n^{-1} \mathbb{E}[R_n] \to |2p-1|</math> as <math>n \to \infty</math>.
* ['''Symmetric 1D random walk (IV)'''] Suppose <math>p = \frac{1}{2}</math>. Prove that <math>\mathbf{E}[|S_n|] = \Theta(\sqrt{n})</math>.
* ['''Symmetric 1D random walk (IV)'''] Suppose <math>p = \frac{1}{2}</math>. Prove that <math>\mathbb{E}[|S_n|] = \Theta(\sqrt{n})</math>.

Revision as of 07:35, 26 October 2024

  • 每道题目的解答都要有完整的解题过程,中英文不限。
  • 我们推荐大家使用LaTeX, markdown等对作业进行排版。
  • 没有条件的同学可以用纸笔完成作业之后拍照。

Assumption throughout Problem Set 3

Without further notice, we are working on probability space [math]\displaystyle{ (\Omega,\mathcal{F},\mathbf{Pr}) }[/math].

Without further notice, we assume that the expectation of random variables are well-defined.

Problem 1 (Discrete random variable)

  • [Multinomial distribution] Prove by calculating that the marginal distribution of multinomial distribution is binomial distribution.
  • [Capture–recapture] A population of [math]\displaystyle{ b }[/math] animals has had a number [math]\displaystyle{ a }[/math] of its members captured, marked, and released. Let [math]\displaystyle{ X }[/math] be the number of animals it is necessary to recapture (without re-release) in order to obtain [math]\displaystyle{ m }[/math] marked animals. Show that [math]\displaystyle{ \Pr[X=n]=\frac ab \binom{a-1}{m-1}\binom{b-a}{n-m}/\binom{b-1}{n-1} }[/math] and find [math]\displaystyle{ \mathbb E[X] }[/math]. This distribution has been called negative hypergeometric.
  • [Poisson distribution(i)] In your pocket is a random number [math]\displaystyle{ N }[/math] of coins, where [math]\displaystyle{ N }[/math] has the Poisson distribution with parameter [math]\displaystyle{ \lambda }[/math]. You toss each coin once, with heads showing with probability [math]\displaystyle{ p }[/math] each time. Let [math]\displaystyle{ X }[/math] be the (random) number of heads outcomes and [math]\displaystyle{ Y }[/math] be the (also random) number of tails.
  1. Find the joint mass function of [math]\displaystyle{ (X,Y) }[/math].
  2. Find PMF of the marginal distribution of [math]\displaystyle{ X }[/math] in [math]\displaystyle{ (X,Y) }[/math]. Are [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] independent?
  • [Poisson distribution(ii)] Let [math]\displaystyle{ \lambda,\mu \gt 0 }[/math] and [math]\displaystyle{ n \in \mathbb{N} }[/math] be parameters, and [math]\displaystyle{ X \sim \mathrm{Pois}(\lambda), Y \sim \mathrm{Pois}(\mu) }[/math] be independent random variables. Find out the conditional distribution of [math]\displaystyle{ X }[/math], given [math]\displaystyle{ X+Y = n }[/math].

Problem 2 (Expectation)

  • [Expectation] Is it generally true that [math]\mathbf{E}[1/X] = 1/\mathbf{E}[X][/math]? Is it ever true that [math]\mathbf{E}[1/X] = 1/\mathbf{E}[X][/math]?
  • [Error] Let [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] be independent with mean [math]\displaystyle{ \mu }[/math]. Explain the error in the following equation: [math]\displaystyle{ \mathbb E[X|X+Y=z]=\mathbb E[X|X=z-Y]=\mathbb E[z-Y]=z-\mu }[/math].
  • [Run] A biased coin is tossed [math]\displaystyle{ n }[/math] times, and heads shows with probability [math]\displaystyle{ p }[/math] on each toss. A run is a sequence of throws which result in the same outcome, so that, for example, the sequence HHTHTTH contains five runs. Show that the expected number of runs is 1 + 2(n− 1)p(1− p).
  • [Streak] Suppose we flip a fair coin [math]\displaystyle{ n }[/math] times independently to obtain a sequence of flips [math]\displaystyle{ X_1, X_2, \ldots , X_n }[/math]. A streak of flips is a consecutive subsequence of flips that are all the same. For example, if [math]\displaystyle{ X_3 }[/math], [math]\displaystyle{ X_4 }[/math], and [math]\displaystyle{ X_5 }[/math] are all heads, there is a streak of length [math]\displaystyle{ 3 }[/math] starting at the third lip. (If [math]\displaystyle{ X_6 }[/math] is also heads, then there is also a streak of length [math]\displaystyle{ 4 }[/math] starting at the third lip.) Find the expected number of streaks of length [math]\displaystyle{ k }[/math] for some integer [math]\displaystyle{ k \ge 1 }[/math].
  • [Tail sum for expectation (double counting)]
  1. If [math]\displaystyle{ X }[/math] takes non-negative integer values show that [math]\displaystyle{ \mathbb E[X]=\sum_{n=0}^\infty \Pr[X\gt n] }[/math].
  2. An urn contains [math]\displaystyle{ b }[/math] blue and [math]\displaystyle{ r }[/math] red balls. Balls are removed at random until the first blue ball is drawn. Show that the expected number drawn is [math]\displaystyle{ (b + r + 1)/(b + 1) }[/math].
  3. The balls are replaced and then removed at random until all the remaining balls are of the same colour. Find the expected number remaining in the urn.
  4. Let [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] be independent random variables taking values in the non-negative integers, with finite means. Let [math]\displaystyle{ U= \min\{X,Y \} }[/math] and [math]\displaystyle{ V= \max\{X,Y \} }[/math]. Show that
    [math]\displaystyle{ \begin{align*} \mathbb E[U]&=\sum_{r=1}^\infty \Pr[X\ge r]\Pr[Y\ge r]\\ \mathbb E[V]&=\sum_{r=1}^\infty \bigg(\Pr[X\ge r]+\Pr[Y\ge r]-\Pr[X\ge r]\Pr[Y\ge r]\bigg)\\ \mathbb E[UV]&=\sum_{r,s=1}^\infty \Pr[X\ge r]\Pr[Y\ge s]. \end{align*} }[/math]
  5. Let [math]\displaystyle{ X }[/math] take values in the non-negative integers. Show that [math]\displaystyle{ \mathbb E[X^2]=\mathbb E[X]+2\sum_{r=0}^\infty r\Pr[X\gt r]=\sum_{r=0}^\infty (2r+1)\Pr[X\gt r] }[/math], and find a similar formula for [math]\displaystyle{ \mathbb E[X^3] }[/math].

Problem 3 (Random graph)

  • [Erdős–Rényi random graph] Consider [math]\displaystyle{ G\sim G(n,p) }[/math] where [math]\displaystyle{ G(n,p) }[/math] is the Erdős–Rényi random graph model.
  1. Let [math]\displaystyle{ p\in (0,1) }[/math]. A "triangle" in a graph is a clique of size [math]\displaystyle{ 3 }[/math]. Find the expected number of triangles in [math]\displaystyle{ G }[/math]. (Hint: use indicators and the linearity of expectation.)
  2. Let [math]\displaystyle{ p\in (0,1) }[/math]. For any [math]\displaystyle{ 2\leq q\leq n }[/math], let the random variable [math]\displaystyle{ N_q }[/math] be the number of [math]\displaystyle{ q }[/math]-cliques. Here, a [math]\displaystyle{ q }[/math]-clique is a clique of size [math]\displaystyle{ q }[/math]. Find [math]\displaystyle{ \mathbf{E}[N_q] }[/math].
  3. Let [math]\displaystyle{ p = 1/2 }[/math]. For an undirected graph [math]\displaystyle{ G }[/math], define [math]\displaystyle{ \alpha(G) = \max\{|S|:S \text{ is an independent set}\} }[/math]. Show that when [math]\displaystyle{ n\rightarrow \infty }[/math], [math]\displaystyle{ \mathbf{Pr}[\alpha(G) \geq 3\log_2 n +1] \rightarrow 0 }[/math]. Also, please interpret this result in the context of social networks, in which the vertices represent people, and the edges represent friendship.
  • [Random social networks] Let [math]\displaystyle{ G = (V, E) }[/math] be a random graph with [math]\displaystyle{ m = |V | }[/math] vertices and edge-set [math]\displaystyle{ E }[/math]. Write [math]\displaystyle{ d_v }[/math] for the degree of vertex [math]\displaystyle{ v }[/math], that is, the number of edges meeting at [math]\displaystyle{ v }[/math]. Let [math]\displaystyle{ Y }[/math] be a uniformly chosen vertex, and [math]\displaystyle{ Z }[/math] a uniformly chosen neighbour of [math]\displaystyle{ Y }[/math].
  1. Show that [math]\displaystyle{ \mathbf{E}[d_Z] \geq \mathbf{E}[d_Y] }[/math].
  2. Interpret this inequality when the vertices represent people, and the edges represent friendship.

Problem 4 (Symmetric 1D Random walk)

Let [math]\displaystyle{ p \in (0,1) }[/math] be a constant, and [math]\displaystyle{ \{X_n\}_{n \ge 1} }[/math] be independent Bernoulli trials with successful probability [math]\displaystyle{ p }[/math]. Define [math]\displaystyle{ S_n = 2\sum_{i=1}^n X_i - n }[/math] and [math]\displaystyle{ S_0 = 0 }[/math].

  • [Range of random walk] The range [math]\displaystyle{ R_n }[/math] of [math]\displaystyle{ S_0, S_1, \ldots, S_n }[/math] is defined as the number of distinct values taken by the sequence. Show that [math]\displaystyle{ \Pr\left[R_n = R_{n-1}+1\right] = \Pr\left[\forall 1 \le i \le n, S_i \neq 0\right] }[/math] as [math]\displaystyle{ n \to \infty }[/math], and deduce that [math]\displaystyle{ n^{-1} \mathbb{E}[R_n]\to \Pr[\forall i \ge 1, S_i \neq 0] }[/math]. Hence show that [math]\displaystyle{ n^{-1} \mathbb{E}[R_n] \to |2p-1| }[/math] as [math]\displaystyle{ n \to \infty }[/math].
  • [Symmetric 1D random walk (IV)] Suppose [math]\displaystyle{ p = \frac{1}{2} }[/math]. Prove that [math]\displaystyle{ \mathbb{E}[|S_n|] = \Theta(\sqrt{n}) }[/math].