数据科学基础 (Fall 2024)/Problem Set 6

From TCS Wiki
Jump to navigation Jump to search
  • 每道题目的解答都要有完整的解题过程,中英文不限。
  • 我们推荐大家使用LaTeX, markdown等对作业进行排版。
  • 没有条件的同学可以用纸笔完成作业之后拍照。
  • under construction

Assumption throughout Problem Set 6

Without further notice, we are working on probability space [math]\displaystyle{ (\Omega,\mathcal{F},\Pr) }[/math].

Without further notice, we assume that the expectation of random variables are well-defined.

Problem 1 (Convergence) (Bonus)

  • [Convergence in [math]\displaystyle{ r }[/math]-th mean] Suppose [math]\displaystyle{ X_n{\xrightarrow {r}} X }[/math], where [math]\displaystyle{ r\ge 1 }[/math]. Prove or disprove that [math]\displaystyle{ \mathbb E[X_n^r]\to\mathbb E[X^r] }[/math].
  • [Dominated convergence] Suppose [math]\displaystyle{ |X_n|\le Z }[/math] for all [math]\displaystyle{ n\in\mathbb N }[/math], where [math]\displaystyle{ \mathbb E(Z)\lt \infty }[/math]. Prove that if [math]\displaystyle{ X_n \xrightarrow P X }[/math] then [math]\displaystyle{ X_n \xrightarrow 1 X }[/math].
  • [Slutsky’s theorem] Let [math]\displaystyle{ (X_n)_{n \ge 1}, (Y_n)_{n \ge 1}, X, Y }[/math] be random variables and [math]\displaystyle{ c\in\mathbb{R} }[/math] be a real number.
    1. Suppose [math]\displaystyle{ X_n \overset{D}{\to} X }[/math] and [math]\displaystyle{ Y_n \overset{D}{\to} c }[/math]. Prove that [math]\displaystyle{ X_nY_n \overset{D}{\to} cX }[/math].
    2. Construct an example such that [math]\displaystyle{ X_n \overset{D}{\to} X }[/math] and [math]\displaystyle{ Y_n \overset{D}{\to} Y }[/math] but [math]\displaystyle{ X_nY_n }[/math] does not converge to [math]\displaystyle{ XY }[/math] in distribution.

Problem 2 (LLN & CLT)

  • [Proportional betting] In each of a sequence of independent bets, a gambler either wins 30%, or loses 25% of her current fortune, each with probability [math]\displaystyle{ 1/2 }[/math]. Denoting her fortune after [math]\displaystyle{ n }[/math] bets by [math]\displaystyle{ F_n }[/math], show that [math]\displaystyle{ \mathbb E(F_n)\to\infty }[/math] as [math]\displaystyle{ n \to\infty }[/math], while [math]\displaystyle{ F_n \to 0 }[/math] almost surely.
  • [Transience] Let [math]\displaystyle{ X_1,X_2,\dots }[/math] be independent identically distributed random variables taking values in the integers [math]\displaystyle{ \mathbb Z }[/math] and having a finite mean. Show that the Markov chain [math]\displaystyle{ S = \{S_n\} }[/math] given by [math]\displaystyle{ S_n = \sum^n_{i=1} X_i }[/math] is transient, i.e. [math]\displaystyle{ \forall n\in\mathbb N,\Pr(\exists n'\gt n, S_{n'}=S_n)\lt 1 }[/math], if [math]\displaystyle{ \mathbb E(X_1)\ne 0 }[/math].
  • [Controlling a Fair Voting] In a society of [math]\displaystyle{ n }[/math] isolated (independent) and neutral (uniform) peoples, how many peoples are there enough to manipulate the result of a majority vote with [math]\displaystyle{ 1-\delta }[/math] certainty? You have to use the Berry–Esseen theorem to solve this problem.

Problem 3 (Concentration of measure)

  • [Tossing coins] We repeatedly toss a fair coin (with an equal probability of heads and tails). Let the random variable [math]\displaystyle{ X }[/math] be the number of throws required to obtain a total of [math]\displaystyle{ n }[/math] heads. Show that [math]\displaystyle{ \Pr[X \gt 2n + \delta\sqrt{n\log n}]\leq n^{-\delta^2/6} }[/math] for any real [math]\displaystyle{ 0\lt \delta\lt \sqrt{\frac{4n}{\log n}} }[/math].
  • [[math]\displaystyle{ k }[/math]-th moment bound] Let [math]\displaystyle{ X }[/math] be a random variable with expectation [math]\displaystyle{ 0 }[/math] such that moment generating function [math]\displaystyle{ \mathbf{E}[\exp(t|X|)] }[/math] is finite for some [math]\displaystyle{ t \gt 0 }[/math]. We can use the following two kinds of tail inequalities for [math]\displaystyle{ X }[/math]:
    • Chernoff Bound: [math]\displaystyle{ \Pr[|X| \geq \delta] \leq \min_{t \geq 0} {\mathbb{E}[e^{t|X|}]}/{e^{t\delta}} }[/math]
    • [math]\displaystyle{ k }[/math]th-Moment Bound: [math]\displaystyle{ \Pr[|X| \geq \delta] \leq {\mathbb{E}[|X|^k]}/{\delta^k} }[/math]
    1. Show that for each [math]\displaystyle{ \delta }[/math], there exists a choice of [math]\displaystyle{ k }[/math] such that the [math]\displaystyle{ k }[/math]th-moment bound is no weaker than the Chernoff bound. (Hint: Use the probabilistic method.)
    2. Why would we still prefer the Chernoff bound to the (seemingly) stronger [math]\displaystyle{ k }[/math]-th moment bound?
  • [Cut size in random graph] Show that with probability at least [math]\displaystyle{ 2/3 }[/math], the size of the max-cut in Erdős–Rényi random graph [math]\displaystyle{ G(n,1/2) }[/math] is at most [math]\displaystyle{ n^2/8 + O(n^{1.5}) }[/math]. In the [math]\displaystyle{ G(n,1/2) }[/math] model, each edge is included in the graph with probability [math]\displaystyle{ 1/2 }[/math], independently of every other edge.

Problem 4 (Random processes)

  • [Hypergeometry] Given a bag with [math]\displaystyle{ r }[/math] red balls and [math]\displaystyle{ g }[/math] green balls, suppose that we uniformly sample [math]\displaystyle{ n }[/math] balls from the bin without replacement. Set up an appropriate martingale and use it to show that the number of red balls in the sample is tightly concentrated around [math]\displaystyle{ nr/(r+ g) }[/math].
  • [Pólya’s urn]A bag contains red and blue balls, with initially [math]\displaystyle{ r }[/math] red and [math]\displaystyle{ b }[/math] blue where [math]\displaystyle{ rb \gt 0 }[/math]. A ball is drawn from the bag, its color noted, and then it is returned to the bag together with a new ball of the same color. Let [math]\displaystyle{ R_n }[/math] be the number of red balls after [math]\displaystyle{ n }[/math] such operations.
    1. Show that [math]\displaystyle{ Y_n = R_n/(n + r + b) }[/math] is a martingale
    2. Let [math]\displaystyle{ T }[/math] be the number of balls drawn until the first blue ball appears, and suppose that [math]\displaystyle{ r = b= 1 }[/math]. Show that [math]\displaystyle{ \mathbb E[(T + 2)^{−1}] = 1/4 }[/math].
  • [Family planning] Children are either female or male. Their sexes are independent random variables, being female with probability [math]\displaystyle{ q }[/math] or male with probability [math]\displaystyle{ p= 1− q }[/math]. A woman ceases childbearing at stage [math]\displaystyle{ T }[/math] , and we write [math]\displaystyle{ G_n }[/math] and [math]\displaystyle{ B_n }[/math] for the numbers of girls and boys born to her up to and including stage [math]\displaystyle{ n }[/math]. Assume that [math]\displaystyle{ T }[/math] is a finite stopping time for the sequence [math]\displaystyle{ \{(G_n,B_n) : n \le 1\} }[/math]. Show that, no matter the stopping rule that yields [math]\displaystyle{ T }[/math] , we have [math]\displaystyle{ \mathbb E(G_T )/\mathbb E(B_T )= q/p }[/math]. What can be said about [math]\displaystyle{ \mathbb E(G_T /B_T ) }[/math]?
  • [Random walk on a graph] A particle performs a random walk on the vertex set of a connected graph [math]\displaystyle{ G }[/math], which for simplicity we assume to have neither loops nor multiple edges. At each stage it moves to a neighbor of its current position, each such neighbor being chosen with equal probability. If [math]\displaystyle{ G }[/math] has [math]\displaystyle{ \eta\lt \infty }[/math] edges, show that the stationary distribution is given by [math]\displaystyle{ \pi(v) = d_v/(2\eta) }[/math], where [math]\displaystyle{ d_v }[/math] is the degree of vertex [math]\displaystyle{ v }[/math].
  • [Reversibility versus periodicity] Can a reversible chain be periodic?