概率论与数理统计 (Spring 2023)/Problem Set 2: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
Zhangxy (talk | contribs)
No edit summary
 
(69 intermediate revisions by 3 users not shown)
Line 1: Line 1:
*每道题目的解答都要有完整的解题过程,中英文不限。
*我们推荐大家使用LaTeX, markdown等对作业进行排版。
== Assumption throughout Problem Set 2==
<p>Without further notice, we are working on probability space <math>(\Omega,\mathcal{F},\mathbf{Pr})</math>.</p>
<p>Without further notice, we assume that the expectation of random variables are well-defined.</p>
<p>The term <math>\log</math> used in this context refers to the natural logarithm.</p>
== Problem 1 (Warm-up problems) ==
== Problem 1 (Warm-up problems) ==


Line 5: Line 16:
<li>[<strong>Function of random variable (II)</strong>] Let [math]X[/math] be a random variable with distribution function [math]\max(0,\min(1,x))[/math]. Let [math]F[/math] be a distribution function which is continuous and strictly increasing. Show that [math]Y=F^{-1}(X)[/math] be a random variable with distribution function [math]F[/math].</li>
<li>[<strong>Function of random variable (II)</strong>] Let [math]X[/math] be a random variable with distribution function [math]\max(0,\min(1,x))[/math]. Let [math]F[/math] be a distribution function which is continuous and strictly increasing. Show that [math]Y=F^{-1}(X)[/math] be a random variable with distribution function [math]F[/math].</li>
<li>[<strong>Marginal distribution</strong>] Let [math](X_1, X_2)[/math] be a random vector satisfying [math]\mathbf{Pr}[(X_1,X_2) = (0,0)] = \mathbf{Pr}[(X_1,X_2) = (1,0)] = \mathbf{Pr}[(X_1,X_2)=(0,1)]=\frac{1}{3}[/math]. Find out the marginal distribution of [math]X_1[/math].</li>
<li>[<strong>Marginal distribution</strong>] Let [math](X_1, X_2)[/math] be a random vector satisfying [math]\mathbf{Pr}[(X_1,X_2) = (0,0)] = \mathbf{Pr}[(X_1,X_2) = (1,0)] = \mathbf{Pr}[(X_1,X_2)=(0,1)]=\frac{1}{3}[/math]. Find out the marginal distribution of [math]X_1[/math].</li>
<li>[<strong>Independence</strong>] Show that discrete random variables [math]X[/math] and [math]Y[/math] are independent if and only if [math]f_{X,Y}(x,y) = f_X(x) f_Y(y)[/math], where [math]f_{X,Y}[/math] is the joint mass function of [math](X,Y)[/math], and [math]f_X[/math] (respectively, [math]f_Y[/math]) is the mass function of [math]X[/math] (respectively, [math]Y[/math]).</li>
<li>[<strong>Independence</strong>] Show that discrete random variables [math]X[/math] and [math]Y[/math] are independent if and only if [math]p_{X,Y}(x,y)[/math] can be written as [math]g(x) h(y)[/math] for some function [math]g,h[/math], where [math]p_{X,Y}[/math] is the joint mass function of [math](X,Y)[/math].</li>
<li>[<strong>Entropy of discrete random variable</strong>] Let [math]X[/math] be a discrete random variable with range of values [math]\mathbb{N}_+[/math] and probability mass function [math]p[/math]. Define [math]H(X) = -\sum_{n \ge 1} p(n) \log p(n)[/math] with convention [math]0\log 0 = 0[/math]. Prove that [math]H(X) \ge 0[/math] using Jensen's inequality.</li>
<li>[<strong>Entropy of discrete random variable</strong>] Let [math]X[/math] be a discrete random variable with range of values [math][N] = \{1,2,\ldots,N\}[/math] and probability mass function [math]p[/math]. Define [math]H(X) = -\sum_{n \ge 1} p(n) \log p(n)[/math] with convention [math]0\log 0 = 0[/math]. Prove that [math]H(X) \le \log N[/math] using Jensen's inequality.</li>
<li>[<strong>Law of total expectation</strong>] Let [math]X \sim \mathrm{Geom}(p)[/math] for some parameter [math]p \in (0,1)[/math]. Calculate [math]\mathbf{E}[X][/math] using the law of total expectation.</li>
<li>[<strong>Law of total expectation</strong>] Let [math]X \sim \mathrm{Geom}(p)[/math] for some parameter [math]p \in (0,1)[/math]. Calculate [math]\mathbf{E}[X][/math] using the law of total expectation.</li>
<li>[<strong>Random number of random variables</strong>] Let <math>\{X_n\}_{n \ge 1}</math> be identically distributed random variable and <math>N</math> be a random variable taking values in the non-negative integers and independent of the <math>X_n</math> for all <math>n \ge 1</math>. Prove that <math>\mathbf{E}\left[\sum_{i=1}^N X_i\right] = \mathbf{E}[N] \mathbf{E}[X_1]</math>.
</li>
</ul>
</ul>


Line 13: Line 28:


<ul>
<ul>
<li><strong>CDF</strong> Given <math>(\Omega, \mathcal{F}, \mathbf{Pr})</math>, let <math>X</math> be a random variable such that <math>X:\Omega\rightarrow \mathbb{R}</math> with cumulative distribution function <math>F</math>.
<li>[<strong>Cumulative distribution function (CDF)</strong>] Let <math>X</math> be a random variable with cumulative distribution function <math>F</math>.
<ol>
<ol>
<li>Show that <math>Y = aX+b</math> is a random variable where <math>a</math> and <math>b</math> are real constants, and express the CDF of <math>Y</math> by <math>F</math>.</li>
<li>Show that <math>Y = aX+b</math> is a random variable where <math>a</math> and <math>b</math> are real constants, and express the CDF of <math>Y</math> by <math>F</math>. (Hint: Try expressing the event [math]Y=aX+b\le y[/math] by countably many set operations on the events defined on [math]X[/math].)</li>
<li>Let <math>G</math> be the CDF of random variable <math>Z:\Omega\rightarrow \mathbb{R}</math> and <math>0\leq \lambda \leq 1</math>, show that  
<li>Let <math>G</math> be the CDF of random variable <math>Z:\Omega\rightarrow \mathbb{R}</math> and <math>0\leq \lambda \leq 1</math>, show that  
<ul>
<ul>
Line 22: Line 37:
</ul></li>
</ul></li>
</ol></li>
</ol></li>
<li><strong>PMF</strong> We toss <math>n</math> coins, and each one shows heads with probability <math>p</math>, independently of each of the others. Each coin which shows head is tossed again. (If the coin shows tail, it won't be tossed again.) Let <math>X</math> be the number of heads resulting from the <strong>second</strong> round of tosses, and <math>Y</math> be the number of heads resulting from <strong>all</strong> tosses, which includes the first and (possible) second round of each toss.
<li>[<strong>Probability mass function (PMF)</strong>] We toss <math>n</math> coins, and each one shows heads with probability <math>p</math>, independently of each of the others. Each coin which shows head is tossed again. (If the coin shows tail, it won't be tossed again.) Let <math>X</math> be the number of heads resulting from the <strong>second</strong> round of tosses, and <math>Y</math> be the number of heads resulting from <strong>all</strong> tosses, which includes the first and (possible) second round of each toss.
<ol>
<ol>
<li>Find the PMF of <math>X</math> and <math>Y</math>.</li>
<li>Find the PMF of <math>X</math> and <math>Y</math>.</li>
<li>Find <math>\mathbb{E}[X]</math> and <math>\mathbb{E}[Y]</math>.</li>
<li>Find <math>\mathbf{E}[X]</math> and <math>\mathbf{E}[Y]</math>.</li>
<li>Let <math>f(k)</math> for <math>0\leq k\leq n</math> be the PMF of <math>X</math>, show that  
<li>Let <math>p_X</math> be the PMF of <math>X</math>, show that [math]p_X(k-1)p_X(k+1)\leq [p_X(k)]^2[/math] for <math>1\leq k \leq n-1</math>.</li>
$$f(k-1)f(k+1)\leq f(k)^2$$
for <math>1\leq k \leq n-1</math>.</li>
</ol></li>
</ol></li>
</ul>
== Problem 3 (Discrete random variable) ==
<ul>
<li> [<strong>Geometric distribution (I)</strong>] Every package of some intrinsically dull commodity includes a small and exciting plastic object. There are [math]c[/math] different types of object, and each package is equally likely to contain any given type. You buy one package each day.</p>
    <ol>
        <li>Find the expected number of days which elapse between the acquisitions of the [math]j[/math]-th new type of object and the [math](j + 1)[/math]-th new type.</li>
        <li>Find the expected number of days which elapse before you have a full set of objects.</li>
    </ol>
</li>
<li> [<strong>Geometric distribution (II)</strong>] Prove that geometry distribution is the only discrete memoryless distribution with range values <math>\mathbb{N}_+</math>.
</li>
<li> [<strong>Binomial distribution</strong>] Let <math>n_1,n_2 \in \mathbb{N}_+</math> and <math>0 \le p \le 1</math> be parameters, and <math>X \sim \mathrm{Bin}(n_1,p),Y \sim \mathrm{Bin}(n_2,p)</math> be independent random variables. Prove that <math>X+Y \sim \mathrm{Bin}(n_1+n_2,p)</math>.
</li>
<li> [<strong>Negative binomial distribution</strong>] Let <math>X</math> follows the negative binomial distribution with parameter <math>r \in \mathbb{N}_+</math> and <math>p \in (0,1)</math>. Calculate <math>\mathbf{Var}[X] = \mathbf{E}[X^2] - \left(\mathbf{E}[X]\right)^2</math>.
</li>
<li>[<strong>Hypergeometric distribution</strong>] An urn contains [math]N[/math] balls, [math]b[/math] of which are blue and [math]r = N -b[/math] of which are red. A random sample of [math]n[/math] balls is drawn <strong>without replacement</strong> (无放回) from the urn. Let [math]B[/math] the number of blue balls in this sample. Show that if [math]N, b[/math], and [math]r[/math] approach [math]+\infty[/math] in such a way that [math]b/N \rightarrow  p[/math] and [math]r/N \rightarrow 1 - p[/math], then [math]\mathbf{Pr}(B = k) \rightarrow {n\choose k}p^k(1-p)^{n-k}[/math] for [math]0\leq k \leq n[/math].</li>
<li>[<strong>Poisson distribution</strong>] In your pocket is a random number <math>N</math> of coins, where <math>N</math> has the Poisson distribution with parameter <math>\lambda</math>. You toss each coin once, with heads showing with probability <math>p</math> each time. Let <math>X</math> be the (random) number of heads outcomes and <math>Y</math> be the (also random) number of tails.
<ol>
<li>Find the joint mass function of <math>(X,Y)</math>.</li>
<li>Find PMF of the marginal distribution of <math>X</math> in <math>(X,Y)</math>. Are <math>X</math> and <math>Y</math> independent?</li>
</ol>
</li>
<li>[<strong>Conditional distribution (I)</strong>]
Let <math>X</math> and <math>Y</math> be independent <math>\text{Bin}(n, p)</math> random variables, and let <math>Z = X + Y</math>. Show that the conditional distribution of <math>X</math> given <math>Z = N</math> is the hypergeometric distribution.
</li>
<li>[<strong>Conditional distribution (II)</strong>]
Let <math>\lambda,\mu > 0</math> and <math>n \in \mathbb{N}</math> be parameters, and <math>X \sim \mathrm{Pois}(\lambda), Y \sim \mathrm{Pois}(\mu)</math> be independent random variables. Find out the conditional distribution of <math>X</math>, given <math>X+Y = n</math>.
</li>
</ul>
== Problem 4 (Linearity of Expectation) ==
<ul>
<li> [<strong>Inversion</strong>]
Given a sequence of <math>n</math> elements <math>a_1, a_2, \ldots, a_n</math>, an inversion is a pair of integer <math>(i, j)</math> such that <math> 1 \le i < j \le n</math> and <math>a_i > a_j</math>. For instance, in the sequence <math>a = [1,2,5,1,3]</math> there are three inversions: <math>(2,4), (3,4), (3,5)</math>. Suppose we choose a sequence <math>\rho</math> from <math>[q]^n</math> uniformly at random, where <math>n</math> and <math>q</math> are given integers with <math>q \ge 1</math>. Find the expected number of inversions in <math>\rho</math>.
</li>
<li> [<strong>Number of cycles</strong>]
At a banquet, there are <math>n</math> people who shake hands according to the following process: In each round, two idle hands are randomly selected and shaken (<strong>these two hands are no longer idle</strong>). After <math>n</math> rounds, there will be no idle hands left, and the <math>n</math> people will form several cycles. For example, when <math>n=3</math>, the following situation may occur: the left and right hands of the first person are held together, the left hand of the second person and the right hand of the third person are held together, and the right hand of the second person and the left hand of the third person are held together. In this case, three people form two cycles. How many cycles are expected to be formed after <math>n</math> rounds?
</li>
<li>[<strong>Connected vertices</strong>]
Consider the [[wikipedia:Binary_tree#Types_of_binary_trees|perfect binary tree]] of depth <math>n</math>. Suppose that you delete each edge of the tree independently with probability <math>1/2</math>. Let <math>X</math> be the number of nodes which are still connected to the root after the deletion of the edges. Compute the expectation of <math>X</math>.
</li>
</ul>
== Problem 5 (Probability meets graph theory) ==
In this part we will work on undirected simple graphs. For any <math>G = (V,E)</math>, a <strong>clique</strong> (<em>团,完全子图</em>) is a subset of vertices <math>C\subset V</math>, such that every two distinct vertices of <math>C</math> are adjacent, and an <strong>independent set</strong> (<em>独立集</em>) is a subset of vertices <math>I\subset V</math> such that no two distinct vertices of <math>I</math> are adjacent.
<ul>
<li>[<strong>Erdős–Rényi random graph</strong>]
Consider <math>G\sim G(n,p)</math> where <math>G(n,p)</math> is the Erdős–Rényi random graph model.
<ol>
<li>Let <math>p\in (0,1)</math>. A "triangle" in a graph is a clique of size <math>3</math>. Find the expected number of triangles in <math>G</math>. (Hint: use indicators and the linearity of expectation.)</li>
<li>Let <math>p\in (0,1)</math>. For any <math>2\leq q\leq n</math>, let the random variable <math>N_q</math> be the number of <math>q</math>-cliques. Here, a <math>q</math>-clique is a clique of size <math>q</math>. Find <math>\mathbf{E}[N_q]</math>. </li>
<li>Let <math>p = 1/2</math>. For an undirected graph <math>G</math>, define
<math>\alpha(G) = \max\{|S|:S \text{ is an independent set}\}</math>.
Show that when <math>n\rightarrow \infty</math>,
<math>\mathbf{Pr}[\alpha(G) \geq 3\log_2 n +1] \rightarrow 0</math>.
Also, please interpret this result in the context of social networks, in which the vertices represent people, and the edges represent friendship.</li>
</ol>
</li>
<li>[<strong>Random social networks</strong>]
Let <math>G = (V, E)</math> be a <strong>fixed</strong> undirected graph without isolating vertex.
Let <math>d_v</math> be the degree of vertex <math>v</math>. Let <math>Y</math> be a uniformly chosen vertex, and <math>Z</math> a uniformly chosen neighbor of <math>Y</math>.
<ol>
<li>Show that <math>\mathbf{E}[d_Z] \geq \mathbf{E}[d_Y]</math>.</li>
<li>Interpret this inequality in the context of social networks, in which the vertices represent people, and the edges represent friendship.</li>
</ol>
</li>
<li>[<strong>Turán's Theorem</strong>]
Let <math>G=(V,E)</math> be a <strong>fixed</strong> undirected graph, and write <math>d_v</math> for the degree of the vertex <math>v</math>. Use probablistic method to prove that <math>\alpha(G) \ge \sum_{v \in V} \frac{1}{d_v + 1}</math>. (Hint: Consider the following random procedure for generating an independent set <math>I</math> from a graph with vertex set <math>V</math>: First, generate a random permutation of the vertices, denoted as <math>v_1,v_2,\ldots,v_n</math>. Then, construct the independent set <math>I</math> as follows: For each vertex <math>v_i \in V</math>, add <math>v_i</math> to <math>I</math> if and only if none of its predecessors in the permutation, i.e., <math>v_1,\ldots,v_{i-1}</math>, are neighbors of <math>v_i</math>.)
</li>
</ul>
== Problem 6 (1D random walk) ==
Let <math>p \in (0,1)</math> be a constant, and <math>\{X_n\}_{n \ge 1}</math> be independent Bernoulli trials with successful probability <math>p</math>.
Define <math>S_n = 2\sum_{i=1}^n X_i - n</math> and <math>S_0 = 0</math>.
<ul>
<li>
[<strong>Range of random walk</strong>] The range <math>R_n</math> of <math>S_0, S_1, \ldots, S_n</math> is defined as the number of distinct values taken by the sequence. Show that <math>\mathbf{Pr}\left(R_n = R_{n-1}+1\right) = \mathbf{Pr}\left(\forall 1 \le i \le n, S_i \neq 0\right)</math> as <math>n \to \infty</math>, and deduce that <math>n^{-1} \mathbf{E}[R_n]\to \mathbf{Pr}(\forall i \ge 1, S_i \neq 0)</math>. Hence show that <math>n^{-1} \mathbf{E}[R_n] \to |2p-1|</math> as <math>n \to \infty</math>.
</li>
<li>
[<strong>Symmetric 1D random walk (IV)</strong>] Suppose <math>p=\frac{1}{2}</math>. Let <math>N_n</math> be the number of points that have been visited by <math>S</math> exactly once up to <math>n</math>, that is the size of set <math>\{0 \le i \le n\mid \forall 0 \le j \le n \text{ and } j \neq i,S_i \neq S_j\}</math>. Prove that <math>\mathbf{E}[N_n] = 2</math> for all <math>n \ge 1</math>.
</li>
<li>
[<strong>Symmetric 1D random walk (V)</strong>] Suppose <math>p = \frac{1}{2}</math>. Prove that <math>\mathbf{E}[|S_n|] = \Theta(\sqrt{n})</math>.
</li>
</ul>
</ul>

Latest revision as of 10:41, 7 April 2023

  • 每道题目的解答都要有完整的解题过程,中英文不限。
  • 我们推荐大家使用LaTeX, markdown等对作业进行排版。

Assumption throughout Problem Set 2

Without further notice, we are working on probability space [math]\displaystyle{ (\Omega,\mathcal{F},\mathbf{Pr}) }[/math].

Without further notice, we assume that the expectation of random variables are well-defined.

The term [math]\displaystyle{ \log }[/math] used in this context refers to the natural logarithm.

Problem 1 (Warm-up problems)

  • [Function of random variable (I)] Let [math]X[/math] be a random variable and [math]g:\mathbb{R} \to \mathbb{R}[/math] be a continuous and strictly increasing function. Show that [math]Y = g(X)[/math] is a random variable.
  • [Function of random variable (II)] Let [math]X[/math] be a random variable with distribution function [math]\max(0,\min(1,x))[/math]. Let [math]F[/math] be a distribution function which is continuous and strictly increasing. Show that [math]Y=F^{-1}(X)[/math] be a random variable with distribution function [math]F[/math].
  • [Marginal distribution] Let [math](X_1, X_2)[/math] be a random vector satisfying [math]\mathbf{Pr}[(X_1,X_2) = (0,0)] = \mathbf{Pr}[(X_1,X_2) = (1,0)] = \mathbf{Pr}[(X_1,X_2)=(0,1)]=\frac{1}{3}[/math]. Find out the marginal distribution of [math]X_1[/math].
  • [Independence] Show that discrete random variables [math]X[/math] and [math]Y[/math] are independent if and only if [math]p_{X,Y}(x,y)[/math] can be written as [math]g(x) h(y)[/math] for some function [math]g,h[/math], where [math]p_{X,Y}[/math] is the joint mass function of [math](X,Y)[/math].
  • [Entropy of discrete random variable] Let [math]X[/math] be a discrete random variable with range of values [math][N] = \{1,2,\ldots,N\}[/math] and probability mass function [math]p[/math]. Define [math]H(X) = -\sum_{n \ge 1} p(n) \log p(n)[/math] with convention [math]0\log 0 = 0[/math]. Prove that [math]H(X) \le \log N[/math] using Jensen's inequality.
  • [Law of total expectation] Let [math]X \sim \mathrm{Geom}(p)[/math] for some parameter [math]p \in (0,1)[/math]. Calculate [math]\mathbf{E}[X][/math] using the law of total expectation.
  • [Random number of random variables] Let [math]\displaystyle{ \{X_n\}_{n \ge 1} }[/math] be identically distributed random variable and [math]\displaystyle{ N }[/math] be a random variable taking values in the non-negative integers and independent of the [math]\displaystyle{ X_n }[/math] for all [math]\displaystyle{ n \ge 1 }[/math]. Prove that [math]\displaystyle{ \mathbf{E}\left[\sum_{i=1}^N X_i\right] = \mathbf{E}[N] \mathbf{E}[X_1] }[/math].

Problem 2 (Distribution of random variable)

  • [Cumulative distribution function (CDF)] Let [math]\displaystyle{ X }[/math] be a random variable with cumulative distribution function [math]\displaystyle{ F }[/math].
    1. Show that [math]\displaystyle{ Y = aX+b }[/math] is a random variable where [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] are real constants, and express the CDF of [math]\displaystyle{ Y }[/math] by [math]\displaystyle{ F }[/math]. (Hint: Try expressing the event [math]Y=aX+b\le y[/math] by countably many set operations on the events defined on [math]X[/math].)
    2. Let [math]\displaystyle{ G }[/math] be the CDF of random variable [math]\displaystyle{ Z:\Omega\rightarrow \mathbb{R} }[/math] and [math]\displaystyle{ 0\leq \lambda \leq 1 }[/math], show that
      • [math]\displaystyle{ \lambda F + (1-\lambda)G }[/math] is a CDF function.
      • The product [math]\displaystyle{ FG }[/math] is a CDF function, and if [math]\displaystyle{ Z }[/math] and [math]\displaystyle{ X }[/math] are independent, then [math]\displaystyle{ FG }[/math] is the CDF of [math]\displaystyle{ \max\{X,Z\} }[/math].
  • [Probability mass function (PMF)] We toss [math]\displaystyle{ n }[/math] coins, and each one shows heads with probability [math]\displaystyle{ p }[/math], independently of each of the others. Each coin which shows head is tossed again. (If the coin shows tail, it won't be tossed again.) Let [math]\displaystyle{ X }[/math] be the number of heads resulting from the second round of tosses, and [math]\displaystyle{ Y }[/math] be the number of heads resulting from all tosses, which includes the first and (possible) second round of each toss.
    1. Find the PMF of [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math].
    2. Find [math]\displaystyle{ \mathbf{E}[X] }[/math] and [math]\displaystyle{ \mathbf{E}[Y] }[/math].
    3. Let [math]\displaystyle{ p_X }[/math] be the PMF of [math]\displaystyle{ X }[/math], show that [math]p_X(k-1)p_X(k+1)\leq [p_X(k)]^2[/math] for [math]\displaystyle{ 1\leq k \leq n-1 }[/math].

Problem 3 (Discrete random variable)

  • [Geometric distribution (I)] Every package of some intrinsically dull commodity includes a small and exciting plastic object. There are [math]c[/math] different types of object, and each package is equally likely to contain any given type. You buy one package each day.

    1. Find the expected number of days which elapse between the acquisitions of the [math]j[/math]-th new type of object and the [math](j + 1)[/math]-th new type.
    2. Find the expected number of days which elapse before you have a full set of objects.
  • [Geometric distribution (II)] Prove that geometry distribution is the only discrete memoryless distribution with range values [math]\displaystyle{ \mathbb{N}_+ }[/math].
  • [Binomial distribution] Let [math]\displaystyle{ n_1,n_2 \in \mathbb{N}_+ }[/math] and [math]\displaystyle{ 0 \le p \le 1 }[/math] be parameters, and [math]\displaystyle{ X \sim \mathrm{Bin}(n_1,p),Y \sim \mathrm{Bin}(n_2,p) }[/math] be independent random variables. Prove that [math]\displaystyle{ X+Y \sim \mathrm{Bin}(n_1+n_2,p) }[/math].
  • [Negative binomial distribution] Let [math]\displaystyle{ X }[/math] follows the negative binomial distribution with parameter [math]\displaystyle{ r \in \mathbb{N}_+ }[/math] and [math]\displaystyle{ p \in (0,1) }[/math]. Calculate [math]\displaystyle{ \mathbf{Var}[X] = \mathbf{E}[X^2] - \left(\mathbf{E}[X]\right)^2 }[/math].
  • [Hypergeometric distribution] An urn contains [math]N[/math] balls, [math]b[/math] of which are blue and [math]r = N -b[/math] of which are red. A random sample of [math]n[/math] balls is drawn without replacement (无放回) from the urn. Let [math]B[/math] the number of blue balls in this sample. Show that if [math]N, b[/math], and [math]r[/math] approach [math]+\infty[/math] in such a way that [math]b/N \rightarrow p[/math] and [math]r/N \rightarrow 1 - p[/math], then [math]\mathbf{Pr}(B = k) \rightarrow {n\choose k}p^k(1-p)^{n-k}[/math] for [math]0\leq k \leq n[/math].
  • [Poisson distribution] In your pocket is a random number [math]\displaystyle{ N }[/math] of coins, where [math]\displaystyle{ N }[/math] has the Poisson distribution with parameter [math]\displaystyle{ \lambda }[/math]. You toss each coin once, with heads showing with probability [math]\displaystyle{ p }[/math] each time. Let [math]\displaystyle{ X }[/math] be the (random) number of heads outcomes and [math]\displaystyle{ Y }[/math] be the (also random) number of tails.
    1. Find the joint mass function of [math]\displaystyle{ (X,Y) }[/math].
    2. Find PMF of the marginal distribution of [math]\displaystyle{ X }[/math] in [math]\displaystyle{ (X,Y) }[/math]. Are [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] independent?
  • [Conditional distribution (I)] Let [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] be independent [math]\displaystyle{ \text{Bin}(n, p) }[/math] random variables, and let [math]\displaystyle{ Z = X + Y }[/math]. Show that the conditional distribution of [math]\displaystyle{ X }[/math] given [math]\displaystyle{ Z = N }[/math] is the hypergeometric distribution.
  • [Conditional distribution (II)] Let [math]\displaystyle{ \lambda,\mu \gt 0 }[/math] and [math]\displaystyle{ n \in \mathbb{N} }[/math] be parameters, and [math]\displaystyle{ X \sim \mathrm{Pois}(\lambda), Y \sim \mathrm{Pois}(\mu) }[/math] be independent random variables. Find out the conditional distribution of [math]\displaystyle{ X }[/math], given [math]\displaystyle{ X+Y = n }[/math].

Problem 4 (Linearity of Expectation)

  • [Inversion] Given a sequence of [math]\displaystyle{ n }[/math] elements [math]\displaystyle{ a_1, a_2, \ldots, a_n }[/math], an inversion is a pair of integer [math]\displaystyle{ (i, j) }[/math] such that [math]\displaystyle{ 1 \le i \lt j \le n }[/math] and [math]\displaystyle{ a_i \gt a_j }[/math]. For instance, in the sequence [math]\displaystyle{ a = [1,2,5,1,3] }[/math] there are three inversions: [math]\displaystyle{ (2,4), (3,4), (3,5) }[/math]. Suppose we choose a sequence [math]\displaystyle{ \rho }[/math] from [math]\displaystyle{ [q]^n }[/math] uniformly at random, where [math]\displaystyle{ n }[/math] and [math]\displaystyle{ q }[/math] are given integers with [math]\displaystyle{ q \ge 1 }[/math]. Find the expected number of inversions in [math]\displaystyle{ \rho }[/math].
  • [Number of cycles] At a banquet, there are [math]\displaystyle{ n }[/math] people who shake hands according to the following process: In each round, two idle hands are randomly selected and shaken (these two hands are no longer idle). After [math]\displaystyle{ n }[/math] rounds, there will be no idle hands left, and the [math]\displaystyle{ n }[/math] people will form several cycles. For example, when [math]\displaystyle{ n=3 }[/math], the following situation may occur: the left and right hands of the first person are held together, the left hand of the second person and the right hand of the third person are held together, and the right hand of the second person and the left hand of the third person are held together. In this case, three people form two cycles. How many cycles are expected to be formed after [math]\displaystyle{ n }[/math] rounds?
  • [Connected vertices] Consider the perfect binary tree of depth [math]\displaystyle{ n }[/math]. Suppose that you delete each edge of the tree independently with probability [math]\displaystyle{ 1/2 }[/math]. Let [math]\displaystyle{ X }[/math] be the number of nodes which are still connected to the root after the deletion of the edges. Compute the expectation of [math]\displaystyle{ X }[/math].

Problem 5 (Probability meets graph theory)

In this part we will work on undirected simple graphs. For any [math]\displaystyle{ G = (V,E) }[/math], a clique (团,完全子图) is a subset of vertices [math]\displaystyle{ C\subset V }[/math], such that every two distinct vertices of [math]\displaystyle{ C }[/math] are adjacent, and an independent set (独立集) is a subset of vertices [math]\displaystyle{ I\subset V }[/math] such that no two distinct vertices of [math]\displaystyle{ I }[/math] are adjacent.

  • [Erdős–Rényi random graph] Consider [math]\displaystyle{ G\sim G(n,p) }[/math] where [math]\displaystyle{ G(n,p) }[/math] is the Erdős–Rényi random graph model.
    1. Let [math]\displaystyle{ p\in (0,1) }[/math]. A "triangle" in a graph is a clique of size [math]\displaystyle{ 3 }[/math]. Find the expected number of triangles in [math]\displaystyle{ G }[/math]. (Hint: use indicators and the linearity of expectation.)
    2. Let [math]\displaystyle{ p\in (0,1) }[/math]. For any [math]\displaystyle{ 2\leq q\leq n }[/math], let the random variable [math]\displaystyle{ N_q }[/math] be the number of [math]\displaystyle{ q }[/math]-cliques. Here, a [math]\displaystyle{ q }[/math]-clique is a clique of size [math]\displaystyle{ q }[/math]. Find [math]\displaystyle{ \mathbf{E}[N_q] }[/math].
    3. Let [math]\displaystyle{ p = 1/2 }[/math]. For an undirected graph [math]\displaystyle{ G }[/math], define [math]\displaystyle{ \alpha(G) = \max\{|S|:S \text{ is an independent set}\} }[/math]. Show that when [math]\displaystyle{ n\rightarrow \infty }[/math], [math]\displaystyle{ \mathbf{Pr}[\alpha(G) \geq 3\log_2 n +1] \rightarrow 0 }[/math]. Also, please interpret this result in the context of social networks, in which the vertices represent people, and the edges represent friendship.
  • [Random social networks] Let [math]\displaystyle{ G = (V, E) }[/math] be a fixed undirected graph without isolating vertex. Let [math]\displaystyle{ d_v }[/math] be the degree of vertex [math]\displaystyle{ v }[/math]. Let [math]\displaystyle{ Y }[/math] be a uniformly chosen vertex, and [math]\displaystyle{ Z }[/math] a uniformly chosen neighbor of [math]\displaystyle{ Y }[/math].
    1. Show that [math]\displaystyle{ \mathbf{E}[d_Z] \geq \mathbf{E}[d_Y] }[/math].
    2. Interpret this inequality in the context of social networks, in which the vertices represent people, and the edges represent friendship.
  • [Turán's Theorem] Let [math]\displaystyle{ G=(V,E) }[/math] be a fixed undirected graph, and write [math]\displaystyle{ d_v }[/math] for the degree of the vertex [math]\displaystyle{ v }[/math]. Use probablistic method to prove that [math]\displaystyle{ \alpha(G) \ge \sum_{v \in V} \frac{1}{d_v + 1} }[/math]. (Hint: Consider the following random procedure for generating an independent set [math]\displaystyle{ I }[/math] from a graph with vertex set [math]\displaystyle{ V }[/math]: First, generate a random permutation of the vertices, denoted as [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math]. Then, construct the independent set [math]\displaystyle{ I }[/math] as follows: For each vertex [math]\displaystyle{ v_i \in V }[/math], add [math]\displaystyle{ v_i }[/math] to [math]\displaystyle{ I }[/math] if and only if none of its predecessors in the permutation, i.e., [math]\displaystyle{ v_1,\ldots,v_{i-1} }[/math], are neighbors of [math]\displaystyle{ v_i }[/math].)

Problem 6 (1D random walk)

Let [math]\displaystyle{ p \in (0,1) }[/math] be a constant, and [math]\displaystyle{ \{X_n\}_{n \ge 1} }[/math] be independent Bernoulli trials with successful probability [math]\displaystyle{ p }[/math]. Define [math]\displaystyle{ S_n = 2\sum_{i=1}^n X_i - n }[/math] and [math]\displaystyle{ S_0 = 0 }[/math].

  • [Range of random walk] The range [math]\displaystyle{ R_n }[/math] of [math]\displaystyle{ S_0, S_1, \ldots, S_n }[/math] is defined as the number of distinct values taken by the sequence. Show that [math]\displaystyle{ \mathbf{Pr}\left(R_n = R_{n-1}+1\right) = \mathbf{Pr}\left(\forall 1 \le i \le n, S_i \neq 0\right) }[/math] as [math]\displaystyle{ n \to \infty }[/math], and deduce that [math]\displaystyle{ n^{-1} \mathbf{E}[R_n]\to \mathbf{Pr}(\forall i \ge 1, S_i \neq 0) }[/math]. Hence show that [math]\displaystyle{ n^{-1} \mathbf{E}[R_n] \to |2p-1| }[/math] as [math]\displaystyle{ n \to \infty }[/math].
  • [Symmetric 1D random walk (IV)] Suppose [math]\displaystyle{ p=\frac{1}{2} }[/math]. Let [math]\displaystyle{ N_n }[/math] be the number of points that have been visited by [math]\displaystyle{ S }[/math] exactly once up to [math]\displaystyle{ n }[/math], that is the size of set [math]\displaystyle{ \{0 \le i \le n\mid \forall 0 \le j \le n \text{ and } j \neq i,S_i \neq S_j\} }[/math]. Prove that [math]\displaystyle{ \mathbf{E}[N_n] = 2 }[/math] for all [math]\displaystyle{ n \ge 1 }[/math].
  • [Symmetric 1D random walk (V)] Suppose [math]\displaystyle{ p = \frac{1}{2} }[/math]. Prove that [math]\displaystyle{ \mathbf{E}[|S_n|] = \Theta(\sqrt{n}) }[/math].