Difference between revisions of "高级算法 (Fall 2019)/Concentration of measure"
(Created page with "= Conditional Expectations = The '''conditional expectation''' of a random variable <math>Y</math> with respect to an event <math>\mathcal{E}</math> is defined by :<math> \mat...") 
(No difference)

Revision as of 05:52, 8 October 2019
Conditional Expectations
The conditional expectation of a random variable [math]Y[/math] with respect to an event [math]\mathcal{E}[/math] is defined by
 [math] \mathbf{E}[Y\mid \mathcal{E}]=\sum_{y}y\Pr[Y=y\mid\mathcal{E}]. [/math]
In particular, if the event [math]\mathcal{E}[/math] is [math]X=a[/math], the conditional expectation
 [math] \mathbf{E}[Y\mid X=a] [/math]
defines a function
 [math] f(a)=\mathbf{E}[Y\mid X=a]. [/math]
Thus, [math]\mathbf{E}[Y\mid X][/math] can be regarded as a random variable [math]f(X)[/math].
 Example
 Suppose that we uniformly sample a human from all human beings. Let [math]Y[/math] be his/her height, and let [math]X[/math] be the country where he/she is from. For any country [math]a[/math], [math]\mathbf{E}[Y\mid X=a][/math] gives the average height of that country. And [math]\mathbf{E}[Y\mid X][/math] is the random variable which can be defined in either ways:
 We choose a human uniformly at random from all human beings, and [math]\mathbf{E}[Y\mid X][/math] is the average height of the country where he/she comes from.
 We choose a country at random with a probability proportional to its population, and [math]\mathbf{E}[Y\mid X][/math] is the average height of the chosen country.
The following proposition states some fundamental facts about conditional expectation.
Proposition (fundamental facts about conditional expectation)  Let [math]X,Y[/math] and [math]Z[/math] be arbitrary random variables. Let [math]f[/math] and [math]g[/math] be arbitrary functions. Then
 [math]\mathbf{E}[X]=\mathbf{E}[\mathbf{E}[X\mid Y]][/math].
 [math]\mathbf{E}[X\mid Z]=\mathbf{E}[\mathbf{E}[X\mid Y,Z]\mid Z][/math].
 [math]\mathbf{E}[\mathbf{E}[f(X)g(X,Y)\mid X]]=\mathbf{E}[f(X)\cdot \mathbf{E}[g(X,Y)\mid X]][/math].
 Let [math]X,Y[/math] and [math]Z[/math] be arbitrary random variables. Let [math]f[/math] and [math]g[/math] be arbitrary functions. Then
The proposition can be formally verified by computing these expectations. Although these equations look formal, the intuitive interpretations to them are very clear.
The first equation:
 [math]\mathbf{E}[X]=\mathbf{E}[\mathbf{E}[X\mid Y]][/math]
says that there are two ways to compute an average. Suppose again that [math]X[/math] is the height of a uniform random human and [math]Y[/math] is the country where he/she is from. There are two ways to compute the average human height: one is to directly average over the heights of all humans; the other is that first compute the average height for each country, and then average over these heights weighted by the populations of the countries.
The second equation:
 [math]\mathbf{E}[X\mid Z]=\mathbf{E}[\mathbf{E}[X\mid Y,Z]\mid Z][/math]
is the same as the first one, restricted to a particular subspace. As the previous example, inaddition to the height [math]X[/math] and the country [math]Y[/math], let [math]Z[/math] be the gender of the individual. Thus, [math]\mathbf{E}[X\mid Z][/math] is the average height of a human being of a given sex. Again, this can be computed either directly or on a countrybycountry basis.
The third equation:
 [math]\mathbf{E}[\mathbf{E}[f(X)g(X,Y)\mid X]]=\mathbf{E}[f(X)\cdot \mathbf{E}[g(X,Y)\mid X]][/math].
looks obscure at the first glance, especially when considering that [math]X[/math] and [math]Y[/math] are not necessarily independent. Nevertheless, the equation follows the simple fact that conditioning on any [math]X=a[/math], the function value [math]f(X)=f(a)[/math] becomes a constant, thus can be safely taken outside the expectation due to the linearity of expectation. For any value [math]X=a[/math],
 [math] \mathbf{E}[f(X)g(X,Y)\mid X=a]=\mathbf{E}[f(a)g(X,Y)\mid X=a]=f(a)\cdot \mathbf{E}[g(X,Y)\mid X=a]. [/math]
The proposition holds in more general cases when [math]X, Y[/math] and [math]Z[/math] are a sequence of random variables.