Difference between revisions of "高级算法 (Fall 2019)/Concentration of measure"

From EtoneWiki
Jump to: navigation, search
(Created page with "= Conditional Expectations = The '''conditional expectation''' of a random variable <math>Y</math> with respect to an event <math>\mathcal{E}</math> is defined by :<math> \mat...")
(No difference)

Revision as of 05:52, 8 October 2019

Conditional Expectations

The conditional expectation of a random variable [math]Y[/math] with respect to an event [math]\mathcal{E}[/math] is defined by

[math] \mathbf{E}[Y\mid \mathcal{E}]=\sum_{y}y\Pr[Y=y\mid\mathcal{E}]. [/math]

In particular, if the event [math]\mathcal{E}[/math] is [math]X=a[/math], the conditional expectation

[math] \mathbf{E}[Y\mid X=a] [/math]

defines a function

[math] f(a)=\mathbf{E}[Y\mid X=a]. [/math]

Thus, [math]\mathbf{E}[Y\mid X][/math] can be regarded as a random variable [math]f(X)[/math].

Example
Suppose that we uniformly sample a human from all human beings. Let [math]Y[/math] be his/her height, and let [math]X[/math] be the country where he/she is from. For any country [math]a[/math], [math]\mathbf{E}[Y\mid X=a][/math] gives the average height of that country. And [math]\mathbf{E}[Y\mid X][/math] is the random variable which can be defined in either ways:
  • We choose a human uniformly at random from all human beings, and [math]\mathbf{E}[Y\mid X][/math] is the average height of the country where he/she comes from.
  • We choose a country at random with a probability proportional to its population, and [math]\mathbf{E}[Y\mid X][/math] is the average height of the chosen country.

The following proposition states some fundamental facts about conditional expectation.

Proposition (fundamental facts about conditional expectation)
Let [math]X,Y[/math] and [math]Z[/math] be arbitrary random variables. Let [math]f[/math] and [math]g[/math] be arbitrary functions. Then
  1. [math]\mathbf{E}[X]=\mathbf{E}[\mathbf{E}[X\mid Y]][/math].
  2. [math]\mathbf{E}[X\mid Z]=\mathbf{E}[\mathbf{E}[X\mid Y,Z]\mid Z][/math].
  3. [math]\mathbf{E}[\mathbf{E}[f(X)g(X,Y)\mid X]]=\mathbf{E}[f(X)\cdot \mathbf{E}[g(X,Y)\mid X]][/math].

The proposition can be formally verified by computing these expectations. Although these equations look formal, the intuitive interpretations to them are very clear.

The first equation:

[math]\mathbf{E}[X]=\mathbf{E}[\mathbf{E}[X\mid Y]][/math]

says that there are two ways to compute an average. Suppose again that [math]X[/math] is the height of a uniform random human and [math]Y[/math] is the country where he/she is from. There are two ways to compute the average human height: one is to directly average over the heights of all humans; the other is that first compute the average height for each country, and then average over these heights weighted by the populations of the countries.

The second equation:

[math]\mathbf{E}[X\mid Z]=\mathbf{E}[\mathbf{E}[X\mid Y,Z]\mid Z][/math]

is the same as the first one, restricted to a particular subspace. As the previous example, inaddition to the height [math]X[/math] and the country [math]Y[/math], let [math]Z[/math] be the gender of the individual. Thus, [math]\mathbf{E}[X\mid Z][/math] is the average height of a human being of a given sex. Again, this can be computed either directly or on a country-by-country basis.

The third equation:

[math]\mathbf{E}[\mathbf{E}[f(X)g(X,Y)\mid X]]=\mathbf{E}[f(X)\cdot \mathbf{E}[g(X,Y)\mid X]][/math].

looks obscure at the first glance, especially when considering that [math]X[/math] and [math]Y[/math] are not necessarily independent. Nevertheless, the equation follows the simple fact that conditioning on any [math]X=a[/math], the function value [math]f(X)=f(a)[/math] becomes a constant, thus can be safely taken outside the expectation due to the linearity of expectation. For any value [math]X=a[/math],

[math] \mathbf{E}[f(X)g(X,Y)\mid X=a]=\mathbf{E}[f(a)g(X,Y)\mid X=a]=f(a)\cdot \mathbf{E}[g(X,Y)\mid X=a]. [/math]

The proposition holds in more general cases when [math]X, Y[/math] and [math]Z[/math] are a sequence of random variables.