Conditional Expectations

The conditional expectation of a random variable $Y$ with respect to an event ${\mathcal {E}}$ is defined by

$\mathbf {E} [Y\mid {\mathcal {E}}]=\sum _{y}y\Pr[Y=y\mid {\mathcal {E}}].$ In particular, if the event ${\mathcal {E}}$ is $X=a$ , the conditional expectation

$\mathbf {E} [Y\mid X=a]$ defines a function

$f(a)=\mathbf {E} [Y\mid X=a].$ Thus, $\mathbf {E} [Y\mid X]$ can be regarded as a random variable $f(X)$ .

Example
Suppose that we uniformly sample a human from all human beings. Let $Y$ be his/her height, and let $X$ be the country where he/she is from. For any country $a$ , $\mathbf {E} [Y\mid X=a]$ gives the average height of that country. And $\mathbf {E} [Y\mid X]$ is the random variable which can be defined in either ways:
• We choose a human uniformly at random from all human beings, and $\mathbf {E} [Y\mid X]$ is the average height of the country where he/she comes from.
• We choose a country at random with a probability proportional to its population, and $\mathbf {E} [Y\mid X]$ is the average height of the chosen country.

The following proposition states some fundamental facts about conditional expectation.

 Proposition (fundamental facts about conditional expectation) Let $X,Y$ and $Z$ be arbitrary random variables. Let $f$ and $g$ be arbitrary functions. Then $\mathbf {E} [X]=\mathbf {E} [\mathbf {E} [X\mid Y]]$ . $\mathbf {E} [X\mid Z]=\mathbf {E} [\mathbf {E} [X\mid Y,Z]\mid Z]$ . $\mathbf {E} [\mathbf {E} [f(X)g(X,Y)\mid X]]=\mathbf {E} [f(X)\cdot \mathbf {E} [g(X,Y)\mid X]]$ .

The proposition can be formally verified by computing these expectations. Although these equations look formal, the intuitive interpretations to them are very clear.

The first equation:

$\mathbf {E} [X]=\mathbf {E} [\mathbf {E} [X\mid Y]]$ says that there are two ways to compute an average. Suppose again that $X$ is the height of a uniform random human and $Y$ is the country where he/she is from. There are two ways to compute the average human height: one is to directly average over the heights of all humans; the other is that first compute the average height for each country, and then average over these heights weighted by the populations of the countries.

The second equation:

$\mathbf {E} [X\mid Z]=\mathbf {E} [\mathbf {E} [X\mid Y,Z]\mid Z]$ is the same as the first one, restricted to a particular subspace. As the previous example, inaddition to the height $X$ and the country $Y$ , let $Z$ be the gender of the individual. Thus, $\mathbf {E} [X\mid Z]$ is the average height of a human being of a given sex. Again, this can be computed either directly or on a country-by-country basis.

The third equation:

$\mathbf {E} [\mathbf {E} [f(X)g(X,Y)\mid X]]=\mathbf {E} [f(X)\cdot \mathbf {E} [g(X,Y)\mid X]]$ .

looks obscure at the first glance, especially when considering that $X$ and $Y$ are not necessarily independent. Nevertheless, the equation follows the simple fact that conditioning on any $X=a$ , the function value $f(X)=f(a)$ becomes a constant, thus can be safely taken outside the expectation due to the linearity of expectation. For any value $X=a$ ,

$\mathbf {E} [f(X)g(X,Y)\mid X=a]=\mathbf {E} [f(a)g(X,Y)\mid X=a]=f(a)\cdot \mathbf {E} [g(X,Y)\mid X=a].$ The proposition holds in more general cases when $X,Y$ and $Z$ are a sequence of random variables.