# Conditional Expectations

The conditional expectation of a random variable ${\displaystyle Y}$ with respect to an event ${\displaystyle {\mathcal {E}}}$ is defined by

${\displaystyle \mathbf {E} [Y\mid {\mathcal {E}}]=\sum _{y}y\Pr[Y=y\mid {\mathcal {E}}].}$

In particular, if the event ${\displaystyle {\mathcal {E}}}$ is ${\displaystyle X=a}$, the conditional expectation

${\displaystyle \mathbf {E} [Y\mid X=a]}$

defines a function

${\displaystyle f(a)=\mathbf {E} [Y\mid X=a].}$

Thus, ${\displaystyle \mathbf {E} [Y\mid X]}$ can be regarded as a random variable ${\displaystyle f(X)}$.

Example
Suppose that we uniformly sample a human from all human beings. Let ${\displaystyle Y}$ be his/her height, and let ${\displaystyle X}$ be the country where he/she is from. For any country ${\displaystyle a}$, ${\displaystyle \mathbf {E} [Y\mid X=a]}$ gives the average height of that country. And ${\displaystyle \mathbf {E} [Y\mid X]}$ is the random variable which can be defined in either ways:
• We choose a human uniformly at random from all human beings, and ${\displaystyle \mathbf {E} [Y\mid X]}$ is the average height of the country where he/she comes from.
• We choose a country at random with a probability proportional to its population, and ${\displaystyle \mathbf {E} [Y\mid X]}$ is the average height of the chosen country.

The following proposition states some fundamental facts about conditional expectation.

 Proposition (fundamental facts about conditional expectation) Let ${\displaystyle X,Y}$ and ${\displaystyle Z}$ be arbitrary random variables. Let ${\displaystyle f}$ and ${\displaystyle g}$ be arbitrary functions. Then ${\displaystyle \mathbf {E} [X]=\mathbf {E} [\mathbf {E} [X\mid Y]]}$. ${\displaystyle \mathbf {E} [X\mid Z]=\mathbf {E} [\mathbf {E} [X\mid Y,Z]\mid Z]}$. ${\displaystyle \mathbf {E} [\mathbf {E} [f(X)g(X,Y)\mid X]]=\mathbf {E} [f(X)\cdot \mathbf {E} [g(X,Y)\mid X]]}$.

The proposition can be formally verified by computing these expectations. Although these equations look formal, the intuitive interpretations to them are very clear.

The first equation:

${\displaystyle \mathbf {E} [X]=\mathbf {E} [\mathbf {E} [X\mid Y]]}$

says that there are two ways to compute an average. Suppose again that ${\displaystyle X}$ is the height of a uniform random human and ${\displaystyle Y}$ is the country where he/she is from. There are two ways to compute the average human height: one is to directly average over the heights of all humans; the other is that first compute the average height for each country, and then average over these heights weighted by the populations of the countries.

The second equation:

${\displaystyle \mathbf {E} [X\mid Z]=\mathbf {E} [\mathbf {E} [X\mid Y,Z]\mid Z]}$

is the same as the first one, restricted to a particular subspace. As the previous example, inaddition to the height ${\displaystyle X}$ and the country ${\displaystyle Y}$, let ${\displaystyle Z}$ be the gender of the individual. Thus, ${\displaystyle \mathbf {E} [X\mid Z]}$ is the average height of a human being of a given sex. Again, this can be computed either directly or on a country-by-country basis.

The third equation:

${\displaystyle \mathbf {E} [\mathbf {E} [f(X)g(X,Y)\mid X]]=\mathbf {E} [f(X)\cdot \mathbf {E} [g(X,Y)\mid X]]}$.

looks obscure at the first glance, especially when considering that ${\displaystyle X}$ and ${\displaystyle Y}$ are not necessarily independent. Nevertheless, the equation follows the simple fact that conditioning on any ${\displaystyle X=a}$, the function value ${\displaystyle f(X)=f(a)}$ becomes a constant, thus can be safely taken outside the expectation due to the linearity of expectation. For any value ${\displaystyle X=a}$,

${\displaystyle \mathbf {E} [f(X)g(X,Y)\mid X=a]=\mathbf {E} [f(a)g(X,Y)\mid X=a]=f(a)\cdot \mathbf {E} [g(X,Y)\mid X=a].}$

The proposition holds in more general cases when ${\displaystyle X,Y}$ and ${\displaystyle Z}$ are a sequence of random variables.