# Conditional Expectations

The conditional expectation of a random variable $Y$ with respect to an event $\mathcal{E}$ is defined by

$\mathbf{E}[Y\mid \mathcal{E}]=\sum_{y}y\Pr[Y=y\mid\mathcal{E}].$

In particular, if the event $\mathcal{E}$ is $X=a$, the conditional expectation

$\mathbf{E}[Y\mid X=a]$

defines a function

$f(a)=\mathbf{E}[Y\mid X=a].$

Thus, $\mathbf{E}[Y\mid X]$ can be regarded as a random variable $f(X)$.

Example
Suppose that we uniformly sample a human from all human beings. Let $Y$ be his/her height, and let $X$ be the country where he/she is from. For any country $a$, $\mathbf{E}[Y\mid X=a]$ gives the average height of that country. And $\mathbf{E}[Y\mid X]$ is the random variable which can be defined in either ways:
• We choose a human uniformly at random from all human beings, and $\mathbf{E}[Y\mid X]$ is the average height of the country where he/she comes from.
• We choose a country at random with a probability proportional to its population, and $\mathbf{E}[Y\mid X]$ is the average height of the chosen country.

The following proposition states some fundamental facts about conditional expectation.

 Proposition (fundamental facts about conditional expectation) Let $X,Y$ and $Z$ be arbitrary random variables. Let $f$ and $g$ be arbitrary functions. Then $\mathbf{E}[X]=\mathbf{E}[\mathbf{E}[X\mid Y]]$. $\mathbf{E}[X\mid Z]=\mathbf{E}[\mathbf{E}[X\mid Y,Z]\mid Z]$. $\mathbf{E}[\mathbf{E}[f(X)g(X,Y)\mid X]]=\mathbf{E}[f(X)\cdot \mathbf{E}[g(X,Y)\mid X]]$.

The proposition can be formally verified by computing these expectations. Although these equations look formal, the intuitive interpretations to them are very clear.

The first equation:

$\mathbf{E}[X]=\mathbf{E}[\mathbf{E}[X\mid Y]]$

says that there are two ways to compute an average. Suppose again that $X$ is the height of a uniform random human and $Y$ is the country where he/she is from. There are two ways to compute the average human height: one is to directly average over the heights of all humans; the other is that first compute the average height for each country, and then average over these heights weighted by the populations of the countries.

The second equation:

$\mathbf{E}[X\mid Z]=\mathbf{E}[\mathbf{E}[X\mid Y,Z]\mid Z]$

is the same as the first one, restricted to a particular subspace. As the previous example, inaddition to the height $X$ and the country $Y$, let $Z$ be the gender of the individual. Thus, $\mathbf{E}[X\mid Z]$ is the average height of a human being of a given sex. Again, this can be computed either directly or on a country-by-country basis.

The third equation:

$\mathbf{E}[\mathbf{E}[f(X)g(X,Y)\mid X]]=\mathbf{E}[f(X)\cdot \mathbf{E}[g(X,Y)\mid X]]$.

looks obscure at the first glance, especially when considering that $X$ and $Y$ are not necessarily independent. Nevertheless, the equation follows the simple fact that conditioning on any $X=a$, the function value $f(X)=f(a)$ becomes a constant, thus can be safely taken outside the expectation due to the linearity of expectation. For any value $X=a$,

$\mathbf{E}[f(X)g(X,Y)\mid X=a]=\mathbf{E}[f(a)g(X,Y)\mid X=a]=f(a)\cdot \mathbf{E}[g(X,Y)\mid X=a].$

The proposition holds in more general cases when $X, Y$ and $Z$ are a sequence of random variables.