# Review of conditional expectations

The conditional expectation of a random variable ${\displaystyle Y}$ with respect to an event ${\displaystyle {\mathcal {E}}}$ is defined by

${\displaystyle \mathbf {E} [Y\mid {\mathcal {E}}]=\sum _{y}y\Pr[Y=y\mid {\mathcal {E}}].}$

In particular, if the event ${\displaystyle {\mathcal {E}}}$ is ${\displaystyle X=a}$, the conditional expectation

${\displaystyle \mathbf {E} [Y\mid X=a]}$

defines a function

${\displaystyle f(a)=\mathbf {E} [Y\mid X=a].}$

Thus, ${\displaystyle \mathbf {E} [Y\mid X]}$ can be regarded as a random variable ${\displaystyle f(X)}$.

Example
Suppose that we uniformly sample a human from all human beings. Let ${\displaystyle Y}$ be his/her height, and let ${\displaystyle X}$ be the country where he/she is from. For any country ${\displaystyle a}$, ${\displaystyle \mathbf {E} [Y\mid X=a]}$ gives the average height of that country. And ${\displaystyle \mathbf {E} [Y\mid X]}$ is the random variable which can be defined in either ways:
• We choose a human uniformly at random from all human beings, and ${\displaystyle \mathbf {E} [Y\mid X]}$ is the average height of the country where he/she comes from.
• We choose a country at random with a probability proportional to its population, and ${\displaystyle \mathbf {E} [Y\mid X]}$ is the average height of the chosen country.

The following proposition states some fundamental facts about conditional expectation.

 Proposition (fundamental facts about conditional expectation) Let ${\displaystyle X,Y}$ and ${\displaystyle Z}$ be arbitrary random variables. Let ${\displaystyle f}$ and ${\displaystyle g}$ be arbitrary functions. Then ${\displaystyle \mathbf {E} [X]=\mathbf {E} [\mathbf {E} [X\mid Y]]}$. ${\displaystyle \mathbf {E} [X\mid Z]=\mathbf {E} [\mathbf {E} [X\mid Y,Z]\mid Z]}$. ${\displaystyle \mathbf {E} [\mathbf {E} [f(X)g(X,Y)\mid X]]=\mathbf {E} [f(X)\cdot \mathbf {E} [g(X,Y)\mid X]]}$.

The proposition can be formally verified by computing these expectations. Although these equations look formal, the intuitive interpretations to them are very clear.

The first equation:

${\displaystyle \mathbf {E} [X]=\mathbf {E} [\mathbf {E} [X\mid Y]]}$

says that there are two ways to compute an average. Suppose again that ${\displaystyle X}$ is the height of a uniform random human and ${\displaystyle Y}$ is the country where he/she is from. There are two ways to compute the average human height: one is to directly average over the heights of all humans; the other is that first compute the average height for each country, and then average over these heights weighted by the populations of the countries.

The second equation:

${\displaystyle \mathbf {E} [X\mid Z]=\mathbf {E} [\mathbf {E} [X\mid Y,Z]\mid Z]}$

is the same as the first one, restricted to a particular subspace. As the previous example, inaddition to the height ${\displaystyle X}$ and the country ${\displaystyle Y}$, let ${\displaystyle Z}$ be the gender of the individual. Thus, ${\displaystyle \mathbf {E} [X\mid Z]}$ is the average height of a human being of a given sex. Again, this can be computed either directly or on a country-by-country basis.

The third equation:

${\displaystyle \mathbf {E} [\mathbf {E} [f(X)g(X,Y)\mid X]]=\mathbf {E} [f(X)\cdot \mathbf {E} [g(X,Y)\mid X]]}$.

looks obscure at the first glance, especially when considering that ${\displaystyle X}$ and ${\displaystyle Y}$ are not necessarily independent. Nevertheless, the equation follows the simple fact that conditioning on any ${\displaystyle X=a}$, the function value ${\displaystyle f(X)=f(a)}$ becomes a constant, thus can be safely taken outside the expectation due to the linearity of expectation. For any value ${\displaystyle X=a}$,

${\displaystyle \mathbf {E} [f(X)g(X,Y)\mid X=a]=\mathbf {E} [f(a)g(X,Y)\mid X=a]=f(a)\cdot \mathbf {E} [g(X,Y)\mid X=a].}$

The proposition holds in more general cases when ${\displaystyle X,Y}$ and ${\displaystyle Z}$ are a sequence of random variables.

# Martingales

"Martingale" originally refers to a betting strategy in which the gambler doubles his bet after every loss. Assuming unlimited wealth, this strategy is guaranteed to eventually have a positive net profit. For example, starting from an initial stake 1, after ${\displaystyle n}$ losses, if the ${\displaystyle (n+1)}$th bet wins, then it gives a net profit of

${\displaystyle 2^{n}-\sum _{i=1}^{n}2^{i-1}=1,}$

which is a positive number.

However, the assumption of unlimited wealth is unrealistic. For limited wealth, with geometrically increasing bet, it is very likely to end up bankrupt. You should never try this strategy in real life. And remember: gambling is bad!

Suppose that the gambler is allowed to use any strategy. His stake on the next beting is decided based on the results of all the bettings so far. This gives us a highly dependent sequence of random variables ${\displaystyle X_{0},X_{1},\ldots ,}$, where ${\displaystyle X_{0}}$ is his initial capital, and ${\displaystyle X_{i}}$ represents his capital after the ${\displaystyle i}$th betting. Up to different betting strategies, ${\displaystyle X_{i}}$ can be arbitrarily dependent on ${\displaystyle X_{0},\ldots ,X_{i-1}}$. However, as long as the game is fair, namely, winning and losing with equal chances, conditioning on the past variables ${\displaystyle X_{0},\ldots ,X_{i-1}}$, we will expect no change in the value of the present variable ${\displaystyle X_{i}}$ on average. Random variables satisfying this property is called a martingale sequence.

 Definition (martingale) A sequence of random variables ${\displaystyle X_{0},X_{1},\ldots }$ is a martingale if for all ${\displaystyle i>0}$, {\displaystyle {\begin{aligned}\mathbf {E} [X_{i}\mid X_{0},\ldots ,X_{i-1}]=X_{i-1}.\end{aligned}}}
Example (coin flips)
A fair coin is flipped for a number of times. Let ${\displaystyle Z_{j}\in \{-1,1\}}$ denote the outcome of the ${\displaystyle j}$th flip. Let
${\displaystyle X_{0}=0\quad {\mbox{ and }}\quad X_{i}=\sum _{j\leq i}Z_{j}}$.
The random variables ${\displaystyle X_{0},X_{1},\ldots }$ defines a martingale.
Proof
We first observe that ${\displaystyle \mathbf {E} [X_{i}\mid X_{0},\ldots ,X_{i-1}]=\mathbf {E} [X_{i}\mid X_{i-1}]}$, which intuitively says that the next number of HEADs depends only on the current number of HEADs. This property is also called the Markov property in statistic processes.
{\displaystyle {\begin{aligned}\mathbf {E} [X_{i}\mid X_{0},\ldots ,X_{i-1}]&=\mathbf {E} [X_{i}\mid X_{i-1}]\\&=\mathbf {E} [X_{i-1}+Z_{i}\mid X_{i-1}]\\&=\mathbf {E} [X_{i-1}\mid X_{i-1}]+\mathbf {E} [Z_{i}\mid X_{i-1}]\\&=X_{i-1}+\mathbf {E} [Z_{i}\mid X_{i-1}]\\&=X_{i-1}+\mathbf {E} [Z_{i}]&\quad ({\mbox{independence of coin flips}})\\&=X_{i-1}\end{aligned}}}
Example (Polya's urn scheme)
Consider an urn (just a container) that initially contains ${\displaystyle b}$ balck balls and ${\displaystyle w}$ white balls. At each step, we uniformly select a ball from the urn, and replace the ball with ${\displaystyle c}$ balls of the same color. Let ${\displaystyle X_{0}=b/(b+w)}$, and ${\displaystyle X_{i}}$ be the fraction of black balls in the urn after the ${\displaystyle i}$th step. The sequence ${\displaystyle X_{0},X_{1},\ldots }$ is a martingale.
Example (edge exposure in a random graph)
Consider a random graph ${\displaystyle G}$ generated as follows. Let ${\displaystyle [n]}$ be the set of vertices, and let ${\displaystyle [m]={[n] \choose 2}}$ be the set of all possible edges. For convenience, we enumerate these potential edges by ${\displaystyle e_{1},\ldots ,e_{m}}$. For each potential edge ${\displaystyle e_{j}}$, we independently flip a fair coin to decide whether the edge ${\displaystyle e_{j}}$ appears in ${\displaystyle G}$. Let ${\displaystyle I_{j}}$ be the random variable that indicates whether ${\displaystyle e_{j}\in G}$. We are interested in some graph-theoretical parameter, say chromatic number, of the random graph ${\displaystyle G}$. Let ${\displaystyle \chi (G)}$ be the chromatic number of ${\displaystyle G}$. Let ${\displaystyle X_{0}=\mathbf {E} [\chi (G)]}$, and for each ${\displaystyle i\geq 1}$, let ${\displaystyle X_{i}=\mathbf {E} [\chi (G)\mid I_{1},\ldots ,I_{i}]}$, namely, the expected chromatic number of the random graph after fixing the first ${\displaystyle i}$ edges. This process is called edges exposure of a random graph, as we "exposing" the edges one by one in a random grpah.
360px
As shown by the above figure, the sequence ${\displaystyle X_{0},X_{1},\ldots ,X_{m}}$ is a martingale. In particular, ${\displaystyle X_{0}=\mathbf {E} [\chi (G)]}$, and ${\displaystyle X_{m}=\chi (G)}$. The martingale ${\displaystyle X_{0},X_{1},\ldots ,X_{m}}$ moves from no information to full information (of the random graph ${\displaystyle G}$) in small steps.

It is nontrivial to formally verify that the edge exposure sequence for a random graph is a martingale. However, we will later see that this construction can be put into a more general context.