# Conditional Expectations

The conditional expectation of a random variable $Y$ with respect to an event $\mathcal{E}$ is defined by

$\mathbf{E}[Y\mid \mathcal{E}]=\sum_{y}y\Pr[Y=y\mid\mathcal{E}].$

In particular, if the event $\mathcal{E}$ is $X=a$, the conditional expectation

$\mathbf{E}[Y\mid X=a]$

defines a function

$f(a)=\mathbf{E}[Y\mid X=a].$

Thus, $\mathbf{E}[Y\mid X]$ can be regarded as a random variable $f(X)$.

Example
Suppose that we uniformly sample a human from all human beings. Let $Y$ be his/her height, and let $X$ be the country where he/she is from. For any country $a$, $\mathbf{E}[Y\mid X=a]$ gives the average height of that country. And $\mathbf{E}[Y\mid X]$ is the random variable which can be defined in either ways:
• We choose a human uniformly at random from all human beings, and $\mathbf{E}[Y\mid X]$ is the average height of the country where he/she comes from.
• We choose a country at random with a probability proportional to its population, and $\mathbf{E}[Y\mid X]$ is the average height of the chosen country.

The following proposition states some fundamental facts about conditional expectation.

 Proposition (fundamental facts about conditional expectation) Let $X,Y$ and $Z$ be arbitrary random variables. Let $f$ and $g$ be arbitrary functions. Then $\mathbf{E}[X]=\mathbf{E}[\mathbf{E}[X\mid Y]]$. $\mathbf{E}[X\mid Z]=\mathbf{E}[\mathbf{E}[X\mid Y,Z]\mid Z]$. $\mathbf{E}[\mathbf{E}[f(X)g(X,Y)\mid X]]=\mathbf{E}[f(X)\cdot \mathbf{E}[g(X,Y)\mid X]]$.

The proposition can be formally verified by computing these expectations. Although these equations look formal, the intuitive interpretations to them are very clear.

The first equation:

$\mathbf{E}[X]=\mathbf{E}[\mathbf{E}[X\mid Y]]$

says that there are two ways to compute an average. Suppose again that $X$ is the height of a uniform random human and $Y$ is the country where he/she is from. There are two ways to compute the average human height: one is to directly average over the heights of all humans; the other is that first compute the average height for each country, and then average over these heights weighted by the populations of the countries.

The second equation:

$\mathbf{E}[X\mid Z]=\mathbf{E}[\mathbf{E}[X\mid Y,Z]\mid Z]$

is the same as the first one, restricted to a particular subspace. As the previous example, inaddition to the height $X$ and the country $Y$, let $Z$ be the gender of the individual. Thus, $\mathbf{E}[X\mid Z]$ is the average height of a human being of a given sex. Again, this can be computed either directly or on a country-by-country basis.

The third equation:

$\mathbf{E}[\mathbf{E}[f(X)g(X,Y)\mid X]]=\mathbf{E}[f(X)\cdot \mathbf{E}[g(X,Y)\mid X]]$.

looks obscure at the first glance, especially when considering that $X$ and $Y$ are not necessarily independent. Nevertheless, the equation follows the simple fact that conditioning on any $X=a$, the function value $f(X)=f(a)$ becomes a constant, thus can be safely taken outside the expectation due to the linearity of expectation. For any value $X=a$,

$\mathbf{E}[f(X)g(X,Y)\mid X=a]=\mathbf{E}[f(a)g(X,Y)\mid X=a]=f(a)\cdot \mathbf{E}[g(X,Y)\mid X=a].$

The proposition holds in more general cases when $X, Y$ and $Z$ are a sequence of random variables.

# Martingales

A martingale is a random sequence $X_0,X_1,\ldots$ satisfying the following so-called martingale property.

 Definition (martingale) A sequence of random variables $X_0,X_1,\ldots$ is a martingale if for all $i\gt 0$, \begin{align} \mathbf{E}[X_{i}\mid X_0,\ldots,X_{i-1}]=X_{i-1}. \end{align}

## Examples

coin flips
A fair coin is flipped for a number of times. Let $Z_j\in\{-1,1\}$ denote the outcome of the $j$th flip. Let
$X_0=0\quad \mbox{ and } \quad X_i=\sum_{j\le i}Z_j$.
The random variables $X_0,X_1,\ldots$ defines a martingale.
Proof.
 We first observe that $\mathbf{E}[X_i\mid X_0,\ldots,X_{i-1}] = \mathbf{E}[X_i\mid X_{i-1}]$, which intuitively says that the next number of HEADs depends only on the current number of HEADs. This property is also called the Markov property in statistic processes. \begin{align} \mathbf{E}[X_i\mid X_0,\ldots,X_{i-1}] &= \mathbf{E}[X_i\mid X_{i-1}]\\ &= \mathbf{E}[X_{i-1}+Z_{i}\mid X_{i-1}]\\ &= \mathbf{E}[X_{i-1}\mid X_{i-1}]+\mathbf{E}[Z_{i}\mid X_{i-1}]\\ &= X_{i-1}+\mathbf{E}[Z_{i}\mid X_{i-1}]\\ &= X_{i-1}+\mathbf{E}[Z_{i}] &\quad (\mbox{independence of coin flips})\\ &= X_{i-1} \end{align}
$\square$
edge exposure in a random graph
Consider a random graph $G$ generated as follows. Let $[n]$ be the set of vertices, and let $[m]={[n]\choose 2}$ be the set of all possible edges. For convenience, we enumerate these potential edges by $e_1,\ldots, e_m$. For each potential edge $e_j$, we independently flip a fair coin to decide whether the edge $e_j$ appears in $G$. Let $I_j$ be the random variable that indicates whether $e_j\in G$. We are interested in some graph-theoretical parameter, say chromatic number, of the random graph $G$. Let $\chi(G)$ be the chromatic number of $G$. Let $X_0=\mathbf{E}[\chi(G)]$, and for each $i\ge 1$, let $X_i=\mathbf{E}[\chi(G)\mid I_1,\ldots,I_{i}]$, namely, the expected chromatic number of the random graph after fixing the first $i$ edges. This process is called edges exposure of a random graph, as we "exposing" the edges one by one in a random graph.

It is nontrivial to formally verify that the edge exposure sequence for a random graph is a martingale. However, we will later see that this construction can be put into a more general context.

## Generalization

The martingale can be generalized to be with respect to another sequence of random variables.

 Definition (martingale, general version) A sequence of random variables $Y_0,Y_1,\ldots$ is a martingale with respect to the sequence $X_0,X_1,\ldots$ if, for all $i\ge 0$, the following conditions hold: $Y_i$ is a function of $X_0,X_1,\ldots,X_i$; \begin{align} \mathbf{E}[Y_{i+1}\mid X_0,\ldots,X_{i}]=Y_{i}. \end{align}

Therefore, a sequence $X_0,X_1,\ldots$ is a martingale if it is a martingale with respect to itself.

The purpose of this generalization is that we are usually more interested in a function of a sequence of random variables, rather than the sequence itself.

The following definition describes a very general approach for constructing an important type of martingales.

 Definition (The Doob sequence) The Doob sequence of a function $f$ with respect to a sequence of random variables $X_1,\ldots,X_n$ is defined by $Y_i=\mathbf{E}[f(X_1,\ldots,X_n)\mid X_1,\ldots,X_{i}], \quad 0\le i\le n.$ In particular, $Y_0=\mathbf{E}[f(X_1,\ldots,X_n)]$ and $Y_n=f(X_1,\ldots,X_n)$.

The Doob sequence of a function defines a martingale. That is

$\mathbf{E}[Y_i\mid X_1,\ldots,X_{i-1}]=Y_{i-1},$

for any $0\le i\le n$.

To prove this claim, we recall the definition that $Y_i=\mathbf{E}[f(X_1,\ldots,X_n)\mid X_1,\ldots,X_{i}]$, thus,

\begin{align} \mathbf{E}[Y_i\mid X_1,\ldots,X_{i-1}] &=\mathbf{E}[\mathbf{E}[f(X_1,\ldots,X_n)\mid X_1,\ldots,X_{i}]\mid X_1,\ldots,X_{i-1}]\\ &=\mathbf{E}[f(X_1,\ldots,X_n)\mid X_1,\ldots,X_{i-1}]\\ &=Y_{i-1}, \end{align}

where the second equation is due to the fundamental fact about conditional expectation introduced in the first section.

The Doob martingale describes a very natural procedure to determine a function value of a sequence of random variables. Suppose that we want to predict the value of a function $f(X_1,\ldots,X_n)$ of random variables $X_1,\ldots,X_n$. The Doob sequence $Y_0,Y_1,\ldots,Y_n$ represents a sequence of refined estimates of the value of $f(X_1,\ldots,X_n)$, gradually using more information on the values of the random variables $X_1,\ldots,X_n$. The first element $Y_0$ is just the expectation of $f(X_1,\ldots,X_n)$. Element $Y_i$ is the expected value of $f(X_1,\ldots,X_n)$ when the values of $X_1,\ldots,X_{i}$ are known, and $Y_n=f(X_1,\ldots,X_n)$ when $f(X_1,\ldots,X_n)$ is fully determined by $X_1,\ldots,X_n$.

The following two Doob martingales arise in evaluating the parameters of random graphs.

### edge exposure martingale

Let $G$ be a random graph on $n$ vertices. Let $f$ be a real-valued function of graphs, such as, chromatic number, number of triangles, the size of the largest clique or independent set, etc. Denote that $m={n\choose 2}$. Fix an arbitrary numbering of potential edges between the $n$ vertices, and denote the edges as $e_1,\ldots,e_m$. Let
$X_i=\begin{cases} 1& \mbox{if }e_i\in G,\\ 0& \mbox{otherwise}. \end{cases}$
Let $Y_0=\mathbf{E}[f(G)]$ and for $i=1,\ldots,m$, let $Y_i=\mathbf{E}[f(G)\mid X_1,\ldots,X_i]$.
The sequence $Y_0,Y_1,\ldots,Y_n$ gives a Doob martingale that is commonly called the edge exposure martingale.

### vertex exposure martingale

Instead of revealing edges one at a time, we could reveal the set of edges connected to a given vertex, one vertex at a time. Suppose that the vertex set is $[n]$. Let $X_i$ be the subgraph of $G$ induced by the vertex set $[i]$, i.e. the first $i$ vertices.
Let $Y_0=\mathbf{E}[f(G)]$ and for $i=1,\ldots,n$, let $Y_i=\mathbf{E}[f(G)\mid X_1,\ldots,X_i]$.
The sequence $Y_0,Y_1,\ldots,Y_n$ gives a Doob martingale that is commonly called the vertex exposure martingale.