高级算法 (Fall 2021)/Conditional expectations

From EtoneWiki
Jump to: navigation, search

Conditional Expectations

The conditional expectation of a random variable [math]Y[/math] with respect to an event [math]\mathcal{E}[/math] is defined by

[math] \mathbf{E}[Y\mid \mathcal{E}]=\sum_{y}y\Pr[Y=y\mid\mathcal{E}]. [/math]

In particular, if the event [math]\mathcal{E}[/math] is [math]X=a[/math], the conditional expectation

[math] \mathbf{E}[Y\mid X=a] [/math]

defines a function

[math] f(a)=\mathbf{E}[Y\mid X=a]. [/math]

Thus, [math]\mathbf{E}[Y\mid X][/math] can be regarded as a random variable [math]f(X)[/math].

Example
Suppose that we uniformly sample a human from all human beings. Let [math]Y[/math] be his/her height, and let [math]X[/math] be the country where he/she is from. For any country [math]a[/math], [math]\mathbf{E}[Y\mid X=a][/math] gives the average height of that country. And [math]\mathbf{E}[Y\mid X][/math] is the random variable which can be defined in either ways:
  • We choose a human uniformly at random from all human beings, and [math]\mathbf{E}[Y\mid X][/math] is the average height of the country where he/she comes from.
  • We choose a country at random with a probability proportional to its population, and [math]\mathbf{E}[Y\mid X][/math] is the average height of the chosen country.

The following proposition states some fundamental facts about conditional expectation.

Proposition (fundamental facts about conditional expectation)
Let [math]X,Y[/math] and [math]Z[/math] be arbitrary random variables. Let [math]f[/math] and [math]g[/math] be arbitrary functions. Then
  1. [math]\mathbf{E}[X]=\mathbf{E}[\mathbf{E}[X\mid Y]][/math].
  2. [math]\mathbf{E}[X\mid Z]=\mathbf{E}[\mathbf{E}[X\mid Y,Z]\mid Z][/math].
  3. [math]\mathbf{E}[\mathbf{E}[f(X)g(X,Y)\mid X]]=\mathbf{E}[f(X)\cdot \mathbf{E}[g(X,Y)\mid X]][/math].

The proposition can be formally verified by computing these expectations. Although these equations look formal, the intuitive interpretations to them are very clear.

The first equation:

[math]\mathbf{E}[X]=\mathbf{E}[\mathbf{E}[X\mid Y]][/math]

says that there are two ways to compute an average. Suppose again that [math]X[/math] is the height of a uniform random human and [math]Y[/math] is the country where he/she is from. There are two ways to compute the average human height: one is to directly average over the heights of all humans; the other is that first compute the average height for each country, and then average over these heights weighted by the populations of the countries.

The second equation:

[math]\mathbf{E}[X\mid Z]=\mathbf{E}[\mathbf{E}[X\mid Y,Z]\mid Z][/math]

is the same as the first one, restricted to a particular subspace. As the previous example, inaddition to the height [math]X[/math] and the country [math]Y[/math], let [math]Z[/math] be the gender of the individual. Thus, [math]\mathbf{E}[X\mid Z][/math] is the average height of a human being of a given sex. Again, this can be computed either directly or on a country-by-country basis.

The third equation:

[math]\mathbf{E}[\mathbf{E}[f(X)g(X,Y)\mid X]]=\mathbf{E}[f(X)\cdot \mathbf{E}[g(X,Y)\mid X]][/math].

looks obscure at the first glance, especially when considering that [math]X[/math] and [math]Y[/math] are not necessarily independent. Nevertheless, the equation follows the simple fact that conditioning on any [math]X=a[/math], the function value [math]f(X)=f(a)[/math] becomes a constant, thus can be safely taken outside the expectation due to the linearity of expectation. For any value [math]X=a[/math],

[math] \mathbf{E}[f(X)g(X,Y)\mid X=a]=\mathbf{E}[f(a)g(X,Y)\mid X=a]=f(a)\cdot \mathbf{E}[g(X,Y)\mid X=a]. [/math]

The proposition holds in more general cases when [math]X, Y[/math] and [math]Z[/math] are a sequence of random variables.

Martingales

A martingale is a random sequence [math]X_0,X_1,\ldots[/math] satisfying the following so-called martingale property.

Definition (martingale)
A sequence of random variables [math]X_0,X_1,\ldots[/math] is a martingale if for all [math]i\gt 0[/math],
[math]\begin{align} \mathbf{E}[X_{i}\mid X_0,\ldots,X_{i-1}]=X_{i-1}. \end{align}[/math]

Examples

coin flips
A fair coin is flipped for a number of times. Let [math]Z_j\in\{-1,1\}[/math] denote the outcome of the [math]j[/math]th flip. Let
[math]X_0=0\quad \mbox{ and } \quad X_i=\sum_{j\le i}Z_j[/math].
The random variables [math]X_0,X_1,\ldots[/math] defines a martingale.
Proof.
We first observe that [math]\mathbf{E}[X_i\mid X_0,\ldots,X_{i-1}] = \mathbf{E}[X_i\mid X_{i-1}][/math], which intuitively says that the next number of HEADs depends only on the current number of HEADs. This property is also called the Markov property in statistic processes.
[math] \begin{align} \mathbf{E}[X_i\mid X_0,\ldots,X_{i-1}] &= \mathbf{E}[X_i\mid X_{i-1}]\\ &= \mathbf{E}[X_{i-1}+Z_{i}\mid X_{i-1}]\\ &= \mathbf{E}[X_{i-1}\mid X_{i-1}]+\mathbf{E}[Z_{i}\mid X_{i-1}]\\ &= X_{i-1}+\mathbf{E}[Z_{i}\mid X_{i-1}]\\ &= X_{i-1}+\mathbf{E}[Z_{i}] &\quad (\mbox{independence of coin flips})\\ &= X_{i-1} \end{align} [/math]
[math]\square[/math]
edge exposure in a random graph
Consider a random graph [math]G[/math] generated as follows. Let [math][n][/math] be the set of vertices, and let [math][m]={[n]\choose 2}[/math] be the set of all possible edges. For convenience, we enumerate these potential edges by [math]e_1,\ldots, e_m[/math]. For each potential edge [math]e_j[/math], we independently flip a fair coin to decide whether the edge [math]e_j[/math] appears in [math]G[/math]. Let [math]I_j[/math] be the random variable that indicates whether [math]e_j\in G[/math]. We are interested in some graph-theoretical parameter, say chromatic number, of the random graph [math]G[/math]. Let [math]\chi(G)[/math] be the chromatic number of [math]G[/math]. Let [math]X_0=\mathbf{E}[\chi(G)][/math], and for each [math]i\ge 1[/math], let [math]X_i=\mathbf{E}[\chi(G)\mid I_1,\ldots,I_{i}][/math], namely, the expected chromatic number of the random graph after fixing the first [math]i[/math] edges. This process is called edges exposure of a random graph, as we "exposing" the edges one by one in a random graph.

It is nontrivial to formally verify that the edge exposure sequence for a random graph is a martingale. However, we will later see that this construction can be put into a more general context.

Generalization

The martingale can be generalized to be with respect to another sequence of random variables.

Definition (martingale, general version)
A sequence of random variables [math]Y_0,Y_1,\ldots[/math] is a martingale with respect to the sequence [math]X_0,X_1,\ldots[/math] if, for all [math]i\ge 0[/math], the following conditions hold:
  • [math]Y_i[/math] is a function of [math]X_0,X_1,\ldots,X_i[/math];
  • [math]\begin{align} \mathbf{E}[Y_{i+1}\mid X_0,\ldots,X_{i}]=Y_{i}. \end{align}[/math]

Therefore, a sequence [math]X_0,X_1,\ldots[/math] is a martingale if it is a martingale with respect to itself.

The purpose of this generalization is that we are usually more interested in a function of a sequence of random variables, rather than the sequence itself.

The following definition describes a very general approach for constructing an important type of martingales.

Definition (The Doob sequence)
The Doob sequence of a function [math]f[/math] with respect to a sequence of random variables [math]X_1,\ldots,X_n[/math] is defined by
[math] Y_i=\mathbf{E}[f(X_1,\ldots,X_n)\mid X_1,\ldots,X_{i}], \quad 0\le i\le n. [/math]
In particular, [math]Y_0=\mathbf{E}[f(X_1,\ldots,X_n)][/math] and [math]Y_n=f(X_1,\ldots,X_n)[/math].

The Doob sequence of a function defines a martingale. That is

[math] \mathbf{E}[Y_i\mid X_1,\ldots,X_{i-1}]=Y_{i-1}, [/math]

for any [math]0\le i\le n[/math].

To prove this claim, we recall the definition that [math]Y_i=\mathbf{E}[f(X_1,\ldots,X_n)\mid X_1,\ldots,X_{i}][/math], thus,

[math] \begin{align} \mathbf{E}[Y_i\mid X_1,\ldots,X_{i-1}] &=\mathbf{E}[\mathbf{E}[f(X_1,\ldots,X_n)\mid X_1,\ldots,X_{i}]\mid X_1,\ldots,X_{i-1}]\\ &=\mathbf{E}[f(X_1,\ldots,X_n)\mid X_1,\ldots,X_{i-1}]\\ &=Y_{i-1}, \end{align} [/math]

where the second equation is due to the fundamental fact about conditional expectation introduced in the first section.

The Doob martingale describes a very natural procedure to determine a function value of a sequence of random variables. Suppose that we want to predict the value of a function [math]f(X_1,\ldots,X_n)[/math] of random variables [math]X_1,\ldots,X_n[/math]. The Doob sequence [math]Y_0,Y_1,\ldots,Y_n[/math] represents a sequence of refined estimates of the value of [math]f(X_1,\ldots,X_n)[/math], gradually using more information on the values of the random variables [math]X_1,\ldots,X_n[/math]. The first element [math]Y_0[/math] is just the expectation of [math]f(X_1,\ldots,X_n)[/math]. Element [math]Y_i[/math] is the expected value of [math]f(X_1,\ldots,X_n)[/math] when the values of [math]X_1,\ldots,X_{i}[/math] are known, and [math]Y_n=f(X_1,\ldots,X_n)[/math] when [math]f(X_1,\ldots,X_n)[/math] is fully determined by [math]X_1,\ldots,X_n[/math].

The following two Doob martingales arise in evaluating the parameters of random graphs.

edge exposure martingale

Let [math]G[/math] be a random graph on [math]n[/math] vertices. Let [math]f[/math] be a real-valued function of graphs, such as, chromatic number, number of triangles, the size of the largest clique or independent set, etc. Denote that [math]m={n\choose 2}[/math]. Fix an arbitrary numbering of potential edges between the [math]n[/math] vertices, and denote the edges as [math]e_1,\ldots,e_m[/math]. Let
[math] X_i=\begin{cases} 1& \mbox{if }e_i\in G,\\ 0& \mbox{otherwise}. \end{cases} [/math]
Let [math]Y_0=\mathbf{E}[f(G)][/math] and for [math]i=1,\ldots,m[/math], let [math]Y_i=\mathbf{E}[f(G)\mid X_1,\ldots,X_i][/math].
The sequence [math]Y_0,Y_1,\ldots,Y_n[/math] gives a Doob martingale that is commonly called the edge exposure martingale.

vertex exposure martingale

Instead of revealing edges one at a time, we could reveal the set of edges connected to a given vertex, one vertex at a time. Suppose that the vertex set is [math][n][/math]. Let [math]X_i[/math] be the subgraph of [math]G[/math] induced by the vertex set [math][i][/math], i.e. the first [math]i[/math] vertices.
Let [math]Y_0=\mathbf{E}[f(G)][/math] and for [math]i=1,\ldots,n[/math], let [math]Y_i=\mathbf{E}[f(G)\mid X_1,\ldots,X_i][/math].
The sequence [math]Y_0,Y_1,\ldots,Y_n[/math] gives a Doob martingale that is commonly called the vertex exposure martingale.