高级算法 (Fall 2019)/Conditional expectations: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
Created page with "= Conditional Expectations = The '''conditional expectation''' of a random variable <math>Y</math> with respect to an event <math>\mathcal{E}</math> is defined by :<math> \mat..."
No edit summary
Line 46: Line 46:

The proposition holds in more general cases when <math>X, Y</math> and <math>Z</math> are a sequence of random variables.
The proposition holds in more general cases when <math>X, Y</math> and <math>Z</math> are a sequence of random variables.
= Martingales =
A '''martingale''' is a random sequence <math>X_0,X_1,\ldots</math> satisfying the following so-called ''martingale property''.
|Definition (martingale)|
:A sequence of random variables <math>X_0,X_1,\ldots</math> is a '''martingale''' if for all <math>i> 0</math>,
:: <math>\begin{align}
\mathbf{E}[X_{i}\mid X_0,\ldots,X_{i-1}]=X_{i-1}.
The martingale can be generalized to be with respect to another sequence of random variables.
|Definition (martingale, general version)|
:A sequence of random variables <math>Y_0,Y_1,\ldots</math> is a martingale with respect to the sequence <math>X_0,X_1,\ldots</math> if, for all <math>i\ge 0</math>, the following conditions hold:
:* <math>Y_i</math> is a function of <math>X_0,X_1,\ldots,X_i</math>;
:* <math>\begin{align}
\mathbf{E}[Y_{i+1}\mid X_0,\ldots,X_{i}]=Y_{i}.
Therefore, a sequence <math>X_0,X_1,\ldots</math> is a martingale if it is a martingale with respect to itself.
The purpose of this generalization is that we are usually more interested in a function of a sequence of random variables, rather than the sequence itself.

Revision as of 07:17, 14 October 2019

Conditional Expectations

The conditional expectation of a random variable [math]\displaystyle{ Y }[/math] with respect to an event [math]\displaystyle{ \mathcal{E} }[/math] is defined by

[math]\displaystyle{ \mathbf{E}[Y\mid \mathcal{E}]=\sum_{y}y\Pr[Y=y\mid\mathcal{E}]. }[/math]

In particular, if the event [math]\displaystyle{ \mathcal{E} }[/math] is [math]\displaystyle{ X=a }[/math], the conditional expectation

[math]\displaystyle{ \mathbf{E}[Y\mid X=a] }[/math]

defines a function

[math]\displaystyle{ f(a)=\mathbf{E}[Y\mid X=a]. }[/math]

Thus, [math]\displaystyle{ \mathbf{E}[Y\mid X] }[/math] can be regarded as a random variable [math]\displaystyle{ f(X) }[/math].

Suppose that we uniformly sample a human from all human beings. Let [math]\displaystyle{ Y }[/math] be his/her height, and let [math]\displaystyle{ X }[/math] be the country where he/she is from. For any country [math]\displaystyle{ a }[/math], [math]\displaystyle{ \mathbf{E}[Y\mid X=a] }[/math] gives the average height of that country. And [math]\displaystyle{ \mathbf{E}[Y\mid X] }[/math] is the random variable which can be defined in either ways:
  • We choose a human uniformly at random from all human beings, and [math]\displaystyle{ \mathbf{E}[Y\mid X] }[/math] is the average height of the country where he/she comes from.
  • We choose a country at random with a probability proportional to its population, and [math]\displaystyle{ \mathbf{E}[Y\mid X] }[/math] is the average height of the chosen country.

The following proposition states some fundamental facts about conditional expectation.

Proposition (fundamental facts about conditional expectation)
Let [math]\displaystyle{ X,Y }[/math] and [math]\displaystyle{ Z }[/math] be arbitrary random variables. Let [math]\displaystyle{ f }[/math] and [math]\displaystyle{ g }[/math] be arbitrary functions. Then
  1. [math]\displaystyle{ \mathbf{E}[X]=\mathbf{E}[\mathbf{E}[X\mid Y]] }[/math].
  2. [math]\displaystyle{ \mathbf{E}[X\mid Z]=\mathbf{E}[\mathbf{E}[X\mid Y,Z]\mid Z] }[/math].
  3. [math]\displaystyle{ \mathbf{E}[\mathbf{E}[f(X)g(X,Y)\mid X]]=\mathbf{E}[f(X)\cdot \mathbf{E}[g(X,Y)\mid X]] }[/math].

The proposition can be formally verified by computing these expectations. Although these equations look formal, the intuitive interpretations to them are very clear.

The first equation:

[math]\displaystyle{ \mathbf{E}[X]=\mathbf{E}[\mathbf{E}[X\mid Y]] }[/math]

says that there are two ways to compute an average. Suppose again that [math]\displaystyle{ X }[/math] is the height of a uniform random human and [math]\displaystyle{ Y }[/math] is the country where he/she is from. There are two ways to compute the average human height: one is to directly average over the heights of all humans; the other is that first compute the average height for each country, and then average over these heights weighted by the populations of the countries.

The second equation:

[math]\displaystyle{ \mathbf{E}[X\mid Z]=\mathbf{E}[\mathbf{E}[X\mid Y,Z]\mid Z] }[/math]

is the same as the first one, restricted to a particular subspace. As the previous example, inaddition to the height [math]\displaystyle{ X }[/math] and the country [math]\displaystyle{ Y }[/math], let [math]\displaystyle{ Z }[/math] be the gender of the individual. Thus, [math]\displaystyle{ \mathbf{E}[X\mid Z] }[/math] is the average height of a human being of a given sex. Again, this can be computed either directly or on a country-by-country basis.

The third equation:

[math]\displaystyle{ \mathbf{E}[\mathbf{E}[f(X)g(X,Y)\mid X]]=\mathbf{E}[f(X)\cdot \mathbf{E}[g(X,Y)\mid X]] }[/math].

looks obscure at the first glance, especially when considering that [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] are not necessarily independent. Nevertheless, the equation follows the simple fact that conditioning on any [math]\displaystyle{ X=a }[/math], the function value [math]\displaystyle{ f(X)=f(a) }[/math] becomes a constant, thus can be safely taken outside the expectation due to the linearity of expectation. For any value [math]\displaystyle{ X=a }[/math],

[math]\displaystyle{ \mathbf{E}[f(X)g(X,Y)\mid X=a]=\mathbf{E}[f(a)g(X,Y)\mid X=a]=f(a)\cdot \mathbf{E}[g(X,Y)\mid X=a]. }[/math]

The proposition holds in more general cases when [math]\displaystyle{ X, Y }[/math] and [math]\displaystyle{ Z }[/math] are a sequence of random variables.


A martingale is a random sequence [math]\displaystyle{ X_0,X_1,\ldots }[/math] satisfying the following so-called martingale property.

Definition (martingale)
A sequence of random variables [math]\displaystyle{ X_0,X_1,\ldots }[/math] is a martingale if for all [math]\displaystyle{ i\gt 0 }[/math],
[math]\displaystyle{ \begin{align} \mathbf{E}[X_{i}\mid X_0,\ldots,X_{i-1}]=X_{i-1}. \end{align} }[/math]

The martingale can be generalized to be with respect to another sequence of random variables.

Definition (martingale, general version)
A sequence of random variables [math]\displaystyle{ Y_0,Y_1,\ldots }[/math] is a martingale with respect to the sequence [math]\displaystyle{ X_0,X_1,\ldots }[/math] if, for all [math]\displaystyle{ i\ge 0 }[/math], the following conditions hold:
  • [math]\displaystyle{ Y_i }[/math] is a function of [math]\displaystyle{ X_0,X_1,\ldots,X_i }[/math];
  • [math]\displaystyle{ \begin{align} \mathbf{E}[Y_{i+1}\mid X_0,\ldots,X_{i}]=Y_{i}. \end{align} }[/math]

Therefore, a sequence [math]\displaystyle{ X_0,X_1,\ldots }[/math] is a martingale if it is a martingale with respect to itself.

The purpose of this generalization is that we are usually more interested in a function of a sequence of random variables, rather than the sequence itself.