Randomized Algorithms (Spring 2010)/Martingales

From TCS Wiki
Revision as of 05:28, 8 April 2010 by imported>WikiSysop (→‎Martingales and Azuma's Inequality)
Jump to navigation Jump to search

Martingales

Review of conditional probability

The conditional expectation of a random variable [math]\displaystyle{ Y }[/math] with respect to an event [math]\displaystyle{ \mathcal{E} }[/math] is defined by

[math]\displaystyle{ \mathbf{E}[Y\mid \mathcal{E}]=\sum_{y}y\Pr[Y=y\mid\mathcal{E}]. }[/math]

In particular, if the event [math]\displaystyle{ \mathcal{E} }[/math] is [math]\displaystyle{ X=a }[/math], the conditional expectation

[math]\displaystyle{ \mathbf{E}[Y\mid X=a] }[/math]

defines a function

[math]\displaystyle{ f(a)=\mathbf{E}[Y\mid X=a]. }[/math]

Thus, [math]\displaystyle{ \mathbf{E}[Y\mid X] }[/math] can be regarded as a random variable [math]\displaystyle{ f(X) }[/math].

Example
Suppose that we uniformly sample a human from all human beings. Let [math]\displaystyle{ Y }[/math] be his/her height, and let [math]\displaystyle{ X }[/math] be the country where he/she is from. For any country [math]\displaystyle{ a }[/math], [math]\displaystyle{ \mathbf{E}[Y\mid X=a] }[/math] gives the average height of that country. And [math]\displaystyle{ \mathbf{E}[Y\mid X] }[/math] is the random variable which can be defined in either ways:
  • We choose a human uniformly at random from all human beings, and [math]\displaystyle{ \mathbf{E}[Y\mid X] }[/math] is the average height of the country where he/she comes from.
  • We choose a country at random with a probability proportional to its population, and [math]\displaystyle{ \mathbf{E}[Y\mid X] }[/math] is the average height of the chosen country.

Martingales and Azuma's Inequality

Definition (martingale):
A sequence of random variables [math]\displaystyle{ X_0,X_1,\ldots }[/math] is a martingale if for all [math]\displaystyle{ i\gt 0 }[/math],
[math]\displaystyle{ \begin{align} \mathbf{E}[X_{i}\mid X_0,\ldots,X_{i-1}]=X_{i-1}. \end{align} }[/math]
Example 1
A fair coin is flipped for a number of times. Let [math]\displaystyle{ Z_j\in\{-1,1\} }[/math] denote the outcome of the [math]\displaystyle{ j }[/math]th flip. Let
[math]\displaystyle{ X_0=0\quad \mbox{ and } \quad X_i=\sum_{j\le i}Z_j }[/math].
The random variables [math]\displaystyle{ X_0,X_1,\ldots }[/math] defines a martingale.
Proof
We first observe that [math]\displaystyle{ \mathbf{E}[X_i\mid X_0,\ldots,X_{i-1}] = \mathbf{E}[X_i\mid X_{i-1}] }[/math], which intuitively says that the next number of HEADs depends only on the current number of HEADs. This property is also called the Markov property in statistic processes.
[math]\displaystyle{ \begin{align} \mathbf{E}[X_i\mid X_0,\ldots,X_{i-1}] &= \mathbf{E}[X_i\mid X_{i-1}]\\ &= \mathbf{E}[X_{i-1}+Z_{i}\mid X_{i-1}]\\ &= \mathbf{E}[X_{i-1}\mid X_{i-1}]+\mathbf{E}[Z_{i}\mid X_{i-1}]\\ &= X_{i-1}+\mathbf{E}[Z_{i}\mid X_{i-1}]\\ &= X_{i-1}+\mathbf{E}[Z_{i}] &\quad (\mbox{independence of coin flips})\\ &= X_{i-1} \end{align} }[/math]
Example 2
Consider an infinite grid. A random walk starts from the origin, and at each step moves to one of the four directions with equal probability. Let [math]\displaystyle{ X_i }[/math] be the distance from the origin, measured by [math]\displaystyle{ \ell_1 }[/math]-distance (the length of the shortest path on the grid). The sequence [math]\displaystyle{ X_0,X_1,\ldots }[/math] defines a martingale.
File:Gridwalk.png
The proof is almost the same as the previous one.
Azuma's Inequality:
Let [math]\displaystyle{ X_0,X_1,\ldots }[/math] be a martingale such that, for all [math]\displaystyle{ k\ge 1 }[/math],
[math]\displaystyle{ |X_{k}-X_{k-1}|\le c_k, }[/math]
Then
[math]\displaystyle{ \begin{align} \Pr\left[|X_n-X_0|\ge t\right]\le 2\exp\left(-\frac{t^2}{2\sum_{k=1}^nc_k^2}\right). \end{align} }[/math]


Corollary:
Let [math]\displaystyle{ X_0,X_1,\ldots }[/math] be a martingale such that, for all [math]\displaystyle{ k\ge 1 }[/math],
[math]\displaystyle{ |X_{k}-X_{k-1}|\le c, }[/math]
Then
[math]\displaystyle{ \begin{align} \Pr\left[|X_n-X_0|\ge ct\sqrt{n}\right]\le 2 e^{-t^2/2}. \end{align} }[/math]

Generalizations

Definition (martingale, general version):
A sequence of random variables [math]\displaystyle{ Y_0,Y_1,\ldots }[/math] is a martingale with respect to the sequence [math]\displaystyle{ X_0,X_1,\ldots }[/math] if, for all [math]\displaystyle{ i\ge 0 }[/math], the following conditions hold:
  • [math]\displaystyle{ Y_i }[/math] is a function of [math]\displaystyle{ X_0,X_1,\ldots,X_i }[/math];
  • [math]\displaystyle{ \begin{align} \mathbf{E}[Y_{i+1}\mid X_0,\ldots,X_{i}]=Y_{i}. \end{align} }[/math]

Therefore, a sequence [math]\displaystyle{ X_0,X_1,\ldots }[/math] is a martingale if it is a martingale with respect to itself.

Azuma's Inequality (general version):
Let [math]\displaystyle{ Y_0,Y_1,\ldots }[/math] be a martingale with respect to the sequence [math]\displaystyle{ X_0,X_1,\ldots }[/math] such that, for all [math]\displaystyle{ k\ge 1 }[/math],
[math]\displaystyle{ |Y_{k}-Y_{k-1}|\le c_k, }[/math]
Then
[math]\displaystyle{ \begin{align} \Pr\left[|Y_n-Y_0|\ge t\right]\le 2\exp\left(-\frac{t^2}{2\sum_{k=1}^nc_k^2}\right). \end{align} }[/math]


Definition (The Doob sequence):
The Doob sequence of a function [math]\displaystyle{ f }[/math] with respect to a sequence of random variables [math]\displaystyle{ X_1,\ldots,X_n }[/math] is defined by
[math]\displaystyle{ Y_i=\mathbf{E}[f(X_1,\ldots,X_n)\mid X_1,\ldots,X_{i}], \quad 0\le i\le n. }[/math]
In particular, [math]\displaystyle{ Y_0=\mathbf{E}[f(X_1,\ldots,X_n)] }[/math] and [math]\displaystyle{ Y_n=f(X_1,\ldots,X_n) }[/math].

The Method of Bounded Differences

For arbitrary random variables

Theorem (The method of averaged bounded differences):
Let [math]\displaystyle{ \boldsymbol{X}=(X_1,\ldots, X_n) }[/math] be arbitrary random variables and let [math]\displaystyle{ f }[/math] be a function of [math]\displaystyle{ X_0,X_1,\ldots, X_n }[/math] satisfying that, for all [math]\displaystyle{ 1\le i\le n }[/math],
[math]\displaystyle{ |\mathbf{E}[f(\boldsymbol{X})\mid X_1,\ldots,X_i]-\mathbf{E}[f(\boldsymbol{X})\mid X_1,\ldots,X_{i-1}]|\le c_i, }[/math]
Then
[math]\displaystyle{ \begin{align} \Pr\left[|f(\boldsymbol{X})-\mathbf{E}[f(\boldsymbol{X})]|\ge t\right]\le 2\exp\left(-\frac{t^2}{2\sum_{i=1}^nc_i^2}\right). \end{align} }[/math]

Define the Doob Martingale sequence [math]\displaystyle{ Y_0,Y_1,\ldots,Y_n }[/math] by setting [math]\displaystyle{ Y_0=\mathbf{E}[f(X_1,\ldots,X_n)] }[/math] and, for [math]\displaystyle{ 1\le i\le n }[/math], [math]\displaystyle{ Y_i=\mathbf{E}[f(X_1,\ldots,X_n)\mid X_1,\ldots,X_i] }[/math]. Then the above theorem is a restatement of the Azuma's inequality holding for [math]\displaystyle{ Y_0,Y_1,\ldots,Y_n }[/math].

For independent random variables

Definition (Lipschitz condition):
A function [math]\displaystyle{ f(x_1,\ldots,x_n) }[/math] satisfies the Lipschitz condition, if for any [math]\displaystyle{ x_1,\ldots,x_n }[/math] and any [math]\displaystyle{ y_i }[/math],
[math]\displaystyle{ \begin{align} |f(x_1,\ldots,x_{i-1},x_i,x_{i+1},\ldots,x_n)-f(x_1,\ldots,x_{i-1},y_i,x_{i+1},\ldots,x_n)|\le 1. \end{align} }[/math]

In other words, the function satisfies the Lipschitz condition if an arbitrary change in the value of any one argument does not change the value of the function by more than 1.

Definition (Lipschitz condition, general version):
A function [math]\displaystyle{ f(x_1,\ldots,x_n) }[/math] satisfies the Lipschitz condition with constants [math]\displaystyle{ c_i }[/math], [math]\displaystyle{ 1\le i\le n }[/math], if for any [math]\displaystyle{ x_1,\ldots,x_n }[/math] and any [math]\displaystyle{ y_i }[/math],
[math]\displaystyle{ \begin{align} |f(x_1,\ldots,x_{i-1},x_i,x_{i+1},\ldots,x_n)-f(x_1,\ldots,x_{i-1},y_i,x_{i+1},\ldots,x_n)|\le c_i. \end{align} }[/math]


Corollary (Method of bounded differences):
Let [math]\displaystyle{ \boldsymbol{X}=(X_1,\ldots, X_n) }[/math] be [math]\displaystyle{ n }[/math] independent random variables and let [math]\displaystyle{ f }[/math] be a function satisfying the Lipschitz condition with constants [math]\displaystyle{ c_i }[/math], [math]\displaystyle{ 1\le i\le n }[/math].
Then
[math]\displaystyle{ \begin{align} \Pr\left[|f(\boldsymbol{X})-\mathbf{E}[f(\boldsymbol{X})]|\ge t\right]\le 2\exp\left(-\frac{t^2}{2\sum_{i=1}^nc_i^2}\right). \end{align} }[/math]

Applications