# 随机算法 (Fall 2011)/Azuma's Inequality

We introduce a martingale tail inequality, called Azuma's inequality.

 Azuma's Inequality Let ${\displaystyle X_{0},X_{1},\ldots }$ be a martingale such that, for all ${\displaystyle k\geq 1}$, ${\displaystyle |X_{k}-X_{k-1}|\leq c_{k},}$ Then {\displaystyle {\begin{aligned}\Pr \left[|X_{n}-X_{0}|\geq t\right]\leq 2\exp \left(-{\frac {t^{2}}{2\sum _{k=1}^{n}c_{k}^{2}}}\right).\end{aligned}}}

Before formally proving this theorem, some comments are in order. First, unlike the Chernoff bounds, there is no assumption of independence. This shows the power of martingale inequalities.

Second, the condition that

${\displaystyle |X_{k}-X_{k-1}|\leq c_{k}}$

is central to the proof. This condition is sometimes called the bounded difference condition. If we think of the martingale ${\displaystyle X_{0},X_{1},\ldots }$ as a process evolving through time, where ${\displaystyle X_{i}}$ gives some measurement at time ${\displaystyle i}$, the bounded difference condition states that the process does not make big jumps. The Azuma's inequality says that if so, then it is unlikely that process wanders far from its starting point.

A special case is when the differences are bounded by a constant. The following corollary is directly implied by the Azuma's inequality.

 Corollary Let ${\displaystyle X_{0},X_{1},\ldots }$ be a martingale such that, for all ${\displaystyle k\geq 1}$, ${\displaystyle |X_{k}-X_{k-1}|\leq c,}$ Then {\displaystyle {\begin{aligned}\Pr \left[|X_{n}-X_{0}|\geq ct{\sqrt {n}}\right]\leq 2e^{-t^{2}/2}.\end{aligned}}}

This corollary states that for any martingale sequence whose diferences are bounded by a constant, the probability that it deviates ${\displaystyle \omega ({\sqrt {n}})}$ far away from the starting point after ${\displaystyle n}$ steps is bounded by ${\displaystyle o(1)}$.

The proof of Azuma's Inequality uses several ideas which are used in the proof of the Chernoff bounds. We first observe that the total deviation of the martingale sequence can be represented as the sum of deferences in every steps. Thus, as the Chernoff bounds, we are looking for a bound of the deviation of the sum of random variables. The strategy of the proof is almost the same as the proof of Chernoff bounds: we first apply Markov's inequality to the moment generating function, then we bound the moment generating function, and at last we optimize the parameter of the moment generating function. However, unlike the Chernoff bounds, the martingale differences are not independent any more. So we replace the use of the independence in the Chernoff bound by the martingale property. The proof is detailed as follows.

In order to bound the probability of ${\displaystyle |X_{n}-X_{0}|\geq t}$, we first bound the upper tail ${\displaystyle \Pr[X_{n}-X_{0}\geq t]}$. The bound of the lower tail can be symmetrically proved with the ${\displaystyle X_{i}}$ replaced by ${\displaystyle -X_{i}}$.

### Represent the deviation as the sum of differences

We define the martingale difference sequence: for ${\displaystyle i\geq 1}$, let

${\displaystyle Y_{i}=X_{i}-X_{i-1}.}$

It holds that

{\displaystyle {\begin{aligned}\mathbf {E} [Y_{i}\mid X_{0},\ldots ,X_{i-1}]&=\mathbf {E} [X_{i}-X_{i-1}\mid X_{0},\ldots ,X_{i-1}]\\&=\mathbf {E} [X_{i}\mid X_{0},\ldots ,X_{i-1}]-\mathbf {E} [X_{i-1}\mid X_{0},\ldots ,X_{i-1}]\\&=X_{i-1}-X_{i-1}\\&=0.\end{aligned}}}

The second to the last equation is due to the fact that ${\displaystyle X_{0},X_{1},\ldots }$ is a martingale and the definition of conditional expectation.

Let ${\displaystyle Z_{n}}$ be the accumulated differences

${\displaystyle Z_{n}=\sum _{i=1}^{n}Y_{i}.}$

The deviation ${\displaystyle (X_{n}-X_{0})}$ can be computed by the accumulated differences:

{\displaystyle {\begin{aligned}X_{n}-X_{0}&=(X_{1}-X_{0})+(X_{2}-X_{1})+\cdots +(X_{n}-X_{n-1})\\&=\sum _{i=1}^{n}Y_{i}\\&=Z_{n}.\end{aligned}}}

We then only need to upper bound the probability of the event ${\displaystyle Z_{n}\geq t}$.

### Apply Markov's inequality to the moment generating function

The event ${\displaystyle Z_{n}\geq t}$ is equivalent to that ${\displaystyle e^{\lambda Z_{n}}\geq e^{\lambda t}}$ for ${\displaystyle \lambda >0}$. Apply Markov's inequality, we have

{\displaystyle {\begin{aligned}\Pr \left[Z_{n}\geq t\right]&=\Pr \left[e^{\lambda Z_{n}}\geq e^{\lambda t}\right]\\&\leq {\frac {\mathbf {E} \left[e^{\lambda Z_{n}}\right]}{e^{\lambda t}}}.\end{aligned}}}

This is exactly the same as what we did to prove the Chernoff bound. Next, we need to bound the moment generating function ${\displaystyle \mathbf {E} \left[e^{\lambda Z_{n}}\right]}$.

### Bound the moment generating functions

The moment generating function

{\displaystyle {\begin{aligned}\mathbf {E} \left[e^{\lambda Z_{n}}\right]&=\mathbf {E} \left[\mathbf {E} \left[e^{\lambda Z_{n}}\mid X_{0},\ldots ,X_{n-1}\right]\right]\\&=\mathbf {E} \left[\mathbf {E} \left[e^{\lambda (Z_{n-1}+Y_{n})}\mid X_{0},\ldots ,X_{n-1}\right]\right]\\&=\mathbf {E} \left[\mathbf {E} \left[e^{\lambda Z_{n-1}}\cdot e^{\lambda Y_{n}}\mid X_{0},\ldots ,X_{n-1}\right]\right]\\&=\mathbf {E} \left[e^{\lambda Z_{n-1}}\cdot \mathbf {E} \left[e^{\lambda Y_{n}}\mid X_{0},\ldots ,X_{n-1}\right]\right]\end{aligned}}}

The first and the last equations are due to the fundamental facts about conditional expectation which are proved by us in the first section.

We then upper bound the ${\displaystyle \mathbf {E} \left[e^{\lambda Y_{n}}\mid X_{0},\ldots ,X_{n-1}\right]}$ by a constant. To do so, we need the following technical lemma which is proved by the convexity of ${\displaystyle e^{\lambda Y_{n}}}$.

 Lemma Let ${\displaystyle X}$ be a random variable such that ${\displaystyle \mathbf {E} [X]=0}$ and ${\displaystyle |X|\leq c}$. Then for ${\displaystyle \lambda >0}$, ${\displaystyle \mathbf {E} [e^{\lambda X}]\leq e^{\lambda ^{2}c^{2}/2}.}$
Proof.
 Observe that for ${\displaystyle \lambda >0}$, the function ${\displaystyle e^{\lambda X}}$ of the variable ${\displaystyle X}$ is convex in the interval ${\displaystyle [-c,c]}$. We draw a line between the two endpoints points ${\displaystyle (-c,e^{-\lambda c})}$ and ${\displaystyle (c,e^{\lambda c})}$. The curve of ${\displaystyle e^{\lambda X}}$ lies entirely below this line. Thus, {\displaystyle {\begin{aligned}e^{\lambda X}&\leq {\frac {c-X}{2c}}e^{-\lambda c}+{\frac {c+X}{2c}}e^{\lambda c}\\&={\frac {e^{\lambda c}+e^{-\lambda c}}{2}}+{\frac {X}{2c}}(e^{\lambda c}-e^{-\lambda c}).\end{aligned}}} Since ${\displaystyle \mathbf {E} [X]=0}$, we have {\displaystyle {\begin{aligned}\mathbf {E} [e^{\lambda X}]&\leq \mathbf {E} [{\frac {e^{\lambda c}+e^{-\lambda c}}{2}}+{\frac {X}{2c}}(e^{\lambda c}-e^{-\lambda c})]\\&={\frac {e^{\lambda c}+e^{-\lambda c}}{2}}+{\frac {e^{\lambda c}-e^{-\lambda c}}{2c}}\mathbf {E} [X]\\&={\frac {e^{\lambda c}+e^{-\lambda c}}{2}}.\end{aligned}}} By expanding both sides as Taylor's series, it can be verified that ${\displaystyle {\frac {e^{\lambda c}+e^{-\lambda c}}{2}}\leq e^{\lambda ^{2}c^{2}/2}}$.
${\displaystyle \square }$

Apply the above lemma to the random variable

${\displaystyle (Y_{n}\mid X_{0},\ldots ,X_{n-1})}$

We have already shown that its expectation ${\displaystyle \mathbf {E} [(Y_{n}\mid X_{0},\ldots ,X_{n-1})]=0,}$ and by the bounded difference condition of Azuma's inequality, we have ${\displaystyle |Y_{n}|=|(X_{n}-X_{n-1})|\leq c_{n}.}$ Thus, due to the above lemma, it holds that

${\displaystyle \mathbf {E} [e^{\lambda Y_{n}}\mid X_{0},\ldots ,X_{n-1}]\leq e^{\lambda ^{2}c_{n}^{2}/2}.}$

Back to our analysis of the expectation ${\displaystyle \mathbf {E} \left[e^{\lambda Z_{n}}\right]}$, we have

{\displaystyle {\begin{aligned}\mathbf {E} \left[e^{\lambda Z_{n}}\right]&=\mathbf {E} \left[e^{\lambda Z_{n-1}}\cdot \mathbf {E} \left[e^{\lambda Y_{n}}\mid X_{0},\ldots ,X_{n-1}\right]\right]\\&\leq \mathbf {E} \left[e^{\lambda Z_{n-1}}\cdot e^{\lambda ^{2}c_{n}^{2}/2}\right]\\&=e^{\lambda ^{2}c_{n}^{2}/2}\cdot \mathbf {E} \left[e^{\lambda Z_{n-1}}\right].\end{aligned}}}

Apply the same analysis to ${\displaystyle \mathbf {E} \left[e^{\lambda Z_{n-1}}\right]}$, we can solve the above recursion by

{\displaystyle {\begin{aligned}\mathbf {E} \left[e^{\lambda Z_{n}}\right]&\leq \prod _{k=1}^{n}e^{\lambda ^{2}c_{k}^{2}/2}\\&=\exp \left(\lambda ^{2}\sum _{k=1}^{n}c_{k}^{2}/2\right).\end{aligned}}}

Go back to the Markov's inequality,

{\displaystyle {\begin{aligned}\Pr \left[Z_{n}\geq t\right]&\leq {\frac {\mathbf {E} \left[e^{\lambda Z_{n}}\right]}{e^{\lambda t}}}\\&\leq \exp \left(\lambda ^{2}\sum _{k=1}^{n}c_{k}^{2}/2-\lambda t\right).\end{aligned}}}

We then only need to choose a proper ${\displaystyle \lambda >0}$.

### Optimization

By choosing ${\displaystyle \lambda ={\frac {t}{\sum _{k=1}^{n}c_{k}^{2}}}}$, we have that

${\displaystyle \exp \left(\lambda ^{2}\sum _{k=1}^{n}c_{k}^{2}/2-\lambda t\right)=\exp \left(-{\frac {t^{2}}{2\sum _{k=1}^{n}c_{k}^{2}}}\right).}$

Thus, the probability

{\displaystyle {\begin{aligned}\Pr \left[X_{n}-X_{0}\geq t\right]&=\Pr \left[Z_{n}\geq t\right]\\&\leq \exp \left(\lambda ^{2}\sum _{k=1}^{n}c_{k}^{2}/2-\lambda t\right)\\&=\exp \left(-{\frac {t^{2}}{2\sum _{k=1}^{n}c_{k}^{2}}}\right).\end{aligned}}}

The upper tail of Azuma's inequality is proved. By replacing ${\displaystyle X_{i}}$ by ${\displaystyle -X_{i}}$, the lower tail can be treated just as the upper tail. Applying the union bound, Azuma's inequality is proved.