随机算法 (Fall 2011)/Coupling and Graham's number: Difference between pages

From TCS Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>WikiSysop
m (Protected "随机算法 (Fall 2011)/Coupling" ([edit=sysop] (indefinite) [move=sysop] (indefinite)))
 
imported>Zabshk
(Reverted vandalism.)
 
Line 1: Line 1:
=Coupling of Two Distributions=
{{Orphan|date=December 2010}}
Coupling is a powerful proof technique in probability theory. It allows us to compare two unrelated variables (or processes) by forcing them to share some randomness.
'''Graham's number''' is a very, very big [[natural number]] that was defined by a man named Ronald Graham. Graham was solving a problem in an area of mathematics called [[Ramsey theory]]. He proved that the answer to his problem was smaller than Graham's number.
{{Theorem|Definition (coupling)|
:Let <math>p</math> and <math>q</math> be two probability distributions over <math>\Omega</math>. A probability distribution <math>\mu</math> over <math>\Omega\times\Omega</math> is said to be a '''coupling''' if its marginal distributions are <math>p</math> and <math>q</math>; that is
::<math>
\begin{align}
p(x) &=\sum_{y\in\Omega}\mu(x,y);\\
q(x) &=\sum_{y\in\Omega}\mu(y,x).
\end{align}
</math>
}}


The interesting case is when <math>X</math> and <math>Y</math> are not independent. In particular, we want the probability that <math>X=Y</math> to be as large as possible, yet still maintaining the respective marginal distributions <math>p</math> and <math>q</math>. When <math>p</math> and <math>q</math> are different it is inevitable that <math>X\neq Y</math> with some probability, but how small this probability can be, that is, how well two distributions can be coupled? The following coupling lemma states that this is determined by the total variation distance between the two distributions.
Graham's number is one of the biggest numbers ever used in a [[mathematical proof]]. Even if every digit in Graham's number were written in the tiniest writing possible, it would still be too big to fit in the [[observable universe]].


{{Theorem|Lemma (coupling lemma)|
==Context==
:Let <math>p</math> and <math>q</math> be two probability distributions over <math>\Omega</math>.
# For any coupling <math>(X,Y)</math> of <math>p</math> and <math>q</math>, it holds that <math>\Pr[X\neq Y]\ge\|p-q\|_{TV}</math>.
# There exists a coupling <math>(X,Y)</math> of <math>p</math> and <math>q</math> such that <math>\Pr[X\neq Y]=\|p-q\|_{TV}</math>.
}}
{{Proof|
;1.
Suppose <math>(X,Y)</math> follows the distribution <math>\mu</math> over <math>\Omega\times\Omega</math>. Due to the definition of coupling,
:<math>
\begin{align}
p(x) =\sum_{y\in\Omega}\mu(x,y) &\quad\text{and } &q(x) =\sum_{y\in\Omega}\mu(y,x).
\end{align}
</math>
Then for any <math>z\in\Omega</math>, <math>\mu(z,z)\le\min\{p(z),q(z)\}</math>, thus
:<math>
\begin{align}
\Pr[X=Y]
&=
\sum_{z\in\Omega}\mu(z,z)
\le\sum_{z\in\Omega}\min\{p(z),q(z)\}.
\end{align}
</math>
Therefore,
:<math>
\begin{align}
\Pr[X\neq Y]
&\ge
1-\sum_{z\in\Omega}\min\{p(z),q(z)\}\\
&=\sum_{z\in\Omega}(p(z)-\min\{p(z),q(z)\})\\
&=\sum_{z\in\Omega\atop p(z)>q(z)}(p(z)-q(z))\\
&=\frac{1}{2}\sum_{z\in\Omega}|p(z)-q(z)|\\
&=\|p-q\|_{TV}.
\end{align}
</math>
;2.
We can couple <math>X</math> and <math>Y</math> so that for each <math>z\in\Omega</math>, <math>X=Y=z</math> with probability <math>\min\{p(z),q(z)\}</math>. The remaining follows as above.
}}


The proof is depicted by the following figure, where the curves are the probability density functions for the two distributions <math>p</math> and <math>q</math>, the colored area is the diference between the two distribution, and the "lower envelope" (the red line) is the <math>\min\{p(x), q(x)\}</math>.
Ramsey theory is an area of mathematics that asks questions like the following:


:[[image:Coupling.png|350px]]
{{quote|<p>Suppose we draw some number of points, and connect every pair of points by a line. Some lines are blue and some are red. Can we always find 3 points for which the 3 lines connecting them are all the same color?}}


===Monotonicity of <math>\Delta_x(t)</math>===
It turns out that for this simple problem, the answer is "yes" when we have 6 or more points, no matter how the lines are colored. But when we have 5 points or fewer, we can color the lines so that the answer is "no".


Consider a Markov chain on state space <math>\Omega</math>. Let <math>\pi</math> be the stationary distribution, and <math>p_x^{(t)}</math> be the distribution after <math>t</math> steps when the initial state is <math>x</math>. Recall that <math>\Delta_x(t)=\|p_x^{(t)}-\pi\|_{TV}</math> is the distance to stationary distribution <math>\pi</math> after <math>t</math> steps, started at state <math>x</math>.
Graham's number comes from a variation on this question.


We will show that <math>\Delta_x(t)</math> is non-decreasing in <math>t</math>. That is, for a Markov chain, the variation distance to the stationary distribution monotonically decreases as time passes. Although this is rather intuitive at the first glance, the proof is nontrivial. A brute force (algebraic) proof can be obtained by analyze the change to the 1-norm of <math>p_x^{(t)}-\pi</math> by multiplying the transition matrix <math>P</math>. The analysis uses eigen decomposition. We introduce a cleverer proof using coupling.
{{quote|<p>Once again, say we have some points, but now they are the corners of an ''n''-dimensional [[hypercube]]. They are still all connected by blue and red lines. For any 4 points, there are 6 lines connecting them. Can we find 4 points that all lie on one [[Plane (mathematics)|plane]], and the 6 lines connecting them are all the same color?}}


{{Theorem|Proposition|
By asking that the 4 points lie on a plane, we have made the problem much harder. We would like to know: for what values of ''n'' is the answer "no" (for some way of coloring the lines), and for what values of ''n'' is it "yes" (for all ways of coloring the lines)? But this problem has not been completely solved yet.
:<math>\Delta_x(t)</math> is non-decreasing in <math>t</math>.
}}
{{Proof|
Let <math>X_0=x</math> and <math>Y_0</math> follow the stationary distribution <math>\pi</math>. Two chains <math>X_0,X_1,X_2,\ldots</math> and <math>Y_0,Y_1,Y_2,\ldots</math> can be defined by the initial distributions <math>X_0</math> and <math>Y_0</math>. The distributions of <math>X_t</math> and <math>Y_t</math> are <math>p_x^{(t)}</math> and <math>\pi</math>, respectively.


For any fixed <math>t</math>, we can couple <math>X_t</math> and <math>Y_t</math> such that <math>\Pr[X\neq Y]=\|p_x^{t}-\pi\|_{TV}=\Delta_x(t)</math>. Due to the Coupling Lemma, such coupling exists. We then define a coupling between <math>X_{t+1}</math> and <math>Y_{t+1}</math> as follows.
In 1971, Ronald Graham and B. L. Rothschild found a partial answer to this problem. They showed that for ''n''=6, the answer is "no". But when ''n'' is very large, as large as Graham's number or larger, the answer is "yes".
* If <math>X_t=Y_t</math>, then <math>X_{t+1}=Y_{t+1}</math> following the transition matrix of the Markov chain.
* Otherwise, do the transitions <math>X_t\rightarrow X_{t+1}</math> and <math>Y_t\rightarrow Y_{t+1}</math> independently, following the transitin matrix.
It is easy to see that the marginal distributions of <math>X_{t+1}</math> and <math>Y_{t+1}</math> both follow the original Markov chain, and
:<math>
\Pr[X_{t+1}\neq Y_{t+1}]\le \Pr[X_t\neq Y_{t}].
</math>
In conclusion, it holds that
:<math>
\Delta_x(t+1)=\|p_x^{(t+1)}-\pi\|_{TV}\le\Pr[X_{t+1}\neq Y_{t+1}]\le \Pr[X_t\neq Y_{t}]=\Delta_x(t),
</math>
where the first inequality is due to the easy direction of the Coupling Lemma.
}}


=Rapid Mixing by Coupling=
One of the reasons this partial answer is important is that it means that the answer is eventually "yes" for at least some large ''n''. Before 1971, we didn't know even that much.
Consider an ergodic (irreducible and aperiodic) Markov chain on state space <math>\Omega</math>. We want to upper bound its mixing time. The coupling technique for bounding the mixing time can be sumerized as follows:
:Consider two random walks starting at state <math>x</math> and <math>y</math> in <math>\Omega</math>. Each individual walk follows the transition rule of the original Markov chain. But the two random walks may be "''coupled''" in some way, that is, may be correlated with each other. A key observation is that for any such coupling, it always holds that
::<math>\Delta(t)
\le\max_{x,y\in\Omega}\Pr[\text{ the two } \mathit{coupled} \text{ random walks started at }x,y\text{ have not met by time }t]
</math>
:where we recall that <math>\Delta(t)=\max_{x\in\Omega}\|p_x^{(t)}-\pi\|_{TV}</math>.


{{Theorem|Definition (coupling of Markov chain)|
==Definition==
:A '''coupling''' of a Markov chain with state space <math>\Omega</math> and transition matrix <math>P</math>, is a Markov chain <math>(X_t,Y_t)</math> with state space <math>\Omega\times\Omega</math>, satisfying
# each of <math>X_t</math> and <math>Y_t</math> viewed in isolation is a faithful copy of the original Markov chain, i.e.
#:<math>\Pr[X_{t+1}=v\mid X_t=u]=\Pr[Y_{t+1}=v\mid Y_{t}=u]=P(u,v)</math>;
# once <math>X_t</math> and <math>Y_t</math> reaches the same state, they make identical moves ever since, i.e.
#:<math>X_{t+1}=Y_{t+1}</math> if <math>X_t=Y_t</math>.
}}


{{Theorem|Lemma (coupling lemma for Markov chains)|
Graham's number is not only too big to write down all of its digits, it is too big even to write in [[scientific notation]]. In order to be able to write it down, we have to use [[Knuth's up-arrow notation]].
:Let <math>(X_t,Y_t)</math> be a coupling of a Markov chain <math>M</math> with state space <math>\Omega</math>. Then <math>\Delta(t)</math> for <math>M</math> is bounded by
::<math>\Delta(t)
\le\max_{x,y\in\Omega}\Pr[X_t\neq Y_t\mid X_0=x,Y_0=y]</math>.
}}
{{Proof|
Due to the coupling lemma (for probability distributions),  
:<math>
\Pr[X_t\neq Y_t\mid X_0=x,Y_0=y]
\ge
\|p_x^{(t)}-p_y^{(t)}\|_{TV},
</math>
where <math>p_x^{(t)},p_y^{(t)}</math> are distributions of the Markov chain <math>M</math> at time <math>t</math>, started at states <math>x, y</math>.


Therefore,
We will write down a [[sequence]] of numbers that we will call '''g1''', '''g2''', '''g3''', and so on. Each one will be used in an equation to find the next. '''g64 is''' Graham's number.
:<math>
\begin{align}
\Delta(t)
&=
\max_{x\in\Omega}\|p_x^{(t)}-\pi\|_{TV}\\
&\le
\max_{x,y\in\Omega}\|p_x^{(t)}-p_y^{(t)}\|_{TV}\\
&\le
\max_{x,y\in\Omega}\Pr[X_t\neq Y_t\mid X_0=x,Y_0=y].
\end{align}
</math>
}}


First, here are some examples of up-arrows:


{{Theorem|Corollary|
* <math>3\uparrow3</math> is 3x3x3 which equals 27. An arrow between two numbers just means the first number multiplied by itself the second number of times.
:Let <math>(X_t,Y_t)</math> be a coupling of a Markov chain <math>M</math> with state space <math>\Omega</math>. Then <math>\tau(\epsilon)</math> for <math>M</math> is bounded as follows:
* You can think of <math>3 \uparrow \uparrow 3</math> as <math>3 \uparrow (3 \uparrow 3)</math> because two arrows between numbers A and B just means A written down a B number of times with an arrow in between each A. Because we know what single arrows are, <math>3\uparrow(3\uparrow3)</math> is 3 multiplied by itself <math>3\uparrow3</math> times and we know <math>3\uparrow3
::<math>\max_{x,y\in\Omega}\Pr[X_t\neq Y_t\mid X_0=x,Y_0=y]\le \epsilon\quad\Longrightarrow\quad\tau(\epsilon)\le t</math>.
</math> is twenty-seven. So <math>3\uparrow\uparrow3</math> is 3x3x3x3x....x3x3, in total 27 times. That equals 7625597484987.  
}}
* <math>3 \uparrow \uparrow \uparrow 3</math> is <math>3 \uparrow \uparrow (3 \uparrow \uparrow 3)</math> and we know <math>3\uparrow\uparrow3</math> is 7625597484987. So <math>3\uparrow\uparrow(3\uparrow\uparrow3)</math> is <math>3\uparrow \uparrow 7625597484987</math>. That can also be written as <math>3\uparrow(3\uparrow(3\uparrow(3\uparrow . . .(3\uparrow(3\uparrow(3\uparrow3)</math> with a total of 7625597484987 3s. This number is so huge, its digits, even written very small, could fill up the observable universe and beyond.
** Although this number may already be beyond comprehension, this is barely the start of  this giant number.
* The next step like this is <math>3 \uparrow \uparrow \uparrow \uparrow 3</math> or <math>3 \uparrow \uparrow \uparrow (3 \uparrow \uparrow \uparrow 3)</math>. This is the number we will call '''g1'''.


=== Geometric convergence ===
After that, '''g2''' is equal to <math>3\uparrow \uparrow \uparrow \uparrow \ldots \uparrow \uparrow \uparrow \uparrow 3</math>; the number of arrows in this number is '''g1'''.
Consider a Markov chain on state space <math>\Omega</math>. Let <math>\pi</math> be the stationary distribution, and <math>p_x^{(t)}</math> be the distribution after <math>t</math> steps when the initial state is <math>x</math>. Recall that
*<math>\Delta_x(t)=\|p_x^{(t)}-\pi\|_{TV}</math>;
*<math>\Delta(t)=\max_{x\in\Omega}\Delta_x(t)</math>;
*<math>\tau_x(\epsilon)=\min\{t\mid\Delta_x(t)\le\epsilon\}</math>;
* <math>\tau(\epsilon)=\max_{x\in\Omega}\tau_x(\epsilon)</math>;
*<math>\tau_{\mathrm{mix}}=\tau(1/2\mathrm{e})\,</math>.


We prove that
'''g3''' is equal to <math>3\uparrow \uparrow \uparrow \uparrow \uparrow \ldots \uparrow \uparrow \uparrow \uparrow \uparrow 3</math>, where the number of arrows is '''g2'''.
{{Theorem|Proposition|
# <math>\Delta(k\cdot\tau_{\mathrm{mix}})\le \mathrm{e}^{-k}</math> for any integer <math>k\ge1</math>.
#<math>\tau(\epsilon)\le\tau_{\mathrm{mix}}\cdot\left\lceil\ln\frac{1}{\epsilon}\right\rceil</math>.
}}
{{Proof|
;1.
We denote that <math>\tau=\tau_{\mathrm{mix}}\,</math>. We construct a coupling of the Markov chain as follows. Suppose <math>t=k\tau</math> for some integer <math>k</math>.
* If <math>X_t=Y_t</math> then <math>X_{t+i}=Y_{t+i}</math> for <math>i=1,2,\ldots,\tau</math>.
* For the case that <math>X_t=u, Y_t=v</math> for some <math>u\neq v</math>, we couple the <math>X_{t+\tau}</math> and <math>Y_{t+\tau}</math> so that <math>\Pr[X_{t+\tau}\neq Y_{t+\tau}\mid X_t=u,Y_t=v]=\|p_u^{(t)}-p_v^{(t)}\|_{TV}</math>. Due to the Coupling Lemma, such coupling does exist.


For any <math>u,v\in\Omega</math> that <math>u\neq v</math>,
We keep going in this way. We stop when we define '''g64''' to be <math>3\uparrow \uparrow \uparrow \uparrow \uparrow \ldots \uparrow \uparrow \uparrow \uparrow \uparrow 3</math>, where the number of arrows is '''g63'''.
:<math>\begin{align}
\Pr[X_{t+\tau}\neq Y_{t+\tau}\mid X_t=u,Y_t=v]
&=\|p_u^{(t)}-p_v^{(t)}\|_{TV}\\
&\le \|p_u^{(\tau)}-\pi\|_{TV}+\|p_v^{(\tau)}-\pi\|_{TV}\\
&= \Delta_u(\tau)+\Delta_v(\tau)\\
&\le \frac{1}{\mathrm{e}}.
\end{align}
</math>
Thus <math>\Pr[X_{t+\tau}\neq Y_{t+\tau}\mid X_t\neq Y_t]\le \frac{1}{\mathrm{e}}</math> by the law of total probability. Therefore, for any <math>x,y\in\Omega</math>,
:<math>
\begin{align}
&\quad\, \Pr[X_{t+\tau}\neq Y_{t+\tau}\mid X_0=x,Y_0=y]\\
&=
\Pr[X_{t+\tau}\neq Y_{t+\tau}\mid X_t\neq Y_t]\cdot \Pr[X_{t}\neq Y_{t}\mid X_0=x,Y_0=y]\\
&\le \frac{1}{\mathrm{e}}\Pr[X_{t}\neq Y_{t}\mid X_0=x,Y_0=y].
\end{align}
</math>
Then by induction,
:<math>
\Pr[X_{k\tau}\neq Y_{k\tau}\mid X_0=x,Y_0=y]\le \mathrm{e}^{-k},
</math>
and this holds for any <math>x,y\in\Omega</math>. By the Coupling Lemma for Markov chains, this means <math>\Delta(k\tau_{\mathrm{mix}})=\Delta(k\tau)\le\mathrm{e}^{-k}</math>.


;2.
This is Graham's number.
By the definition of <math>\tau(\epsilon)</math>, the inequality is straightforwardly implied by (1).
}}


=Random Walk on the Hypercube=
==Related pages==
A <math>n</math>-dimensional [http://en.wikipedia.org/wiki/Hypercube '''hypercube'''] is a graph on vertex set <math>\{0,1\}^n</math>. For any <math>x,y\in\{0,1\}^n</math>, there is an edge between <math>x</math> and <math>y</math> if <math>x</math> and <math>y</math> differ at exact one coordinate (the hamming distance between <math>x</math> and <math>y</math> is 1).
* [[Knuth's up-arrow notation]]


We use coupling to bound the mixing time of the following simple random walk on a hypercube.
[[Category:Mathematics]]
{{Theorem|Random Walk on Hypercube|
[[Category:Hyperoperations]]
: At each step, for the current state <math>x\in\{0,1\}^n</math>:
[[Category:Integers]]
# with probability <math>1/2</math>, do nothing;
# else, pick a coordinate <math>i\in\{1,2,\ldots,n\}</math> uniformly at random and flip the bit <math>x_i</math> (change <math>x_i</math> to <math>1-x_i</math>).
}}
 
This is a lazy uniform random walk in an <math>n</math>-dimensional hypercube. It is equivalent to the following random walk.
 
{{Theorem|Random Walk on Hypercube|
: At each step, for the current state <math>x\in\{0,1\}^n</math>:
# pick a coordinate <math>i\in\{1,2,\ldots,n\}</math> uniformly at random and a bit <math>b\in\{0,1\}</math> uniformly at random;
# let <math>x_i=b</math>.
}}
 
It is easy to see the two random walks are the same. The second form hint us to couple the random choice of <math>i</math> and <math>b</math> in each step. Consider two copies of the random walk <math>X_t</math> and <math>Y_t</math>, started from arbitrary two states <math>X_0</math> and <math>Y_0</math>, the coupling rule is:
* At each step, <math>X_t</math> and <math>Y_t</math> choose the same (uniformly random) coordinate <math>i</math> and the same (uniformly random) bit <math>b</math>.
Obviously the two individual random walks are both faithful copies of the original walk.
 
For arbitrary <math>x,y\in\{0,1\}^n</math>, started at <math>X_0=x, Y_0=y</math>, the event <math>X_t=Y_t</math> occurs if everyone of the coordinates on which <math>x</math> and <math>y</math> disagree has been picked at least once. In the worst case, <math>x</math> and <math>y</math> disagree on all <math>n</math> coordinates. The time <math>T</math> that all <math>n</math> coordinates have been picked follows the coupon collector problem collecting <math>n</math> coupons, and we know from the coupon collector problem that for <math>c>0</math>,
:<math>
\Pr[T\ge n\ln n+cn]\le \mathrm{e}^{-c}.
</math>
Thus for any <math>x,y\in\{0,1\}^n</math>, if <math>t\ge n\ln n+cn</math>, then <math>\Pr[X_t\neq Y_t\mid X_0=x,Y_0=y]\le \mathrm{e}^{-c}</math>. Due to the coupling lemma for Markov chains,
:<math>\Delta(n\ln n+cn)\le \mathrm{e}^{-c},</math>
which implies that
:<math>\tau(\epsilon)=n\ln n+n\ln\frac{1}{\epsilon}</math>.
So the random walk achieves a polynomially small variation distance <math>\epsilon=\frac{1}{\mathrm{poly}(n)}</math> to the stationary distribution in time <math>O(n\ln n)</math>.
 
Note that the number of states (vertices in the <math>n</math>-dimensional hypercube) is <math>2^n</math>. Therefore, the mixing time is exponentially small compared to the size of the state space.

Latest revision as of 20:06, 6 April 2017

Template:Orphan Graham's number is a very, very big natural number that was defined by a man named Ronald Graham. Graham was solving a problem in an area of mathematics called Ramsey theory. He proved that the answer to his problem was smaller than Graham's number.

Graham's number is one of the biggest numbers ever used in a mathematical proof. Even if every digit in Graham's number were written in the tiniest writing possible, it would still be too big to fit in the observable universe.

Context

Ramsey theory is an area of mathematics that asks questions like the following:

Template:Quote

It turns out that for this simple problem, the answer is "yes" when we have 6 or more points, no matter how the lines are colored. But when we have 5 points or fewer, we can color the lines so that the answer is "no".

Graham's number comes from a variation on this question.

Template:Quote

By asking that the 4 points lie on a plane, we have made the problem much harder. We would like to know: for what values of n is the answer "no" (for some way of coloring the lines), and for what values of n is it "yes" (for all ways of coloring the lines)? But this problem has not been completely solved yet.

In 1971, Ronald Graham and B. L. Rothschild found a partial answer to this problem. They showed that for n=6, the answer is "no". But when n is very large, as large as Graham's number or larger, the answer is "yes".

One of the reasons this partial answer is important is that it means that the answer is eventually "yes" for at least some large n. Before 1971, we didn't know even that much.

Definition

Graham's number is not only too big to write down all of its digits, it is too big even to write in scientific notation. In order to be able to write it down, we have to use Knuth's up-arrow notation.

We will write down a sequence of numbers that we will call g1, g2, g3, and so on. Each one will be used in an equation to find the next. g64 is Graham's number.

First, here are some examples of up-arrows:

  • [math]\displaystyle{ 3\uparrow3 }[/math] is 3x3x3 which equals 27. An arrow between two numbers just means the first number multiplied by itself the second number of times.
  • You can think of [math]\displaystyle{ 3 \uparrow \uparrow 3 }[/math] as [math]\displaystyle{ 3 \uparrow (3 \uparrow 3) }[/math] because two arrows between numbers A and B just means A written down a B number of times with an arrow in between each A. Because we know what single arrows are, [math]\displaystyle{ 3\uparrow(3\uparrow3) }[/math] is 3 multiplied by itself [math]\displaystyle{ 3\uparrow3 }[/math] times and we know [math]\displaystyle{ 3\uparrow3 }[/math] is twenty-seven. So [math]\displaystyle{ 3\uparrow\uparrow3 }[/math] is 3x3x3x3x....x3x3, in total 27 times. That equals 7625597484987.
  • [math]\displaystyle{ 3 \uparrow \uparrow \uparrow 3 }[/math] is [math]\displaystyle{ 3 \uparrow \uparrow (3 \uparrow \uparrow 3) }[/math] and we know [math]\displaystyle{ 3\uparrow\uparrow3 }[/math] is 7625597484987. So [math]\displaystyle{ 3\uparrow\uparrow(3\uparrow\uparrow3) }[/math] is [math]\displaystyle{ 3\uparrow \uparrow 7625597484987 }[/math]. That can also be written as [math]\displaystyle{ 3\uparrow(3\uparrow(3\uparrow(3\uparrow . . .(3\uparrow(3\uparrow(3\uparrow3) }[/math] with a total of 7625597484987 3s. This number is so huge, its digits, even written very small, could fill up the observable universe and beyond.
    • Although this number may already be beyond comprehension, this is barely the start of this giant number.
  • The next step like this is [math]\displaystyle{ 3 \uparrow \uparrow \uparrow \uparrow 3 }[/math] or [math]\displaystyle{ 3 \uparrow \uparrow \uparrow (3 \uparrow \uparrow \uparrow 3) }[/math]. This is the number we will call g1.

After that, g2 is equal to [math]\displaystyle{ 3\uparrow \uparrow \uparrow \uparrow \ldots \uparrow \uparrow \uparrow \uparrow 3 }[/math]; the number of arrows in this number is g1.

g3 is equal to [math]\displaystyle{ 3\uparrow \uparrow \uparrow \uparrow \uparrow \ldots \uparrow \uparrow \uparrow \uparrow \uparrow 3 }[/math], where the number of arrows is g2.

We keep going in this way. We stop when we define g64 to be [math]\displaystyle{ 3\uparrow \uparrow \uparrow \uparrow \uparrow \ldots \uparrow \uparrow \uparrow \uparrow \uparrow 3 }[/math], where the number of arrows is g63.

This is Graham's number.

Related pages