随机算法 (Fall 2011)/The Spectral Gap: Difference between revisions

From TCS Wiki
Jump to navigation Jump to search
imported>Etone
No edit summary
imported>Etone
No edit summary
 
Line 18: Line 18:
</math>.
</math>.


== Rayleigh-Ritz theorem ==
== Optimization Characterization of Eigenvalues ==
{{Theorem
{{Theorem
|Theorem (Rayleigh-Ritz theorem)|
|Theorem (Rayleigh-Ritz theorem)|
Line 154: Line 154:
\phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}}\frac{\sum_{uv\in E}(\chi_S(u)-\chi_S(v))^2}{\sum_{v\in V}\chi_S(v)^2}=\min_{\overset{x\in\{0,1\}^n}{\|x\|_1\le\frac{n}{2}}}\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}.
\phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}}\frac{\sum_{uv\in E}(\chi_S(u)-\chi_S(v))^2}{\sum_{v\in V}\chi_S(v)^2}=\min_{\overset{x\in\{0,1\}^n}{\|x\|_1\le\frac{n}{2}}}\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}.
</math>
</math>
On the other hand, due to the variational characterization, the spectral gap is characterized by:
On the other hand, due to the variational characterization of the spectral gap, we have
:<math>
:<math>
d-\lambda_2=\min_{x\bot\boldsymbol{1}}\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}.
d-\lambda_2=\min_{x\bot\boldsymbol{1}}\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}.
Line 160: Line 160:


We can easily observe the similarity between the two formulas. Both the expansion ration <math>\phi(G)</math> and the spectral gap <math>d-\lambda_2</math> can be characterized by optimizations of the same objective function <math>\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}</math> over different domains (for the spectral gap, the optimization is over all <math>x\bot\boldsymbol{1}</math>; and for the expansion ratio, it is over all such vectors <math>x\in\{0,1\}^n</math> with at most <math>n/2</math> many 1-entries).
We can easily observe the similarity between the two formulas. Both the expansion ration <math>\phi(G)</math> and the spectral gap <math>d-\lambda_2</math> can be characterized by optimizations of the same objective function <math>\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}</math> over different domains (for the spectral gap, the optimization is over all <math>x\bot\boldsymbol{1}</math>; and for the expansion ratio, it is over all such vectors <math>x\in\{0,1\}^n</math> with at most <math>n/2</math> many 1-entries).
-------
;Notations
Throughout the proof, we assume that <math>G(V,E)</math> is the <math>d</math>-regular graph of <math>n</math> vertices, <math>A</math> is the adjacency matrix, whose eigenvalues are <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>, and <math>L=(dI-A)</math> is the graph Laplacian.


=== Large spectral gap implies high expansion ===
=== Large spectral gap implies high expansion ===

Latest revision as of 15:16, 13 December 2011

Cheeger's Inequality

One of the most exciting results in spectral graph theory is the following theorem which relate the graph expansion to the spectral gap.

Theorem (Cheeger's inequality)
Let [math]\displaystyle{ G }[/math] be a [math]\displaystyle{ d }[/math]-regular graph with spectrum [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. Then
[math]\displaystyle{ \frac{d-\lambda_2}{2}\le \phi(G) \le \sqrt{2d(d-\lambda_2)}. }[/math]

The theorem was first stated for Riemannian manifolds, and was proved by Cheeger and Buser (for different directions of the inequalities). The discrete case is proved independently by Dodziuk and Alon-Milman.

For a [math]\displaystyle{ d }[/math]-regular graph, the quantity [math]\displaystyle{ (d-\lambda_2) }[/math] is called the spectral gap. The name is due to the fact that it is the gap between the first and the second largest eigenvalues of a graph.

If we write [math]\displaystyle{ \alpha=1-\frac{\lambda_2}{d} }[/math] (sometimes it is called the normalized spectral gap), the Cheeger's inequality is turned into a nicer form:

[math]\displaystyle{ \frac{\alpha}{2}\le \frac{\phi}{d}\le\sqrt{2\alpha} }[/math] or equivalently [math]\displaystyle{ \frac{1}{2}\left(\frac{\phi}{d}\right)^2\le \alpha\le 2\left(\frac{\phi}{d}\right) }[/math].

Optimization Characterization of Eigenvalues

Theorem (Rayleigh-Ritz theorem)
Let [math]\displaystyle{ A }[/math] be a symmetric [math]\displaystyle{ n\times n }[/math] matrix. Let [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math] be the eigen values of [math]\displaystyle{ A }[/math] and [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] be the corresponding eigenvectors. Then
[math]\displaystyle{ \begin{align} \lambda_1 &=\max_{x\in\mathbb{R}^n}\frac{x^TAx}{x^Tx} \end{align} }[/math] and [math]\displaystyle{ \begin{align} \lambda_2 &=\max_{x\bot v_1}\frac{x^TAx}{x^Tx}. \end{align} }[/math]
Proof.

Without loss of generality, we may assume that [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] are orthonormal eigen-basis. Then it holds that

[math]\displaystyle{ \frac{v_1^TAv_1}{v_1^Tv_1}=\lambda_1v_1^Tv_1=\lambda_1 }[/math],

thus we have [math]\displaystyle{ \max_{x\in\mathbb{R}^n}\frac{x^TAx}{x^Tx}\ge\lambda_1 }[/math].

Let [math]\displaystyle{ x\in\mathbb{R}^n }[/math] be an arbitrary vector and let [math]\displaystyle{ y=\frac{x}{\sqrt{x^Tx}}=\frac{x}{\|x\|} }[/math] be its normalization. Since [math]\displaystyle{ v_1,v_2,\ldots,v_n }[/math] are orthonormal basis, [math]\displaystyle{ y }[/math] can be expressed as [math]\displaystyle{ y=\sum_{i=1}^nc_iv_i }[/math]. Then

[math]\displaystyle{ \begin{align} \frac{x^TAx}{x^Tx} &=y^TAy =\left(\sum_{i=1}^nc_iv_i\right)^TA\left(\sum_{i=1}^nc_iv_i\right) =\left(\sum_{i=1}^nc_iv_i\right)^T\left(\sum_{i=1}^n\lambda_ic_iv_i\right)\\ &=\sum_{i=1}^n\lambda_ic_i^2 \le\lambda_1\sum_{i=1}^nc_i^2 =\lambda_1\|y\| =\lambda_1. \end{align} }[/math]

Therefore, [math]\displaystyle{ \max_{x\in\mathbb{R}^n}\frac{x^TAx}{x^Tx}\le\lambda_1 }[/math]. Altogether we have [math]\displaystyle{ \max_{x\in\mathbb{R}^n}\frac{x^TAx}{x^Tx}=\lambda_1 }[/math]

It is similar to prove [math]\displaystyle{ \max_{x\bot v_1}\frac{x^TAx}{x^Tx}=\lambda_2 }[/math]. In the first part take [math]\displaystyle{ x=v_2 }[/math] to show that [math]\displaystyle{ \max_{x\bot v_1}\frac{x^TAx}{x^Tx}\ge\lambda_2 }[/math]; and in the second part take an arbitrary [math]\displaystyle{ x\bot v_1 }[/math] and [math]\displaystyle{ y=\frac{x}{\|x\|} }[/math]. Notice that [math]\displaystyle{ y\bot v_1 }[/math], thus [math]\displaystyle{ y=\sum_{i=1}^nc_iv_i }[/math] with [math]\displaystyle{ c_1=0 }[/math].

[math]\displaystyle{ \square }[/math]

The Rayleigh-Ritz Theorem is a special case of a fundamental theorem in linear algebra, called the Courant-Fischer theorem, which characterizes the eigenvalues of a symmetric matrix by a series of optimizations:

Theorem (Courant-Fischer theorem)
Let [math]\displaystyle{ A }[/math] be a symmetric matrix with eigenvalues [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. Then
[math]\displaystyle{ \begin{align} \lambda_k &=\max_{v_1,v_2,\ldots,v_{n-k}\in \mathbb{R}^n}\min_{\overset{x\in\mathbb{R}^n, x\neq \mathbf{0}}{x\bot v_1,v_2,\ldots,v_{n-k}}}\frac{x^TAx}{x^Tx}\\ &= \min_{v_1,v_2,\ldots,v_{k-1}\in \mathbb{R}^n}\max_{\overset{x\in\mathbb{R}^n, x\neq \mathbf{0}}{x\bot v_1,v_2,\ldots,v_{k-1}}}\frac{x^TAx}{x^Tx}. \end{align} }[/math]

Graph Laplacian

Let [math]\displaystyle{ G(V,E) }[/math] be a [math]\displaystyle{ d }[/math]-regular graph of [math]\displaystyle{ n }[/math] vertices and let [math]\displaystyle{ A }[/math] be its adjacency matrix. We define [math]\displaystyle{ L=dI-A }[/math] to be the Laplacian of the graph [math]\displaystyle{ G }[/math]. Take [math]\displaystyle{ x\in \mathbb{R}^V }[/math] as a distribution over vertices, its Laplacian quadratic form [math]\displaystyle{ x^TLx }[/math] measures the "smoothness" of [math]\displaystyle{ x }[/math] over the graph topology, just as what the Laplacian operator does to the differentiable functions.

Laplacian Property
For any vector [math]\displaystyle{ x\in\mathbb{R}^n }[/math], it holds that
[math]\displaystyle{ x^TLx=\sum_{uv\in E}(x_u-x_v)^2 }[/math].
Proof.
[math]\displaystyle{ \begin{align} x^TLx &= \sum_{u,v\in V}x_u(dI-A)_{uv}x_v\\ &= \sum_{u}\left(dx_u^2-\sum_{uv\in E}x_ux_v\right)\\ &= \sum_{u\in V}\sum_{uv\in E}(x_u^2-x_ux_v). \end{align} }[/math]

On the other hand,

[math]\displaystyle{ \begin{align} \sum_{uv\in E}(x_u-x_v)^2 &= \sum_{uv\in E}\left(x_u^2-2x_ux_v+x_v^2\right)\\ &= \sum_{uv\in E}\left((x_u^2-x_ux_v)+(x_v^2-x_vx_u)\right)\\ &= \sum_{u\in V}\sum_{uv\in E}(x_u^2-x_ux_v). \end{align} }[/math]
[math]\displaystyle{ \square }[/math]

Applying the Rayleigh-Ritz theorem to the Laplacian matrix of the graph, we have the following "variational characterization" of the spectral gap [math]\displaystyle{ d-\lambda_2 }[/math].

Theorem (Variational Characterization)
Let [math]\displaystyle{ G(V,E) }[/math] be a [math]\displaystyle{ d }[/math]-regular graph of [math]\displaystyle{ n }[/math] vertices. Suppose that its adjacency matrix is [math]\displaystyle{ A }[/math], whose eigenvalues are [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math]. Let [math]\displaystyle{ L=dI-A }[/math] be the Laplacian matrix. Then
[math]\displaystyle{ \begin{align} d-\lambda_2 &=\min_{x\bot \boldsymbol{1}}\frac{x^TLx}{x^Tx} =\min_{x\bot \boldsymbol{1}}\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}. \end{align} }[/math]
Proof.

For [math]\displaystyle{ d }[/math]-regular graph, we know that [math]\displaystyle{ \lambda_1=d }[/math] and [math]\displaystyle{ \boldsymbol{1}A=d\boldsymbol{1} }[/math], thus [math]\displaystyle{ \boldsymbol{1} }[/math] is the eigenvector of [math]\displaystyle{ \lambda_1 }[/math]. Due to Rayleigh-Ritz Theorem, it holds that [math]\displaystyle{ \lambda_2 =\max_{x\bot \boldsymbol{1}}\frac{x^TAx}{x^Tx} }[/math]. Then

[math]\displaystyle{ \begin{align} \min_{x\bot \boldsymbol{1}}\frac{x^TLx}{x^Tx} &=\min_{x\bot \boldsymbol{1}}\frac{x^T(dI-A)x}{x^Tx}\\ &=\min_{x\bot \boldsymbol{1}}\frac{dx^Tx-x^TAx}{x^Tx}\\ &=\min_{x\bot \boldsymbol{1}}\left(d-\frac{x^TAx}{x^Tx}\right)\\ &=d-\max_{x\bot \boldsymbol{1}}\frac{x^TAx}{x^Tx}\\ &=d-\lambda_2. \end{align} }[/math]

We know it holds for the graph Laplacian that [math]\displaystyle{ x^TLx=\sum_{uv\in E}(x_u-x_v)^2 }[/math]. So the variational characterization of the second eigenvalue of graph is proved.

[math]\displaystyle{ \square }[/math]

Proof of Cheeger's Inequality

We will first give an informal explanation why Cheeger's inequality holds.

Recall that the expansion is defined as

[math]\displaystyle{ \phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}}\frac{|\partial S|}{|S|}. }[/math]

Let [math]\displaystyle{ \chi_S }[/math] be the characteristic vector of the set [math]\displaystyle{ S }[/math] such that

[math]\displaystyle{ \chi_S(v)=\begin{cases} 1 & v\in S,\\ 0 & v\not\in S. \end{cases} }[/math]

It is easy to see that

[math]\displaystyle{ \frac{\sum_{uv\in E}(\chi_S(u)-\chi_S(v))^2}{\sum_{v\in V}\chi_S(v)^2}=\frac{|\partial S|}{|S|}. }[/math]

Thus, the expansion can be expressed algebraically as

[math]\displaystyle{ \phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}}\frac{\sum_{uv\in E}(\chi_S(u)-\chi_S(v))^2}{\sum_{v\in V}\chi_S(v)^2}=\min_{\overset{x\in\{0,1\}^n}{\|x\|_1\le\frac{n}{2}}}\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}. }[/math]

On the other hand, due to the variational characterization of the spectral gap, we have

[math]\displaystyle{ d-\lambda_2=\min_{x\bot\boldsymbol{1}}\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}. }[/math]

We can easily observe the similarity between the two formulas. Both the expansion ration [math]\displaystyle{ \phi(G) }[/math] and the spectral gap [math]\displaystyle{ d-\lambda_2 }[/math] can be characterized by optimizations of the same objective function [math]\displaystyle{ \frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2} }[/math] over different domains (for the spectral gap, the optimization is over all [math]\displaystyle{ x\bot\boldsymbol{1} }[/math]; and for the expansion ratio, it is over all such vectors [math]\displaystyle{ x\in\{0,1\}^n }[/math] with at most [math]\displaystyle{ n/2 }[/math] many 1-entries).


Notations

Throughout the proof, we assume that [math]\displaystyle{ G(V,E) }[/math] is the [math]\displaystyle{ d }[/math]-regular graph of [math]\displaystyle{ n }[/math] vertices, [math]\displaystyle{ A }[/math] is the adjacency matrix, whose eigenvalues are [math]\displaystyle{ \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n }[/math], and [math]\displaystyle{ L=(dI-A) }[/math] is the graph Laplacian.

Large spectral gap implies high expansion

Cheeger's inequality (lower bound)
[math]\displaystyle{ \phi(G)\ge\frac{d-\lambda_2}{2}. }[/math]
Proof.

Let [math]\displaystyle{ S^*\subset V }[/math], [math]\displaystyle{ |S^*|\le\frac{n}{2} }[/math], be the vertex set achieving the optimal expansion ratio [math]\displaystyle{ \phi(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}} \frac{|\partial S|}{|S|}=\frac{|\partial S^*|}{|S^*|} }[/math], and [math]\displaystyle{ x\in\mathbb{R}^n }[/math] be a vector defined as

[math]\displaystyle{ x_v=\begin{cases} 1/|S^*| & \text{if } v\in S^*,\\ -1/\left|\overline{S^*}\right| &\text{if }v\in\overline{S^*}. \end{cases} }[/math]

Clearly, [math]\displaystyle{ x\cdot \boldsymbol{1}=\sum_{v\in S^*}\frac{1}{|S^*|} -\sum_{v\in\overline{S^*}}\frac{1}{\left|\overline{S^*}\right|}=0 }[/math], thus [math]\displaystyle{ x\bot\boldsymbol{1} }[/math].

Due to the variational characterization of the second eigenvalue,

[math]\displaystyle{ \begin{align} d-\lambda_2 &\le\frac{\sum_{uv\in E}(x_u-x_v)^2}{\sum_{v\in V}x_v^2}\\ &=\frac{\sum_{u\in S^*,v\in\overline{S^*},uv\in E}\left(1/|S^*|+1/|\overline{S^*}|\right)^2}{1/|S^*|+1/|\overline{S^*}|}\\ &=\left(\frac{1}{|S^*|}+\frac{1}{\left|\overline{S^*}\right|}\right)\cdot|\partial S^*|\\ &\le \frac{2|\partial S^*|}{|S^*|} &(\text{since }|S^*|\le\frac{n}{2})\\ &=2\phi(G). \end{align} }[/math]
[math]\displaystyle{ \square }[/math]

High expansion implies large spectral gap

We next prove the upper bound direction of the Cheeger's inequality:

Cheeger's inequality (upper bound)
[math]\displaystyle{ \phi(G) \le \sqrt{2d(d-\lambda_2)}. }[/math]

This direction is harder than the lower bound direction. But it is mathematically more interesting and also more useful to us for analyzing the mixing time of random walks.

We prove the following equivalent inequality:

[math]\displaystyle{ \frac{\phi^2}{2d} \le d-\lambda_2. }[/math]

Let [math]\displaystyle{ x }[/math] satisfy that

  • [math]\displaystyle{ Ax=\lambda_2x }[/math], i.e., it is a eigenvector for [math]\displaystyle{ \lambda_2 }[/math];
  • [math]\displaystyle{ |\{v\in V\mid x_v\gt 0\}|\le\frac{n}{2} }[/math], i.e., [math]\displaystyle{ x }[/math] has at most [math]\displaystyle{ n/2 }[/math] positive entries. (We can always choose [math]\displaystyle{ x }[/math] to be [math]\displaystyle{ -x }[/math] if this is not satisfied.)

And let nonnegative vector [math]\displaystyle{ y }[/math] be defined as

[math]\displaystyle{ y_v=\begin{cases} x_v & x_v\gt 0,\\ 0 & \text{otherwise.} \end{cases} }[/math]

We then prove the following inequalities:

  1. [math]\displaystyle{ \frac{y^TLy}{y^Ty}\le d-\lambda_2 }[/math];
  2. [math]\displaystyle{ \frac{\phi^2}{2d}\le\frac{y^TLy}{y^Ty} }[/math].

The theorem is then a simple consequence by combining these two inequalities.

We prove the first inequality:

Lemma
[math]\displaystyle{ \frac{y^TLy}{y^Ty}\le d-\lambda_2 }[/math].
Proof.

If [math]\displaystyle{ x_u\ge 0 }[/math], then

[math]\displaystyle{ \begin{align} (Ly)_u &=((dI-A)y)_u =dy_u-\sum_{v}A(u,v)y_v =dx_u-\sum_{v:x_v\ge 0}A(u,v)x_v\\ &\le dx_u-\sum_{v}A(u,v)x_v =((dI-A)x)_u =(d-\lambda_2)x_u. \end{align} }[/math]

Then

[math]\displaystyle{ \begin{align} y^TLy &=\sum_{u}y_u(Ly)_u =\sum_{u:x_u\ge 0}y_u(Ly)_u =\sum_{u:x_u\ge 0}x_u(Ly)_u\\ &\le (d-\lambda_2)\sum_{u:x_u\ge 0}x_u^2 =(d-\lambda_2)\sum_{u}y_u^2 =(d-\lambda_2)y^Ty, \end{align} }[/math]

which proves the lemma.

[math]\displaystyle{ \square }[/math]

We then prove the second inequality:

Lemma
[math]\displaystyle{ \frac{\phi^2}{2d}\le\frac{y^TLy}{y^Ty} }[/math].
Proof.

To prove this, we introduce a new quantity [math]\displaystyle{ \frac{\sum_{uv\in E}|y_u^2-y_v^2|}{y^Ty} }[/math] and shows that

[math]\displaystyle{ \phi\le\frac{\sum_{uv\in E}|y_u^2-y_v^2|}{y^Ty}\le\sqrt{2d}\cdot\sqrt{\frac{y^TLy}{y^Ty}} }[/math].

This will give us the desired inequality [math]\displaystyle{ \frac{\phi^2}{2d}\le\frac{y^TLy}{y^Ty} }[/math].

Lemma
[math]\displaystyle{ \frac{\sum_{uv\in E}|y_u^2-y_v^2|}{y^Ty}\le\sqrt{2d}\cdot\sqrt{\frac{y^TLy}{y^Ty}} }[/math].
Proof.

By the Cauchy-Schwarz Inequality,

[math]\displaystyle{ \begin{align} \sum_{uv\in E}|y_u^2-y_v^2| &=\sum_{uv\in E}|y_u-y_v||y_u+y_v|\\ &\le\sqrt{\sum_{uv\in E}(y_u-y_v)^2}\cdot\sqrt{\sum_{uv\in E}(y_u+y_v)^2}. \end{align} }[/math]

By the Laplacian property, the first term [math]\displaystyle{ \sqrt{\sum_{uv\in E}(y_u-y_v)^2}=\sqrt{y^TLy} }[/math]. By the Inequality of Arithmetic and Geometric Means, the second term

[math]\displaystyle{ \sqrt{\sum_{uv\in E}(y_u+y_v)^2} \le\sqrt{2\sum_{uv\in E}(y_u^2+y_v^2)} =\sqrt{2d\sum_{u\in V}y_u^2} =\sqrt{2dy^Ty}. }[/math]

Combining them together, we have

[math]\displaystyle{ \sum_{uv\in E}|y_u^2-y_v^2|\le\sqrt{2d}\cdot\sqrt{y^TLy}\cdot\sqrt{y^Ty} }[/math].
[math]\displaystyle{ \square }[/math]
Lemma
[math]\displaystyle{ \phi\le\frac{\sum_{uv\in E}|y_u^2-y_v^2|}{y^Ty} }[/math].
Proof.

Suppose that [math]\displaystyle{ y }[/math] has [math]\displaystyle{ t }[/math] nonzero entries. We know that [math]\displaystyle{ t\le n/2 }[/math] due to the definition of [math]\displaystyle{ y }[/math]. We enumerate the vertices [math]\displaystyle{ u_1,u_2,\ldots,u_n\in V }[/math] such that

[math]\displaystyle{ y_{u_1}\ge y_{u_2}\ge\cdots\ge y_{u_t}\gt y_{u_{t+1}}=\cdots=y_{u_n}=0 }[/math].

Then

[math]\displaystyle{ \begin{align} \sum_{uv\in E}|y_u^2-y_v^2| &=\sum_{u_iu_j\in E\atop i\lt j}(y_{u_i}^2-y_{u_j}^2) =\sum_{u_iu_j\in E\atop i\lt j}\sum_{k=i}^{j-1}(y_{u_k}^2-y_{u_{k+1}}^2)\\ &=\sum_{i=1}^n\sum_{j\gt i}A(u_i,u_j)\sum_{k=i}^{j-1}(y_{u_k}^2-y_{u_{k+1}}^2) =\sum_{i=1}^n\sum_{j\gt i}\sum_{k=i}^{j-1}A(u_i,u_j)(y_{u_k}^2-y_{u_{k+1}}^2). \end{align} }[/math]

We have the following universal equation for sums:

[math]\displaystyle{ \begin{align} \sum_{i=1}^n\sum_{j\gt i}\sum_{k=i}^{j-1}A(u_i,u_j)(y_{u_k}^2-y_{u_{k+1}}^2) &=\sum_{k=1}^n\sum_{i\le k}\sum_{j\gt k}A(u_i,u_j)(y_{u_k}^2-y_{u_{k+1}}^2)\\ &=\sum_{k=1}^t(y_{u_k}^2-y_{u_{k+1}}^2)\sum_{i\le k}\sum_{j\gt k}A(u_i,u_j) \end{align} }[/math]

Notice that [math]\displaystyle{ \sum_{i\le k}\sum_{j\gt k}A(u_i,u_j)=|\partial\{u_1,\ldots, u_k\}| }[/math], which is at most [math]\displaystyle{ \phi k }[/math] since [math]\displaystyle{ k\le t\le n/2 }[/math]. Therefore, combining these together, we have

[math]\displaystyle{ \begin{align} \sum_{uv\in E}|y_u^2-y_v^2| &=\sum_{k=1}^t(y_{u_k}^2-y_{u_{k+1}}^2)\sum_{i\le k}\sum_{j\gt k}A(u_i,u_j)\\ &=\sum_{k=1}^t(y_{u_k}^2-y_{u_{k+1}}^2)|\partial\{u_1,\ldots, u_k\}|\\ &\le \phi\sum_{k=1}^t(y_{u_k}^2-y_{u_{k+1}}^2)k\\ &=\phi\sum_{k=1}^ty_{u_k}^2\\ &=\phi y^Ty. \end{align} }[/math]
[math]\displaystyle{ \square }[/math]
[math]\displaystyle{ \square }[/math]