随机算法 (Fall 2011)/Coupling and 高级算法 (Fall 2018): Difference between pages

From TCS Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>WikiSysop
(Created page with '=Coupling of Two Distributions= Coupling is a powerful proof technique in probability theory. It allows us to compare two unrelated variables (or processes) by forcing them to sh…')
 
imported>Etone
No edit summary
 
Line 1: Line 1:
=Coupling of Two Distributions=
{{Infobox
Coupling is a powerful proof technique in probability theory. It allows us to compare two unrelated variables (or processes) by forcing them to share some randomness.
|name        = Infobox
{{Theorem|Definition (coupling)|
|bodystyle    =
:Let <math>p</math> and <math>q</math> be two probability distributions over <math>\Omega</math>. A probability distribution <math>\mu</math> over <math>\Omega\times\Omega</math> is said to be a '''coupling''' if its marginal distributions are <math>p</math> and <math>q</math>; that is
|title        = <font size=3>高级算法
::<math>
<br>Advanced Algorithms</font>
\begin{align}
|titlestyle  =  
p(x) &=\sum_{y\in\Omega}\mu(x,y);\\
q(x) &=\sum_{y\in\Omega}\mu(y,x).
\end{align}
</math>
}}


The interesting case is when <math>X</math> and <math>Y</math> are not independent. In particular, we want the probability that <math>X=Y</math> to be as large as possible, yet still maintaining the respective marginal distributions <math>p</math> and <math>q</math>. When <math>p</math> and <math>q</math> are different it is inevitable that <math>X\neq Y</math> with some probability, but how small this probability can be, that is, how well two distributions can be coupled? The following coupling lemma states that this is determined by the total variation distance between the two distributions.
|image        =
|imagestyle  =
|caption      =
|captionstyle =
|headerstyle  = background:#ccf;
|labelstyle  = background:#ddf;
|datastyle    =  


{{Theorem|Lemma (coupling lemma)|
|header1 =Instructor
:Let <math>p</math> and <math>q</math> be two probability distributions over <math>\Omega</math>.
|label1  =
# For any coupling <math>(X,Y)</math> of <math>p</math> and <math>q</math>, it holds that <math>\Pr[X\neq Y]\ge\|p-q\|_{TV}</math>.
|data1  =
# There exists a coupling <math>(X,Y)</math> of <math>p</math> and <math>q</math> such that <math>\Pr[X\neq Y]=\|p-q\|_{TV}</math>.
|header2 =
|label2  =
|data2  = 尹一通<br>郑朝栋
|header3 =
|label3  = Email
|data3  = yinyt@nju.edu.cn chaodong@nju.edu.cn 
|header4 =
|label4= office
|data4= 计算机系 804
|header5 = Class
|label5  =
|data5  =
|header6 =
|label6  = Class meetings
|data6  = Wednesday, 8am-10am <br> 仙I-319
|header7 =
|label7  = Place
|data7  =
|header8 =
|label8  = Office hours
|data8  = Wednesday, 10am-12pm <br>计算机系 804(尹一通)、302(郑朝栋)
|header9 = Textbooks
|label9  =
|data9  =
|header10 =
|label10  =
|data10  = [[File:MR-randomized-algorithms.png|border|100px]]
|header11 =
|label11  =
|data11  = Motwani and Raghavan. <br>''Randomized Algorithms''.<br> Cambridge Univ Press, 1995.
|header12 =
|label12  =
|data12  = [[File:Approximation_Algorithms.jpg|border|100px]]
|header13 =
|label13  =
|data13  =  Vazirani. <br>''Approximation Algorithms''. <br> Springer-Verlag, 2001.
|belowstyle = background:#ddf;
|below =
}}
}}
{{Proof|
;1.
Suppose <math>(X,Y)</math> follows the distribution <math>\mu</math> over <math>\Omega\times\Omega</math>. Due to the definition of coupling,
:<math>
\begin{align}
p(x) =\sum_{y\in\Omega}\mu(x,y) &\quad\text{and } &q(x) =\sum_{y\in\Omega}\mu(y,x).
\end{align}
</math>
Then for any <math>z\in\Omega</math>, <math>\mu(z,z)\le\min\{p(z),q(z)\}</math>, thus
:<math>
\begin{align}
\Pr[X=Y]
&=
\sum_{z\in\Omega}\mu(z,z)
\le\sum_{z\in\Omega}\min\{p(z),q(z)\}.
\end{align}
</math>
Therefore,
:<math>
\begin{align}
\Pr[X\neq Y]
&\ge
1-\sum_{z\in\Omega}\min\{p(z),q(z)\}\\
&=\sum_{z\in\Omega}(p(z)-\min\{p(z),q(z)\})\\
&=\sum_{z\in\Omega\atop p(z)>q(z)}(p(z)-q(z))\\
&=\frac{1}{2}\sum_{z\in\Omega}|p(z)-q(z)|\\
&=\|p-q\|_{TV}.
\end{align}
</math>
;2.
We can couple <math>X</math> and <math>Y</math> so that for each <math>z\in\Omega</math>, <math>X=Y=z</math> with probability <math>\min\{p(z),q(z)\}</math>. The remaining follows as above.
}}
The proof is depicted by the following figure, where the curves are the probability density functions for the two distributions <math>p</math> and <math>q</math>, the colored area is the diference between the two distribution, and the "lower envelope" (the red line) is the <math>\min\{p(x), q(x)\}</math>.
:[[image:Coupling.png|350px]]
===Monotonicity of <math>\Delta_x(t)</math>===
Consider a Markov chain on state space <math>\Omega</math>. Let <math>\pi</math> be the stationary distribution, and <math>p_x^{(t)}</math> be the distribution after <math>t</math> steps when the initial state is <math>x</math>. Recall that <math>\Delta_x(t)=\|p_x^{(t)}-\pi\|_{TV}</math> is the distance to stationary distribution <math>\pi</math> after <math>t</math> steps, started at state <math>x</math>.


We will show that <math>\Delta_x(t)</math> is non-decreasing in <math>t</math>. That is, for a Markov chain, the variation distance to the stationary distribution monotonically decreases as time passes. Although this is rather intuitive at the first glance, the proof is nontrivial. A brute force (algebraic) proof can be obtained by analyze the change to the 1-norm of <math>p_x^{(t)}-\pi</math> by multiplying the transition matrix <math>P</math>. The analysis uses eigen decomposition. We introduce a cleverer proof using coupling.
This is the webpage for the ''Advanced Algorithms'' class of fall 2018. Students who take this class should check this page periodically for content updates and new announcements.  


{{Theorem|Proposition|
= Announcement =
:<math>\Delta_x(t)</math> is non-decreasing in <math>t</math>.
* (2018/9/5) 新学期第一次上课。
}}
{{Proof|
Let <math>X_0=x</math> and <math>Y_0</math> follow the stationary distribution <math>\pi</math>. Two chains <math>X_0,X_1,X_2,\ldots</math> and <math>Y_0,Y_1,Y_2,\ldots</math> can be defined by the initial distributions <math>X_0</math> and <math>Y_0</math>. The distributions of <math>X_t</math> and <math>Y_t</math> are <math>p_x^{(t)}</math> and <math>\pi</math>, respectively.


For any fixed <math>t</math>, we can couple <math>X_t</math> and <math>Y_t</math> such that <math>\Pr[X\neq Y]=\|p_x^{t}-\pi\|_{TV}=\Delta_x(t)</math>. Due to the Coupling Lemma, such coupling exists. We then define a coupling between <math>X_{t+1}</math> and <math>Y_{t+1}</math> as follows.
= Course info =
* If <math>X_t=Y_t</math>, then <math>X_{t+1}=Y_{t+1}</math> following the transition matrix of the Markov chain.
* '''Instructor ''': 尹一通、郑朝栋
* Otherwise, do the transitions <math>X_t\rightarrow X_{t+1}</math> and <math>Y_t\rightarrow Y_{t+1}</math> independently, following the transitin matrix.
:*email: yinyt@nju.edu.cn, chaodong@nju.edu.cn
It is easy to see that the marginal distributions of <math>X_{t+1}</math> and <math>Y_{t+1}</math> both follow the original Markov chain, and
* '''Class meeting''': Wednesday 8am-10am, 仙I-319.
:<math>
* '''Office hour''': Wednesday 10am-12pm, 计算机系 804.
\Pr[X_{t+1}\neq Y_{t+1}]\le \Pr[X_t\neq Y_{t}].
</math>
In conclusion, it holds that
:<math>
\Delta_x(t+1)=\|p_x^{(t+1)}-\pi\|_{TV}\le\Pr[X_{t+1}\neq Y_{t+1}]\le \Pr[X_t\neq Y_{t}]=\Delta_x(t),
</math>
where the first inequality is due to the easy direction of the Coupling Lemma.
}}


=Rapid Mixing by Coupling=
= Syllabus =
Consider an ergodic (irreducible and aperiodic) Markov chain on state space <math>\Omega</math>. We want to upper bound its mixing time. The coupling technique for bounding the mixing time can be sumerized as follows:
:Consider two random walks starting at state <math>x</math> and <math>y</math> in <math>\Omega</math>. Each individual walk follows the transition rule of the original Markov chain. But the two random walks may be "''coupled''" in some way, that is, may be correlated with each other. A key observation is that for any such coupling, it always holds that
::<math>\Delta(t)
\le\max_{x,y\in\Omega}\Pr[\text{ the two } \mathit{coupled} \text{ random walks started at }x,y\text{ have not met by time }t]
</math>
:where we recall that <math>\Delta(t)=\max_{x\in\Omega}\|p_x^{(t)}-\pi\|_{TV}</math>.


{{Theorem|Definition (coupling of Markov chain)|
=== 先修课程 Prerequisites ===
:A '''coupling''' of a Markov chain with state space <math>\Omega</math> and transition matrix <math>P</math>, is a Markov chain <math>(X_t,Y_t)</math> with state space <math>\Omega\times\Omega</math>, satisfying
* 必须:离散数学,概率论,线性代数。
# each of <math>X_t</math> and <math>Y_t</math> viewed in isolation is a faithful copy of the original Markov chain, i.e.
* 推荐:算法设计与分析。
#:<math>\Pr[X_{t+1}=v\mid X_t=u]=\Pr[Y_{t+1}=v\mid Y_{t}=u]=P(u,v)</math>;
# once <math>X_t</math> and <math>Y_t</math> reaches the same state, they make identical moves ever since, i.e.
#:<math>X_{t+1}=Y_{t+1}</math> if <math>X_t=Y_t</math>.
}}


{{Theorem|Lemma (coupling lemma for Markov chains)|
=== Course materials ===
:Let <math>(X_t,Y_t)</math> be a coupling of a Markov chain <math>M</math> with state space <math>\Omega</math>. Then <math>\Delta(t)</math> for <math>M</math> is bounded by
* [[高级算法 (Fall 2018) / Course materials|<font size=3>教材和参考书</font>]]
::<math>\Delta(t)
\le\max_{x,y\in\Omega}\Pr[X_t\neq Y_t\mid X_0=x,Y_0=y]</math>.
}}
{{Proof|
Due to the coupling lemma (for probability distributions),
:<math>
\Pr[X_t\neq Y_t\mid X_0=x,Y_0=y]
\ge
\|p_x^{(t)}-p_y^{(t)}\|_{TV},
</math>
where <math>p_x^{(t)},p_y^{(t)}</math> are distributions of the Markov chain <math>M</math> at time <math>t</math>, started at states <math>x, y</math>.


Therefore,
=== 成绩 Grades ===
:<math>
* 课程成绩:本课程将会有若干次作业和一次期末考试。最终成绩将由平时作业成绩和期末考试成绩综合得出。
\begin{align}
* 迟交:如果有特殊的理由,无法按时完成作业,请提前联系授课老师,给出正当理由。否则迟交的作业将不被接受。
\Delta(t)
&=
\max_{x\in\Omega}\|p_x^{(t)}-\pi\|_{TV}\\
&\le
\max_{x,y\in\Omega}\|p_x^{(t)}-p_y^{(t)}\|_{TV}\\
&\le
\max_{x,y\in\Omega}\Pr[X_t\neq Y_t\mid X_0=x,Y_0=y].
\end{align}
</math>
}}


=== <font color=red> 学术诚信 Academic Integrity </font>===
学术诚信是所有从事学术活动的学生和学者最基本的职业道德底线,本课程将不遗余力的维护学术诚信规范,违反这一底线的行为将不会被容忍。


{{Theorem|Corollary|
作业完成的原则:署你名字的工作必须由你完成。允许讨论,但作业必须独立完成,并在作业中列出所有参与讨论的人。不允许其他任何形式的合作——尤其是与已经完成作业的同学“讨论”。
:Let <math>(X_t,Y_t)</math> be a coupling of a Markov chain <math>M</math> with state space <math>\Omega</math>. Then <math>\tau(\epsilon)</math> for <math>M</math> is bounded as follows:
::<math>\max_{x,y\in\Omega}\Pr[X_t\neq Y_t\mid X_0=x,Y_0=y]\le \epsilon\quad\Longrightarrow\quad\tau(\epsilon)\le t</math>.
}}


=== Geometric convergence ===
本课程将对剽窃行为采取零容忍的态度。在完成作业过程中,对他人工作(出版物、互联网资料、其他人的作业等)直接的文本抄袭和对关键思想、关键元素的抄袭,按照 [http://www.acm.org/publications/policies/plagiarism_policy ACM Policy on Plagiarism]的解释,都将视为剽窃。剽窃者成绩将被取消。如果发现互相抄袭行为,<font color=red> 抄袭和被抄袭双方的成绩都将被取消</font>。因此请主动防止自己的作业被他人抄袭。
Consider a Markov chain on state space <math>\Omega</math>. Let <math>\pi</math> be the stationary distribution, and <math>p_x^{(t)}</math> be the distribution after <math>t</math> steps when the initial state is <math>x</math>. Recall that
*<math>\Delta_x(t)=\|p_x^{(t)}-\pi\|_{TV}</math>;
*<math>\Delta(t)=\max_{x\in\Omega}\Delta_x(t)</math>;
*<math>\tau_x(\epsilon)=\min\{t\mid\Delta_x(t)\le\epsilon\}</math>;
* <math>\tau(\epsilon)=\max_{x\in\Omega}\tau_x(\epsilon)</math>;
*<math>\tau_{\mathrm{mix}}=\tau(1/2\mathrm{e})\,</math>.
 
We prove that
{{Theorem|Proposition|
# <math>\Delta(k\cdot\tau_{\mathrm{mix}})\le \mathrm{e}^{-k}</math> for any integer <math>k\ge1</math>.
#<math>\tau(\epsilon)\le\tau_{\mathrm{mix}}\cdot\left\lceil\ln\frac{1}{\epsilon}\right\rceil</math>.
}}
{{Proof|
;1.
We denote that <math>\tau=\tau_{\mathrm{mix}}\,</math>. We construct a coupling of the Markov chain as follows. Suppose <math>t=k\tau</math> for some integer <math>k</math>.
* If <math>X_t=Y_t</math> then <math>X_{t+i}=Y_{t+i}</math> for <math>i=1,2,\ldots,\tau</math>.
* For the case that <math>X_t=u, Y_t=v</math> for some <math>u\neq v</math>, we couple the <math>X_{t+\tau}</math> and <math>Y_{t+\tau}</math> so that <math>\Pr[X_{t+\tau}\neq Y_{t+\tau}\mid X_t=u,Y_t=v]=\|p_u^{(t)}-p_v^{(t)}\|_{TV}</math>. Due to the Coupling Lemma, such coupling does exist.
 
For any <math>u,v\in\Omega</math> that <math>u\neq v</math>,
:<math>\begin{align}
\Pr[X_{t+\tau}\neq Y_{t+\tau}\mid X_t=u,Y_t=v]
&=\|p_u^{(t)}-p_v^{(t)}\|_{TV}\\
&\le \|p_u^{(\tau)}-\pi\|_{TV}+\|p_v^{(\tau)}-\pi\|_{TV}\\
&= \Delta_u(\tau)+\Delta_v(\tau)\\
&\le \frac{1}{\mathrm{e}}.
\end{align}
</math>
Thus <math>\Pr[X_{t+\tau}\neq Y_{t+\tau}\mid X_t\neq Y_t]\le \frac{1}{\mathrm{e}}</math> by the law of total probability. Therefore, for any <math>x,y\in\Omega</math>,
:<math>
\begin{align}
&\quad\, \Pr[X_{t+\tau}\neq Y_{t+\tau}\mid X_0=x,Y_0=y]\\
&=
\Pr[X_{t+\tau}\neq Y_{t+\tau}\mid X_t\neq Y_t]\cdot \Pr[X_{t}\neq Y_{t}\mid X_0=x,Y_0=y]\\
&\le \frac{1}{\mathrm{e}}\Pr[X_{t}\neq Y_{t}\mid X_0=x,Y_0=y].
\end{align}
</math>
Then by induction,
:<math>
\Pr[X_{k\tau}\neq Y_{k\tau}\mid X_0=x,Y_0=y]\le \mathrm{e}^{-k},
</math>
and this holds for any <math>x,y\in\Omega</math>. By the Coupling Lemma for Markov chains, this means <math>\Delta(k\tau_{\mathrm{mix}})=\Delta(k\tau)\le\mathrm{e}^{-k}</math>.
 
;2.
By the definition of <math>\tau(\epsilon)</math>, the inequality is straightforwardly implied by (1).
}}
 
=Random Walk on the Hypercube=
A <math>n</math>-dimensional [http://en.wikipedia.org/wiki/Hypercube '''hypercube'''] is a graph on vertex set <math>\{0,1\}^n</math>. For any <math>x,y\in\{0,1\}^n</math>, there is an edge between <math>x</math> and <math>y</math> if <math>x</math> and <math>y</math> differ at exact one coordinate (the hamming distance between <math>x</math> and <math>y</math> is 1).
 
We use coupling to bound the mixing time of the following simple random walk on a hypercube.
{{Theorem|Random Walk on Hypercube|
: At each step, for the current state <math>x\in\{0,1\}^n</math>:
# with probability <math>1/2</math>, do nothing;
# else, pick a coordinate <math>i\in\{1,2,\ldots,n\}</math> uniformly at random and flip the bit <math>x_i</math> (change <math>x_i</math> to <math>1-x_i</math>).
}}
 
This is a lazy uniform random walk in an <math>n</math>-dimensional hypercube. It is equivalent to the following random walk.
 
{{Theorem|Random Walk on Hypercube|
: At each step, for the current state <math>x\in\{0,1\}^n</math>:
# pick a coordinate <math>i\in\{1,2,\ldots,n\}</math> uniformly at random and a bit <math>b\in\{0,1\}</math> uniformly at random;
# let <math>x_i=b</math>.
}}


It is easy to see the two random walks are the same. The second form hint us to couple the random choice of <math>i</math> and <math>b</math> in each step. Consider two copies of the random walk <math>X_t</math> and <math>Y_t</math>, started from arbitrary two states <math>X_0</math> and <math>Y_0</math>, the coupling rule is:
学术诚信影响学生个人的品行,也关乎整个教育系统的正常运转。为了一点分数而做出学术不端的行为,不仅使自己沦为一个欺骗者,也使他人的诚实努力失去意义。让我们一起努力维护一个诚信的环境。
* At each step, <math>X_t</math> and <math>Y_t</math> choose the same (uniformly random) coordinate <math>i</math> and the same (uniformly random) bit <math>b</math>.
Obviously the two individual random walks are both faithful copies of the original walk.


For arbitrary <math>x,y\in\{0,1\}^n</math>, started at <math>X_0=x, Y_0=y</math>, the event <math>X_t=Y_t</math> occurs if everyone of the coordinates on which <math>x</math> and <math>y</math> disagree has been picked at least once. In the worst case, <math>x</math> and <math>y</math> disagree on all <math>n</math> coordinates. The time <math>T</math> that all <math>n</math> coordinates have been picked follows the coupon collector problem collecting <math>n</math> coupons, and we know from the coupon collector problem that for <math>c>0</math>,
= Assignments =
:<math>
* TBA
\Pr[T\ge n\ln n+cn]\le \mathrm{e}^{-c}.
</math>
Thus for any <math>x,y\in\{0,1\}^n</math>, if <math>t\ge n\ln n+cn</math>, then <math>\Pr[X_t\neq Y_t\mid X_0=x,Y_0=y]\le \mathrm{e}^{-c}</math>. Due to the coupling lemma for Markov chains,
:<math>\Delta(n\ln n+cn)\le \mathrm{e}^{-c},</math>
which implies that
:<math>\tau(\epsilon)=n\ln n+n\ln\frac{1}{\epsilon}</math>.
So the random walk achieves a polynomially small variation distance <math>\epsilon=\frac{1}{\mathrm{poly}(n)}</math> to the stationary distribution in time <math>O(n\ln n)</math>.


Note that the number of states (vertices in the <math>n</math>-dimensional hypercube) is <math>2^n</math>. Therefore, the mixing time is exponentially small compared to the size of the state space.
= Lecture Notes =
# [[高级算法 (Fall 2018)/Min-Cut and Max-Cut|Min-Cut and Max-Cut]]
#:  [[高级算法 (Fall 2018)/Probability Basics|Probability basics]]

Revision as of 14:15, 4 September 2018

高级算法
Advanced Algorithms
Instructor
尹一通
郑朝栋
Email yinyt@nju.edu.cn chaodong@nju.edu.cn
office 计算机系 804
Class
Class meetings Wednesday, 8am-10am
仙I-319
Office hours Wednesday, 10am-12pm
计算机系 804(尹一通)、302(郑朝栋)
Textbooks
Motwani and Raghavan.
Randomized Algorithms.
Cambridge Univ Press, 1995.
Vazirani.
Approximation Algorithms.
Springer-Verlag, 2001.
v · d · e

This is the webpage for the Advanced Algorithms class of fall 2018. Students who take this class should check this page periodically for content updates and new announcements.

Announcement

  • (2018/9/5) 新学期第一次上课。

Course info

  • Instructor : 尹一通、郑朝栋
  • email: yinyt@nju.edu.cn, chaodong@nju.edu.cn
  • Class meeting: Wednesday 8am-10am, 仙I-319.
  • Office hour: Wednesday 10am-12pm, 计算机系 804.

Syllabus

先修课程 Prerequisites

  • 必须:离散数学,概率论,线性代数。
  • 推荐:算法设计与分析。

Course materials

成绩 Grades

  • 课程成绩:本课程将会有若干次作业和一次期末考试。最终成绩将由平时作业成绩和期末考试成绩综合得出。
  • 迟交:如果有特殊的理由,无法按时完成作业,请提前联系授课老师,给出正当理由。否则迟交的作业将不被接受。

学术诚信 Academic Integrity

学术诚信是所有从事学术活动的学生和学者最基本的职业道德底线,本课程将不遗余力的维护学术诚信规范,违反这一底线的行为将不会被容忍。

作业完成的原则:署你名字的工作必须由你完成。允许讨论,但作业必须独立完成,并在作业中列出所有参与讨论的人。不允许其他任何形式的合作——尤其是与已经完成作业的同学“讨论”。

本课程将对剽窃行为采取零容忍的态度。在完成作业过程中,对他人工作(出版物、互联网资料、其他人的作业等)直接的文本抄袭和对关键思想、关键元素的抄袭,按照 ACM Policy on Plagiarism的解释,都将视为剽窃。剽窃者成绩将被取消。如果发现互相抄袭行为, 抄袭和被抄袭双方的成绩都将被取消。因此请主动防止自己的作业被他人抄袭。

学术诚信影响学生个人的品行,也关乎整个教育系统的正常运转。为了一点分数而做出学术不端的行为,不仅使自己沦为一个欺骗者,也使他人的诚实努力失去意义。让我们一起努力维护一个诚信的环境。

Assignments

  • TBA

Lecture Notes

  1. Min-Cut and Max-Cut
    Probability basics