高级算法 (Fall 2018)/Problem Set 2 and 高级算法 (Fall 2019): Difference between pages

From TCS Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>TCSseminar
 
imported>Etone
 
Line 1: Line 1:
每道题目的解答都要有<font color="red" >完整的解题过程</font>。中英文不限。
{{Infobox
|name        = Infobox
|bodystyle    =
|title        = <font size=3>高级算法
<br>Advanced Algorithms</font>
|titlestyle  =


== Problem 1 ==
|image        =  
Starting with a graph with <math>n</math> vertices and no edge, we consider the following process to build a random undirected simple graph. At each step, we choose an edge uniformly at random from the set of all remaining unchosen edges, and add it to the graph. I.e., in the first step, we choose an edge from the set of <math>n \choose 2</math> edges; in the second step, we choose from the remaining <math>{n \choose 2} - 1</math> edges; and so on.
|imagestyle  =  
Let <math>X</math> denote number of edges added until the graph is connected, give an upper bound for <math>\mathbf{E}[X]</math>.
|caption      =  
|captionstyle =  
|headerstyle  = background:#ccf;
|labelstyle  = background:#ddf;
|datastyle    =


== Problem 2 ==
|header1 =Instructor
In Balls-and-Bins model, we throw <math>m</math> balls independently and uniformly at random into <math>n</math> bins. We know that the maximum load is <math>\Theta\left(\frac{\log n}{\log\log n}\right)</math> with high probability when <math>m=\Theta(n)</math>.
|label1  =
The two-choice paradigm is another way to throw <math>m</math> balls into <math>n</math> bins: each ball is thrown into the least loaded of two bins chosen independently and uniformly at random(it could be the case that the two chosen bins are exactly the same, and then the ball will be thrown into that bin), and breaks the tie arbitrarily. When <math>m=\Theta(n)</math>, the maximum load of two-choice paradigm is known to be <math>\Theta(\log\log n)</math> with high probability, which is exponentially less than the maxim load when there is only one random choice. This phenomenon is called '''''the power of two choices'''''.  
|data1  =
|header2 =  
|label2  =  
|data2  = 尹一通
|header3 =
|label3  = Email
|data3  = yinyt@nju.edu.cn chaodong@nju.edu.cn 
|header4 =
|label4= office
|data4= 计算机系 804
|header5 = Class
|label5  =
|data5  =
|header6 =
|label6  = Class meetings
|data6  = Wednesday, 10am-12pm <br> 仙I-108
|header7 =
|label7  = Place
|data7  =
|header8 =
|label8  = Office hours
|data8  = Wednesday, 4pm-6pm <br>804
|header9 = Textbooks
|label9  =
|data9  =
|header10 =
|label10  =
|data10  = [[File:MR-randomized-algorithms.png|border|100px]]
|header11 =
|label11  =
|data11  = Motwani and Raghavan. <br>''Randomized Algorithms''.<br> Cambridge Univ Press, 1995.
|header12 =
|label12  =  
|data12  = [[File:Approximation_Algorithms.jpg|border|100px]]
|header13 =
|label13  =
|data13  = Vazirani. <br>''Approximation Algorithms''. <br> Springer-Verlag, 2001.
|belowstyle = background:#ddf;
|below =
}}


Here are the questions:
This is the webpage for the ''Advanced Algorithms'' class of fall 2018. Students who take this class should check this page periodically for content updates and new announcements.  
*Consider the following paradigm: we throw <math>n</math> balls into <math>n</math> bins. The first <math>\frac{n}{2}</math> balls are thrown into bins independently and uniformly at random. The remaining <math>\frac{n}{2}</math> balls are thrown into bins using the two-choice paradigm. What is the maximum load with high probability? You need to give an asymptotically tight bound (in the form of <math>\Theta(\cdot)</math>).


*Replace the above paradigm to the following: the first <math>\frac{n}{2}</math> balls are thrown into bins using the  two-choice paradigm while the remaining <math>\frac{n}{2}</math> balls are thrown into bins independently and uniformly at random.  What is the maximum load with high probability in this case? You need to give an asymptotically tight bound.
= Announcement =
TBA


*Replace the above paradigm to the following: assume all <math>n</math> balls are thrown in a sequence. For every <math>1\le i\le n</math>, if <math>i</math> is odd, we throw <math>i</math>-th ball into bins independently and uniformly at random, otherwise, we throw it into bins using the two-choice paradigm. What is the maximum load with high probability in this case? You need to give an asymptotically tight bound.
= Course info =
* '''Instructor ''': 尹一通
:*'''email''': yinyt@nju.edu.cn
* '''Class meeting''': Wednesday 10am-12pm, 仙I-108.
* '''Office hour''': Wednesday 4pm-6pm, 计算机系 804.


== Problem 3==
= Syllabus =
Let <math>X</math> be a real-valued random variable with finite <math>\mathbb{E}[X]</math> and finite <math>\mathbb{E}\left[\mathrm{e}^{\lambda X}\right]</math> for all <math>\lambda\ge 0</math>. We define the '''log-moment-generating function''' as
:<math>\Psi_X(\lambda):=\ln\mathbb{E}[\mathrm{e}^{\lambda X}] \quad\text{ for all }\lambda\ge 0</math>,
and its ''dual function'':
:<math>\Psi_X^*(t):=\sup_{\lambda\ge 0}(\lambda t-\Psi_X(\lambda))</math>.
Assume that <math>X</math> is NOT almost surely constant. Then due to the convexity of <math>\mathrm{e}^{\lambda X}</math> with respect to <math>\lambda</math>, the function <math>\Psi_X(\lambda)</math> is ''strictly'' convex over <math>\lambda\ge 0</math>.
*Prove the following Chernoff bound:
::<math>\Pr[X\ge t]\le\exp(-\Psi_X^*(t))</math>.
:In particular if <math>\Psi_X(\lambda)</math> is continuously differentiable, prove that the supreme in <math>\Psi_X^*(t)</math> is achieved at the unique <math>\lambda\ge 0</math> satisfying
::<math>\Psi_X'(\lambda)=t</math>
:where <math>\Psi_X'(\lambda)</math> denotes the derivative of <math>\Psi_X(\lambda)</math> with respect to <math>\lambda</math>.


*'''Normal random variables.''' Let <math>X\sim \mathrm{N}(\mu,\sigma)</math> be a Gaussian random variable with mean <math>\mu</math> and standard deviation <math>\sigma</math>. What are the <math>\Psi_X(\lambda)</math> and <math>\Psi_X^*(t)</math>? And give a tail inequality to upper bound the probability <math>\Pr[X\ge t]</math>.
=== 先修课程 Prerequisites ===
* 必须:离散数学,概率论,线性代数。
* 推荐:算法设计与分析。


*'''Poisson random variables.''' Let <math>X\sim \mathrm{Pois}(\nu)</math> be a Poisson random variable with parameter <math>\nu</math>, that is, <math>\Pr[X=k]=\mathrm{e}^{-\nu}\nu^k/k!</math> for all <math>k=0,1,2,\ldots</math>. What are the <math>\Psi_X(\lambda)</math> and <math>\Psi_X^*(t)</math>? And give a tail inequality to upper bound the probability <math>\Pr[X\ge t]</math>.
=== Course materials ===
* [[高级算法 (Fall 2019) / Course materials|<font size=3>教材和参考书</font>]]


*'''Bernoulli random variables.''' Let <math>X\in\{0,1\}</math> be a single Bernoulli trial with probability of success <math>p</math>, that is, <math>\Pr[X=1]=1-\Pr[X=0]=p</math>. Show that for any <math>t\in(p,1)</math>, we have <math>\Psi_X^*(t)=D(Y \| X)</math> where <math>Y\in\{0,1\}</math> is a Bernoulli random variable with parameter <math>t</math> and <math>D(Y \| X)=(1-t)\ln\frac{1-t}{1-p}+t\ln\frac{t}{p}</math> is the [https://en.wikipedia.org/wiki/Kullback–Leibler_divergence '''Kullback-Leibler divergence'''] between <math>Y</math> and <math>X</math>.
=== 成绩 Grades ===
* 课程成绩:本课程将会有若干次作业和一次期末考试。最终成绩将由平时作业成绩和期末考试成绩综合得出。
* 迟交:如果有特殊的理由,无法按时完成作业,请提前联系授课老师,给出正当理由。否则迟交的作业将不被接受。


*'''Sum of independent random variables.''' Let <math>X=\sum_{i=1}^nX_i</math> be the sum of <math>n</math> independently and identically distributed random variables <math>X_1,X_2,\ldots, X_n</math>. Show that <math>\Psi_X(\lambda)=\sum_{i=1}^n\Psi_{X_i}(\lambda)</math> and <math>\Psi_X^*(t)=n\Psi^*_{X_i}(\frac{t}{n})</math>. Also for binomial random variable <math>X\sim \mathrm{Bin}(n,p)</math>, give an upper bound to the tail inequality <math>\Pr[X\ge t]</math> in terms of KL-divergence.
=== <font color=red> 学术诚信 Academic Integrity </font>===
学术诚信是所有从事学术活动的学生和学者最基本的职业道德底线,本课程将不遗余力的维护学术诚信规范,违反这一底线的行为将不会被容忍。


:Give an upper bound to <math>\Pr[X\ge t]</math> when every <math>X_i</math> follows the geometric distribution with a probability <math>p</math> of success.
作业完成的原则:署你名字的工作必须由你完成。允许讨论,但作业必须独立完成,并在作业中列出所有参与讨论的人。不允许其他任何形式的合作——尤其是与已经完成作业的同学“讨论”。


==Problem 4==
本课程将对剽窃行为采取零容忍的态度。在完成作业过程中,对他人工作(出版物、互联网资料、其他人的作业等)直接的文本抄袭和对关键思想、关键元素的抄袭,按照 [http://www.acm.org/publications/policies/plagiarism_policy ACM Policy on Plagiarism]的解释,都将视为剽窃。剽窃者成绩将被取消。如果发现互相抄袭行为,<font color=red> 抄袭和被抄袭双方的成绩都将被取消</font>。因此请主动防止自己的作业被他人抄袭。
A '''boolean code''' is a mapping <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math>. Each <math>x\in\{0,1\}^k</math> is called a '''message''' and <math>y=C(x)</math> is called a '''codeword'''. The '''code rate''' <math>r</math> of a code <math>C</math> is <math>r=\frac{k}{n}</math>. A boolean code <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math> is a '''linear code''' if it is a linear transformation, i.e. there is a matrix <math>A\in\{0,1\}^{n\times k}</math> such that <math>C(x)=Ax</math> for any <math>x\in\{0,1\}^k</math>, where the additions and multiplications are defined over the finite field of order two, <math>(\{0,1\},+_{\bmod 2},\times_{\bmod 2})</math>.


The '''distance''' between two codeword <math>y_1</math> and <math>y_2</math>, denoted by <math>d(y_1,y_2)</math>, is defined as the Hamming distance between them. Formally, <math>d(y_1,y_2)=\|y_1-y_2\|_1=\sum_{i=1}^n|y_1(i)-y_2(i)|</math>. The distance of a code <math>C</math> is the minimum distance between any two codewords. Formally, <math>d=\min_{x_1,x_2\in \{0,1\}^k\atop x_1\neq x_2}d(C(x_1),C(x_2))</math>.
学术诚信影响学生个人的品行,也关乎整个教育系统的正常运转。为了一点分数而做出学术不端的行为,不仅使自己沦为一个欺骗者,也使他人的诚实努力失去意义。让我们一起努力维护一个诚信的环境。


Usually we want to make both the code rate <math>r</math> and the code distance <math>d</math> as large as possible, because a larger rate means that the amount of actual message per transmitted bit is high, and a larger distance allows for more error correction and detection.
= Assignments =
TBA


* Use the probabilistic method to prove that there exists a boolean code <math>C:\{0,1\}^k\rightarrow\{0,1\}^n</math> of code rate <math>r</math> and distance <math>\left(\frac{1}{2}-\Theta\left(\sqrt{r}\right)\right)n</math>. Try to optimize the constant in <math>\Theta(\cdot)</math>.
= Lecture Notes =
* Prove a similar result for linear boolean codes.
# [[高级算法 (Fall 2019)/Min-Cut and Max-Cut|Min-Cut and Max-Cut]]
#:  [[高级算法 (Fall 2019)/Probability Basics|Probability basics]]
 
 
= Related Online Courses=
* Ankur Moitra's [http://people.csail.mit.edu/moitra/854.html Advanced Algorithms] at MIT.

Revision as of 00:22, 2 September 2019

高级算法
Advanced Algorithms
Instructor
尹一通
Email yinyt@nju.edu.cn chaodong@nju.edu.cn
office 计算机系 804
Class
Class meetings Wednesday, 10am-12pm
仙I-108
Office hours Wednesday, 4pm-6pm
804
Textbooks
Motwani and Raghavan.
Randomized Algorithms.
Cambridge Univ Press, 1995.
Vazirani.
Approximation Algorithms.
Springer-Verlag, 2001.
v · d · e

This is the webpage for the Advanced Algorithms class of fall 2018. Students who take this class should check this page periodically for content updates and new announcements.

Announcement

TBA

Course info

  • Instructor : 尹一通
  • email: yinyt@nju.edu.cn
  • Class meeting: Wednesday 10am-12pm, 仙I-108.
  • Office hour: Wednesday 4pm-6pm, 计算机系 804.

Syllabus

先修课程 Prerequisites

  • 必须:离散数学,概率论,线性代数。
  • 推荐:算法设计与分析。

Course materials

成绩 Grades

  • 课程成绩:本课程将会有若干次作业和一次期末考试。最终成绩将由平时作业成绩和期末考试成绩综合得出。
  • 迟交:如果有特殊的理由,无法按时完成作业,请提前联系授课老师,给出正当理由。否则迟交的作业将不被接受。

学术诚信 Academic Integrity

学术诚信是所有从事学术活动的学生和学者最基本的职业道德底线,本课程将不遗余力的维护学术诚信规范,违反这一底线的行为将不会被容忍。

作业完成的原则:署你名字的工作必须由你完成。允许讨论,但作业必须独立完成,并在作业中列出所有参与讨论的人。不允许其他任何形式的合作——尤其是与已经完成作业的同学“讨论”。

本课程将对剽窃行为采取零容忍的态度。在完成作业过程中,对他人工作(出版物、互联网资料、其他人的作业等)直接的文本抄袭和对关键思想、关键元素的抄袭,按照 ACM Policy on Plagiarism的解释,都将视为剽窃。剽窃者成绩将被取消。如果发现互相抄袭行为, 抄袭和被抄袭双方的成绩都将被取消。因此请主动防止自己的作业被他人抄袭。

学术诚信影响学生个人的品行,也关乎整个教育系统的正常运转。为了一点分数而做出学术不端的行为,不仅使自己沦为一个欺骗者,也使他人的诚实努力失去意义。让我们一起努力维护一个诚信的环境。

Assignments

TBA

Lecture Notes

  1. Min-Cut and Max-Cut
    Probability basics


Related Online Courses