TCS Wiki - User contributions [en]

数据科学基础 (Fall 2025)

2025-08-19T07:19:51Z

Kvrmnks: /* Course info */

{{Infobox
|name = Infobox
|bodystyle =
|title = '''数据科学基础''' 
Foundations of Data Science
|titlestyle =

|image =
|imagestyle =
|caption =
|captionstyle =
|headerstyle = background:#ccf;
|labelstyle = background:#ddf;
|datastyle =

|header1 =Instructor
|label1 =
|data1 =
|header2 =
|label2 =
|data5 = '''刘明谋'''
|header6 =
|label6 = Email
|data6 = lmm@nju.edu.cn
|header7 =
|label7 = office
|data7 = 南雍-西229
|header8 = Class
|label8 =
|data8 =
|header9 =
|label9 = Class meeting
|data9 = 周五, 2pm-5pm 苏教楼C204
|header10=
|label10 = Office hour
|data10 = 周四, 3pm-5pm 南雍-西229
|header11= Textbook
|label11 =
|data11 =
|header12=
|label12 =
|data12 = [[File:概率导论.jpeg|border|100px]]
|header13=
|label13 =
|data13 = '''概率导论'''（第2版·修订版） Dimitri P. Bertsekas and John N. Tsitsiklis 郑忠国童行伟译；人民邮电出版社 (2022)
|header14=
|label14 =
|data14 = [[File:Probability_and_Computing_2ed.jpg|border|100px]]
|header15=
|label15 =
|data15 = '''Probability and Computing''' (2E) Michael Mitzenmacher and Eli Upfal Cambridge University Press (2017)
|header16=
|label16 =
|data16 = [[File:Foundations_of_Data_Science.jpg|border|100px]]
|header17=
|label17 =
|data17 = '''Foundations of Data Science''' Avrim Blum, John Hopcroft, Ravi Kannan Cambridge University Press (2020)
|belowstyle = background:#ddf;
|below =
}}

This is the webpage for the ''Foundations of Data Science'' (数据科学基础) class of Fall 2025. Students who take this class should check this page periodically for content updates and new announcements.

= Announcement =
* 新学期第一堂课：2025年8月29日，苏教楼C204。

= Course info =
* '''Instructor ''':
** [https://liumingmou.github.io 刘明谋]：[mailto:lmm@nju.edu.cn <lmm@nju.edu.cn>]，南雍-西229
* '''Teaching assistant''':
** 梁梓豪：[mailto:zhliang@smail.nju.edu.cn] 仙林校区计科楼北栋426
** 周海刚：[mailto:hgzhou2003@outlook.com 📧] 仙林校区计科楼北栋410
** 欧丰宁：[mailto:oufn02@outlook.com 📧] 仙林校区计科楼北栋410
** 于逸潇：[mailto:yixiaoyu@smail.nju.edu.cn 📧] 仙林校区计科楼北栋410
** 缪天顺：[mailto:*@smail.nju.edu.cn] 仙林校区计科楼北栋426
* '''Class meeting''':
** 周五：2pm-5pm，苏教楼C204
* '''Office hour''':
:* 周四：3pm-5pm，南雍-西229（刘明谋）
:* '''QQ群''': 1019436733（申请加入需提供姓名、院系、学号）

= Syllabus =
课程内容分为三大部分：

* '''经典概率论'''：包括概率空间、随机变量及其数字特征、多维与连续随机变量
* '''概率与计算'''：包括测度集中现象，概率法，离散随机过程三部分
* '''数理统计'''：包括参数估计、假设检验、贝叶斯估计、方差分析、相关性及回归分析等统计推断内容。

对于第一和第二部分，要求清楚掌握基本概念，深刻理解关键的现象与规律以及背后的原理，并可以灵活运用所学方法求解相关问题。对于第三部分，要求熟悉数理统计相关的基本概念，以及典型的统计模型、统计推断方法。

经过本课程的训练，学生将能够掌握概率论和统计学的基本理论和方法，具备处理和分析实际数据的能力，为后续学习数据挖掘、机器学习、大数据技术等数据科学相关领域打下坚实基础。本课程采用课堂讲授、案例分析和课后练习相结合的教学方式，注重理论与实践相结合，培养学生运用所学知识解决实际问题的能力。通过本课程的学习，学生将能够具备扎实的数学基础，为未来从事数据科学研究和实践奠定坚实基础。

=== 教材与参考书 Course Materials ===
* '''[BT]''' 概率导论（第2版·修订版），[美]伯特瑟卡斯（Dimitri P.Bertsekas）[美]齐齐克利斯（John N.Tsitsiklis）著，郑忠国童行伟译，人民邮电出版社（2022）。
* '''[MU]''' ''Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis'', by Michael Mitzenmacher, Eli Upfal; Cambridge University Press; 2nd edition (2017).
* '''[GS]''' ''Probability and Random Processes'', by Geoffrey Grimmett and David Stirzaker; Oxford University Press; 4th edition (2020).
* '''[BHK]''' ''Foundations of Data Science'', by Avrim Blum, John Hopcroft, and Ravindran Kannan; Cambridge University Press (2020).

=== 成绩 Grading Policy ===
* 课程成绩：本课程将会有若干次作业和一次期末考试。最终成绩将由平时作业成绩和期末考试成绩综合得出。
* 迟交：如果有特殊的理由，无法按时完成作业，请提前联系授课老师，给出正当理由。否则迟交的作业将不被接受。

=== 学术诚信 Academic Integrity ===
学术诚信是所有从事学术活动的学生和学者最基本的职业道德底线，本课程将不遗余力的维护学术诚信规范，违反这一底线的行为将不会被容忍。

作业完成的原则：'''署你名字的工作必须是你个人的贡献，作业中必须明确标注任何不是由你完成的部分'''，特别是由AI生成的部分，否则就涉嫌抄袭。在完成作业的过程中，允许讨论，前提是讨论的所有参与者均处于同等完成度。但关键想法的执行、以及作业文本的写作必须独立完成，并在作业中致谢（acknowledge）所有参与讨论的人。符合规则的讨论与致谢将不会影响得分。不允许其他任何形式的合作——尤其是与已经完成作业的同学“讨论”。

本课程将对剽窃行为采取零容忍的态度。在完成作业过程中，对他人工作（出版物、互联网资料、其他人的作业等）直接的文本抄袭和对关键思想、关键元素的抄袭，按照 [http://www.acm.org/publications/policies/plagiarism_policy ACM Policy on Plagiarism]的解释，都将视为剽窃。剽窃者成绩将被取消。如果发现互相抄袭行为， 抄袭和被抄袭双方的成绩都将被取消。因此请主动防止自己的作业被他人抄袭。

学术诚信影响学生个人的品行，也关乎整个教育系统的正常运转。为了一点分数而做出学术不端的行为，不仅使自己沦为一个欺骗者，也使他人的诚实努力失去意义。让我们一起努力维护一个诚信的环境。

= Assignments =
*TBA

= Lectures =
# TBA

= Concepts =
* [https://plato.stanford.edu/entries/probability-interpret/ Interpretations of probability]
* [https://en.wikipedia.org/wiki/History_of_probability History of probability]
* Example problems:
** [https://dornsifecms.usc.edu/assets/sites/520/docs/VonNeumann-ams12p36-38.pdf von Neumann's Bernoulli factory] and other [https://peteroupc.github.io/bernoulli.html Bernoulli factory algorithms]
** [https://en.wikipedia.org/wiki/Boy_or_Girl_paradox Boy or Girl paradox]
** [https://en.wikipedia.org/wiki/Monty_Hall_problem Monty Hall problem]
** [https://en.wikipedia.org/wiki/Bertrand_paradox_(probability) Bertrand paradox]
** [https://en.wikipedia.org/wiki/Hard_spheres Hard spheres model] and [https://en.wikipedia.org/wiki/Ising_model Ising model]
** [https://en.wikipedia.org/wiki/PageRank ''PageRank''] and stationary [https://en.wikipedia.org/wiki/Random_walk random walk]
** [https://en.wikipedia.org/wiki/Diffusion_process Diffusion process] and [https://en.wikipedia.org/wiki/Diffusion_model diffusion model]
*[https://en.wikipedia.org/wiki/Probability_space Probability space]
** [https://en.wikipedia.org/wiki/Sample_space Sample space]
** [https://en.wikipedia.org/wiki/Event_(probability_theory) Event] and [https://en.wikipedia.org/wiki/Σ-algebra <math>\sigma</math>-algebra]
** Kolmogorov's [https://en.wikipedia.org/wiki/Probability_axioms axioms of probability]
* [https://en.wikipedia.org/wiki/Discrete_uniform_distribution Classical] and [https://en.wikipedia.org/wiki/Geometric_probability goemetric probability]
* [https://en.wikipedia.org/wiki/Boole%27s_inequality Union bound]
** [https://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle Inclusion-Exclusion principle]
** [https://en.wikipedia.org/wiki/Boole%27s_inequality#Bonferroni_inequalities Bonferroni inequalities]
* [https://en.wikipedia.org/wiki/Conditional_probability Conditional probability]
** [https://en.wikipedia.org/wiki/Chain_rule_(probability) Chain rule]
** [https://en.wikipedia.org/wiki/Law_of_total_probability Law of total probability]
** [https://en.wikipedia.org/wiki/Bayes%27_theorem Bayes' law]
* [https://en.wikipedia.org/wiki/Independence_(probability_theory) Independence]
** [https://en.wikipedia.org/wiki/Pairwise_independence Pairwise independence]
* [https://en.wikipedia.org/wiki/Random_variable Random variable]
** [https://en.wikipedia.org/wiki/Cumulative_distribution_function Cumulative distribution function]
** [https://en.wikipedia.org/wiki/Probability_mass_function Probability mass function]
** [https://en.wikipedia.org/wiki/Probability_density_function Probability density function]
* [https://en.wikipedia.org/wiki/Multivariate_random_variable Random vector]
** [https://en.wikipedia.org/wiki/Joint_probability_distribution Joint probability distribution]
** [https://en.wikipedia.org/wiki/Conditional_probability_distribution Conditional probability distribution]
** [https://en.wikipedia.org/wiki/Marginal_distribution Marginal distribution]
* Some '''discrete''' probability distributions
** [https://en.wikipedia.org/wiki/Bernoulli_trial Bernoulli trial] and [https://en.wikipedia.org/wiki/Bernoulli_distribution Bernoulli distribution]
** [https://en.wikipedia.org/wiki/Discrete_uniform_distribution Discrete uniform distribution]
** [https://en.wikipedia.org/wiki/Binomial_distribution Binomial distribution]
** [https://en.wikipedia.org/wiki/Geometric_distribution Geometric distribution]
** [https://en.wikipedia.org/wiki/Negative_binomial_distribution Negative binomial distribution]
** [https://en.wikipedia.org/wiki/Hypergeometric_distribution Hypergeometric distribution]
** [https://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]
** and [https://en.wikipedia.org/wiki/List_of_probability_distributions#Discrete_distributions others]
* Balls into bins model
** [https://en.wikipedia.org/wiki/Multinomial_distribution Multinomial distribution]
** [https://en.wikipedia.org/wiki/Birthday_problem Birthday problem]
** [https://en.wikipedia.org/wiki/Coupon_collector%27s_problem Coupon collector]
** [https://en.wikipedia.org/wiki/Balls_into_bins_problem Occupancy problem]
* Random graphs
** [https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model Erdős–Rényi random graph model]
** [https://en.wikipedia.org/wiki/Galton%E2%80%93Watson_process Galton–Watson branching process]
* [https://en.wikipedia.org/wiki/Expected_value Expectation]
** [https://en.wikipedia.org/wiki/Law_of_the_unconscious_statistician Law of the unconscious statistician, ''LOTUS'']
** [https://dlsun.github.io/probability/linearity.html Linearity of expectation]
** [https://en.wikipedia.org/wiki/Conditional_expectation Conditional expectation]
** [https://en.wikipedia.org/wiki/Law_of_total_expectation Law of total expectation]

计算方法Numerical method (Spring 2025)/Homework6 提交名单

2025-05-21T07:21:28Z

Kvrmnks: Created page with " 如有错漏请邮件联系助教. <center> {| class="wikitable" |- ! 学号 !! 姓名 |- | 221220090 || 周思桥 |- | 221240002 || 季悦宁 |- | 221240040 || 郑雯琪 |- | 231098068 || 戎昱 |- | 231098091 || 刘棣文 |- | 231098166 || 陈展 |- | 231200035 || 葛翰飞 |- | 231220036 || 周楚函 |- | 231220065 || 劳汉显 |- | 231220067 || 黄裕书琪 |- | 231220071 || 吴江涛 |- | 231220122 || 安琦煜 |- | 231220166 || 苏易 |- | 231220171 |..."

如有错漏请邮件联系助教.
<center>
{| class="wikitable"
|-
! 学号 !! 姓名
|-
| 221220090 || 周思桥
|-
| 221240002 || 季悦宁
|-
| 221240040 || 郑雯琪
|-
| 231098068 || 戎昱
|-
| 231098091 || 刘棣文
|-
| 231098166 || 陈展
|-
| 231200035 || 葛翰飞
|-
| 231220036 || 周楚函
|-
| 231220065 || 劳汉显
|-
| 231220067 || 黄裕书琪
|-
| 231220071 || 吴江涛
|-
| 231220122 || 安琦煜
|-
| 231220166 || 苏易
|-
| 231220171 || 刘正阳
|-
| 231220176 || 罗皓然
|-
| 231220179 || 徐钰炜
|-
| 231230102 || 庄铸锴
|-
| 231240002 || 余孟凡
|-
| 231240004 || 何梓杨
|-
| 231240009 || 陈心怡
|-
| 231240011 || 陈力琰
|-
| 231240013 || 南晨曦
|-
| 231240016 || 徐冰冰
|-
| 231240018 || 渠翔凯
|-
| 231240027 || 彭浩楠
|-
| 231240029 || 朱非凡
|-
| 231240038 || 肖金宇
|-
| 231240045 || 杨俊炜
|-
| 231240047 || 姜淮仁
|-
| 231240051 || 王昱霏
|-
| 231240053 || 王艺文
|-
| 231240056 || 靳濡搏
|-
| 231502012 || 许立恒
|-
| 231502013 || 卢林强
|-
| 231502014 || 杨子烨
|-
| 231502015 || 胡子豪
|-
| 231502016 || 綦浩量
|-
| 231830135 || 周林辉
|-
| 231870073 || 朱伟鹏
|-
| 231870127 || 李熠城
|-
| 231880394 || 翟笑晨
|-
| 248355139 || 朴召怡
|}
</center>
共 42 人

计算方法 Numerical method (Spring 2025)

2025-05-21T07:21:11Z

Kvrmnks: /* Assignments */

{{Infobox
|name = Infobox
|bodystyle =
|title = 计算方法
 Numerical method
|titlestyle =

|image =
|imagestyle =
|caption =
|captionstyle =
|headerstyle = background:#ccf;
|labelstyle = background:#ddf;
|datastyle =

|header1 =Instructor
|label1 =
|data1 =
|header2 =
|label2 =
|data2 = 刘景铖
|header3 =
|label3 = Email
|data3 = liu [at] nju [dot] edu [dot] cn
|header4 =
|label4= Office
|data4= 计算机系 516
|header5 = Class
|label5 =
|data5 =
|header6 =
|label6 = Class meetings
|data6 = 周三 14:00-16:00 仙 Ⅱ- 405
|header7 =
|label7 = Place
|data7 =
|header8 =
|label8 =
|data8 =
|header9 = Textbooks
|label9 =
|data9 =
|header10 =
|label10 =
|data10 =
|header11 =
|label11 =
|data11 = Timothy Sauer 数值分析（Numerical Analysis）（原书第2版）. 机械工业出版社.
|header12 = Teaching Assistants
|data13= 侯哲，于逸潇
|label14= Email
|data14= {houzhe, yixiaoyu} [at] smail [dot] nju [dot] edu [dot] cn
|label15= Office
|data15=计算机系 410
|belowstyle = background:#ddf;
|below =
}}
=Announcement=
*Welcome
=Course info=
*'''Instructor''': 刘景铖 ( liu [at] nju [dot] edu [dot] cn )

*'''Teaching assistants''': 侯哲，于逸潇
*'''TA email''': {houzhe, yixiaoyu} [at] smail [dot] nju [dot] edu [dot] cn
*'''Homework email''': nm_nju_2025@163.com
*'''Class meeting''':周三 14:00-16:00，仙 Ⅱ- 405
*'''Office hour''': 周二 16:00-18:00?, 计算机系516 (subject to change)
*'''QQ群''': 1019649082.(加入时需报姓名、专业、学号)

=Textbooks and Readings=
*数值分析（Numerical Analysis）（原书第2版）. Timothy Sauer. 机械工业出版社.
*[https://people.csail.mit.edu/jsolomon/share/book/numerical_book.pdf Numerical Algorithms: Methods for Computer Vision, Machine Learning, and Graphics. Justin Solomon. CRC Press]
*[https://www.cs.yale.edu/homes/vishnoi/Lxb-Web.pdf Lx=b, Laplacian Solver and Their Algorithmic Applications. Nisheeth K. Vishnoi.]
如果在获取教材方面有困难可以联系助教。(仅限英文版)

= Collaboration on Homework =
You are welcome to work on homework problems in study groups of no more than 3 people; however, you must always write up the solutions on your own, listing all collaborators at the top. Similarly, you may use books or online resources to help solve homework problems, but you must always credit all such sources in your writeup and you must never copy material verbatim.

We believe that most students can distinguish between helping other students and cheating. You may discuss approaches but your solution must be written by you and you only. You should acknowledge everyone whom you have worked with or who has given you any significant ideas about the homework.

Further, it is your responsibility to ensure that your solutions will not be visible to other students. If you use Github or another source control system to store your solutions electronically, you must ensure your account is configured so your solutions are not publicly visible. Many popular version control systems provide free repositories to students.

As a final note, we’d like to point out that collaboration on homework, while permitted, can be detrimental to your learning if misused. In particular, avoid collaborations where you do not contribute enough to your own satisfaction. Such a collaboration not only cheats you out of an opportunity to learn through homework, but can also affect your confidence. If you feel that you are not contributing enough to your group, then try to spend time thinking about the problems alone before working with your group. If you end up solving the problem all by yourself, that’s great! And if not, you’ll still be better prepared to contribute to your group.

See also [http://www.acm.org/publications/policies/plagiarism_policy ACM Policy on Plagiarism].

=Assignments=
Late policy: In general, we will accomodate late submission requests ONLY IF you made such requests ahead of time.
# [[Media:Computational Method 2025 Assignments 1.pdf| Homework1]] 请在2025年03月04日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A1.pdf') [[计算方法 Numerical method (Spring 2025)/Homework1 提交名单|Homework1 提交名单]]
# [[Media:Computational Method 2025 Assignments 2.pdf| Homework2]] 请在2025年03月18日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A2.pdf') [[计算方法 Numerical method (Spring 2025)/Homework2 提交名单|Homework2 提交名单]]
# [[Media:Computational Method 2025 Assignments 3.pdf| Homework3]] 请在2025年04月01日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A3.pdf') [[计算方法 Numerical method (Spring 2025)/Homework3 提交名单|Homework3 提交名单]]
# [[Media:Computational Method 2025 Assignments 4.pdf| Homework4]] 请在2025年04月22日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A4.pdf') [[计算方法 Numerical method (Spring 2025)/Homework4 提交名单|Homework4 提交名单]]
# [[Media:Computational Method 2025 Assignments 5.pdf| Homework5]] 请在2025年05月06日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A5.pdf') [[计算方法Numerical method (Spring 2025)/Homework5_提交名单|Homework5 提交名单]]
# [[Media:Computational Method 2025 Assignments 6 v3.pdf| Homework6]] 请在2025年05月20日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A6.pdf') [[计算方法Numerical method (Spring 2025)/Homework6_提交名单|Homework6 提交名单]]
# [[Media:Computational Method 2025 Assignments 7.pdf| Homework7]] 请在2025年06月03日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A7.pdf')

=Lecture Notes=
如果有下载课件的问题请及时联系助教。

# [[Media:计算方法1-2025.pdf|课程简介，函数求根]]
# [[Media:计算方法2-2025.pdf|牛顿法，插值，秘密分享，自纠错码]]
# [[Media:计算方法3-2025.pdf|Chebyshev插值与Chebyshev多项式，范数]]
# [[Media:计算方法4-2025.pdf|最小二乘法，Gram-Schmidt正交化与QR分解]]
# [[Media:计算方法5-2025.pdf|FFT，高斯消元与LU分解]]
# [[Media:计算方法6-2025.pdf|算子范数，条件数和迭代法]]
# [[Media:计算方法 7-特征值与幂迭代.pdf|特征值与幂迭代]]
# [[Media:计算方法 8-特征值的其它迭代方法与SVD.pdf|特征值的其它迭代方法与SVD]]
#* Further reading: [https://web.stanford.edu/class/cs168/l/l9.pdf lecture note by Tim Roughgarden and Greg Valiant on matrix completions]
# [[Media:计算方法9.pdf|迭代法解线性方程组：梯度下降方法与共轭梯度]]
# [[Media:计算方法10.pdf|幂迭代的特例：随机游走与马尔可夫链]]
# [[Media:计算方法11.pdf|谱图论]]
# [[Media:计算方法12.pdf|电阻电路网络，碰撞时间和遍历时间]]
# [[Media:计算方法13.pdf|线性规划入门]]

计算方法 Numerical method (Spring 2025)

2025-04-23T02:19:46Z

Kvrmnks: /* Assignments */

{{Infobox
|name = Infobox
|bodystyle =
|title = 计算方法
 Numerical method
|titlestyle =

|image =
|imagestyle =
|caption =
|captionstyle =
|headerstyle = background:#ccf;
|labelstyle = background:#ddf;
|datastyle =

|header1 =Instructor
|label1 =
|data1 =
|header2 =
|label2 =
|data2 = 刘景铖
|header3 =
|label3 = Email
|data3 = liu [at] nju [dot] edu [dot] cn
|header4 =
|label4= Office
|data4= 计算机系 516
|header5 = Class
|label5 =
|data5 =
|header6 =
|label6 = Class meetings
|data6 = 周三 14:00-16:00 仙 Ⅱ- 405
|header7 =
|label7 = Place
|data7 =
|header8 =
|label8 =
|data8 =
|header9 = Textbooks
|label9 =
|data9 =
|header10 =
|label10 =
|data10 =
|header11 =
|label11 =
|data11 = Timothy Sauer 数值分析（Numerical Analysis）（原书第2版）. 机械工业出版社.
|header12 = Teaching Assistants
|data13= 侯哲，于逸潇
|label14= Email
|data14= {houzhe, yixiaoyu} [at] smail [dot] nju [dot] edu [dot] cn
|label15= Office
|data15=计算机系 410
|belowstyle = background:#ddf;
|below =
}}
=Announcement=
*Welcome
=Course info=
*'''Instructor''': 刘景铖 ( liu [at] nju [dot] edu [dot] cn )

*'''Teaching assistants''': 侯哲，于逸潇
*'''TA email''': {houzhe, yixiaoyu} [at] smail [dot] nju [dot] edu [dot] cn
*'''Homework email''': nm_nju_2025@163.com
*'''Class meeting''':周三 14:00-16:00，仙 Ⅱ- 405
*'''Office hour''': 周二 16:00-18:00?, 计算机系516 (subject to change)
*'''QQ群''': 1019649082.(加入时需报姓名、专业、学号)

=Textbooks and Readings=
*数值分析（Numerical Analysis）（原书第2版）. Timothy Sauer. 机械工业出版社.
*[https://people.csail.mit.edu/jsolomon/share/book/numerical_book.pdf Numerical Algorithms: Methods for Computer Vision, Machine Learning, and Graphics. Justin Solomon. CRC Press]
*[https://www.cs.yale.edu/homes/vishnoi/Lxb-Web.pdf Lx=b, Laplacian Solver and Their Algorithmic Applications. Nisheeth K. Vishnoi.]
如果在获取教材方面有困难可以联系助教。(仅限英文版)

= Collaboration on Homework =
You are welcome to work on homework problems in study groups of no more than 3 people; however, you must always write up the solutions on your own, listing all collaborators at the top. Similarly, you may use books or online resources to help solve homework problems, but you must always credit all such sources in your writeup and you must never copy material verbatim.

We believe that most students can distinguish between helping other students and cheating. You may discuss approaches but your solution must be written by you and you only. You should acknowledge everyone whom you have worked with or who has given you any significant ideas about the homework.

Further, it is your responsibility to ensure that your solutions will not be visible to other students. If you use Github or another source control system to store your solutions electronically, you must ensure your account is configured so your solutions are not publicly visible. Many popular version control systems provide free repositories to students.

As a final note, we’d like to point out that collaboration on homework, while permitted, can be detrimental to your learning if misused. In particular, avoid collaborations where you do not contribute enough to your own satisfaction. Such a collaboration not only cheats you out of an opportunity to learn through homework, but can also affect your confidence. If you feel that you are not contributing enough to your group, then try to spend time thinking about the problems alone before working with your group. If you end up solving the problem all by yourself, that’s great! And if not, you’ll still be better prepared to contribute to your group.

See also [http://www.acm.org/publications/policies/plagiarism_policy ACM Policy on Plagiarism].

=Assignments=
Late policy: In general, we will accomodate late submission requests ONLY IF you made such requests ahead of time.
# [[Media:Computational Method 2025 Assignments 1.pdf| Homework1]] 请在2025年03月04日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A1.pdf') [[计算方法 Numerical method (Spring 2025)/Homework1 提交名单|Homework1 提交名单]]
# [[Media:Computational Method 2025 Assignments 2.pdf| Homework2]] 请在2025年03月18日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A2.pdf') [[计算方法 Numerical method (Spring 2025)/Homework2 提交名单|Homework2 提交名单]]
# [[Media:Computational Method 2025 Assignments 3.pdf| Homework3]] 请在2025年04月01日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A3.pdf') [[计算方法 Numerical method (Spring 2025)/Homework3 提交名单|Homework3 提交名单]]
# [[Media:Computational Method 2025 Assignments 4.pdf| Homework4]] 请在2025年04月22日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A4.pdf') [[计算方法 Numerical method (Spring 2025)/Homework4 提交名单|Homework4 提交名单]]
# [[Media:Computational Method 2025 Assignments 5.pdf| Homework5]] 请在2025年05月06日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A5.pdf')

=Lecture Notes=
如果有下载课件的问题请及时联系助教。

# [[Media:计算方法1-2025.pdf|课程简介，函数求根]]
# [[Media:计算方法2-2025.pdf|牛顿法，插值，秘密分享，自纠错码]]
# [[Media:计算方法3-2025.pdf|Chebyshev插值与Chebyshev多项式，范数]]
# [[Media:计算方法4-2025.pdf|最小二乘法，Gram-Schmidt正交化与QR分解]]
# [[Media:计算方法5-2025.pdf|FFT，高斯消元与LU分解]]
# [[Media:计算方法6-2025.pdf|算子范数，条件数和迭代法]]
# [[Media:计算方法 7-特征值与幂迭代.pdf|特征值与幂迭代]]
# [[Media:计算方法 8-特征值的其它迭代方法与SVD.pdf|特征值的其它迭代方法与SVD]]
#* Further reading: [https://web.stanford.edu/class/cs168/l/l9.pdf lecture note by Tim Roughgarden and Greg Valiant on matrix completions]
# [[Media:计算方法9.pdf|迭代法解线性方程组：梯度下降方法与共轭梯度]]

File:Computational Method 2025 Assignments 5.pdf

2025-04-23T02:18:51Z

Kvrmnks:

计算方法 Numerical method (Spring 2025)/Homework4 提交名单

2025-04-23T02:02:48Z

Kvrmnks: Created page with " 如有错漏请邮件联系助教. <center> {| class="wikitable" |- ! 学号 !! 姓名 |- | 221220090 || 周思桥 |- | 221240040 || 郑雯琪 |- | 231098068 || 戎昱 |- | 231098091 || 刘棣文 |- | 231098166 || 陈展 |- | 231200035 || 葛翰飞 |- | 231220006 || 陆华均 |- | 231220036 || 周楚函 |- | 231220065 || 劳汉显 |- | 231220067 || 黄裕书琪 |- | 231220071 || 吴江涛 |- | 231220122 || 安琦煜 |- | 231220166 || 苏易 |- | 231220171 |..."

如有错漏请邮件联系助教.
<center>
{| class="wikitable"
|-
! 学号 !! 姓名
|-
| 221220090 || 周思桥
|-
| 221240040 || 郑雯琪
|-
| 231098068 || 戎昱
|-
| 231098091 || 刘棣文
|-
| 231098166 || 陈展
|-
| 231200035 || 葛翰飞
|-
| 231220006 || 陆华均
|-
| 231220036 || 周楚函
|-
| 231220065 || 劳汉显
|-
| 231220067 || 黄裕书琪
|-
| 231220071 || 吴江涛
|-
| 231220122 || 安琦煜
|-
| 231220166 || 苏易
|-
| 231220171 || 刘正阳
|-
| 231220176 || 罗皓然
|-
| 231220179 || 徐钰炜
|-
| 231230102 || 庄铸锴
|-
| 231240002 || 余孟凡
|-
| 231240004 || 何梓杨
|-
| 231240009 || 陈心怡
|-
| 231240011 || 陈力琰
|-
| 231240013 || 南晨曦
|-
| 231240016 || 徐冰冰
|-
| 231240018 || 渠翔凯
|-
| 231240027 || 彭浩楠
|-
| 231240029 || 朱非凡
|-
| 231240031 || 吴天祥
|-
| 231240038 || 肖金宇
|-
| 231240045 || 杨俊炜
|-
| 231240047 || 姜淮仁
|-
| 231240051 || 王昱霏
|-
| 231240053 || 王艺文
|-
| 231240056 || 靳濡搏
|-
| 231502012 || 许立恒
|-
| 231502013 || 卢林强
|-
| 231502014 || 杨子烨
|-
| 231502015 || 胡子豪
|-
| 231502016 || 綦浩量
|-
| 231830135 || 周林辉
|-
| 231840160 || 温昊臻
|-
| 231870073 || 朱伟鹏
|-
| 231870127 || 李熠城
|-
| 231880394 || 翟笑晨
|-
| 248355139 || 朴召怡
|}
</center>
共 44 人

计算方法 Numerical method (Spring 2025)

2025-04-23T02:02:32Z

Kvrmnks: /* Assignments */

{{Infobox
|name = Infobox
|bodystyle =
|title = 计算方法
 Numerical method
|titlestyle =

|image =
|imagestyle =
|caption =
|captionstyle =
|headerstyle = background:#ccf;
|labelstyle = background:#ddf;
|datastyle =

|header1 =Instructor
|label1 =
|data1 =
|header2 =
|label2 =
|data2 = 刘景铖
|header3 =
|label3 = Email
|data3 = liu [at] nju [dot] edu [dot] cn
|header4 =
|label4= Office
|data4= 计算机系 516
|header5 = Class
|label5 =
|data5 =
|header6 =
|label6 = Class meetings
|data6 = 周三 14:00-16:00 仙 Ⅱ- 405
|header7 =
|label7 = Place
|data7 =
|header8 =
|label8 =
|data8 =
|header9 = Textbooks
|label9 =
|data9 =
|header10 =
|label10 =
|data10 =
|header11 =
|label11 =
|data11 = Timothy Sauer 数值分析（Numerical Analysis）（原书第2版）. 机械工业出版社.
|header12 = Teaching Assistants
|data13= 侯哲，于逸潇
|label14= Email
|data14= {houzhe, yixiaoyu} [at] smail [dot] nju [dot] edu [dot] cn
|label15= Office
|data15=计算机系 410
|belowstyle = background:#ddf;
|below =
}}
=Announcement=
*Welcome
=Course info=
*'''Instructor''': 刘景铖 ( liu [at] nju [dot] edu [dot] cn )

*'''Teaching assistants''': 侯哲，于逸潇
*'''TA email''': {houzhe, yixiaoyu} [at] smail [dot] nju [dot] edu [dot] cn
*'''Homework email''': nm_nju_2025@163.com
*'''Class meeting''':周三 14:00-16:00，仙 Ⅱ- 405
*'''Office hour''': 周二 16:00-18:00?, 计算机系516 (subject to change)
*'''QQ群''': 1019649082.(加入时需报姓名、专业、学号)

=Textbooks and Readings=
*数值分析（Numerical Analysis）（原书第2版）. Timothy Sauer. 机械工业出版社.
*[https://people.csail.mit.edu/jsolomon/share/book/numerical_book.pdf Numerical Algorithms: Methods for Computer Vision, Machine Learning, and Graphics. Justin Solomon. CRC Press]
*[https://www.cs.yale.edu/homes/vishnoi/Lxb-Web.pdf Lx=b, Laplacian Solver and Their Algorithmic Applications. Nisheeth K. Vishnoi.]
如果在获取教材方面有困难可以联系助教。(仅限英文版)

= Collaboration on Homework =
You are welcome to work on homework problems in study groups of no more than 3 people; however, you must always write up the solutions on your own, listing all collaborators at the top. Similarly, you may use books or online resources to help solve homework problems, but you must always credit all such sources in your writeup and you must never copy material verbatim.

We believe that most students can distinguish between helping other students and cheating. You may discuss approaches but your solution must be written by you and you only. You should acknowledge everyone whom you have worked with or who has given you any significant ideas about the homework.

Further, it is your responsibility to ensure that your solutions will not be visible to other students. If you use Github or another source control system to store your solutions electronically, you must ensure your account is configured so your solutions are not publicly visible. Many popular version control systems provide free repositories to students.

As a final note, we’d like to point out that collaboration on homework, while permitted, can be detrimental to your learning if misused. In particular, avoid collaborations where you do not contribute enough to your own satisfaction. Such a collaboration not only cheats you out of an opportunity to learn through homework, but can also affect your confidence. If you feel that you are not contributing enough to your group, then try to spend time thinking about the problems alone before working with your group. If you end up solving the problem all by yourself, that’s great! And if not, you’ll still be better prepared to contribute to your group.

See also [http://www.acm.org/publications/policies/plagiarism_policy ACM Policy on Plagiarism].

=Assignments=
Late policy: In general, we will accomodate late submission requests ONLY IF you made such requests ahead of time.
# [[Media:Computational Method 2025 Assignments 1.pdf| Homework1]] 请在2025年03月04日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A1.pdf') [[计算方法 Numerical method (Spring 2025)/Homework1 提交名单|Homework1 提交名单]]
# [[Media:Computational Method 2025 Assignments 2.pdf| Homework2]] 请在2025年03月18日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A2.pdf') [[计算方法 Numerical method (Spring 2025)/Homework2 提交名单|Homework2 提交名单]]
# [[Media:Computational Method 2025 Assignments 3.pdf| Homework3]] 请在2025年04月01日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A3.pdf') [[计算方法 Numerical method (Spring 2025)/Homework3 提交名单|Homework3 提交名单]]
# [[Media:Computational Method 2025 Assignments 4.pdf| Homework4]] 请在2025年04月22日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A4.pdf') [[计算方法 Numerical method (Spring 2025)/Homework4 提交名单|Homework4 提交名单]]

=Lecture Notes=
如果有下载课件的问题请及时联系助教。

# [[Media:计算方法1-2025.pdf|课程简介，函数求根]]
# [[Media:计算方法2-2025.pdf|牛顿法，插值，秘密分享，自纠错码]]
# [[Media:计算方法3-2025.pdf|Chebyshev插值与Chebyshev多项式，范数]]
# [[Media:计算方法4-2025.pdf|最小二乘法，Gram-Schmidt正交化与QR分解]]
# [[Media:计算方法5-2025.pdf|FFT，高斯消元与LU分解]]
# [[Media:计算方法6-2025.pdf|算子范数，条件数和迭代法]]
# [[Media:计算方法 7-特征值与幂迭代.pdf|特征值与幂迭代]]
# [[Media:计算方法 8-特征值的其它迭代方法与SVD.pdf|特征值的其它迭代方法与SVD]]
#* Further reading: [https://web.stanford.edu/class/cs168/l/l9.pdf lecture note by Tim Roughgarden and Greg Valiant on matrix completions]
# [[Media:计算方法9.pdf|迭代法解线性方程组：梯度下降方法与共轭梯度]]

计算方法 Numerical method (Spring 2025)

2025-04-02T02:41:21Z

Kvrmnks: /* Assignments */ upload assignment 4

{{Infobox
|name = Infobox
|bodystyle =
|title = 计算方法
 Numerical method
|titlestyle =

|image =
|imagestyle =
|caption =
|captionstyle =
|headerstyle = background:#ccf;
|labelstyle = background:#ddf;
|datastyle =

|header1 =Instructor
|label1 =
|data1 =
|header2 =
|label2 =
|data2 = 刘景铖
|header3 =
|label3 = Email
|data3 = liu [at] nju [dot] edu [dot] cn
|header4 =
|label4= Office
|data4= 计算机系 516
|header5 = Class
|label5 =
|data5 =
|header6 =
|label6 = Class meetings
|data6 = 周三 14:00-16:00 仙 Ⅱ- 405
|header7 =
|label7 = Place
|data7 =
|header8 =
|label8 =
|data8 =
|header9 = Textbooks
|label9 =
|data9 =
|header10 =
|label10 =
|data10 =
|header11 =
|label11 =
|data11 = Timothy Sauer 数值分析（Numerical Analysis）（原书第2版）. 机械工业出版社.
|header12 = Teaching Assistants
|data13= 侯哲，于逸潇
|label14= Email
|data14= {houzhe, yixiaoyu} [at] smail [dot] nju [dot] edu [dot] cn
|label15= Office
|data15=计算机系 410
|belowstyle = background:#ddf;
|below =
}}
=Announcement=
*Welcome
=Course info=
*'''Instructor''': 刘景铖 ( liu [at] nju [dot] edu [dot] cn )

*'''Teaching assistants''': 侯哲，于逸潇
*'''TA email''': {houzhe, yixiaoyu} [at] smail [dot] nju [dot] edu [dot] cn
*'''Homework email''': nm_nju_2025@163.com
*'''Class meeting''':周三 14:00-16:00，仙 Ⅱ- 405
*'''Office hour''': 周二 16:00-18:00?, 计算机系516 (subject to change)
*'''QQ群''': 1019649082.(加入时需报姓名、专业、学号)

=Textbooks and Readings=
*数值分析（Numerical Analysis）（原书第2版）. Timothy Sauer. 机械工业出版社.
*[https://people.csail.mit.edu/jsolomon/share/book/numerical_book.pdf Numerical Algorithms: Methods for Computer Vision, Machine Learning, and Graphics. Justin Solomon. CRC Press]
*[https://www.cs.yale.edu/homes/vishnoi/Lxb-Web.pdf Lx=b, Laplacian Solver and Their Algorithmic Applications. Nisheeth K. Vishnoi.]
如果在获取教材方面有困难可以联系助教。(仅限英文版)

= Collaboration on Homework =
You are welcome to work on homework problems in study groups of no more than 3 people; however, you must always write up the solutions on your own, listing all collaborators at the top. Similarly, you may use books or online resources to help solve homework problems, but you must always credit all such sources in your writeup and you must never copy material verbatim.

We believe that most students can distinguish between helping other students and cheating. You may discuss approaches but your solution must be written by you and you only. You should acknowledge everyone whom you have worked with or who has given you any significant ideas about the homework.

Further, it is your responsibility to ensure that your solutions will not be visible to other students. If you use Github or another source control system to store your solutions electronically, you must ensure your account is configured so your solutions are not publicly visible. Many popular version control systems provide free repositories to students.

As a final note, we’d like to point out that collaboration on homework, while permitted, can be detrimental to your learning if misused. In particular, avoid collaborations where you do not contribute enough to your own satisfaction. Such a collaboration not only cheats you out of an opportunity to learn through homework, but can also affect your confidence. If you feel that you are not contributing enough to your group, then try to spend time thinking about the problems alone before working with your group. If you end up solving the problem all by yourself, that’s great! And if not, you’ll still be better prepared to contribute to your group.

See also [http://www.acm.org/publications/policies/plagiarism_policy ACM Policy on Plagiarism].

=Assignments=
Late policy: In general, we will accomodate late submission requests ONLY IF you made such requests ahead of time.
# [[Media:Computational Method 2025 Assignments 1.pdf| Homework1]] 请在2025年03月04日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A1.pdf') [[计算方法 Numerical method (Spring 2025)/Homework1 提交名单|Homework1 提交名单]]
# [[Media:Computational Method 2025 Assignments 2.pdf| Homework2]] 请在2025年03月18日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A2.pdf') [[计算方法 Numerical method (Spring 2025)/Homework2 提交名单|Homework2 提交名单]]
# [[Media:Computational Method 2025 Assignments 3.pdf| Homework3]] 请在2025年04月01日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A3.pdf')
# [[Media:Computational Method 2025 Assignments 4.pdf| Homework4]] 请在2025年04月22日23点59分之前提交到 nm_nju_2025@163.com (文件名为'学号_姓名_A4.pdf')

=Lecture Notes=
如果有下载课件的问题请及时联系助教。

# [[Media:计算方法1-2025.pdf|课程简介，函数求根]]
# [[Media:计算方法2-2025.pdf|牛顿法，插值，秘密分享，自纠错码]]
# [[Media:计算方法3-2025.pdf|Chebyshev插值与Chebyshev多项式，范数]]
# [[Media:计算方法4-2025.pdf|最小二乘法，Gram-Schmidt正交化与QR分解]]
# [[Media:计算方法5-2025.pdf|FFT，高斯消元与LU分解]]
# [[Media:计算方法6-2025.pdf|算子范数，条件数和迭代法]]

File:Computational Method 2025 Assignments 4.pdf

2025-04-02T02:40:27Z

Kvrmnks:

计算方法 Numerical method (Spring 2025)

2025-03-19T03:12:34Z

Kvrmnks: /* Assignments */

计算方法 Numerical method (Spring 2025)/Homework2 提交名单

2025-03-19T03:12:25Z

Kvrmnks: Created page with " 如有错漏请邮件联系助教. <center> {| class="wikitable" |- ! 学号 !! 姓名 |- | 221220090 || 周思桥 |- | 221240002 || 季悦宁 |- | 221240040 || 郑雯琪 |- | 231098068 || 戎昱 |- | 231098091 || 刘棣文 |- | 231098166 || 陈展 |- | 231200035 || 葛翰飞 |- | 231220006 || 陆华均 |- | 231220036 || 周楚函 |- | 231220049 || 张泽宇 |- | 231220065 || 劳汉显 |- | 231220067 || 黄裕书琪 |- | 231220071 || 吴江涛 |- | 23122012..."

如有错漏请邮件联系助教.
<center>
{| class="wikitable"
|-
! 学号 !! 姓名
|-
| 221220090 || 周思桥
|-
| 221240002 || 季悦宁
|-
| 221240040 || 郑雯琪
|-
| 231098068 || 戎昱
|-
| 231098091 || 刘棣文
|-
| 231098166 || 陈展
|-
| 231200035 || 葛翰飞
|-
| 231220006 || 陆华均
|-
| 231220036 || 周楚函
|-
| 231220049 || 张泽宇
|-
| 231220065 || 劳汉显
|-
| 231220067 || 黄裕书琪
|-
| 231220071 || 吴江涛
|-
| 231220122 || 安琦煜
|-
| 231220132 || 陈志远
|-
| 231220166 || 苏易
|-
| 231220171 || 刘正阳
|-
| 231220176 || 罗皓然
|-
| 231220179 || 徐钰炜
|-
| 231230102 || 庄铸锴
|-
| 231240002 || 余孟凡
|-
| 231240004 || 何梓杨
|-
| 231240009 || 陈心怡
|-
| 231240011 || 陈力琰
|-
| 231240013 || 南晨曦
|-
| 231240016 || 徐冰冰
|-
| 231240018 || 渠翔凯
|-
| 231240027 || 彭浩楠
|-
| 231240029 || 朱非凡
|-
| 231240031 || 吴天祥
|-
| 231240038 || 肖金宇
|-
| 231240045 || 杨俊炜
|-
| 231240047 || 姜淮仁
|-
| 231240051 || 王昱霏
|-
| 231240053 || 王艺文
|-
| 231240056 || 靳濡搏
|-
| 231300082 || 王鹭天
|-
| 231502012 || 许立恒
|-
| 231502013 || 卢林强
|-
| 231502014 || 杨子烨
|-
| 231502015 || 胡子豪
|-
| 231502016 || 綦浩量
|-
| 231830135 || 周林辉
|-
| 231840160 || 温昊臻
|-
| 231870073 || 朱伟鹏
|-
| 231870127 || 李熠城
|-
| 231880140 || 桂天麟
|-
| 231880394 || 翟笑晨
|}
</center>
共 48 人

计算方法 Numerical method (Spring 2025)

2025-03-05T02:58:38Z

Kvrmnks: /* Assignments */

File:Computational Method 2025 Assignments 2.pdf

2025-03-05T02:54:06Z

Kvrmnks:

File:Computational Method 2025 Assignments 1.pdf

2025-02-19T12:14:50Z

Kvrmnks: Kvrmnks reverted File:Computational Method 2025 Assignments 1.pdf to an old version

Computational Method 2025 Assignments 1

File:Computational Method 2025 Assignments 1.pdf

2025-02-19T12:12:37Z

Kvrmnks: Kvrmnks reverted File:Computational Method 2025 Assignments 1.pdf to an old version

Computational Method 2025 Assignments 1

File:Computational Method 2025 Assignments 1.pdf

2025-02-19T12:11:55Z

Kvrmnks: Kvrmnks uploaded a new version of File:Computational Method 2025 Assignments 1.pdf

Computational Method 2025 Assignments 1

计算方法 Numerical method (Spring 2025)

2025-02-19T12:06:05Z

Kvrmnks: /* Assignments */

计算方法 Numerical method (Spring 2025)

2025-02-17T02:25:27Z

Kvrmnks: change classroom

计算方法 Numerical method (Spring 2025)

2025-02-16T12:20:20Z

Kvrmnks: change email address

Main Page

2025-02-16T07:32:50Z

Kvrmnks: /* Home Pages for Courses and Seminars */

This is a course/seminar wiki run by the [http://tcs.nju.edu.cn theory group] in the Department of Computer Science and Technology at Nanjing University.

== Home Pages for Courses and Seminars==
*[[计算方法 Numerical method (Spring 2025)|计算方法 Numerical Method (Spring 2025)]]
*[[高级算法 (Fall 2024)|高级算法 Advanced Algorithms (Fall 2024)]]

*[[高级算法 (Spring 2025)|高级算法 Advanced Algorithms (Spring 2025 苏州校区)]]

* [[概率论与数理统计 (Spring 2025)| 概率论与数理统计 Probability Theory (Spring 2025)]]

*[[Theory Seminar|理论计算机科学讨论班]]

*[[Study Group|理论计算机科学学习小组]]

;Past courses

* Advanced Algorithms: [[高级算法 (Fall 2024)|Fall 2024]], [[高级算法 (Fall 2023)|Fall 2023]], [[高级算法 (Fall 2022)|Fall 2022]], [[高级算法 (Fall 2021)|Fall 2021]], [[高级算法 (Fall 2020)|Fall 2020]], [[高级算法 (Fall 2019)|Fall 2019]], [[高级算法 (Fall 2018)|Fall 2018]], [[高级算法 (Fall 2017)|Fall 2017]], [[随机算法 \ 高级算法 (Fall 2016)|Fall 2016]].

*Algorithm Design and Analysis: [https://tcs.nju.edu.cn/shili/courses/2024spring-algo/ Spring 2024]

* Combinatorics: [[组合数学 (Spring 2024)|Spring 2024]], [[组合数学 (Spring 2023)|Spring 2023]], [[组合数学 (Fall 2019)|Fall 2019]], [[组合数学 (Fall 2017)|Fall 2017]], [[组合数学 (Fall 2016)|Fall 2016]], [[组合数学 (Fall 2015)|Fall 2015]], [[组合数学 (Spring 2014)|Spring 2014]], [[组合数学 (Spring 2013)|Spring 2013]], [[组合数学 (Fall 2011)|Fall 2011]], [[Combinatorics (Fall 2010)|Fall 2010]].

* Computational Complexity: [[计算复杂性 (Spring 2025)|Spring 2025]], [[计算复杂性 (Spring 2024)|Spring 2024]], [[计算复杂性 (Spring 2023)|Spring 2023]], [[计算复杂性 (Fall 2019)|Fall 2019]], [[计算复杂性 (Fall 2018)|Fall 2018]].

* Numerical Method: [[计算方法 Numerical method (Spring 2024)|Spring 2024]], [[计算方法 Numerical method (Spring 2023)|Spring 2023]], [https://liuexp.github.io/numerical.html Spring 2022].

* Probability Theory: [[概率论与数理统计 (Spring 2024)|Spring 2024]], [[概率论与数理统计 (Spring 2023)|Spring 2023]].

* Quantum Computation: [[量子计算 (Spring 2022)|Spring 2022]], [[量子计算 (Spring 2021)|Spring 2021]], [[量子计算 (Fall 2019)|Fall 2019]].

* Randomized Algorithms: [[随机算法 (Fall 2015)|Fall 2015]], [[随机算法 (Spring 2014)|Spring 2014]], [[随机算法 (Spring 2013)|Spring 2013]], [[随机算法 (Fall 2011)|Fall 2011]], [[Randomized Algorithms (Spring 2010)|Spring 2010]].

;Past seminars, workshops and summer schools
*计算理论之美暑期学校: [[计算理论之美 (Summer 2024)|2024]], [[计算理论之美 (Summer 2023)|2023]], [[计算理论之美 (Summer 2021)|2021]]
*[[TCSPhD2020| 理论计算机科学优秀博士生论坛2020]]
*[[Quantum|量子算法与物理实现研讨会]]
*Nanjing Theory Day: [[Theory@Nanjing 2019|2019]], [[Theory@Nanjing 2018|2018]], [[Theory@Nanjing 2017|2017]]
*[[\Delta Seminar on Logic, Philosophy, and Computer Science|Δ Seminar on Logic, Philosophy, and Computer Science]]
*[[近似算法讨论班 (Fall 2011)|近似算法 Approximation Algorithms, Fall 2011.]]

; 其它链接
* [[General Circulation(Fall 2024)|大气环流 General Circulation of the Atmosphere, Fall 2024]]
* [[General Circulation(Fall 2023)|大气环流 General Circulation of the Atmosphere, Fall 2023]]

* [[概率论 (Summer 2014)| 概率与计算 (上海交大 Summer 2014)]]

Main Page

2025-02-16T07:32:20Z

Kvrmnks: add numerical method in list /* Home Pages for Courses and Seminars */

This is a course/seminar wiki run by the [http://tcs.nju.edu.cn theory group] in the Department of Computer Science and Technology at Nanjing University.

== Home Pages for Courses and Seminars==
*[[计算方法 Numerical method (Spring 2025)|计算方法 Numerical Method (Spring 2025)]]
*[[高级算法 (Fall 2024)|高级算法 Advanced Algorithms (Fall 2024)]]

*[[高级算法 (Spring 2025)|高级算法 Advanced Algorithms (Spring 2025 苏州校区)]]

* [[概率论与数理统计 (Spring 2025)| 概率论与数理统计 Probability Theory (Spring 2025)]]

*[[Theory Seminar|理论计算机科学讨论班]]

*[[Study Group|理论计算机科学学习小组]]

;Past courses

* Advanced Algorithms: [[高级算法 (Fall 2024)|Fall 2024]], [[高级算法 (Fall 2023)|Fall 2023]], [[高级算法 (Fall 2022)|Fall 2022]], [[高级算法 (Fall 2021)|Fall 2021]], [[高级算法 (Fall 2020)|Fall 2020]], [[高级算法 (Fall 2019)|Fall 2019]], [[高级算法 (Fall 2018)|Fall 2018]], [[高级算法 (Fall 2017)|Fall 2017]], [[随机算法 \ 高级算法 (Fall 2016)|Fall 2016]].

*Algorithm Design and Analysis: [https://tcs.nju.edu.cn/shili/courses/2024spring-algo/ Spring 2024]

* Combinatorics: [[组合数学 (Spring 2024)|Spring 2024]], [[组合数学 (Spring 2023)|Spring 2023]], [[组合数学 (Fall 2019)|Fall 2019]], [[组合数学 (Fall 2017)|Fall 2017]], [[组合数学 (Fall 2016)|Fall 2016]], [[组合数学 (Fall 2015)|Fall 2015]], [[组合数学 (Spring 2014)|Spring 2014]], [[组合数学 (Spring 2013)|Spring 2013]], [[组合数学 (Fall 2011)|Fall 2011]], [[Combinatorics (Fall 2010)|Fall 2010]].

* Computational Complexity: [[计算复杂性 (Spring 2025)|Spring 2025]], [[计算复杂性 (Spring 2024)|Spring 2024]], [[计算复杂性 (Spring 2023)|Spring 2023]], [[计算复杂性 (Fall 2019)|Fall 2019]], [[计算复杂性 (Fall 2018)|Fall 2018]].

* Numerical Method: [[计算方法 Numerical method (Spring 2025)|Spring 2025]], [[计算方法 Numerical method (Spring 2024)|Spring 2024]], [[计算方法 Numerical method (Spring 2023)|Spring 2023]], [https://liuexp.github.io/numerical.html Spring 2022].

* Probability Theory: [[概率论与数理统计 (Spring 2024)|Spring 2024]], [[概率论与数理统计 (Spring 2023)|Spring 2023]].

* Quantum Computation: [[量子计算 (Spring 2022)|Spring 2022]], [[量子计算 (Spring 2021)|Spring 2021]], [[量子计算 (Fall 2019)|Fall 2019]].

* Randomized Algorithms: [[随机算法 (Fall 2015)|Fall 2015]], [[随机算法 (Spring 2014)|Spring 2014]], [[随机算法 (Spring 2013)|Spring 2013]], [[随机算法 (Fall 2011)|Fall 2011]], [[Randomized Algorithms (Spring 2010)|Spring 2010]].

;Past seminars, workshops and summer schools
*计算理论之美暑期学校: [[计算理论之美 (Summer 2024)|2024]], [[计算理论之美 (Summer 2023)|2023]], [[计算理论之美 (Summer 2021)|2021]]
*[[TCSPhD2020| 理论计算机科学优秀博士生论坛2020]]
*[[Quantum|量子算法与物理实现研讨会]]
*Nanjing Theory Day: [[Theory@Nanjing 2019|2019]], [[Theory@Nanjing 2018|2018]], [[Theory@Nanjing 2017|2017]]
*[[\Delta Seminar on Logic, Philosophy, and Computer Science|Δ Seminar on Logic, Philosophy, and Computer Science]]
*[[近似算法讨论班 (Fall 2011)|近似算法 Approximation Algorithms, Fall 2011.]]

; 其它链接
* [[General Circulation(Fall 2024)|大气环流 General Circulation of the Atmosphere, Fall 2024]]
* [[General Circulation(Fall 2023)|大气环流 General Circulation of the Atmosphere, Fall 2023]]

* [[概率论 (Summer 2014)| 概率与计算 (上海交大 Summer 2014)]]

计算方法 Numerical method (Spring 2025)

2025-02-16T07:30:12Z

Kvrmnks: create page

Main Page

2025-02-16T07:27:23Z

Kvrmnks: add numerical method 2025 /* Home Pages for Courses and Seminars */

This is a course/seminar wiki run by the [http://tcs.nju.edu.cn theory group] in the Department of Computer Science and Technology at Nanjing University.

== Home Pages for Courses and Seminars==
*[[高级算法 (Fall 2024)|高级算法 Advanced Algorithms (Fall 2024)]]

*[[高级算法 (Spring 2025)|高级算法 Advanced Algorithms (Spring 2025 苏州校区)]]

* [[概率论与数理统计 (Spring 2025) | 概率论与数理统计 Probability Theory (Spring 2025)]]

*[[Theory Seminar|理论计算机科学讨论班]]

*[[Study Group|理论计算机科学学习小组]]

;Past courses

* Advanced Algorithms: [[高级算法 (Fall 2024)|Fall 2024]], [[高级算法 (Fall 2023)|Fall 2023]], [[高级算法 (Fall 2022)|Fall 2022]], [[高级算法 (Fall 2021)|Fall 2021]], [[高级算法 (Fall 2020)|Fall 2020]], [[高级算法 (Fall 2019)|Fall 2019]], [[高级算法 (Fall 2018)|Fall 2018]], [[高级算法 (Fall 2017)|Fall 2017]], [[随机算法 \ 高级算法 (Fall 2016)|Fall 2016]].

*Algorithm Design and Analysis: [https://tcs.nju.edu.cn/shili/courses/2024spring-algo/ Spring 2024]

* Combinatorics: [[组合数学 (Spring 2024)|Spring 2024]], [[组合数学 (Spring 2023)|Spring 2023]], [[组合数学 (Fall 2019)|Fall 2019]], [[组合数学 (Fall 2017)|Fall 2017]], [[组合数学 (Fall 2016)|Fall 2016]], [[组合数学 (Fall 2015)|Fall 2015]], [[组合数学 (Spring 2014)|Spring 2014]], [[组合数学 (Spring 2013)|Spring 2013]], [[组合数学 (Fall 2011)|Fall 2011]], [[Combinatorics (Fall 2010)|Fall 2010]].

* Computational Complexity: [[计算复杂性 (Spring 2025)|Spring 2025]], [[计算复杂性 (Spring 2024)|Spring 2024]], [[计算复杂性 (Spring 2023)|Spring 2023]], [[计算复杂性 (Fall 2019)|Fall 2019]], [[计算复杂性 (Fall 2018)|Fall 2018]].

* Numerical Method: [[计算方法 Numerical method (Spring 2025)|Spring 2025]], [[计算方法 Numerical method (Spring 2024)|Spring 2024]], [[计算方法 Numerical method (Spring 2023)|Spring 2023]], [https://liuexp.github.io/numerical.html Spring 2022].

* Probability Theory: [[概率论与数理统计 (Spring 2024)|Spring 2024]], [[概率论与数理统计 (Spring 2023)|Spring 2023]].

* Quantum Computation: [[量子计算 (Spring 2022)|Spring 2022]], [[量子计算 (Spring 2021)|Spring 2021]], [[量子计算 (Fall 2019)|Fall 2019]].

* Randomized Algorithms: [[随机算法 (Fall 2015)|Fall 2015]], [[随机算法 (Spring 2014)|Spring 2014]], [[随机算法 (Spring 2013)|Spring 2013]], [[随机算法 (Fall 2011)|Fall 2011]], [[Randomized Algorithms (Spring 2010)|Spring 2010]].

;Past seminars, workshops and summer schools
*计算理论之美暑期学校: [[计算理论之美 (Summer 2024)|2024]], [[计算理论之美 (Summer 2023)|2023]], [[计算理论之美 (Summer 2021)|2021]]
*[[TCSPhD2020| 理论计算机科学优秀博士生论坛2020]]
*[[Quantum|量子算法与物理实现研讨会]]
*Nanjing Theory Day: [[Theory@Nanjing 2019|2019]], [[Theory@Nanjing 2018|2018]], [[Theory@Nanjing 2017|2017]]
*[[\Delta Seminar on Logic, Philosophy, and Computer Science|Δ Seminar on Logic, Philosophy, and Computer Science]]
*[[近似算法讨论班 (Fall 2011)|近似算法 Approximation Algorithms, Fall 2011.]]

; 其它链接
* [[General Circulation(Fall 2024)|大气环流 General Circulation of the Atmosphere, Fall 2024]]
* [[General Circulation(Fall 2023)|大气环流 General Circulation of the Atmosphere, Fall 2023]]

* [[概率论 (Summer 2014)| 概率与计算 (上海交大 Summer 2014)]]

File:Advanced algorithm 2024 Fall take home final.pdf

2024-12-26T14:36:04Z

Kvrmnks: Kvrmnks uploaded a new version of File:Advanced algorithm 2024 Fall take home final.pdf

高级算法 (Fall 2024)/Min Cut, Max Cut, and Spectral Cut

2024-10-04T07:49:58Z

Kvrmnks: /* Graph Cut */ Modify the definition of graph cut

= Graph Cut =
Let <math>G(V, E)</math> be an undirected graph.
Let <math>\{S,T\}</math> be a '''bipartition''' of <math>V</math> into nonempty subsets <math>S,T\subseteq V</math>, where <math>S\cap T=\emptyset</math> and <math>S\cup T=V</math>.

A cut <math>C</math> is defined by a bipartition <math>\{S,T\}</math> of <math>V</math> as
:<math>C=E(S,T)\,</math>,
where <math>E(S,T)</math> denotes the set of "crossing edges" with one endpoint in each of <math>S</math> and <math>T</math>, formally defined as
:<math>E(S,T)=\{uv\in E\mid u\in S, v\in T\}</math>.

Given a graph <math>G</math>, there might be many cuts in <math>G</math>, and we are interested in finding the '''minimum''' or '''maximum''' cut.

= Min-Cut =
The '''min-cut problem''', also called the '''global minimum cut problem''', is defined as follows.
{{Theorem|Min-cut problem|
*'''Input''': an undirected graph <math>G(V,E)</math>;
*'''Output''': a cut <math>C</math> in <math>G</math> with the smallest size <math>|C|</math>.
}}

Equivalently, the problem asks to find a bipartition of <math>V</math> into disjoint non-empty subsets <math>S</math> and <math>T</math> that minimizes <math>|E(S,T)|</math>.

We consider the problem in a slightly more generalized setting, where the input graphs <math>G</math> can be '''multi-graphs''', meaning that there could be multiple '''parallel edges''' between two vertices <math>u</math> and <math>v</math>. The cuts in multi-graphs are defined in the same way as before, and the cost of a cut <math>C</math> is given by the total number of edges (including parallel edges) in <math>C</math>. Equivalently, one may think of a multi-graph as a graph with integer edge weights, and the cost of a cut <math>C</math> is the total weights of all edges in <math>C</math>.

A canonical deterministic algorithm for this problem is through the [http://en.wikipedia.org/wiki/Max-flow_min-cut_theorem max-flow min-cut theorem]. The max-flow algorithm finds us a minimum '''<math>s</math>-<math>t</math> cut''', which disconnects a '''source''' <math>s\in V</math> from a '''sink''' <math>t\in V</math>, both specified as part of the input. A global min cut can be found by exhaustively finding the minimum <math>s</math>-<math>t</math> cut for an arbitrarily fixed source <math>s</math> and all possible sink <math>t\neq s</math>. This takes <math>(n-1)\times</math>max-flow time where <math>n=|V|</math> is the number of vertices.

The fastest known deterministic algorithm for the minimum cut problem on multi-graphs is the [https://en.wikipedia.org/wiki/Stoer–Wagner_algorithm Stoer–Wagner algorithm], which achieves an <math>O(mn+n^2\log n)</math> time complexity where <math>m=|E|</math> is the total number of edges (counting the parallel edges).

If we restrict the input to be '''simple graphs''' (meaning there is no parallel edges) with no edge weight, there are better algorithms. A deterministic algorithm of [https://dl.acm.org/citation.cfm?id=2746588 Ken-ichi Kawarabayashi and Mikkel Thorup] published in STOC 2015, achieves the near-linear (in the number of edges) time complexity.

== Karger's ''Contraction'' algorithm ==
We will describe a simple and elegant randomized algorithm for the min-cut problem. The algorithm is due to [http://people.csail.mit.edu/karger/ David Karger].

Let <math>G(V, E)</math> be a '''multi-graph''', which allows more than one '''parallel edges''' between two distinct vertices <math>u</math> and <math>v</math> but does not allow any '''self-loops''': the edges that adjoin a vertex to itself. A multi-graph <math>G</math> can be represented by an adjacency matrix <math>A</math>, in the way that each non-diagonal entry <math>A(u,v)</math> takes nonnegative integer values instead of just 0 or 1, representing the number of parallel edges between <math>u</math> and <math>v</math> in <math>G</math>, and all diagonal entries <math>A(v,v)=0</math> (since there is no self-loop).

Given a multi-graph <math>G(V,E)</math> and an edge <math>e\in E</math>, we define the following '''contraction''' operator Contract(<math>G</math>, <math>e</math>), which transform <math>G</math> to a new multi-graph.
{{Theorem|The contraction operator ''Contract''(<math>G</math>, <math>e</math>)|
:say <math>e=uv</math>:
:*replace <math>\{u,v\}</math> by a new vertex <math>x</math>;
:*for every edge (no matter parallel or not) in the form of <math>uw</math> or <math>vw</math> that connects one of <math>\{u,v\}</math> to a vertex <math>w\in V\setminus\{u,v\}</math> in the graph other than <math>u,v</math>, replace it by a new edge <math>xw</math>;
:*the reset of the graph does not change.
}}

In other words, the <math>Contract(G,uv)</math> merges the two vertices <math>u</math> and <math>v</math> into a new vertex <math>x</math> whose incident edges preserves the edges incident to <math>u</math> or <math>v</math> in the original graph <math>G</math> except for the parallel edges between them. Now you should realize why we consider multi-graphs instead of simple graphs, because even if we start with a simple graph without parallel edges, the contraction operator may create parallel edges.

The contraction operator is illustrated by the following picture:
[[Image:Contract.png|600px|center]]

Karger's algorithm uses a simple idea:
*At each step we randomly select an edge in the current multi-graph to contract until there are only two vertices left.
*The parallel edges between these two remaining vertices must be a cut of the original graph.
*We return this cut and hope that with good chance this gives us a minimum cut.
The following is the pseudocode for Karger's algorithm.
{{Theorem|''RandomContract'' (Karger 1993)|
:'''Input:''' multi-graph <math>G(V,E)</math>;
----
:while <math>|V|>2</math> do
:* choose an edge <math>uv\in E</math> uniformly at random;
:* <math>G=Contract(G,uv)</math>;
:return <math>C=E</math> (the parallel edges between the only two vertices in <math>V</math>);
}}

Another way of looking at the contraction operator Contract(<math>G</math>,<math>e</math>) is that we are dealing with classes of vertices. Let <math>V=\{v_1,v_2,\ldots,v_n\}</math> be the set of all vertices. We start with <math>n</math> vertex classes <math>S_1,S_2,\ldots, S_n</math> with each class <math>S_i=\{v_i\}</math> contains one vertex. By calling <math>Contract(G,uv)</math>, where <math>u\in S_i</math> and <math>v\in S_j</math> for distinct <math>i\neq j</math>, we take union of <math>S_i</math> and <math>S_j</math>. The edges in the contracted multi-graph are the edges that cross between different vertex classes.

This view of contraction is illustrated by the following picture:
[[Image:Contract_class.png|600px|center]]

The following claim is left as an exercise for the class:
:{|border="2" width="100%" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|
*With suitable choice of data structures, each operation <math>Contract(G,e)</math> can be implemented within running time <math>O(n)</math> where <math>n=|V|</math> is the number of vertices.
|}

In the above '''''RandomContract''''' algorithm, there are precisely <math>n-2</math> contractions. Therefore, we have the following time upper bound.
{{Theorem|Theorem|
: For any multigraph with <math>n</math> vertices, the running time of the '''''RandomContract''''' algorithm is <math>O(n^2)</math>.
}}
We emphasize that it's the time complexity of a "single running" of the algorithm: later we will see we may need to run this algorithm for many times to guarantee a desirable accuracy.

== Analysis of accuracy ==
We now analyze the performance of the above algorithm. Since the algorithm is '''''randomized''''', its output cut is a random variable even when the input is fixed, so ''the output may not always be correct''. We want to give a theoretical guarantee of the chance that the algorithm returns a correct answer on an arbitrary input.

More precisely, on an arbitrarily fixed input multi-graph <math>G</math>, we want to answer the following question rigorously:
:<math>p_{\text{correct}}=\Pr[\,\text{a minimum cut is returned by }RandomContract\,]\ge ?</math>

To answer this question, we prove a stronger statement: for arbitrarily fixed input multi-graph <math>G</math> and a particular minimum cut <math>C</math> in <math>G</math>,
:<math>p_{C}=\Pr[\,C\mbox{ is returned by }RandomContract\,]\ge ?</math>
Obviously this will imply the previous lower bound for <math>p_{\text{correct}}</math> because the event in <math>p_{C}</math> implies the event in <math>p_{\text{correct}}</math>.
:{|border="2" width="100%" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|
*In above argument we use the simple law in probability that <math>\Pr[A]\le \Pr[B]</math> if <math>A\subseteq B</math>, i.e. event <math>A</math> implies event <math>B</math>.
|}

We introduce the following notations:
*Let <math>e_1,e_2,\ldots,e_{n-2}</math> denote the sequence of random edges chosen to contract in a running of ''RandomContract'' algorithm.
*Let <math>G_1=G</math> denote the original input multi-graph. And for <math>i=1,2,\ldots,n-2</math>, let <math>G_{i+1}=Contract(G_{i},e_i)</math> be the multigraph after <math>i</math>th contraction.
Obviously <math>e_1,e_2,\ldots,e_{n-2}</math> are random variables, and they are the ''only'' random choices used in the algorithm: meaning that they along with the input <math>G</math>, uniquely determine the sequence of multi-graphs <math>G_1,G_2,\ldots,G_{n-2}</math> in every iteration as well as the final output.

We now compute the probability <math>p_C</math> by decompose it into more elementary events involving <math>e_1,e_2,\ldots,e_{n-2}</math>. This is due to the following proposition.
{{Theorem
|Proposition 1|
:If <math>C</math> is a minimum cut in a multi-graph <math>G</math> and <math>e\not\in C</math>, then <math>C</math> is still a minimum cut in the contracted graph <math>G'=contract(G,e)</math>.
}}
{{Proof|
We first observe that contraction will never create new cuts: every cut in the contracted graph <math>G'</math> must also be a cut in the original graph <math>G</math>.

We then observe that a cut <math>C</math> in <math>G</math> "survives" in the contracted graph <math>G'</math> if and only if the contracted edge <math>e\not\in C</math>.

Both observations are easy to verify by the definition of contraction operator (in particular, easier to verify if we take the vertex class interpretation). The detailed proofs are left as an exercise.
}}

Recall that <math>e_1,e_2,\ldots,e_{n-2}</math> denote the sequence of random edges chosen to contract in a running of ''RandomContract'' algorithm.

By Proposition 1, the event <math>\mbox{``}C\mbox{ is returned by }RandomContract\mbox{''}\,</math> is equivalent to the event <math>\mbox{``}e_i\not\in C\mbox{ for all }i=1,2,\ldots,n-2\mbox{''}</math>. Therefore:
:<math>
\begin{align}
p_C
&=
\Pr[\,C\mbox{ is returned by }{RandomContract}\,]\\
&=
\Pr[\,e_i\not\in C\mbox{ for all }i=1,2,\ldots,n-2\,]\\
&=
\prod_{i=1}^{n-2}\Pr[e_i\not\in C\mid \forall j<i, e_j\not\in C].
\end{align}
</math>
The last equation is due to the so called '''chain rule''' in probability.
:{|border="2" width="100%" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|
*The '''chain rule''', also known as the '''law of progressive conditioning''', is the following proposition: for a sequence of events (not necessarily independent) <math>A_1,A_2,\ldots,A_n</math>,
::<math>\Pr[\forall i, A_i]=\prod_{i=1}^n\Pr[A_i\mid \forall j<i, A_j]</math>.
:It is a simple consequence of the definition of conditional probability. By definition of conditional probability,
::<math>\Pr[A_n\mid \forall j<n]=\frac{\Pr[\forall i, A_i]}{\Pr[\forall j<n, A_j]}</math>,
:and equivalently we have
::<math>\Pr[\forall i, A_i]=\Pr[\forall j<n, A_j]\Pr[A_n\mid \forall j<n]</math>.
:Recursively apply this to <math>\Pr[\forall j<n, A_j]</math> we obtain the chain rule.
|}

Back to the analysis of probability <math>p_C</math>.

Now our task is to give lower bound to each <math>p_i=\Pr[e_i\not\in C\mid \forall j<i, e_j\not\in C]</math>. The condition <math>\mbox{``}\forall j<i, e_j\not\in C\mbox{''}</math> means the min-cut <math>C</math> survives all first <math>i-1</math> contractions <math>e_1,e_2,\ldots,e_{i-1}</math>, which due to Proposition 1 means that <math>C</math> is also a min-cut in the multi-graph <math>G_i</math> obtained from applying the first <math>(i-1)</math> contractions.

Then the conditional probability <math>p_i=\Pr[e_i\not\in C\mid \forall j<i, e_j\not\in C]</math> is the probability that no edge in <math>C</math> is hit when a uniform random edge in the current multi-graph is chosen assuming that <math>C</math> is a minimum cut in the current multi-graph. Intuitively this probability should be bounded from below, because as a min-cut <math>C</math> should be sparse among all edges. This intuition is justified by the following proposition.

{{Theorem
|Proposition 2|
:If <math>C</math> is a min-cut in a multi-graph <math>G(V,E)</math>, then <math>|E|\ge \frac{|V||C|}{2}</math>.
}}
{{Proof|
:It must hold that the degree of each vertex <math>v\in V</math> is at least <math>|C|</math>, or otherwise the set of edges incident to <math>v</math> forms a cut of size smaller than <math>|C|</math> which separates <math>\{v\}</math> from the rest of the graph, contradicting that <math>C</math> is a min-cut. And the bound <math>|E|\ge \frac{|V||C|}{2}</math> follows directly from applying the [https://en.wikipedia.org/wiki/Handshaking_lemma handshaking lemma] to the fact that every vertex in <math>G</math> has degree at least <math>|C|</math>.
}}

Let <math>V_i</math> and <math>E_i</math> denote the vertex set and edge set of the multi-graph <math>G_i</math> respectively, and recall that <math>G_i</math> is the multi-graph obtained from applying first <math>(i-1)</math> contractions. Obviously <math>|V_{i}|=n-i+1</math>. And due to Proposition 2, <math>|E_i|\ge \frac{|V_i||C|}{2}</math> if <math>C</math> is still a min-cut in <math>G_i</math>.

The probability <math>p_i=\Pr[e_i\not\in C\mid \forall j<i, e_j\not\in C]</math> can be computed as
:<math>
\begin{align}
p_i
&=1-\frac{|C|}{|E_i|}\\
&\ge1-\frac{2}{|V_i|}\\
&=1-\frac{2}{n-i+1}
\end{align},</math>
where the inequality is due to Proposition 2.

We now can put everything together. We arbitrarily fix the input multi-graph <math>G</math> and any particular minimum cut <math>C</math> in <math>G</math>.
:<math>\begin{align}
p_{\text{correct}}
&=\Pr[\,\text{a minimum cut is returned by }RandomContract\,]\\
&\ge
\Pr[\,C\mbox{ is returned by }{RandomContract}\,]\\
&=
\Pr[\,e_i\not\in C\mbox{ for all }i=1,2,\ldots,n-2\,]\\
&=
\prod_{i=1}^{n-2}\Pr[e_i\not\in C\mid \forall j<i, e_j\not\in C]\\
&\ge
\prod_{i=1}^{n-2}\left(1-\frac{2}{n-i+1}\right)\\
&=
\prod_{k=3}^{n}\frac{k-2}{k}\\
&= \frac{2}{n(n-1)}.
\end{align}</math>

This gives us the following theorem.
{{Theorem
|Theorem|
: For any multigraph with <math>n</math> vertices, the ''RandomContract'' algorithm returns a minimum cut with probability at least <math>\frac{2}{n(n-1)}</math>.
}}
At first glance this seems to be a miserable chance of success. However, notice that there may be exponential many cuts in a graph (because potentially every nonempty subset <math>S\subset V</math> corresponds to a cut <math>C=E(S,\overline{S})</math>), and Karger's algorithm effectively reduce this exponential-sized space of feasible solutions to a quadratic size one, an exponential improvement!

We can run ''RandomContract'' independently for <math>t=\frac{n(n-1)\ln n}{2}</math> times and return the smallest cut ever returned. The probability that a minimum cut is found is at least:

:<math>\begin{align}
&\quad 1-\Pr[\,\mbox{all }t\mbox{ independent runnings of } RandomContract\mbox{ fails to find a min-cut}\,] \\
&= 1-\Pr[\,\mbox{a single running of }{RandomContract}\mbox{ fails}\,]^{t} \\
&\ge 1- \left(1-\frac{2}{n(n-1)}\right)^{\frac{n(n-1)\ln n}{2}} \\
&\ge 1-\frac{1}{n}.
\end{align}</math>

Recall that a running of ''RandomContract'' algorithm takes <math>O(n^2)</math> time. Altogether this gives us a randomized algorithm running in time <math>O(n^4\log n)</math> and find a minimum cut [https://en.wikipedia.org/wiki/With_high_probability '''with high probability'''].

== A Corollary by the Probabilistic Method ==
The analysis of Karger's algorithm implies the following combinatorial proposition for the number of distinct minimum cuts in a graph.
{{Theorem|Corollary|
:For any graph <math>G(V,E)</math> of <math>n</math> vertices, the number of distinct minimum cuts in <math>G</math> is at most <math>\frac{n(n-1)}{2}</math>.
}}
{{Proof|
Let <math>\mathcal{C}</math> denote the set of all minimum cuts in <math>G</math>. For each min-cut <math>C\in\mathcal{C}</math>, let <math>A_C</math> denote the event "<math>C</math> is returned by ''RandomContract''", whose probability is given by
:<math>p_C=\Pr[A_C]\,</math>.

Clearly we have:
* for any distinct <math>C,D\in\mathcal{C}</math>, <math>A_C\,</math> and <math>A_{D}\,</math> are '''disjoint events'''; and
* the union <math>\bigcup_{C\in\mathcal{C}}A_C</math> is precisely the event "a minimum cut is returned by ''RandomContract''", whose probability is given by
::<math>p_{\text{correct}}=\Pr[\,\text{a minimum cut is returned by } RandomContract\,]</math>.
Due to the [https://en.wikipedia.org/wiki/Probability_axioms#Third_axiom '''additivity of probability'''], it holds that
:<math>
p_{\text{correct}}=\sum_{C\in\mathcal{C}}\Pr[A_C]=\sum_{C\in\mathcal{C}}p_C.
</math>

By the analysis of Karger's algorithm, we know <math>p_C\ge\frac{2}{n(n-1)}</math>. And since <math>p_{\text{correct}}</math> is a well defined probability, due to the [https://en.wikipedia.org/wiki/Probability_axioms#Second_axiom '''unitarity of probability'''], it must hold that <math>p_{\text{correct}}\le 1</math>. Therefore,
:<math>1\ge p_{\text{correct}}=\sum_{C\in\mathcal{C}}p_C\ge|\mathcal{C}|\frac{2}{n(n-1)}</math>,
which means <math>|\mathcal{C}|\le\frac{n(n-1)}{2}</math>.
}}

Note that the statement of this theorem has no randomness at all, while the proof consists of a randomized procedure. This is an example of [http://en.wikipedia.org/wiki/Probabilistic_method the probabilistic method].

== Fast Min-Cut ==
In the analysis of ''RandomContract'' algorithm, recall that we lower bound the probability <math>p_C</math> that a min-cut <math>C</math> is returned by ''RandomContract'' by the following '''telescopic product''':
:<math>p_C\ge\prod_{i=1}^{n-2}\left(1-\frac{2}{n-i+1}\right)</math>.
Here the index <math>i</math> corresponds to the <math>i</math>th contraction. The factor <math>\left(1-\frac{2}{n-i+1}\right)</math> is decreasing in <math>i</math>, which means:
* The probability of success is only getting bad when the graph is getting "too contracted", that is, when the number of remaining vertices is getting small.
This motivates us to consider the following alternation to the algorithm: first using random contractions to reduce the number of vertices to a moderately small number, and then recursively finding a min-cut in this smaller instance. This seems just a restatement of exactly what we have been doing. Inspired by the idea of boosting the accuracy via independent repetition, here we apply the recursion on ''two'' smaller instances generated independently.

The algorithm obtained in this way is called ''FastCut''. We first define a procedure to randomly contract edges until there are <math>t</math> number of vertices left.

{{Theorem|''RandomContract''<math>(G, t)</math>|
:'''Input:''' multi-graph <math>G(V,E)</math>, and integer <math>t\ge 2</math>;
----
:while <math>|V|>t</math> do
:* choose an edge <math>uv\in E</math> uniformly at random;
:* <math>G=Contract(G,uv)</math>;
:return <math>G</math>;
}}

The ''FastCut'' algorithm is recursively defined as follows.
{{Theorem|''FastCut''<math>(G)</math>|
:'''Input:''' multi-graph <math>G(V,E)</math>;
----
:if <math>|V|\le 6</math> then return a mincut by brute force;
:else let <math>t=\left\lceil1+|V|/\sqrt{2}\right\rceil</math>;
:: <math>G_1=RandomContract(G,t)</math>;
:: <math>G_2=RandomContract(G,t)</math>;
::return the smaller one of <math>FastCut(G_1)</math> and <math>FastCut(G_2)</math>;
}}

As before, all <math>G</math> are multigraphs.

Fix a min-cut <math>C</math> in the original multigraph <math>G</math>. By the same analysis as in the case of ''RandomContract'', we have
:<math>
\begin{align}
&\Pr[C\text{ survives all contractions in }RandomContract(G,t)]\\
=
&\prod_{i=1}^{n-t}\Pr[C\text{ survives the }i\text{-th contraction}\mid C\text{ survives the first }(i-1)\text{-th contractions}]\\
\ge
&\prod_{i=1}^{n-t}\left(1-\frac{2}{n-i+1}\right)\\
=
&\prod_{k=t+1}^{n}\frac{k-2}{k}\\
=
&\frac{t(t-1)}{n(n-1)}.
\end{align}
</math>
When <math>t=\left\lceil1+n/\sqrt{2}\right\rceil</math>, this probability is at least <math>1/2</math>. The choice of <math>t</math> is due to our purpose to make this probability at least <math>1/2</math>. You will see this is crucial in the following analysis of accuracy.

We denote by <math>A</math> and <math>B</math> the following events:
:<math>
\begin{align}
A:
&\quad C\text{ survives all contractions in }RandomContract(G,t);\\
B:
&\quad\text{size of min-cut is unchanged after }RandomContract(G,t);
\end{align}
</math>
Clearly, <math>A</math> implies <math>B</math> and by above analysis <math>\Pr[B]\ge\Pr[A]\ge\frac{1}{2}</math>.

We denote by <math>p(n)</math> the lower bound on the probability that <math>FastCut(G)</math> succeeds for a multigraph of <math>n</math> vertices, that is
:<math>
p(n)
=\min_{G: |V|=n}\Pr[\,FastCut(G)\text{ returns a min-cut in }G\,].
</math>
Suppose that <math>G</math> is the multigraph that achieves the minimum in above definition. The following recurrence holds for <math>p(n)</math>.
:<math>
\begin{align}
p(n)
&=
\Pr[\,FastCut(G)\text{ returns a min-cut in }G\,]\\
&=
\Pr[\,\text{ a min-cut of }G\text{ is returned by }FastCut(G_1)\text{ or }FastCut(G_2)\,]\\
&\ge
1-\left(1-\Pr[B\wedge FastCut(G_1)\text{ returns a min-cut in }G_1\,]\right)^2\\
&\ge
1-\left(1-\Pr[A\wedge FastCut(G_1)\text{ returns a min-cut in }G_1\,]\right)^2\\
&=
1-\left(1-\Pr[A]\Pr[ FastCut(G_1)\text{ returns a min-cut in }G_1\mid A]\right)^2\\
&\ge
1-\left(1-\frac{1}{2}p\left(\left\lceil1+n/\sqrt{2}\right\rceil\right)\right)^2,
\end{align}
</math>
where <math>A</math> and <math>B</math> are defined as above such that <math>\Pr[A]\ge\frac{1}{2}</math>.

The base case is that <math>p(n)=1</math> for <math>n\le 6</math>. By induction it is easy to prove that
:<math>
p(n)=\Omega\left(\frac{1}{\log n}\right).
</math>

Recall that we can implement an edge contraction in <math>O(n)</math> time, thus it is easy to verify the following recursion of time complexity:
:<math>
T(n)=2T\left(\left\lceil1+n/\sqrt{2}\right\rceil\right)+O(n^2),
</math>
where <math>T(n)</math> denotes the running time of <math>FastCut(G)</math> on a multigraph <math>G</math> of <math>n</math> vertices.

By induction with the base case <math>T(n)=O(1)</math> for <math>n\le 6</math>, it is easy to verify that <math>T(n)=O(n^2\log n)</math>.

{{Theorem
|Theorem|
: For any multigraph with <math>n</math> vertices, the ''FastCut'' algorithm returns a minimum cut with probability <math>\Omega\left(\frac{1}{\log n}\right)</math> in time <math>O(n^2\log n)</math>.
}}

At this point, we see the name ''FastCut'' is misleading because it is actually slower than the original ''RandomContract'' algorithm, only the chance of successfully finding a min-cut is much better (improved from an <math>\Omega(1/n^2)</math> to an <math>\Omega(1/\log n)</math>).

Given any input multi-graph, repeatedly running the ''FastCut'' algorithm independently for some <math>O((\log n)^2)</math> times and returns the smallest cut ever returned, we have an algorithm which runs in time <math>O(n^2\log^3n)</math> and returns a min-cut with probability <math>1-O(1/n)</math>, i.e. with high probability.

Recall that the running time of best known deterministic algorithm for min-cut on multi-graph is <math>O(mn+n^2\log n)</math>. On dense graph, the randomized algorithm outperforms the best known deterministic algorithm.

Finally, Karger further improves this and obtains a near-linear (in the number of edges) time [https://arxiv.org/abs/cs/9812007 randomized algorithm] for minimum cut in multi-graphs.

= Max-Cut=
The '''maximum cut problem''', in short the '''max-cut problem''', is defined as follows.
{{Theorem|Max-cut problem|
*'''Input''': an undirected graph <math>G(V,E)</math>;
*'''Output''': a bipartition of <math>V</math> into disjoint subsets <math>S</math> and <math>T</math> that maximizes <math>|E(S,T)|</math>.
}}

The problem is a typical MAX-CSP, an optimization version of the [https://en.wikipedia.org/wiki/Constraint_satisfaction_problem constraint satisfaction problem]. An instance of CSP consists of:
* a set of variables <math>x_1,x_2,\ldots,x_n</math> usually taking values from some finite domain;
* a sequence of constraints (predicates) <math>C_1,C_2,\ldots, C_m</math> defined on those variables.
The MAX-CSP asks to find an assignment of values to variables <math>x_1,x_2,\ldots,x_n</math> which maximizes the number of satisfied constraints.

In particular, when the variables <math>x_1,x_2,\ldots,x_n</math> takes Boolean values <math>\{0,1\}</math> and every constraint is a binary constraint <math>\cdot\neq\cdot</math> in the form of <math>x_1\neq x_j</math>, then the MAX-CSP is precisely the max-cut problem.

Unlike the min-cut problem, which can be solved in polynomial time, the max-cut is known to be [https://en.wikipedia.org/wiki/NP-hardness '''NP-hard''']. Its decision version is among the [https://en.wikipedia.org/wiki/Karp%27s_21_NP-complete_problems 21 '''NP-complete''' problems found by Karp]. This means we should not hope for a polynomial-time algorithm for solving the problem if [https://en.wikipedia.org/wiki/P_versus_NP_problem a famous conjecture in computational complexity] is correct. And due to another [https://en.wikipedia.org/wiki/BPP_(complexity)#Problems less famous conjecture in computational complexity], randomization alone probably cannot help this situation either.

We may compromise our goal and allow algorithm to ''not always find the optimal solution''. However, we still want to guarantee that the algorithm ''always returns a relatively good solution on all possible instances''. This notion is formally captured by '''approximation algorithms''' and '''approximation ratio'''.

== Greedy algorithm ==
A natural heuristics for solving the max-cut is to sequentially join the vertices to one of the two disjoint subsets <math>S</math> and <math>T</math> to ''greedily'' maximize the ''current'' number of edges crossing between <math>S</math> and <math>T</math>.

To state the algorithm, we overload the definition <math>E(S,T)</math>. Given an undirected graph <math>G(V,E)</math>, for any disjoint subsets <math>S,T\subseteq V</math> of vertices, we define
:<math>E(S,T)=\{uv\in E\mid u\in S, v\in T\}</math>.

We also assume that the vertices are ordered arbitrarily as <math>V=\{v_1,v_2,\ldots,v_n\}</math>.

The greedy heuristics is then described as follows.
{{Theorem|''GreedyMaxCut''|
:'''Input:''' undirected graph <math>G(V,E)</math>,
:::with an arbitrary order of vertices <math>V=\{v_1,v_2,\ldots,v_n\}</math>;
----
:initially <math>S=T=\emptyset</math>;
:for <math>i=1,2,\ldots,n</math>
::<math>v_i</math> joins one of <math>S,T</math> to maximize the current <math>|E(S,T)|</math> (breaking ties arbitrarily);
}}

The algorithm certainly runs in polynomial time.

Without any guarantee of how good the solution returned by the algorithm approximates the optimal solution, the algorithm is only a heuristics, not an '''approximation algorithm'''.

=== Approximation ratio ===
For now we restrict ourselves to the max-cut problem, although the notion applies more generally.

Let <math>G</math> be an arbitrary instance of max-cut problem. Let <math>OPT_G</math> denote the size of the of max-cut in graph <math>G</math>. More precisely,
:<math>OPT_G=\max_{S\subseteq V}|E(S,\overline{S})|</math>.
Let <math>SOL_G</math> be the size of of the cut <math>|E(S,T)|</math> returned by the ''GreedyMaxCut'' algorithm on input graph <math>G</math>.

As a maximization problem it is trivial that <math>SOL_G\le OPT_G</math> for all <math>G</math>. To guarantee that the ''GreedyMaxCut'' gives good approximation of optimal solution, we need the other direction:
{{Theorem|Approximation ratio|
:We say that the '''approximation ratio''' of the ''GreedyMaxCut'' algorithm is <math>\alpha</math>, or ''GreedyMaxCut'' is an '''<math>\alpha</math>-approximation''' algorithm, for some <math>0<\alpha\le 1</math>, if
::<math>\frac{SOL_G}{OPT_G}\ge \alpha</math> for every possible instance <math>G</math> of max-cut.
}}

With this notion, we now try to analyze the approximation ratio of the ''GreedyMaxCut'' algorithm.

A dilemma to apply this notion in our analysis is that in the definition of approximation ratio, we compare the solution returned by the algorithm with the '''optimal solution'''. However, in the analysis we can hardly conduct similar comparisons to the optimal solutions. A fallacy in this logic is that the optimal solutions are '''NP-hard''', meaning there is no easy way to calculate them (e.g. a closed form).

A popular step (usually the first step of analyzing approximation ratio) to avoid this dilemma is that instead of directly comparing to the optimal solution, we compare to an '''upper bound''' of the optimal solution (for minimization problem, this needs to be a lower bound), that is, we compare to something which is even better than the optimal solution (which means it cannot be realized by any feasible solution).

For the max-cut problem, a simple upper bound to <math>OPT_G</math> is <math>|E|</math>, the number of all edges. This is a trivial upper bound of max-cut since any cut is a subset of edges.

Let <math>G(V,E)</math> be the input graph and <math>V=\{v_1,v_2,\ldots,v_n\}</math>. Initially <math>S_1=T_1=\emptyset</math>. And for <math>i=1,2,\ldots,n</math>, we let <math>S_{i+1}</math> and <math>T_{i+1}</math> be the respective <math>S</math> and <math>T</math> after <math>v_i</math> joins one of <math>S,T</math>. More precisely,
* <math>S_{i+1}=S_i\cup\{v_i\}</math> and <math>T_{i+1}=T_i\,</math> if <math>E(S_{i}\cup\{v_i\},T_i)>E(S_{i},T_i\cup\{v_i\})</math>;
* <math>S_{i+1}=S_i\,</math> and <math>T_{i+1}=T_i\cup\{v_i\}</math> if otherwise.
Finally, the max-cut is given by
:<math>SOL_G=|E(S_{n+1},T_{n+1})|</math>.

We first observe that we can count the number of edges <math>|E|</math> by summarizing the contributions of individual <math>v_i</math>'s.
{{Theorem|Proposition 1|
:<math>|E| = \sum_{i=1}^n\left(|E(S_i,\{v_i\})|+|E(T_i,\{v_i\})|\right)</math>.
}}
{{Proof|
Note that <math>S_i\cup T_i=\{v_1,v_2,\ldots,v_{i-1}\}</math>, i.e. <math>S_i</math> and <math>T_i</math> together contain precisely those vertices preceding <math>v_i</math>. Therefore, by taking the sum
:<math>\sum_{i=1}^n\left(|E(S_i,\{v_i\})|+|E(T_i,\{v_i\})|\right)</math>,
we effectively enumerate all <math>(v_j,v_i)</math> that <math>v_jv_i\in E</math> and <math>j<i</math>. The total number is precisely <math>|E|</math>.
}}

We then observe that the <math>SOL_G</math> can be decomposed into contributions of individual <math>v_i</math>'s in the same way.
{{Theorem|Proposition 2|
:<math>SOL_G = \sum_{i=1}^n\max\left(|E(S_i, \{v_i\})|,|E(T_i, \{v_i\})|\right)</math>.
}}
{{Proof|
It is east to observe that <math>E(S_i,T_i)\subseteq E(S_{i+1},T_{i+1})</math>, i.e. once an edge joins the cut between current <math>S,T</math> it will never drop from the cut in the future.

We then define
:<math>\Delta_i= |E(S_{i+1},T_{i+1})|-|E(S_i,T_i)|=|E(S_{i+1},T_{i+1})\setminus E(S_i,T_i)|</math>
to be the contribution of <math>v_i</math> in the final cut.

It holds that
:<math>\sum_{i=1}^n\Delta_i=|E(S_{n+1},T_{n+1})|-|E(S_{1},T_{1})|=|E(S_{n+1},T_{n+1})|=SOL_G</math>.
On the other hand, due to the greedy rule:
* <math>S_{i+1}=S_i\cup\{v_i\}</math> and <math>T_{i+1}=T_i\,</math> if <math>E(S_{i}\cup\{v_i\},T_i)>E(S_{i},T_i\cup\{v_i\})</math>;
* <math>S_{i+1}=S_i\,</math> and <math>T_{i+1}=T_i\cup\{v_i\}</math> if otherwise;
it holds that
:<math>\Delta_i=|E(S_{i+1},T_{i+1})\setminus E(S_i,T_i)| = \max\left(|E(S_i, \{v_i\})|,|E(T_i, \{v_i\})|\right)</math>.
Together the proposition follows.
}}

Combining the above Proposition 1 and Proposition 2, we have
:<math>
\begin{align}
SOL_G
&= \sum_{i=1}^n\max\left(|E(S_i, \{v_i\})|,|E(T_i, \{v_i\})|\right)\\
&\ge \frac{1}{2}\sum_{i=1}^n\left(|E(S_i, \{v_i\})|+|E(T_i, \{v_i\})|\right)\\
&=\frac{1}{2}|E|\\
&\ge\frac{1}{2}OPT_G.
\end{align}
</math>

{{Theorem|Theorem|
:The ''GreedyMaxCut'' is a <math>0.5</math>-approximation algorithm for the max-cut problem.
}}

This is not the best approximation ratio achieved by polynomial-time algorithms for max-cut.
* The best known approximation ratio achieved by any polynomial-time algorithm is achieved by the [http://www-math.mit.edu/~goemans/PAPERS/maxcut-jacm.pdf Goemans-Williamson algorithm], which relies on rounding an [https://en.wikipedia.org/wiki/Semidefinite_programming SDP] relaxation of the max-cut, and achieves an approximation ratio <math>\alpha^*\approx 0.878</math>, where <math>\alpha^*</math> is an irrational whose precise value is given by <math>\alpha^*=\frac{2}{\pi}\inf_{x\in[-1,1]}\frac{\arccos(x)}{1-x}</math>.
* Assuming the [https://en.wikipedia.org/wiki/Unique_games_conjecture unique game conjecture], there does not exist any polynomial-time algorithm for max-cut with approximation ratio <math>\alpha>\alpha^*</math>.

== Derandomization by conditional expectation ==
There is a probabilistic interpretation of the greedy algorithm, which may explains why we use greedy scheme for max-cut and why it works for finding an approximate max-cut.

Given an undirected graph <math>G(V,E)</math>, let us calculate the average size of cuts in <math>G</math>. For every vertex <math>v\in V</math> let <math>X_v\in\{0,1\}</math> be a ''uniform'' and ''independent'' random bit which indicates whether <math>v</math> joins <math>S</math> or <math>T</math>. This gives us a uniform random bipartition of <math>V</math> into <math>S</math> and <math>T</math>.

The size of the random cut <math>|E(S,T)|</math> is given by
:<math>
|E(S,T)| = \sum_{uv\in E} I[X_u\neq X_v],
</math>
where <math>I[X_u\neq X_v]</math> is the Boolean indicator random variable that indicates whether event <math>X_u\neq X_v</math> occurs.

Due to '''linearity of expectation''',
:<math>
\mathbb{E}[|E(S,T)|]=\sum_{uv\in E} \mathbb{E}[I[X_u\neq X_v]] =\sum_{uv\in E} \Pr[X_u\neq X_v]=\frac{|E|}{2}.
</math>
Recall that <math>|E|</math> is a trivial upper bound for the max-cut <math>OPT_G</math>. Due to the above argument, we have
:<math>
\mathbb{E}[|E(S,T)|]\ge\frac{OPT_G}{2}.
</math>
:{|border="2" width="100%" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|
*In above argument we use a few probability propositions.
: '''linearity of expectation:'''
:: Let <math>\boldsymbol{X}=(X_1,X_2,\ldots,X_n)</math> be a random vector. Then
:::<math>\mathbb{E}\left[\sum_{i=1}^nc_iX_i\right]=\sum_{i=1}^nc_i\mathbb{E}[X_i]</math>,
::where <math>c_1,c_2,\ldots,c_n</math> are scalars.
::That is, the order of computations of expectation and linear (affine) function of a random vector can be exchanged.
::Note that this property ignores the dependency between random variables, and hence is very useful.
:'''Expectation of indicator random variable:'''
::We usually use the notation <math>I[A]</math> to represent the Boolean indicator random variable that indicates whether the event <math>A</math> occurs: i.e. <math>I[A]=1</math> if event <math>A</math> occurs and <math>I[A]=0</math> if otherwise.
::It is easy to see that <math>\mathbb{E}[I[A]]=\Pr[A]</math>. The expectation of an indicator random variable equals the probability of the event it indicates.
|}

By above analysis, the average (under uniform distribution) size of all cuts in any graph <math>G</math> must be at least <math>\frac{OPT_G}{2}</math>. Due to '''the probabilistic method''', in particular '''the averaging principle''', there must exists a bipartition of <math>V</math> into <math>S</math> and <math>T</math> whose cut <math>E(S,T)</math> is of size at least <math>\frac{OPT_G}{2}</math>. Then next question is how to find such a bipartition <math>\{S,T\}</math> ''algorithmically''.

We still fix an arbitrary order of all vertices as <math>V=\{v_1,v_2,\ldots,v_n\}</math>. Recall that each vertex <math>v_i</math> is associated with a uniform and independent random bit <math>X_{v_i}</math> to indicate whether <math>v_i</math> joins <math>S</math> or <math>T</math>. We want to fix the value of <math>X_{v_i}</math> one after another to construct a bipartition <math>\{\hat{S},\hat{T}\}</math> of <math>V</math> such that
:<math>|E(\hat{S},\hat{T})|\ge\mathbb{E}[|E(S,T)|]\ge\frac{OPT_G}{2}</math>.

We start with the first vertex <math>v_i</math> and its random variable <math>X_{v_1}</math>. By the '''law of total expectation''',
:<math>
\mathbb{E}[E(S,T)]=\frac{1}{2}\mathbb{E}[E(S,T)\mid X_{v_1}=0]+\frac{1}{2}\mathbb{E}[E(S,T)\mid X_{v_1}=1].
</math>
There must exist an assignment <math>x_1\in\{0,1\}</math> of <math>X_{v_1}</math> such that
:<math>\mathbb{E}[E(S,T)\mid X_{v_1}=x_1]\ge \mathbb{E}[E(S,T)]</math>.
We can continuously applying this argument. In general, for any <math>i\le n</math> and any particular partial assignment <math>x_1,x_2,\ldots,x_{i-1}\in\{0,1\}</math> of <math>X_{v_1},X_{v_2},\ldots,X_{v_{i-1}}</math>, by the law of total expectation
:<math>
\begin{align}
\mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{i-1}}=x_{i-1}]
=
&\frac{1}{2}\mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{i-1}}=x_{i-1}, X_{v_{i}}=0]\\
&+\frac{1}{2}\mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{i-1}}=x_{i-1}, X_{v_{i}}=1].
\end{align}
</math>
There must exist an assignment <math>x_{i}\in\{0,1\}</math> of <math>X_{v_i}</math> such that
:<math>
\mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{i}}=x_{i}]\ge \mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{i-1}}=x_{i-1}].
</math>
By this argument, we can find a sequence <math>x_1,x_2,\ldots,x_n\in\{0,1\}</math> of bits which forms a ''monotone path'':
:<math>
\mathbb{E}[E(S,T)]\le \cdots \le \mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{i-1}}=x_{i-1}] \le \mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{i}}=x_{i}] \le \cdots \le \mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{n}}=x_{n}].
</math>
We already know the first step of this monotone path <math>\mathbb{E}[E(S,T)]\ge\frac{OPT_G}{2}</math>. And for the last step of the monotone path <math>\mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{n}}=x_{n}]</math> since all random bits have been fixed, a bipartition <math>(\hat{S},\hat{T})</math> is determined by the assignment <math>x_1,\ldots, x_n</math>, so the expectation has no effect except just retuning the size of that cut <math>|E(\hat{S},\hat{T})|</math>. We found the cut <math>E(\hat{S},\hat{T})</math> such that <math>|E(\hat{S},\hat{T})|\ge \frac{OPT_G}{2}</math>.

We translate the procedure of constructing this monotone path of conditional expectation to the following algorithm.
{{Theorem|''MonotonePath''|
:'''Input:''' undirected graph <math>G(V,E)</math>,
:::with an arbitrary order of vertices <math>V=\{v_1,v_2,\ldots,v_n\}</math>;
----
:initially <math>S=T=\emptyset</math>;
:for <math>i=1,2,\ldots,n</math>
::<math>v_i</math> joins one of <math>S,T</math> to maximize the average size of cut conditioning on the choices made so far by the vertices <math>v_1,v_2,\ldots,v_i</math>;
}}
We leave as an exercise to verify that the choice of each <math>v_i</math> (to join which one of <math>S,T</math>) in the ''MonotonePath'' algorithm (which maximizes the average size of cut conditioning on the choices made so far by the vertices <math>v_1,v_2,\ldots,v_i</math>) must be the same choice made by <math>v_i</math> in the ''GreedyMaxCut'' algorithm (which maximizes the current <math>|E(S,T)|</math>).

Therefore, the greedy algorithm for max-cut is actually due to a derandomization of average-case.

== Derandomization by pairwise independence ==
We still construct a random bipartition of <math>V</math> into <math>S</math> and <math>T</math>. But this time the random choices have '''bounded independence'''.

For each vertex <math>v\in V</math>, we use a Boolean random variable <math>Y_v\in\{0,1\}</math> to indicate whether <math>v</math> joins <math>S</math> and <math>T</math>. The dependencies between <math>Y_v</math>'s are to be specified later.

By linearity of expectation, regardless of the dependencies between <math>Y_v</math>'s, it holds that:
:<math>
\mathbb{E}[|E(S,T)|]=\sum_{uv\in E} \Pr[Y_u\neq Y_v].
</math>
In order to have the average cut <math>\mathbb{E}[|E(S,T)|]=\frac{|E|}{2}</math> as the fully random case, we need <math>\Pr[Y_u\neq Y_v]=\frac{1}{2}</math>. This only requires that the Boolean random variables <math>Y_v</math>'s are uniform and '''pairwise independent''' instead of being '''mutually independent'''.

The <math>n</math> pairwise independent random bits <math>\{Y_v\}_{v\in V}</math> can be constructed by at most <math>k=\lceil\log (n+1)\rceil</math> mutually independent random bits <math>X_1,X_2,\ldots,X_k\in\{0,1\}</math> by the following standard routine.

{{Theorem|Theorem|
:Let <math>X_1, X_2, \ldots, X_k\in\{0,1\}</math> be mutually independent uniform random bits.
:Let <math>S_1, S_2, \ldots, S_{2^k-1}\subseteq \{1,2,\ldots,k\}</math> enumerate the <math>2^k-1</math> nonempty subsets of <math>\{1,2,\ldots,k\}</math>.
:For each <math>i\le i\le2^k-1</math>, let
::<math>Y_i=\bigoplus_{j\in S_i}X_j=\left(\sum_{j\in S_i}X_j\right)\bmod 2.</math>
:Then <math>Y_1,Y_2,\ldots,Y_{2^k-1}</math> are pairwise independent uniform random bits.
}}

If <math>Y_v</math> for each vertex <math>v\in V</math> is constructed in this way by at most <math>k=\lceil\log (n+1)\rceil</math> mutually independent random bits <math>X_1,X_2,\ldots,X_k\in\{0,1\}</math>, then they are uniform and pairwise independent, which by the above calculation, it holds for the corresponding bipartition <math>\{S,T\}</math> of <math>V</math> that
:<math>
\mathbb{E}[|E(S,T)|]=\sum_{uv\in E} \Pr[Y_u\neq Y_v]=\frac{|E|}{2}.
</math>
Note that the average is taken over the random choices of <math>X_1,X_2,\ldots,X_k\in\{0,1\}</math> (because they are the only random choices used to construct the bipartition <math>\{S,T\}</math>). By the probabilistic method, there must exist an assignment of <math>X_1,X_2,\ldots,X_k\in\{0,1\}</math> such that the corresponding <math>Y_v</math>'s and the bipartition <math>\{S,T\}</math> of <math>V</math> indicated by the <math>Y_v</math>'s have that
:<math>|E(S,T)|\ge \frac{|E|}{2}\ge\frac{OPT}{2}</math>.

This gives us the following algorithm for exhaustive search in a smaller solution space of size <math>2^k-1=O(n^2)</math>.
{{Theorem|Algorithm|
:Enumerate vertices as <math>V=\{v_1,v_2,\ldots,v_n\}</math>;
:let <math>k=\lceil\log (n+1)\rceil</math>;
:for all <math>\vec{x}\in\{0,1\}^k</math>
::initialize <math>S_{\vec{x}}=T_{\vec{x}}=\emptyset</math>;
::for <math>i=1, 2, \ldots, n</math>
:::if <math>\bigoplus_{j:\lfloor i/2^j\rfloor\bmod 2=1}x_j=1</math> then <math>v_i</math> joins <math>S_{\vec{x}}</math>;
:::else <math>v_i</math> joins <math>T_{\vec{x}}</math>;
:return the <math>\{S_{\vec{x}},T_{\vec{x}}\}</math> with the largest <math>|E(S_{\vec{x}},T_{\vec{x}})|</math>;
}}
The algorithm has approximation ratio 1/2 and runs in polynomial time.

= Spectral Cut =
== Expansion, Conductance and Sparsest Cut in Regular Graphs==
Consider an undirected <math>d</math>-regular (multi-)graph <math>G(V,E)</math>, where the parallel edges between two vertices are allowed.
For <math>S,T\subset V</math>, let <math>E(S,T)=\{uv\in E\mid u\in S,v\in T\}</math>.

{{Theorem
|Definition (Edge expansion)|
:The '''Edge expansion''' of an undirected graph <math>G</math> on <math>n</math> vertices, is defined as
::<math>
h(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}} \frac{|E(S, \bar{S}|}{|S|}.
</math>
}}
As a side note, the edge expansion in a irregular graph has exactly the same definition.
The edge expansion is a very hard to approximate problem. Under the [https://en.wikipedia.org/wiki/Unique_games_conjecture Unique games conjecture], it is NP-hard to approximate within constant factors.

{{Theorem
|Definition (Conductance)|
:The '''Conductance''' of an undirected graph <math>G</math> on <math>n</math> vertices, is defined as
::<math>
\varphi(G)=h(G)/d.
</math>
}}

{{Theorem
|Definition (Sparsest cut)|
:The '''Sparsest cut''' of an undirected graph <math>G</math> on <math>n</math> vertices, is a set <math>S</math> minimizing
::<math>
\frac{|E(S, \bar{S}|}{\min\{|S|,|\bar{S}|\}}.
</math>
}}

== Spectrum of Regular Graphs==
The '''adjacency matrix''' of an <math>n</math>-vertex graph <math>G</math>, denoted <math>A = A(G)</math>, is an <math>n\times n</math> matrix where <math>A(u,v)</math> is the number of edges in <math>G</math> between vertex <math>u</math> and vertex <math>v</math>. Because <math>A</math> is a symmetric matrix with real entries, due to the [https://en.wikipedia.org/wiki/Spectral_theorem Spectral theorem], it has real eigenvalues <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>, which associate with an orthonormal system of eigenvectors <math>v_1,v_2,\ldots, v_n\,</math> with <math>Av_i=\lambda_i v_i\,</math>. We call the eigenvalues of <math>A</math> the '''spectrum''' of the graph <math>G</math>.

The spectrum of a graph carries a lot of information about the graph. For example, suppose that <math>G</math> is <math>d</math>-regular, the following lemma holds.
{{Theorem
|Lemma|
# <math>|\lambda_i|\le d</math> for all <math>1\le i\le n</math>.
# <math>\lambda_1=d</math> and the corresponding eigenvector is <math>\vec{1}</math>.
# <math>G</math> is connected if and only if <math>\lambda_2<\lambda_1</math>.
}}
{{Proof| Let <math>A</math> be the adjacency matrix of <math>G</math>, with entries <math>a_{ij}</math>. It is obvious that <math>\sum_{j}a_{ij}=d\,</math> for any <math>j</math>.
*(1) Suppose that <math>Ax=\lambda x, x\neq \mathbf{0}</math>, and let <math>x_i</math> be an entry of <math>x</math> with the largest absolute value. Since <math>(Ax)_i=\lambda x_i</math>, we have
::<math>
\sum_{j}a_{ij}x_j=\lambda x_i,\,
</math>
:and so
::<math>
|\lambda||x_i|=\left|\sum_{j}a_{ij}x_j\right|\le \sum_{j}a_{ij}|x_j|\le \sum_{j}a_{ij}|x_i| \le d|x_i|.
</math>
:Thus <math>|\lambda|\le d</math>.
*(2) is easy to check.
*(3) Suppose that <math>G</math> is connected. Let <math>x\neq 0</math> be an eigenvector for which <math>Ax=dx</math>. Without loss of generality we can assume that <math>\max_i x_i>0 </math>.
:We first show that <math>\forall i\left(x_i = \max_j x_j \implies \forall k\sim i, x_k=x_i \right).</math>

:This claim will imply that <math>x=c\vec{1}</math> for a connected graph <math>G</math>, so we prove it next, by contradiction.
:Let <math>x_i</math> be an entry of <math>x</math> with the largest absolute value. Suppose that <math>\exists k\sim i, x_k < x_i</math>. Since <math>\sum_{j}a_{ij}=d</math> and $x_j \le x_i$, we have
::<math>
\sum_{j}a_{ij}x_j < d x_i.\,
</math>
:However,
::<math>
(Ax)_i=d x_i \implies \sum_{j}a_{ij}x_j=d x_i.\,
</math>
:A contradiction. So it follows that <math>x_j=x_i</math> for all <math>j</math> with <math>a_{ij}>0</math>, which verifies the claim. Since <math>G</math> is connected, <math>x=c\vec{1}</math>, and the eigenvalue <math>d=\lambda_1</math> has multiplicity 1, thus <math>\lambda_1>\lambda_2</math>.

:Conversely, suppose that <math>G</math> is disconnected with components <math>G_1 \uplus G_2 = G</math>, we have <math>
A(G) = \begin{pmatrix}
A(G_1) & 0\\
0 & A(G_2)
\end{pmatrix}.</math>

:Therefore, <math>A\vec{1}_{G_1}=d\vec{1}_{G_1}</math> and <math>A\vec{1}_{G_2}=d\vec{1}_{G_2}</math>, where <math>\vec{1}_{G_1}, \vec{1}_{G_2}</math> are all-one vectors only supported on <math>G_1, G_2</math> respectively. Since <math>\vec{1}_{G_1}, \vec{1}_{G_2}</math> are linearly independent, the multiplicity of eigenvalue <math>d</math> is greater than 1, so <math>\lambda_1=\lambda_2</math>.
}}

== Spectral Partitioning Algorithm ==
This is a popular heuristic in practice:
{{Theorem
|Spectral Partitioning Algorithm|
# Compute the second largest eigenvalue of the adjacency matrix and its corresponding eigenvector <math>x</math>
# Sort the vertices <math> V=\{ u_1, \ldots, u_n\} </math> so that <math> x(u_1) \ge x(u_2) \ge \ldots \ge x(u_n) </math>
# Let <math> S_i:=\begin{cases}
\{1,2,\ldots,i\}, \qquad&\hbox{ if }i\le n/2 \\
V \setminus \{1,2,\ldots,i\}, &\hbox{ otherwise}
\end{cases} </math>, and output <math> i = \arg\min_{1\le i \le n} \{ \varphi(S_i)\} </math>
}}

The performance guarantee of this algorithm comes from the Cheeger's inequality. In the worst-case, it only guarantees that the output set <math>S_i</math> has conductance <math>\varphi(S_i) \le 2 \sqrt{\varphi(G)}</math>.
In practice however, the performance is usually much better. We will discuss an improved Cheeger's inequality under additional assumptions on spectral gap later in class, which explains the effectiveness of the algorithm in practice.

== Graph visualization ==
See slides.

高级算法 (Fall 2024)/Min Cut, Max Cut, and Spectral Cut

2024-10-04T07:47:16Z

Kvrmnks: /* Graph Cut */ Remove the definition of graph cut via connectivity

= Graph Cut =
Let <math>G(V, E)</math> be an undirected graph.
Let <math>\{S,T\}</math> be a '''bipartition''' of <math>V</math> into nonempty subsets <math>S,T\subseteq V</math>, where <math>S\cap T=\emptyset</math> and <math>S\cup T=V</math>. A cut <math>C</math> is specified by this bipartition as
:<math>C=E(S,T)\,</math>,
where <math>E(S,T)</math> denotes the set of "crossing edges" with one endpoint in each of <math>S</math> and <math>T</math>, formally defined as
:<math>E(S,T)=\{uv\in E\mid u\in S, v\in T\}</math>.

Given a graph <math>G</math>, there might be many cuts in <math>G</math>, and we are interested in finding the '''minimum''' or '''maximum''' cut.

= Min-Cut =
The '''min-cut problem''', also called the '''global minimum cut problem''', is defined as follows.
{{Theorem|Min-cut problem|
*'''Input''': an undirected graph <math>G(V,E)</math>;
*'''Output''': a cut <math>C</math> in <math>G</math> with the smallest size <math>|C|</math>.
}}

Equivalently, the problem asks to find a bipartition of <math>V</math> into disjoint non-empty subsets <math>S</math> and <math>T</math> that minimizes <math>|E(S,T)|</math>.

We consider the problem in a slightly more generalized setting, where the input graphs <math>G</math> can be '''multi-graphs''', meaning that there could be multiple '''parallel edges''' between two vertices <math>u</math> and <math>v</math>. The cuts in multi-graphs are defined in the same way as before, and the cost of a cut <math>C</math> is given by the total number of edges (including parallel edges) in <math>C</math>. Equivalently, one may think of a multi-graph as a graph with integer edge weights, and the cost of a cut <math>C</math> is the total weights of all edges in <math>C</math>.

A canonical deterministic algorithm for this problem is through the [http://en.wikipedia.org/wiki/Max-flow_min-cut_theorem max-flow min-cut theorem]. The max-flow algorithm finds us a minimum '''<math>s</math>-<math>t</math> cut''', which disconnects a '''source''' <math>s\in V</math> from a '''sink''' <math>t\in V</math>, both specified as part of the input. A global min cut can be found by exhaustively finding the minimum <math>s</math>-<math>t</math> cut for an arbitrarily fixed source <math>s</math> and all possible sink <math>t\neq s</math>. This takes <math>(n-1)\times</math>max-flow time where <math>n=|V|</math> is the number of vertices.

The fastest known deterministic algorithm for the minimum cut problem on multi-graphs is the [https://en.wikipedia.org/wiki/Stoer–Wagner_algorithm Stoer–Wagner algorithm], which achieves an <math>O(mn+n^2\log n)</math> time complexity where <math>m=|E|</math> is the total number of edges (counting the parallel edges).

If we restrict the input to be '''simple graphs''' (meaning there is no parallel edges) with no edge weight, there are better algorithms. A deterministic algorithm of [https://dl.acm.org/citation.cfm?id=2746588 Ken-ichi Kawarabayashi and Mikkel Thorup] published in STOC 2015, achieves the near-linear (in the number of edges) time complexity.

== Karger's ''Contraction'' algorithm ==
We will describe a simple and elegant randomized algorithm for the min-cut problem. The algorithm is due to [http://people.csail.mit.edu/karger/ David Karger].

Let <math>G(V, E)</math> be a '''multi-graph''', which allows more than one '''parallel edges''' between two distinct vertices <math>u</math> and <math>v</math> but does not allow any '''self-loops''': the edges that adjoin a vertex to itself. A multi-graph <math>G</math> can be represented by an adjacency matrix <math>A</math>, in the way that each non-diagonal entry <math>A(u,v)</math> takes nonnegative integer values instead of just 0 or 1, representing the number of parallel edges between <math>u</math> and <math>v</math> in <math>G</math>, and all diagonal entries <math>A(v,v)=0</math> (since there is no self-loop).

Given a multi-graph <math>G(V,E)</math> and an edge <math>e\in E</math>, we define the following '''contraction''' operator Contract(<math>G</math>, <math>e</math>), which transform <math>G</math> to a new multi-graph.
{{Theorem|The contraction operator ''Contract''(<math>G</math>, <math>e</math>)|
:say <math>e=uv</math>:
:*replace <math>\{u,v\}</math> by a new vertex <math>x</math>;
:*for every edge (no matter parallel or not) in the form of <math>uw</math> or <math>vw</math> that connects one of <math>\{u,v\}</math> to a vertex <math>w\in V\setminus\{u,v\}</math> in the graph other than <math>u,v</math>, replace it by a new edge <math>xw</math>;
:*the reset of the graph does not change.
}}

In other words, the <math>Contract(G,uv)</math> merges the two vertices <math>u</math> and <math>v</math> into a new vertex <math>x</math> whose incident edges preserves the edges incident to <math>u</math> or <math>v</math> in the original graph <math>G</math> except for the parallel edges between them. Now you should realize why we consider multi-graphs instead of simple graphs, because even if we start with a simple graph without parallel edges, the contraction operator may create parallel edges.

The contraction operator is illustrated by the following picture:
[[Image:Contract.png|600px|center]]

Karger's algorithm uses a simple idea:
*At each step we randomly select an edge in the current multi-graph to contract until there are only two vertices left.
*The parallel edges between these two remaining vertices must be a cut of the original graph.
*We return this cut and hope that with good chance this gives us a minimum cut.
The following is the pseudocode for Karger's algorithm.
{{Theorem|''RandomContract'' (Karger 1993)|
:'''Input:''' multi-graph <math>G(V,E)</math>;
----
:while <math>|V|>2</math> do
:* choose an edge <math>uv\in E</math> uniformly at random;
:* <math>G=Contract(G,uv)</math>;
:return <math>C=E</math> (the parallel edges between the only two vertices in <math>V</math>);
}}

Another way of looking at the contraction operator Contract(<math>G</math>,<math>e</math>) is that we are dealing with classes of vertices. Let <math>V=\{v_1,v_2,\ldots,v_n\}</math> be the set of all vertices. We start with <math>n</math> vertex classes <math>S_1,S_2,\ldots, S_n</math> with each class <math>S_i=\{v_i\}</math> contains one vertex. By calling <math>Contract(G,uv)</math>, where <math>u\in S_i</math> and <math>v\in S_j</math> for distinct <math>i\neq j</math>, we take union of <math>S_i</math> and <math>S_j</math>. The edges in the contracted multi-graph are the edges that cross between different vertex classes.

This view of contraction is illustrated by the following picture:
[[Image:Contract_class.png|600px|center]]

The following claim is left as an exercise for the class:
:{|border="2" width="100%" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|
*With suitable choice of data structures, each operation <math>Contract(G,e)</math> can be implemented within running time <math>O(n)</math> where <math>n=|V|</math> is the number of vertices.
|}

In the above '''''RandomContract''''' algorithm, there are precisely <math>n-2</math> contractions. Therefore, we have the following time upper bound.
{{Theorem|Theorem|
: For any multigraph with <math>n</math> vertices, the running time of the '''''RandomContract''''' algorithm is <math>O(n^2)</math>.
}}
We emphasize that it's the time complexity of a "single running" of the algorithm: later we will see we may need to run this algorithm for many times to guarantee a desirable accuracy.

== Analysis of accuracy ==
We now analyze the performance of the above algorithm. Since the algorithm is '''''randomized''''', its output cut is a random variable even when the input is fixed, so ''the output may not always be correct''. We want to give a theoretical guarantee of the chance that the algorithm returns a correct answer on an arbitrary input.

More precisely, on an arbitrarily fixed input multi-graph <math>G</math>, we want to answer the following question rigorously:
:<math>p_{\text{correct}}=\Pr[\,\text{a minimum cut is returned by }RandomContract\,]\ge ?</math>

To answer this question, we prove a stronger statement: for arbitrarily fixed input multi-graph <math>G</math> and a particular minimum cut <math>C</math> in <math>G</math>,
:<math>p_{C}=\Pr[\,C\mbox{ is returned by }RandomContract\,]\ge ?</math>
Obviously this will imply the previous lower bound for <math>p_{\text{correct}}</math> because the event in <math>p_{C}</math> implies the event in <math>p_{\text{correct}}</math>.
:{|border="2" width="100%" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|
*In above argument we use the simple law in probability that <math>\Pr[A]\le \Pr[B]</math> if <math>A\subseteq B</math>, i.e. event <math>A</math> implies event <math>B</math>.
|}

We introduce the following notations:
*Let <math>e_1,e_2,\ldots,e_{n-2}</math> denote the sequence of random edges chosen to contract in a running of ''RandomContract'' algorithm.
*Let <math>G_1=G</math> denote the original input multi-graph. And for <math>i=1,2,\ldots,n-2</math>, let <math>G_{i+1}=Contract(G_{i},e_i)</math> be the multigraph after <math>i</math>th contraction.
Obviously <math>e_1,e_2,\ldots,e_{n-2}</math> are random variables, and they are the ''only'' random choices used in the algorithm: meaning that they along with the input <math>G</math>, uniquely determine the sequence of multi-graphs <math>G_1,G_2,\ldots,G_{n-2}</math> in every iteration as well as the final output.

We now compute the probability <math>p_C</math> by decompose it into more elementary events involving <math>e_1,e_2,\ldots,e_{n-2}</math>. This is due to the following proposition.
{{Theorem
|Proposition 1|
:If <math>C</math> is a minimum cut in a multi-graph <math>G</math> and <math>e\not\in C</math>, then <math>C</math> is still a minimum cut in the contracted graph <math>G'=contract(G,e)</math>.
}}
{{Proof|
We first observe that contraction will never create new cuts: every cut in the contracted graph <math>G'</math> must also be a cut in the original graph <math>G</math>.

We then observe that a cut <math>C</math> in <math>G</math> "survives" in the contracted graph <math>G'</math> if and only if the contracted edge <math>e\not\in C</math>.

Both observations are easy to verify by the definition of contraction operator (in particular, easier to verify if we take the vertex class interpretation). The detailed proofs are left as an exercise.
}}

Recall that <math>e_1,e_2,\ldots,e_{n-2}</math> denote the sequence of random edges chosen to contract in a running of ''RandomContract'' algorithm.

By Proposition 1, the event <math>\mbox{``}C\mbox{ is returned by }RandomContract\mbox{''}\,</math> is equivalent to the event <math>\mbox{``}e_i\not\in C\mbox{ for all }i=1,2,\ldots,n-2\mbox{''}</math>. Therefore:
:<math>
\begin{align}
p_C
&=
\Pr[\,C\mbox{ is returned by }{RandomContract}\,]\\
&=
\Pr[\,e_i\not\in C\mbox{ for all }i=1,2,\ldots,n-2\,]\\
&=
\prod_{i=1}^{n-2}\Pr[e_i\not\in C\mid \forall j<i, e_j\not\in C].
\end{align}
</math>
The last equation is due to the so called '''chain rule''' in probability.
:{|border="2" width="100%" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|
*The '''chain rule''', also known as the '''law of progressive conditioning''', is the following proposition: for a sequence of events (not necessarily independent) <math>A_1,A_2,\ldots,A_n</math>,
::<math>\Pr[\forall i, A_i]=\prod_{i=1}^n\Pr[A_i\mid \forall j<i, A_j]</math>.
:It is a simple consequence of the definition of conditional probability. By definition of conditional probability,
::<math>\Pr[A_n\mid \forall j<n]=\frac{\Pr[\forall i, A_i]}{\Pr[\forall j<n, A_j]}</math>,
:and equivalently we have
::<math>\Pr[\forall i, A_i]=\Pr[\forall j<n, A_j]\Pr[A_n\mid \forall j<n]</math>.
:Recursively apply this to <math>\Pr[\forall j<n, A_j]</math> we obtain the chain rule.
|}

Back to the analysis of probability <math>p_C</math>.

Now our task is to give lower bound to each <math>p_i=\Pr[e_i\not\in C\mid \forall j<i, e_j\not\in C]</math>. The condition <math>\mbox{``}\forall j<i, e_j\not\in C\mbox{''}</math> means the min-cut <math>C</math> survives all first <math>i-1</math> contractions <math>e_1,e_2,\ldots,e_{i-1}</math>, which due to Proposition 1 means that <math>C</math> is also a min-cut in the multi-graph <math>G_i</math> obtained from applying the first <math>(i-1)</math> contractions.

Then the conditional probability <math>p_i=\Pr[e_i\not\in C\mid \forall j<i, e_j\not\in C]</math> is the probability that no edge in <math>C</math> is hit when a uniform random edge in the current multi-graph is chosen assuming that <math>C</math> is a minimum cut in the current multi-graph. Intuitively this probability should be bounded from below, because as a min-cut <math>C</math> should be sparse among all edges. This intuition is justified by the following proposition.

{{Theorem
|Proposition 2|
:If <math>C</math> is a min-cut in a multi-graph <math>G(V,E)</math>, then <math>|E|\ge \frac{|V||C|}{2}</math>.
}}
{{Proof|
:It must hold that the degree of each vertex <math>v\in V</math> is at least <math>|C|</math>, or otherwise the set of edges incident to <math>v</math> forms a cut of size smaller than <math>|C|</math> which separates <math>\{v\}</math> from the rest of the graph, contradicting that <math>C</math> is a min-cut. And the bound <math>|E|\ge \frac{|V||C|}{2}</math> follows directly from applying the [https://en.wikipedia.org/wiki/Handshaking_lemma handshaking lemma] to the fact that every vertex in <math>G</math> has degree at least <math>|C|</math>.
}}

Let <math>V_i</math> and <math>E_i</math> denote the vertex set and edge set of the multi-graph <math>G_i</math> respectively, and recall that <math>G_i</math> is the multi-graph obtained from applying first <math>(i-1)</math> contractions. Obviously <math>|V_{i}|=n-i+1</math>. And due to Proposition 2, <math>|E_i|\ge \frac{|V_i||C|}{2}</math> if <math>C</math> is still a min-cut in <math>G_i</math>.

The probability <math>p_i=\Pr[e_i\not\in C\mid \forall j<i, e_j\not\in C]</math> can be computed as
:<math>
\begin{align}
p_i
&=1-\frac{|C|}{|E_i|}\\
&\ge1-\frac{2}{|V_i|}\\
&=1-\frac{2}{n-i+1}
\end{align},</math>
where the inequality is due to Proposition 2.

We now can put everything together. We arbitrarily fix the input multi-graph <math>G</math> and any particular minimum cut <math>C</math> in <math>G</math>.
:<math>\begin{align}
p_{\text{correct}}
&=\Pr[\,\text{a minimum cut is returned by }RandomContract\,]\\
&\ge
\Pr[\,C\mbox{ is returned by }{RandomContract}\,]\\
&=
\Pr[\,e_i\not\in C\mbox{ for all }i=1,2,\ldots,n-2\,]\\
&=
\prod_{i=1}^{n-2}\Pr[e_i\not\in C\mid \forall j<i, e_j\not\in C]\\
&\ge
\prod_{i=1}^{n-2}\left(1-\frac{2}{n-i+1}\right)\\
&=
\prod_{k=3}^{n}\frac{k-2}{k}\\
&= \frac{2}{n(n-1)}.
\end{align}</math>

This gives us the following theorem.
{{Theorem
|Theorem|
: For any multigraph with <math>n</math> vertices, the ''RandomContract'' algorithm returns a minimum cut with probability at least <math>\frac{2}{n(n-1)}</math>.
}}
At first glance this seems to be a miserable chance of success. However, notice that there may be exponential many cuts in a graph (because potentially every nonempty subset <math>S\subset V</math> corresponds to a cut <math>C=E(S,\overline{S})</math>), and Karger's algorithm effectively reduce this exponential-sized space of feasible solutions to a quadratic size one, an exponential improvement!

We can run ''RandomContract'' independently for <math>t=\frac{n(n-1)\ln n}{2}</math> times and return the smallest cut ever returned. The probability that a minimum cut is found is at least:

:<math>\begin{align}
&\quad 1-\Pr[\,\mbox{all }t\mbox{ independent runnings of } RandomContract\mbox{ fails to find a min-cut}\,] \\
&= 1-\Pr[\,\mbox{a single running of }{RandomContract}\mbox{ fails}\,]^{t} \\
&\ge 1- \left(1-\frac{2}{n(n-1)}\right)^{\frac{n(n-1)\ln n}{2}} \\
&\ge 1-\frac{1}{n}.
\end{align}</math>

Recall that a running of ''RandomContract'' algorithm takes <math>O(n^2)</math> time. Altogether this gives us a randomized algorithm running in time <math>O(n^4\log n)</math> and find a minimum cut [https://en.wikipedia.org/wiki/With_high_probability '''with high probability'''].

== A Corollary by the Probabilistic Method ==
The analysis of Karger's algorithm implies the following combinatorial proposition for the number of distinct minimum cuts in a graph.
{{Theorem|Corollary|
:For any graph <math>G(V,E)</math> of <math>n</math> vertices, the number of distinct minimum cuts in <math>G</math> is at most <math>\frac{n(n-1)}{2}</math>.
}}
{{Proof|
Let <math>\mathcal{C}</math> denote the set of all minimum cuts in <math>G</math>. For each min-cut <math>C\in\mathcal{C}</math>, let <math>A_C</math> denote the event "<math>C</math> is returned by ''RandomContract''", whose probability is given by
:<math>p_C=\Pr[A_C]\,</math>.

Clearly we have:
* for any distinct <math>C,D\in\mathcal{C}</math>, <math>A_C\,</math> and <math>A_{D}\,</math> are '''disjoint events'''; and
* the union <math>\bigcup_{C\in\mathcal{C}}A_C</math> is precisely the event "a minimum cut is returned by ''RandomContract''", whose probability is given by
::<math>p_{\text{correct}}=\Pr[\,\text{a minimum cut is returned by } RandomContract\,]</math>.
Due to the [https://en.wikipedia.org/wiki/Probability_axioms#Third_axiom '''additivity of probability'''], it holds that
:<math>
p_{\text{correct}}=\sum_{C\in\mathcal{C}}\Pr[A_C]=\sum_{C\in\mathcal{C}}p_C.
</math>

By the analysis of Karger's algorithm, we know <math>p_C\ge\frac{2}{n(n-1)}</math>. And since <math>p_{\text{correct}}</math> is a well defined probability, due to the [https://en.wikipedia.org/wiki/Probability_axioms#Second_axiom '''unitarity of probability'''], it must hold that <math>p_{\text{correct}}\le 1</math>. Therefore,
:<math>1\ge p_{\text{correct}}=\sum_{C\in\mathcal{C}}p_C\ge|\mathcal{C}|\frac{2}{n(n-1)}</math>,
which means <math>|\mathcal{C}|\le\frac{n(n-1)}{2}</math>.
}}

Note that the statement of this theorem has no randomness at all, while the proof consists of a randomized procedure. This is an example of [http://en.wikipedia.org/wiki/Probabilistic_method the probabilistic method].

== Fast Min-Cut ==
In the analysis of ''RandomContract'' algorithm, recall that we lower bound the probability <math>p_C</math> that a min-cut <math>C</math> is returned by ''RandomContract'' by the following '''telescopic product''':
:<math>p_C\ge\prod_{i=1}^{n-2}\left(1-\frac{2}{n-i+1}\right)</math>.
Here the index <math>i</math> corresponds to the <math>i</math>th contraction. The factor <math>\left(1-\frac{2}{n-i+1}\right)</math> is decreasing in <math>i</math>, which means:
* The probability of success is only getting bad when the graph is getting "too contracted", that is, when the number of remaining vertices is getting small.
This motivates us to consider the following alternation to the algorithm: first using random contractions to reduce the number of vertices to a moderately small number, and then recursively finding a min-cut in this smaller instance. This seems just a restatement of exactly what we have been doing. Inspired by the idea of boosting the accuracy via independent repetition, here we apply the recursion on ''two'' smaller instances generated independently.

The algorithm obtained in this way is called ''FastCut''. We first define a procedure to randomly contract edges until there are <math>t</math> number of vertices left.

{{Theorem|''RandomContract''<math>(G, t)</math>|
:'''Input:''' multi-graph <math>G(V,E)</math>, and integer <math>t\ge 2</math>;
----
:while <math>|V|>t</math> do
:* choose an edge <math>uv\in E</math> uniformly at random;
:* <math>G=Contract(G,uv)</math>;
:return <math>G</math>;
}}

The ''FastCut'' algorithm is recursively defined as follows.
{{Theorem|''FastCut''<math>(G)</math>|
:'''Input:''' multi-graph <math>G(V,E)</math>;
----
:if <math>|V|\le 6</math> then return a mincut by brute force;
:else let <math>t=\left\lceil1+|V|/\sqrt{2}\right\rceil</math>;
:: <math>G_1=RandomContract(G,t)</math>;
:: <math>G_2=RandomContract(G,t)</math>;
::return the smaller one of <math>FastCut(G_1)</math> and <math>FastCut(G_2)</math>;
}}

As before, all <math>G</math> are multigraphs.

Fix a min-cut <math>C</math> in the original multigraph <math>G</math>. By the same analysis as in the case of ''RandomContract'', we have
:<math>
\begin{align}
&\Pr[C\text{ survives all contractions in }RandomContract(G,t)]\\
=
&\prod_{i=1}^{n-t}\Pr[C\text{ survives the }i\text{-th contraction}\mid C\text{ survives the first }(i-1)\text{-th contractions}]\\
\ge
&\prod_{i=1}^{n-t}\left(1-\frac{2}{n-i+1}\right)\\
=
&\prod_{k=t+1}^{n}\frac{k-2}{k}\\
=
&\frac{t(t-1)}{n(n-1)}.
\end{align}
</math>
When <math>t=\left\lceil1+n/\sqrt{2}\right\rceil</math>, this probability is at least <math>1/2</math>. The choice of <math>t</math> is due to our purpose to make this probability at least <math>1/2</math>. You will see this is crucial in the following analysis of accuracy.

We denote by <math>A</math> and <math>B</math> the following events:
:<math>
\begin{align}
A:
&\quad C\text{ survives all contractions in }RandomContract(G,t);\\
B:
&\quad\text{size of min-cut is unchanged after }RandomContract(G,t);
\end{align}
</math>
Clearly, <math>A</math> implies <math>B</math> and by above analysis <math>\Pr[B]\ge\Pr[A]\ge\frac{1}{2}</math>.

We denote by <math>p(n)</math> the lower bound on the probability that <math>FastCut(G)</math> succeeds for a multigraph of <math>n</math> vertices, that is
:<math>
p(n)
=\min_{G: |V|=n}\Pr[\,FastCut(G)\text{ returns a min-cut in }G\,].
</math>
Suppose that <math>G</math> is the multigraph that achieves the minimum in above definition. The following recurrence holds for <math>p(n)</math>.
:<math>
\begin{align}
p(n)
&=
\Pr[\,FastCut(G)\text{ returns a min-cut in }G\,]\\
&=
\Pr[\,\text{ a min-cut of }G\text{ is returned by }FastCut(G_1)\text{ or }FastCut(G_2)\,]\\
&\ge
1-\left(1-\Pr[B\wedge FastCut(G_1)\text{ returns a min-cut in }G_1\,]\right)^2\\
&\ge
1-\left(1-\Pr[A\wedge FastCut(G_1)\text{ returns a min-cut in }G_1\,]\right)^2\\
&=
1-\left(1-\Pr[A]\Pr[ FastCut(G_1)\text{ returns a min-cut in }G_1\mid A]\right)^2\\
&\ge
1-\left(1-\frac{1}{2}p\left(\left\lceil1+n/\sqrt{2}\right\rceil\right)\right)^2,
\end{align}
</math>
where <math>A</math> and <math>B</math> are defined as above such that <math>\Pr[A]\ge\frac{1}{2}</math>.

The base case is that <math>p(n)=1</math> for <math>n\le 6</math>. By induction it is easy to prove that
:<math>
p(n)=\Omega\left(\frac{1}{\log n}\right).
</math>

Recall that we can implement an edge contraction in <math>O(n)</math> time, thus it is easy to verify the following recursion of time complexity:
:<math>
T(n)=2T\left(\left\lceil1+n/\sqrt{2}\right\rceil\right)+O(n^2),
</math>
where <math>T(n)</math> denotes the running time of <math>FastCut(G)</math> on a multigraph <math>G</math> of <math>n</math> vertices.

By induction with the base case <math>T(n)=O(1)</math> for <math>n\le 6</math>, it is easy to verify that <math>T(n)=O(n^2\log n)</math>.

{{Theorem
|Theorem|
: For any multigraph with <math>n</math> vertices, the ''FastCut'' algorithm returns a minimum cut with probability <math>\Omega\left(\frac{1}{\log n}\right)</math> in time <math>O(n^2\log n)</math>.
}}

At this point, we see the name ''FastCut'' is misleading because it is actually slower than the original ''RandomContract'' algorithm, only the chance of successfully finding a min-cut is much better (improved from an <math>\Omega(1/n^2)</math> to an <math>\Omega(1/\log n)</math>).

Given any input multi-graph, repeatedly running the ''FastCut'' algorithm independently for some <math>O((\log n)^2)</math> times and returns the smallest cut ever returned, we have an algorithm which runs in time <math>O(n^2\log^3n)</math> and returns a min-cut with probability <math>1-O(1/n)</math>, i.e. with high probability.

Recall that the running time of best known deterministic algorithm for min-cut on multi-graph is <math>O(mn+n^2\log n)</math>. On dense graph, the randomized algorithm outperforms the best known deterministic algorithm.

Finally, Karger further improves this and obtains a near-linear (in the number of edges) time [https://arxiv.org/abs/cs/9812007 randomized algorithm] for minimum cut in multi-graphs.

= Max-Cut=
The '''maximum cut problem''', in short the '''max-cut problem''', is defined as follows.
{{Theorem|Max-cut problem|
*'''Input''': an undirected graph <math>G(V,E)</math>;
*'''Output''': a bipartition of <math>V</math> into disjoint subsets <math>S</math> and <math>T</math> that maximizes <math>|E(S,T)|</math>.
}}

The problem is a typical MAX-CSP, an optimization version of the [https://en.wikipedia.org/wiki/Constraint_satisfaction_problem constraint satisfaction problem]. An instance of CSP consists of:
* a set of variables <math>x_1,x_2,\ldots,x_n</math> usually taking values from some finite domain;
* a sequence of constraints (predicates) <math>C_1,C_2,\ldots, C_m</math> defined on those variables.
The MAX-CSP asks to find an assignment of values to variables <math>x_1,x_2,\ldots,x_n</math> which maximizes the number of satisfied constraints.

In particular, when the variables <math>x_1,x_2,\ldots,x_n</math> takes Boolean values <math>\{0,1\}</math> and every constraint is a binary constraint <math>\cdot\neq\cdot</math> in the form of <math>x_1\neq x_j</math>, then the MAX-CSP is precisely the max-cut problem.

Unlike the min-cut problem, which can be solved in polynomial time, the max-cut is known to be [https://en.wikipedia.org/wiki/NP-hardness '''NP-hard''']. Its decision version is among the [https://en.wikipedia.org/wiki/Karp%27s_21_NP-complete_problems 21 '''NP-complete''' problems found by Karp]. This means we should not hope for a polynomial-time algorithm for solving the problem if [https://en.wikipedia.org/wiki/P_versus_NP_problem a famous conjecture in computational complexity] is correct. And due to another [https://en.wikipedia.org/wiki/BPP_(complexity)#Problems less famous conjecture in computational complexity], randomization alone probably cannot help this situation either.

We may compromise our goal and allow algorithm to ''not always find the optimal solution''. However, we still want to guarantee that the algorithm ''always returns a relatively good solution on all possible instances''. This notion is formally captured by '''approximation algorithms''' and '''approximation ratio'''.

== Greedy algorithm ==
A natural heuristics for solving the max-cut is to sequentially join the vertices to one of the two disjoint subsets <math>S</math> and <math>T</math> to ''greedily'' maximize the ''current'' number of edges crossing between <math>S</math> and <math>T</math>.

To state the algorithm, we overload the definition <math>E(S,T)</math>. Given an undirected graph <math>G(V,E)</math>, for any disjoint subsets <math>S,T\subseteq V</math> of vertices, we define
:<math>E(S,T)=\{uv\in E\mid u\in S, v\in T\}</math>.

We also assume that the vertices are ordered arbitrarily as <math>V=\{v_1,v_2,\ldots,v_n\}</math>.

The greedy heuristics is then described as follows.
{{Theorem|''GreedyMaxCut''|
:'''Input:''' undirected graph <math>G(V,E)</math>,
:::with an arbitrary order of vertices <math>V=\{v_1,v_2,\ldots,v_n\}</math>;
----
:initially <math>S=T=\emptyset</math>;
:for <math>i=1,2,\ldots,n</math>
::<math>v_i</math> joins one of <math>S,T</math> to maximize the current <math>|E(S,T)|</math> (breaking ties arbitrarily);
}}

The algorithm certainly runs in polynomial time.

Without any guarantee of how good the solution returned by the algorithm approximates the optimal solution, the algorithm is only a heuristics, not an '''approximation algorithm'''.

=== Approximation ratio ===
For now we restrict ourselves to the max-cut problem, although the notion applies more generally.

Let <math>G</math> be an arbitrary instance of max-cut problem. Let <math>OPT_G</math> denote the size of the of max-cut in graph <math>G</math>. More precisely,
:<math>OPT_G=\max_{S\subseteq V}|E(S,\overline{S})|</math>.
Let <math>SOL_G</math> be the size of of the cut <math>|E(S,T)|</math> returned by the ''GreedyMaxCut'' algorithm on input graph <math>G</math>.

As a maximization problem it is trivial that <math>SOL_G\le OPT_G</math> for all <math>G</math>. To guarantee that the ''GreedyMaxCut'' gives good approximation of optimal solution, we need the other direction:
{{Theorem|Approximation ratio|
:We say that the '''approximation ratio''' of the ''GreedyMaxCut'' algorithm is <math>\alpha</math>, or ''GreedyMaxCut'' is an '''<math>\alpha</math>-approximation''' algorithm, for some <math>0<\alpha\le 1</math>, if
::<math>\frac{SOL_G}{OPT_G}\ge \alpha</math> for every possible instance <math>G</math> of max-cut.
}}

With this notion, we now try to analyze the approximation ratio of the ''GreedyMaxCut'' algorithm.

A dilemma to apply this notion in our analysis is that in the definition of approximation ratio, we compare the solution returned by the algorithm with the '''optimal solution'''. However, in the analysis we can hardly conduct similar comparisons to the optimal solutions. A fallacy in this logic is that the optimal solutions are '''NP-hard''', meaning there is no easy way to calculate them (e.g. a closed form).

A popular step (usually the first step of analyzing approximation ratio) to avoid this dilemma is that instead of directly comparing to the optimal solution, we compare to an '''upper bound''' of the optimal solution (for minimization problem, this needs to be a lower bound), that is, we compare to something which is even better than the optimal solution (which means it cannot be realized by any feasible solution).

For the max-cut problem, a simple upper bound to <math>OPT_G</math> is <math>|E|</math>, the number of all edges. This is a trivial upper bound of max-cut since any cut is a subset of edges.

Let <math>G(V,E)</math> be the input graph and <math>V=\{v_1,v_2,\ldots,v_n\}</math>. Initially <math>S_1=T_1=\emptyset</math>. And for <math>i=1,2,\ldots,n</math>, we let <math>S_{i+1}</math> and <math>T_{i+1}</math> be the respective <math>S</math> and <math>T</math> after <math>v_i</math> joins one of <math>S,T</math>. More precisely,
* <math>S_{i+1}=S_i\cup\{v_i\}</math> and <math>T_{i+1}=T_i\,</math> if <math>E(S_{i}\cup\{v_i\},T_i)>E(S_{i},T_i\cup\{v_i\})</math>;
* <math>S_{i+1}=S_i\,</math> and <math>T_{i+1}=T_i\cup\{v_i\}</math> if otherwise.
Finally, the max-cut is given by
:<math>SOL_G=|E(S_{n+1},T_{n+1})|</math>.

We first observe that we can count the number of edges <math>|E|</math> by summarizing the contributions of individual <math>v_i</math>'s.
{{Theorem|Proposition 1|
:<math>|E| = \sum_{i=1}^n\left(|E(S_i,\{v_i\})|+|E(T_i,\{v_i\})|\right)</math>.
}}
{{Proof|
Note that <math>S_i\cup T_i=\{v_1,v_2,\ldots,v_{i-1}\}</math>, i.e. <math>S_i</math> and <math>T_i</math> together contain precisely those vertices preceding <math>v_i</math>. Therefore, by taking the sum
:<math>\sum_{i=1}^n\left(|E(S_i,\{v_i\})|+|E(T_i,\{v_i\})|\right)</math>,
we effectively enumerate all <math>(v_j,v_i)</math> that <math>v_jv_i\in E</math> and <math>j<i</math>. The total number is precisely <math>|E|</math>.
}}

We then observe that the <math>SOL_G</math> can be decomposed into contributions of individual <math>v_i</math>'s in the same way.
{{Theorem|Proposition 2|
:<math>SOL_G = \sum_{i=1}^n\max\left(|E(S_i, \{v_i\})|,|E(T_i, \{v_i\})|\right)</math>.
}}
{{Proof|
It is east to observe that <math>E(S_i,T_i)\subseteq E(S_{i+1},T_{i+1})</math>, i.e. once an edge joins the cut between current <math>S,T</math> it will never drop from the cut in the future.

We then define
:<math>\Delta_i= |E(S_{i+1},T_{i+1})|-|E(S_i,T_i)|=|E(S_{i+1},T_{i+1})\setminus E(S_i,T_i)|</math>
to be the contribution of <math>v_i</math> in the final cut.

It holds that
:<math>\sum_{i=1}^n\Delta_i=|E(S_{n+1},T_{n+1})|-|E(S_{1},T_{1})|=|E(S_{n+1},T_{n+1})|=SOL_G</math>.
On the other hand, due to the greedy rule:
* <math>S_{i+1}=S_i\cup\{v_i\}</math> and <math>T_{i+1}=T_i\,</math> if <math>E(S_{i}\cup\{v_i\},T_i)>E(S_{i},T_i\cup\{v_i\})</math>;
* <math>S_{i+1}=S_i\,</math> and <math>T_{i+1}=T_i\cup\{v_i\}</math> if otherwise;
it holds that
:<math>\Delta_i=|E(S_{i+1},T_{i+1})\setminus E(S_i,T_i)| = \max\left(|E(S_i, \{v_i\})|,|E(T_i, \{v_i\})|\right)</math>.
Together the proposition follows.
}}

Combining the above Proposition 1 and Proposition 2, we have
:<math>
\begin{align}
SOL_G
&= \sum_{i=1}^n\max\left(|E(S_i, \{v_i\})|,|E(T_i, \{v_i\})|\right)\\
&\ge \frac{1}{2}\sum_{i=1}^n\left(|E(S_i, \{v_i\})|+|E(T_i, \{v_i\})|\right)\\
&=\frac{1}{2}|E|\\
&\ge\frac{1}{2}OPT_G.
\end{align}
</math>

{{Theorem|Theorem|
:The ''GreedyMaxCut'' is a <math>0.5</math>-approximation algorithm for the max-cut problem.
}}

This is not the best approximation ratio achieved by polynomial-time algorithms for max-cut.
* The best known approximation ratio achieved by any polynomial-time algorithm is achieved by the [http://www-math.mit.edu/~goemans/PAPERS/maxcut-jacm.pdf Goemans-Williamson algorithm], which relies on rounding an [https://en.wikipedia.org/wiki/Semidefinite_programming SDP] relaxation of the max-cut, and achieves an approximation ratio <math>\alpha^*\approx 0.878</math>, where <math>\alpha^*</math> is an irrational whose precise value is given by <math>\alpha^*=\frac{2}{\pi}\inf_{x\in[-1,1]}\frac{\arccos(x)}{1-x}</math>.
* Assuming the [https://en.wikipedia.org/wiki/Unique_games_conjecture unique game conjecture], there does not exist any polynomial-time algorithm for max-cut with approximation ratio <math>\alpha>\alpha^*</math>.

== Derandomization by conditional expectation ==
There is a probabilistic interpretation of the greedy algorithm, which may explains why we use greedy scheme for max-cut and why it works for finding an approximate max-cut.

Given an undirected graph <math>G(V,E)</math>, let us calculate the average size of cuts in <math>G</math>. For every vertex <math>v\in V</math> let <math>X_v\in\{0,1\}</math> be a ''uniform'' and ''independent'' random bit which indicates whether <math>v</math> joins <math>S</math> or <math>T</math>. This gives us a uniform random bipartition of <math>V</math> into <math>S</math> and <math>T</math>.

The size of the random cut <math>|E(S,T)|</math> is given by
:<math>
|E(S,T)| = \sum_{uv\in E} I[X_u\neq X_v],
</math>
where <math>I[X_u\neq X_v]</math> is the Boolean indicator random variable that indicates whether event <math>X_u\neq X_v</math> occurs.

Due to '''linearity of expectation''',
:<math>
\mathbb{E}[|E(S,T)|]=\sum_{uv\in E} \mathbb{E}[I[X_u\neq X_v]] =\sum_{uv\in E} \Pr[X_u\neq X_v]=\frac{|E|}{2}.
</math>
Recall that <math>|E|</math> is a trivial upper bound for the max-cut <math>OPT_G</math>. Due to the above argument, we have
:<math>
\mathbb{E}[|E(S,T)|]\ge\frac{OPT_G}{2}.
</math>
:{|border="2" width="100%" cellspacing="4" cellpadding="3" rules="all" style="margin:1em 1em 1em 0; border:solid 1px #AAAAAA; border-collapse:collapse;empty-cells:show;"
|
*In above argument we use a few probability propositions.
: '''linearity of expectation:'''
:: Let <math>\boldsymbol{X}=(X_1,X_2,\ldots,X_n)</math> be a random vector. Then
:::<math>\mathbb{E}\left[\sum_{i=1}^nc_iX_i\right]=\sum_{i=1}^nc_i\mathbb{E}[X_i]</math>,
::where <math>c_1,c_2,\ldots,c_n</math> are scalars.
::That is, the order of computations of expectation and linear (affine) function of a random vector can be exchanged.
::Note that this property ignores the dependency between random variables, and hence is very useful.
:'''Expectation of indicator random variable:'''
::We usually use the notation <math>I[A]</math> to represent the Boolean indicator random variable that indicates whether the event <math>A</math> occurs: i.e. <math>I[A]=1</math> if event <math>A</math> occurs and <math>I[A]=0</math> if otherwise.
::It is easy to see that <math>\mathbb{E}[I[A]]=\Pr[A]</math>. The expectation of an indicator random variable equals the probability of the event it indicates.
|}

By above analysis, the average (under uniform distribution) size of all cuts in any graph <math>G</math> must be at least <math>\frac{OPT_G}{2}</math>. Due to '''the probabilistic method''', in particular '''the averaging principle''', there must exists a bipartition of <math>V</math> into <math>S</math> and <math>T</math> whose cut <math>E(S,T)</math> is of size at least <math>\frac{OPT_G}{2}</math>. Then next question is how to find such a bipartition <math>\{S,T\}</math> ''algorithmically''.

We still fix an arbitrary order of all vertices as <math>V=\{v_1,v_2,\ldots,v_n\}</math>. Recall that each vertex <math>v_i</math> is associated with a uniform and independent random bit <math>X_{v_i}</math> to indicate whether <math>v_i</math> joins <math>S</math> or <math>T</math>. We want to fix the value of <math>X_{v_i}</math> one after another to construct a bipartition <math>\{\hat{S},\hat{T}\}</math> of <math>V</math> such that
:<math>|E(\hat{S},\hat{T})|\ge\mathbb{E}[|E(S,T)|]\ge\frac{OPT_G}{2}</math>.

We start with the first vertex <math>v_i</math> and its random variable <math>X_{v_1}</math>. By the '''law of total expectation''',
:<math>
\mathbb{E}[E(S,T)]=\frac{1}{2}\mathbb{E}[E(S,T)\mid X_{v_1}=0]+\frac{1}{2}\mathbb{E}[E(S,T)\mid X_{v_1}=1].
</math>
There must exist an assignment <math>x_1\in\{0,1\}</math> of <math>X_{v_1}</math> such that
:<math>\mathbb{E}[E(S,T)\mid X_{v_1}=x_1]\ge \mathbb{E}[E(S,T)]</math>.
We can continuously applying this argument. In general, for any <math>i\le n</math> and any particular partial assignment <math>x_1,x_2,\ldots,x_{i-1}\in\{0,1\}</math> of <math>X_{v_1},X_{v_2},\ldots,X_{v_{i-1}}</math>, by the law of total expectation
:<math>
\begin{align}
\mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{i-1}}=x_{i-1}]
=
&\frac{1}{2}\mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{i-1}}=x_{i-1}, X_{v_{i}}=0]\\
&+\frac{1}{2}\mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{i-1}}=x_{i-1}, X_{v_{i}}=1].
\end{align}
</math>
There must exist an assignment <math>x_{i}\in\{0,1\}</math> of <math>X_{v_i}</math> such that
:<math>
\mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{i}}=x_{i}]\ge \mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{i-1}}=x_{i-1}].
</math>
By this argument, we can find a sequence <math>x_1,x_2,\ldots,x_n\in\{0,1\}</math> of bits which forms a ''monotone path'':
:<math>
\mathbb{E}[E(S,T)]\le \cdots \le \mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{i-1}}=x_{i-1}] \le \mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{i}}=x_{i}] \le \cdots \le \mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{n}}=x_{n}].
</math>
We already know the first step of this monotone path <math>\mathbb{E}[E(S,T)]\ge\frac{OPT_G}{2}</math>. And for the last step of the monotone path <math>\mathbb{E}[E(S,T)\mid X_{v_1}=x_1,\ldots, X_{v_{n}}=x_{n}]</math> since all random bits have been fixed, a bipartition <math>(\hat{S},\hat{T})</math> is determined by the assignment <math>x_1,\ldots, x_n</math>, so the expectation has no effect except just retuning the size of that cut <math>|E(\hat{S},\hat{T})|</math>. We found the cut <math>E(\hat{S},\hat{T})</math> such that <math>|E(\hat{S},\hat{T})|\ge \frac{OPT_G}{2}</math>.

We translate the procedure of constructing this monotone path of conditional expectation to the following algorithm.
{{Theorem|''MonotonePath''|
:'''Input:''' undirected graph <math>G(V,E)</math>,
:::with an arbitrary order of vertices <math>V=\{v_1,v_2,\ldots,v_n\}</math>;
----
:initially <math>S=T=\emptyset</math>;
:for <math>i=1,2,\ldots,n</math>
::<math>v_i</math> joins one of <math>S,T</math> to maximize the average size of cut conditioning on the choices made so far by the vertices <math>v_1,v_2,\ldots,v_i</math>;
}}
We leave as an exercise to verify that the choice of each <math>v_i</math> (to join which one of <math>S,T</math>) in the ''MonotonePath'' algorithm (which maximizes the average size of cut conditioning on the choices made so far by the vertices <math>v_1,v_2,\ldots,v_i</math>) must be the same choice made by <math>v_i</math> in the ''GreedyMaxCut'' algorithm (which maximizes the current <math>|E(S,T)|</math>).

Therefore, the greedy algorithm for max-cut is actually due to a derandomization of average-case.

== Derandomization by pairwise independence ==
We still construct a random bipartition of <math>V</math> into <math>S</math> and <math>T</math>. But this time the random choices have '''bounded independence'''.

For each vertex <math>v\in V</math>, we use a Boolean random variable <math>Y_v\in\{0,1\}</math> to indicate whether <math>v</math> joins <math>S</math> and <math>T</math>. The dependencies between <math>Y_v</math>'s are to be specified later.

By linearity of expectation, regardless of the dependencies between <math>Y_v</math>'s, it holds that:
:<math>
\mathbb{E}[|E(S,T)|]=\sum_{uv\in E} \Pr[Y_u\neq Y_v].
</math>
In order to have the average cut <math>\mathbb{E}[|E(S,T)|]=\frac{|E|}{2}</math> as the fully random case, we need <math>\Pr[Y_u\neq Y_v]=\frac{1}{2}</math>. This only requires that the Boolean random variables <math>Y_v</math>'s are uniform and '''pairwise independent''' instead of being '''mutually independent'''.

The <math>n</math> pairwise independent random bits <math>\{Y_v\}_{v\in V}</math> can be constructed by at most <math>k=\lceil\log (n+1)\rceil</math> mutually independent random bits <math>X_1,X_2,\ldots,X_k\in\{0,1\}</math> by the following standard routine.

{{Theorem|Theorem|
:Let <math>X_1, X_2, \ldots, X_k\in\{0,1\}</math> be mutually independent uniform random bits.
:Let <math>S_1, S_2, \ldots, S_{2^k-1}\subseteq \{1,2,\ldots,k\}</math> enumerate the <math>2^k-1</math> nonempty subsets of <math>\{1,2,\ldots,k\}</math>.
:For each <math>i\le i\le2^k-1</math>, let
::<math>Y_i=\bigoplus_{j\in S_i}X_j=\left(\sum_{j\in S_i}X_j\right)\bmod 2.</math>
:Then <math>Y_1,Y_2,\ldots,Y_{2^k-1}</math> are pairwise independent uniform random bits.
}}

If <math>Y_v</math> for each vertex <math>v\in V</math> is constructed in this way by at most <math>k=\lceil\log (n+1)\rceil</math> mutually independent random bits <math>X_1,X_2,\ldots,X_k\in\{0,1\}</math>, then they are uniform and pairwise independent, which by the above calculation, it holds for the corresponding bipartition <math>\{S,T\}</math> of <math>V</math> that
:<math>
\mathbb{E}[|E(S,T)|]=\sum_{uv\in E} \Pr[Y_u\neq Y_v]=\frac{|E|}{2}.
</math>
Note that the average is taken over the random choices of <math>X_1,X_2,\ldots,X_k\in\{0,1\}</math> (because they are the only random choices used to construct the bipartition <math>\{S,T\}</math>). By the probabilistic method, there must exist an assignment of <math>X_1,X_2,\ldots,X_k\in\{0,1\}</math> such that the corresponding <math>Y_v</math>'s and the bipartition <math>\{S,T\}</math> of <math>V</math> indicated by the <math>Y_v</math>'s have that
:<math>|E(S,T)|\ge \frac{|E|}{2}\ge\frac{OPT}{2}</math>.

This gives us the following algorithm for exhaustive search in a smaller solution space of size <math>2^k-1=O(n^2)</math>.
{{Theorem|Algorithm|
:Enumerate vertices as <math>V=\{v_1,v_2,\ldots,v_n\}</math>;
:let <math>k=\lceil\log (n+1)\rceil</math>;
:for all <math>\vec{x}\in\{0,1\}^k</math>
::initialize <math>S_{\vec{x}}=T_{\vec{x}}=\emptyset</math>;
::for <math>i=1, 2, \ldots, n</math>
:::if <math>\bigoplus_{j:\lfloor i/2^j\rfloor\bmod 2=1}x_j=1</math> then <math>v_i</math> joins <math>S_{\vec{x}}</math>;
:::else <math>v_i</math> joins <math>T_{\vec{x}}</math>;
:return the <math>\{S_{\vec{x}},T_{\vec{x}}\}</math> with the largest <math>|E(S_{\vec{x}},T_{\vec{x}})|</math>;
}}
The algorithm has approximation ratio 1/2 and runs in polynomial time.

= Spectral Cut =
== Expansion, Conductance and Sparsest Cut in Regular Graphs==
Consider an undirected <math>d</math>-regular (multi-)graph <math>G(V,E)</math>, where the parallel edges between two vertices are allowed.
For <math>S,T\subset V</math>, let <math>E(S,T)=\{uv\in E\mid u\in S,v\in T\}</math>.

{{Theorem
|Definition (Edge expansion)|
:The '''Edge expansion''' of an undirected graph <math>G</math> on <math>n</math> vertices, is defined as
::<math>
h(G)=\min_{\overset{S\subset V}{|S|\le\frac{n}{2}}} \frac{|E(S, \bar{S}|}{|S|}.
</math>
}}
As a side note, the edge expansion in a irregular graph has exactly the same definition.
The edge expansion is a very hard to approximate problem. Under the [https://en.wikipedia.org/wiki/Unique_games_conjecture Unique games conjecture], it is NP-hard to approximate within constant factors.

{{Theorem
|Definition (Conductance)|
:The '''Conductance''' of an undirected graph <math>G</math> on <math>n</math> vertices, is defined as
::<math>
\varphi(G)=h(G)/d.
</math>
}}

{{Theorem
|Definition (Sparsest cut)|
:The '''Sparsest cut''' of an undirected graph <math>G</math> on <math>n</math> vertices, is a set <math>S</math> minimizing
::<math>
\frac{|E(S, \bar{S}|}{\min\{|S|,|\bar{S}|\}}.
</math>
}}

== Spectrum of Regular Graphs==
The '''adjacency matrix''' of an <math>n</math>-vertex graph <math>G</math>, denoted <math>A = A(G)</math>, is an <math>n\times n</math> matrix where <math>A(u,v)</math> is the number of edges in <math>G</math> between vertex <math>u</math> and vertex <math>v</math>. Because <math>A</math> is a symmetric matrix with real entries, due to the [https://en.wikipedia.org/wiki/Spectral_theorem Spectral theorem], it has real eigenvalues <math>\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n</math>, which associate with an orthonormal system of eigenvectors <math>v_1,v_2,\ldots, v_n\,</math> with <math>Av_i=\lambda_i v_i\,</math>. We call the eigenvalues of <math>A</math> the '''spectrum''' of the graph <math>G</math>.

The spectrum of a graph carries a lot of information about the graph. For example, suppose that <math>G</math> is <math>d</math>-regular, the following lemma holds.
{{Theorem
|Lemma|
# <math>|\lambda_i|\le d</math> for all <math>1\le i\le n</math>.
# <math>\lambda_1=d</math> and the corresponding eigenvector is <math>\vec{1}</math>.
# <math>G</math> is connected if and only if <math>\lambda_2<\lambda_1</math>.
}}
{{Proof| Let <math>A</math> be the adjacency matrix of <math>G</math>, with entries <math>a_{ij}</math>. It is obvious that <math>\sum_{j}a_{ij}=d\,</math> for any <math>j</math>.
*(1) Suppose that <math>Ax=\lambda x, x\neq \mathbf{0}</math>, and let <math>x_i</math> be an entry of <math>x</math> with the largest absolute value. Since <math>(Ax)_i=\lambda x_i</math>, we have
::<math>
\sum_{j}a_{ij}x_j=\lambda x_i,\,
</math>
:and so
::<math>
|\lambda||x_i|=\left|\sum_{j}a_{ij}x_j\right|\le \sum_{j}a_{ij}|x_j|\le \sum_{j}a_{ij}|x_i| \le d|x_i|.
</math>
:Thus <math>|\lambda|\le d</math>.
*(2) is easy to check.
*(3) Suppose that <math>G</math> is connected. Let <math>x\neq 0</math> be an eigenvector for which <math>Ax=dx</math>. Without loss of generality we can assume that <math>\max_i x_i>0 </math>.
:We first show that <math>\forall i\left(x_i = \max_j x_j \implies \forall k\sim i, x_k=x_i \right).</math>

:This claim will imply that <math>x=c\vec{1}</math> for a connected graph <math>G</math>, so we prove it next, by contradiction.
:Let <math>x_i</math> be an entry of <math>x</math> with the largest absolute value. Suppose that <math>\exists k\sim i, x_k < x_i</math>. Since <math>\sum_{j}a_{ij}=d</math> and $x_j \le x_i$, we have
::<math>
\sum_{j}a_{ij}x_j < d x_i.\,
</math>
:However,
::<math>
(Ax)_i=d x_i \implies \sum_{j}a_{ij}x_j=d x_i.\,
</math>
:A contradiction. So it follows that <math>x_j=x_i</math> for all <math>j</math> with <math>a_{ij}>0</math>, which verifies the claim. Since <math>G</math> is connected, <math>x=c\vec{1}</math>, and the eigenvalue <math>d=\lambda_1</math> has multiplicity 1, thus <math>\lambda_1>\lambda_2</math>.

:Conversely, suppose that <math>G</math> is disconnected with components <math>G_1 \uplus G_2 = G</math>, we have <math>
A(G) = \begin{pmatrix}
A(G_1) & 0\\
0 & A(G_2)
\end{pmatrix}.</math>

:Therefore, <math>A\vec{1}_{G_1}=d\vec{1}_{G_1}</math> and <math>A\vec{1}_{G_2}=d\vec{1}_{G_2}</math>, where <math>\vec{1}_{G_1}, \vec{1}_{G_2}</math> are all-one vectors only supported on <math>G_1, G_2</math> respectively. Since <math>\vec{1}_{G_1}, \vec{1}_{G_2}</math> are linearly independent, the multiplicity of eigenvalue <math>d</math> is greater than 1, so <math>\lambda_1=\lambda_2</math>.
}}

== Spectral Partitioning Algorithm ==
This is a popular heuristic in practice:
{{Theorem
|Spectral Partitioning Algorithm|
# Compute the second largest eigenvalue of the adjacency matrix and its corresponding eigenvector <math>x</math>
# Sort the vertices <math> V=\{ u_1, \ldots, u_n\} </math> so that <math> x(u_1) \ge x(u_2) \ge \ldots \ge x(u_n) </math>
# Let <math> S_i:=\begin{cases}
\{1,2,\ldots,i\}, \qquad&\hbox{ if }i\le n/2 \\
V \setminus \{1,2,\ldots,i\}, &\hbox{ otherwise}
\end{cases} </math>, and output <math> i = \arg\min_{1\le i \le n} \{ \varphi(S_i)\} </math>
}}

The performance guarantee of this algorithm comes from the Cheeger's inequality. In the worst-case, it only guarantees that the output set <math>S_i</math> has conductance <math>\varphi(S_i) \le 2 \sqrt{\varphi(G)}</math>.
In practice however, the performance is usually much better. We will discuss an improved Cheeger's inequality under additional assumptions on spectral gap later in class, which explains the effectiveness of the algorithm in practice.

== Graph visualization ==
See slides.

高级算法 (Fall 2024)/Hashing and Sketching

2024-10-02T12:21:22Z

Kvrmnks: /* 2-universal hash families */

=Balls into Bins=
The following is the so-called balls into bins model.
Consider throwing <math>m</math> balls into <math>n</math> bins uniformly and independently at random. This is equivalent to a random mapping <math>f:[m]\to[n]</math>. Needless to say, random mapping is an important random model and may have many applications in Computer Science, e.g. hashing.

We are concerned with the following three questions regarding the balls into bins model:
* birthday problem: the probability that every bin contains at most one ball (the mapping is 1-1);
* coupon collector problem: the probability that every bin contains at least one ball (the mapping is on-to);
* occupancy problem: the maximum load of bins.

== Birthday Problem==
We now consider the '''birthday problem'''.
There are <math>m</math> students in the class. Assume that for each student, his/her birthday is uniformly and independently distributed over the 365 days in a years. We wonder what the probability that no two students share a birthday.

Due to the [http://en.wikipedia.org/wiki/Pigeonhole_principle pigeonhole principle], it is obvious that for <math>m>365</math>, there must be two students with the same birthday. Surprisingly, for any <math>m>57</math> this event occurs with more than 99% probability. This is called the [http://en.wikipedia.org/wiki/Birthday_problem '''birthday paradox''']. Despite the name, the birthday paradox is not a real paradox.

We can model this problem as a balls-into-bins problem. <math>m</math> different balls (students) are uniformly and independently thrown into 365 bins (days). More generally, let <math>n</math> be the number of bins. We ask for the probability of the following event <math>\mathcal{E}</math>

* <math>\mathcal{E}</math>: there is no bin with more than one balls (i.e. no two students share birthday).

We first analyze this by counting. There are totally <math>n^m</math> ways of assigning <math>m</math> balls to <math>n</math> bins. The number of assignments that no two balls share a bin is <math>{n\choose m}m!</math>.

Thus the probability is given by:
:<math>\begin{align}
\Pr[\mathcal{E}]
=
\frac{{n\choose m}m!}{n^m}.
\end{align}
</math>

Recall that <math>{n\choose m}=\frac{n!}{(n-m)!m!}</math>. Then
:<math>\begin{align}
\Pr[\mathcal{E}]
=
\frac{{n\choose m}m!}{n^m}
=
\frac{n!}{n^m(n-m)!}
=
\frac{n}{n}\cdot\frac{n-1}{n}\cdot\frac{n-2}{n}\cdots\frac{n-(m-1)}{n}
=
\prod_{k=1}^{m-1}\left(1-\frac{k}{n}\right).
\end{align}
</math>

There is also a more "probabilistic" argument for the above equation. Consider again that <math>m</math> students are mapped to <math>n</math> possible birthdays uniformly at random.

The first student has a birthday for sure. The probability that the second student has a different birthday from the first student is <math>\left(1-\frac{1}{n}\right)</math>. Given that the first two students have different birthdays, the probability that the third student has a different birthday from the first two students is <math>\left(1-\frac{2}{n}\right)</math>. Continuing this on, assuming that the first <math>k-1</math> students all have different birthdays, the probability that the <math>k</math>th student has a different birthday than the first <math>k-1</math>, is given by <math>\left(1-\frac{k-1}{n}\right)</math>. By the chain rule, the probability that all <math>m</math> students have different birthdays is:
:<math>\begin{align}
\Pr[\mathcal{E}]=\left(1-\frac{1}{n}\right)\cdot \left(1-\frac{2}{n}\right)\cdots \left(1-\frac{m-1}{n}\right)
&=
\prod_{k=1}^{m-1}\left(1-\frac{k}{n}\right),
\end{align}
</math>
which is the same as what we got by the counting argument.

[[File:Birthday.png|border|450px|right]]

There are several ways of analyzing this formular. Here is a convenient one: Due to [http://en.wikipedia.org/wiki/Taylor_series Taylor's expansion], <math>e^{-k/n}\approx 1-k/n</math>. Then
:<math>\begin{align}
\prod_{k=1}^{m-1}\left(1-\frac{k}{n}\right)
&\approx
\prod_{k=1}^{m-1}e^{-\frac{k}{n}}\\
&=
\exp\left(-\sum_{k=1}^{m-1}\frac{k}{n}\right)\\
&=
e^{-m(m-1)/2n}\\
&\approx
e^{-m^2/2n}.
\end{align}</math>
The quality of this approximation is shown in the Figure.

Therefore, for <math>m=\sqrt{2n\ln \frac{1}{\epsilon}}</math>, the probability that <math>\Pr[\mathcal{E}]\approx\epsilon</math>.

==Universal Hashing ==
Hashing is one of the oldest tools in Computer Science. Knuth's memorandum in 1963 on analysis of hash tables is now considered to be the birth of the area of analysis of algorithms.
* Knuth. Notes on "open" addressing, July 22 1963. Unpublished memorandum.

The idea of hashing is simple: an unknown set <math>S</math> of <math>n</math> data '''items''' (or keys) are drawn from a large '''universe''' <math>U=[N]</math> where <math>N\gg n</math>; in order to store <math>S</math> in a table of <math>M</math> entries (slots), we assume a consistent mapping (called a '''hash function''') from the universe <math>U</math> to a small range <math>[M]</math>.

This idea seems clever: we use a consistent mapping to deal with an arbitrary unknown data set. However, there is a fundamental flaw for hashing.
* For sufficiently large universe (<math>N> M(n-1)</math>), for any function, there exists a bad data set <math>S</math>, such that all items in <math>S</math> are mapped to the same entry in the table.

A simple use of pigeonhole principle can prove the above statement.

To overcome this situation, randomization is introduced into hashing. We assume that the hash function is a random mapping from <math>[N]</math> to <math>[M]</math>. In order to ease the analysis, the following ideal assumption is used:

'''Simple Uniform Hash Assumption''' ('''SUHA''' or '''UHA''', a.k.a. the random oracle model):
:A ''uniform'' random function <math>h:[N]\rightarrow[M]</math> is available and the computation of <math>h</math> is efficient.

=== Families of universal hash functions ===
The assumption of completely random function simplifies the analysis. However, in practice, truly uniform random hash function is extremely expensive to compute and store. Thus, this simple assumption can hardly represent the reality.

There are two approaches for implementing practical hash functions. One is to use ''ad hoc'' implementations and wish they may work. The other approach is to construct class of hash functions which are efficient to compute and store but with weaker randomness guarantees, and then analyze the applications of hash functions based on this weaker assumption of randomness.

This route was took by Carter and Wegman in 1977 while they introduced universal families of hash functions.

{{Theorem
|Definition (universal hash families)|
:Let <math>[N]</math> be a universe with <math>N\ge M</math>. A family of hash functions <math>\mathcal{H}</math> from <math>[N]</math> to <math>[M]</math> is said to be '''<math>k</math>-universal''' if, for any distinct items <math>x_1,x_2,\ldots,x_k\in [N]</math> and for a hash function <math>h</math> chosen uniformly at random from <math>\mathcal{H}</math>, we have
::<math>
\Pr[h(x_1)=h(x_2)=\cdots=h(x_k)]\le\frac{1}{M^{k-1}}.
</math>

:A family of hash functions <math>\mathcal{H}</math> from <math>[N]</math> to <math>[M]</math> is said to be '''strongly <math>k</math>-universal''' if, for any distinct items <math>x_1,x_2,\ldots,x_k\in [N]</math>, any values <math>y_1,y_2,\ldots,y_k\in[M]</math>, and for a hash function <math>h</math> chosen uniformly at random from <math>\mathcal{H}</math>, we have
::<math>
\Pr[h(x_1)=y_1\wedge h(x_2)=y_2 \wedge \cdots \wedge h(x_k)=y_k]=\frac{1}{M^{k}}.
</math>
}}
In particular, for a 2-universal family <math>\mathcal{H}</math>, for any distinct elements <math>x_1,x_2\in[N]</math>, a uniform random <math>h\in\mathcal{H}</math> has
:<math>
\Pr[h(x_1)=h(x_2)]\le\frac{1}{M}.
</math>
For a strongly 2-universal family <math>\mathcal{H}</math>, for any distinct elements <math>x_1,x_2\in[N]</math> and any values <math>y_1,y_2\in[M]</math>, a uniform random <math>h\in\mathcal{H}</math> has
:<math>
\Pr[h(x_1)=y_1\wedge h(x_2)=y_2]=\frac{1}{M^2}.
</math>
This behavior is exactly the same as uniform random hash functions on any distinct pair of inputs. For this reason, a strongly 2-universal hash family are also called pairwise independent hash functions.

=== 2-universal hash families ===

The construction of pairwise independent random variables via modulo a prime introduced in Section 1 already provides a way of constructing a strongly 2-universal hash family.

Let <math>p</math> be a prime. The function <math>h_{a,b}:[p]\rightarrow [p]</math> is defined by
:<math>
h_{a,b}(x)=(ax+b)\bmod p,
</math>
and the family is
:<math>
\mathcal{H}=\{h_{a,b}\mid a,b\in[p]\}.
</math>

{{Theorem
|Lemma|
:<math>\mathcal{H}</math> is strongly 2-universal.
}}
{{Proof| In Section 1, we have proved the pairwise independence of the sequence of <math>(a i+b)\bmod p</math>, for <math>i=0,1,\ldots, p-1</math>, which directly implies that <math>\mathcal{H}</math> is strongly 2-universal.
}}

;The original construction of Carter-Wegman
What if we want to have hash functions from <math>[N]</math> to <math>[M]</math> for non-prime <math>N</math> and <math>M</math>? Carter and Wegman developed the following method.

Suppose that the universe is <math>[N]</math>, and the functions map <math>[N]</math> to <math>[M]</math>, where <math>N\ge M</math>. For some prime <math>p\ge N</math>, let
:<math>
h_{a,b}(x)=((ax+b)\bmod p)\bmod M,
</math>
and the family
:<math>
\mathcal{H}=\{h_{a,b}\mid 1\le a\le p-1, b\in[p]\}.
</math>
Note that unlike the first construction, now <math>a\neq 0</math>.
{{Theorem
|Lemma (Carter-Wegman)|
:<math>\mathcal{H}</math> is 2-universal.
}}
{{Proof| Due to the definition of <math>\mathcal{H}</math>, there are <math>p(p-1)</math> many different hash functions in <math>\mathcal{H}</math>, because each hash function in <math>\mathcal{H}</math> corresponds to a pair of <math>1\le a\le p-1</math> and <math>b\in[p]</math>. We only need to count for any particular pair of <math>x_1,x_2\in[N]</math> that <math>x_1\neq x_2</math>, the number of hash functions that <math>h(x_1)=h(x_2)</math>.

We first note that for any <math>x_1\neq x_2</math>, <math>a x_1+b\not\equiv a x_2+b \pmod p</math>. This is because <math>a x_1+b\equiv a x_2+b \pmod p</math> would imply that <math>a(x_1-x_2)\equiv 0\pmod p</math>, which can never happen since <math>1\le a\le p-1</math> and <math>x_1\neq x_2</math> (note that <math>x_1,x_2\in[N]</math> for an <math>N\le p</math>). Therefore, we can assume that <math>(a x_1+b)\bmod p=u</math> and <math>(a x_2+b)\bmod p=v</math> for <math>u\neq v</math>.

By linear algebra (over finite field), for any <math>x_1,x_2\in[N]</math> that <math>x_1\neq x_2</math>, for any <math>u,v\in[p]</math> that <math>u\neq v</math>, there is exact one solution to <math>(a,b)</math> satisfying:
:<math>
\begin{cases}
a x_1+b \equiv u \pmod p\\
a x_2+b \equiv v \pmod p.
\end{cases}
</math>
After modulo <math>M</math>, every <math>u\in[p]</math> has at most <math>\lceil p/M\rceil -1</math> many <math>v\in[p]</math> that <math>v\neq u</math> but <math>v\equiv u\pmod M</math>. Therefore, for every pair of <math>x_1,x_2\in[N]</math> that <math>x_1\neq x_2</math>, there exist at most <math>p(\lceil p/M\rceil -1)\le p(p-1)/M</math> pairs of <math>1\le a\le p-1</math> and <math>b\in[p]</math> such that <math>((ax_1+b)\bmod p)\bmod M=((ax_2+b)\bmod p)\bmod M</math>, which means there are at most <math> p(p-1)/M</math> many hash functions <math>h\in\mathcal{H}</math> having <math>h(x_1)=h(x_2)</math> for <math>x_1\neq x_2</math>. For <math>h</math> uniformly chosen from <math>\mathcal{H}</math>, for any <math>x_1\neq x_2</math>,
:<math>
\Pr[h(x_1)=h(x_2)]\le \frac{p(p-1)/M}{p(p-1)}=\frac{1}{M}.
</math>
We prove that <math>\mathcal{H}</math> is 2-universal.
}}

;A construction used in practice
The main issue of Carter-Wegman construction is the efficiency. The mod operation is very slow, and has been so for more than 30 years.

The following construction is due to Dietzfelbinger ''et al''. It was published in 1997 and has been practically used in various applications of universal hashing.

The family of hash functions is from <math>[2^u]</math> to <math>[2^v]</math>. With a binary representation, the functions map binary strings of length <math>u</math> to binary strings of length <math>v</math>.
Let
:<math>
h_{a}(x)=\left\lfloor\frac{a\cdot x\bmod 2^u}{2^{u-v}}\right\rfloor,
</math>
and the family
:<math>
\mathcal{H}=\{h_{a}\mid a\in[2^v]\mbox{ and }a\mbox{ is odd}\}.
</math>

This family of hash functions does not exactly meet the requirement of 2-universal family. However, Dietzfelbinger ''et al'' proved that <math>\mathcal{H}</math> is close to a 2-universal family. Specifically, for any distinct input values <math>x_1,x_2\in[2^u]</math>, for a uniformly random <math>h\in\mathcal{H}</math>,
:<math>
\Pr[h(x_1)=h(x_2)]\le\frac{1}{2^{v-1}}.
</math>
So <math>\mathcal{H}</math> is within an approximation ratio of 2 to being 2-universal. The proof uses the fact that odd numbers are relative prime to a power of 2.

The function is extremely simple to compute in c language.
We exploit that C-multiplication (*) of unsigned u-bit numbers is done <math>\bmod 2^u</math>, and have a one-line C-code for computing the hash function:
h_a(x) = (a*x)>>(u-v)
The bit-wise shifting is a lot faster than modular. It explains the popularity of this scheme in practice than the original Carter-Wegman construction.

== Collision number ==
Consider a 2-universal family <math>\mathcal{H}</math> of hash functions from <math>[N]</math> to <math>[M]</math>. Let <math>h</math> be a hash function chosen uniformly from <math>\mathcal{H}</math>. For a fixed set <math>S</math> of <math>n</math> distinct elements from <math>[N]</math>, say <math>S=\{x_1,x_2,\ldots,x_n\}</math>, the elements are mapped to the hash values <math>h(x_1), h(x_2), \ldots, h(x_n)</math>. This can be seen as throwing <math>n</math> balls to <math>M</math> bins, with pairwise independent choices of bins.

As in the balls-into-bins with full independence, we are curious about the questions such as the birthday problem or the maximum load. These questions are interesting not only because they are natural to ask in a balls-into-bins setting, but in the context of hashing, they are closely related to the performance of hash functions.

The old techniques for analyzing balls-into-bins rely too much on the independence of the choice of the bin for each ball, therefore can hardly be extended to the setting of 2-universal hash families. However, it turns out several balls-into-bins questions can somehow be answered by analyzing a very natural quantity: the number of '''collision pairs'''.

A collision pair for hashing is a pair of elements <math>x_1,x_2\in S</math> which are mapped to the same hash value, i.e. <math>h(x_1)=h(x_2)</math>. Formally, for a fixed set of elements <math>S=\{x_1,x_2,\ldots,x_n\}</math>, for any <math>1\le i,j\le n</math>, let the random variable
:<math>
X_{ij}
=
\begin{cases}
1 & \text{if }h(x_i)=h(x_j),\\
0 & \text{otherwise.}
\end{cases}
</math>
The total number of collision pairs among the <math>n</math> items <math>x_1,x_2,\ldots,x_n</math> is
:<math>X=\sum_{i<j} X_{ij}.\,</math>

Since <math>\mathcal{H}</math> is 2-universal, for any <math>i\neq j</math>,
:<math>
\Pr[X_{ij}=1]=\Pr[h(x_i)=h(x_j)]\le\frac{1}{M}.
</math>

The expected number of collision pairs is
:<math>\mathbf{E}[X]=\mathbf{E}\left[\sum_{i<j}X_{ij}\right]=\sum_{i<j}\mathbf{E}[X_{ij}]=\sum_{i<j}\Pr[X_{ij}=1]\le{n\choose 2}\frac{1}{M}<\frac{n^2}{2M}.
</math>

In particular, for <math>n=M</math>, i.e. <math>n</math> items are mapped to <math>n</math> hash values by a pairwise independent hash function, the expected collision number is <math>\mathbf{E}[X]<\frac{n^2}{2M}=\frac{n}{2}</math>.

The above analysis gives us an estimation on the expected number of collision pairs, such that <math>\mathbf{E}[X]<\frac{n^2}{2M}</math>. Apply the Markov's inequality, for <math>0<\epsilon<1</math>, we have
:<math>
\Pr\left[X\ge \frac{n^2}{2\epsilon M}\right]\le\Pr\left[X\ge \frac{1}{\epsilon}\mathbf{E}[X]\right]\le\epsilon.
</math>

When <math>n\le\sqrt{2\epsilon M}</math>, the number of collision pairs is <math>X\ge1</math> with probability at most <math>\epsilon</math>, therefore with probability at least <math>1-\epsilon</math>, there is no collision at all. Therefore, we have the following theorem.
{{Theorem
|Theorem|
:If <math>h</math> is chosen uniformly from a 2-universal family of hash functions mapping the universe <math>[N]</math> to <math>[M]</math> where <math>N\ge M</math>, then for any set <math>S\subset [N]</math> of <math>n</math> items, where <math>n\le\sqrt{2\epsilon M}</math>, the probability that there exits a collision pair is
::<math>
\Pr[\mbox{collision occurs}]\le\epsilon.
</math>
}}

Recall that for mutually independent choices of bins, for some <math>n=\sqrt{2M\ln(1/\epsilon)}</math>, the probability that a collision occurs is about <math>\epsilon</math>. For constant <math>\epsilon</math>, this gives an essentially same bound as the pairwise independent setting. Therefore,
the behavior of pairwise independent hash function is essentially the same as the uniform random hash function for the birthday problem. This is easy to understand, because birthday problem is about the behavior of collisions, and the definition of 2-universal hash function can be interpreted as "functions that the probability of collision is as low as a uniform random function".

= Set Membership=
A basic question in Computer Science is:
:"<math>\mbox{Is }x\in S?</math>"
for a set <math>S</math> and an element <math>x</math>. This is the '''set membership''' problem.

Formally, given an arbitrary set <math>S</math> of <math>n</math> elements from a universe <math>U</math>, we want to use a succinct '''data structure''' to represent this set <math>S</math>, so that upon each '''query''' of any element <math>x</math> from the universe <math>[N]</math>, the question of whether <math>x\in S</math> is efficiently answered. The complexity of such data structure is measured in two-fold:
* '''space cost''': size of the data structure to represent a set <math>S</math> of size <math>n</math>;
* '''time cost''': time complexity of answering each query by accessing to the data structure.

Suppose that the universe <math>U</math> is of size <math>N</math>. Clearly, the membership problem can be solved by a '''dictionary data structure''', e.g.:
* '''sorted table / balanced search tree''': with space cost <math>O(n\log N)</math> bits and time cost <math>O(\log n)</math>.

Note that <math>\log{N\choose n}=\Theta\left(n\log \frac{N}{n}\right)</math> is the entropy of sets <math>S</math> of <math>n</math> elements from a universe <math>U</math> of size <math>N</math>. Therefore it is necessary to use so many bits to represent a set without losing any information.
With hashing, we can solve this fundamental problem with asymptotic optimal space cost and time cost at the same time.

== Perfect hashing using quadratic space==
The idea of perfect hashing is that we use a hash function <math>h</math> to map the <math>n</math> items to distinct entries of the table; store every item <math>x\in S</math> in the entry <math>h(x)</math>; and also store the hash function <math>h</math> in a fixed location in the table (usually the beginning of the table). The algorithm for searching for an item is as follows:

:search for <math>x</math> in table <math>T</math>:
# retrieve <math>h</math> from a fixed location in the table;
# if <math>x=T[h(x)]</math> return <math>h(x)</math>; else return NOT_FOUND;

This scheme works as long as that the hash function satisfies the following two conditions:
* The description of <math>h</math> is sufficiently short, so that <math>h</math> can be stored in one entry (or in constant many entries) of the table.
* <math>h</math> has no collisions on <math>S</math>, i.e. there is no pair of items <math>x_1,x_2\in S</math> that are mapped to the same value by <math>h</math>.

The first condition is easy to guarantee for 2-universal hash families. As shown by Carter-Wegman construction, a 2-universal hash function can be uniquely represented by two integers <math>a</math> and <math>b</math>, which can be stored in two entries (or just one, if the word length is sufficiently large) of the table.

Our discussion is now focused on the second condition. We find that it relies on the ''perfectness'' of the hash function for a data set <math>S</math>.

A hash function <math>h</math> is '''perfect''' for a set <math>S</math> of items if <math>h</math> maps all items in <math>S</math> to different values, i.e. there is no collision.

We have shown by the birthday problem for 2-universal hashing that when <math>n</math> items are mapped to <math>n^2</math> values, for an <math>h</math> chosen uniformly from a 2-universal family of hash functions, the probability that a collision occurs is at most 1/2. Thus
:<math>
\Pr[h\mbox{ is perfect for }S]\ge\frac{1}{2}
</math>
for a table of <math>n^2</math> entries.

The construction of perfect hashing is straightforward then:
:For a set <math>S</math> of <math>n</math> elements:
# uniformly choose an <math>h</math> from a 2-universal family <math>\mathcal{H}</math>; (for Carter-Wegman's construction, it means uniformly choose two integer <math>1\le a\le p-1</math> and <math>b\in[p]</math> for a sufficiently large prime <math>p</math>.)
# check whether <math>h</math> is perfect for <math>S</math>;
# if <math>h</math> is NOT perfect for <math>S</math>, start over again; otherwise, construct the table;

This is a Las Vegas randomized algorithm, which construct a perfect hashing for a fixed set <math>S</math> with expectedly at most two trials (due to geometric distribution). The resulting data structure is a <math>O(n^2)</math>-size static dictionary of <math>n</math> elements which answers every search in deterministic <math>O(1)</math> time.

== FKS perfect hashing ==
In the last section we see how to use <math>O(n^2)</math> space and constant time for answering search in a set. Now we see how to do it with linear space and constant time. This solves the problem of searching asymptotically optimal for both time and space.

This was once seemingly impossible, until Yao's seminal paper:
*Yao. Should tables be sorted? ''Journal of the ACM (JACM)'', 1981.

Yao's paper shows a possibility of achieving linear space and constant time at the same time by exploiting the power of hashing, but assumes an unrealistically large universe.

Inspired by Yao's work, Fredman, Komlós, and Szemerédi discover the first linear-space and constant-time static dictionary in a realistic setting:
* Fredman, Komlós, and Szemerédi. Storing a sparse table with O(1) worst case access time. ''Journal of the ACM (JACM)'', 1984.

The idea of FKS hashing is to arrange hash table in two levels:
* In the first level, <math>n</math> items are hashed to <math>n</math> ''buckets'' by a 2-universal hash function <math>h</math>.
: Let <math>B_i</math> be the set of items hashed to the <math>i</math>th bucket.
* In the second level, construct a <math>|B_i|^2</math>-size perfect hashing for each bucket <math>B_i</math>.

The data structure can be stored in a table. The first few entries are reserved to store the primary hash function <math>h</math>. To help the searching algorithm locate a bucket, we use the next <math>n</math> entries of the table as the "pointers" to the bucket: each entry stores the address of the first entry of the space to store a bucket. In the rest of table, the <math>n</math> buckets are stored in order, each using a <math>|B_i|^2</math> space as required by perfect hashing.

::[[File:FKS.png|600px]]

It is easy to see that the search time is constant. To search for an item <math>x</math>, the algorithm does the followings:
* Retrieve <math>h</math>.
* Retrieve the address for bucket <math>h(x)</math>.
* Search by perfect hashing within bucket <math>h(x)</math>.
Each line takes constant time. So the worst-case search time is constant.

We then need to guarantee that the space is linear to <math>n</math>. At the first glance, this seems impossible because each instance of perfect hashing for a bucket costs a square-size of space. We will prove that although the individual buckets use square-sized spaces, the sum of the them is still linear.

For a fixed set <math>S</math> of <math>n</math> items, for a hash function <math>h</math> chosen uniformly from a 2-universe family which maps the items to <math>[n]</math>, called <math>n</math> ''buckets'', let <math>Y_i=|B_i|</math> be the number of items in <math>S</math> mapped to the <math>i</math>th bucket.
We are going to bound the following quantity:
:<math>
Y=\sum_{i=1}^n Y_i^2.
</math>
Since each bucket <math>B_i</math> use a space of <math>Y_i^2</math> for perfect hashing. <math>Y</math> gives the size of the space for storing the buckets.

We will show that <math>Y</math> is related to the total number of collision pairs. (Indeed, the number of collision pairs can be computed by a degree-2 polynomial, just like <math>Y</math>.)

Note that a bucket of <math>Y_i</math> items contributes <math>{Y_i\choose 2}</math> collision pairs. Let <math>X</math> be the total number of collision pairs.
<math>X</math> can be computed by summing over the collision pairs in every bucket:
:<math>
X=\sum_{i=1}^n{Y_i\choose 2}=\sum_{i=1}^n\frac{Y_i(Y_i-1)}{2}=\frac{1}{2}\left(\sum_{i=1}^nY_i^2-\sum_{i=1}^nY_i\right)=\frac{1}{2}\left(\sum_{i=1}^nY_i^2-n\right).
</math>

Therefore, the sum of squares of the sizes of buckets is related to collision number by:
:<math>
\sum_{i=1}^nY_i^2=2X+n.
</math>
By our analysis of the collision number, we know that for <math>n</math> items mapped to <math>n</math> buckets, the expected number of collision pairs is: <math>\mathbf{E}[X]\le \frac{n}{2}</math>.
Thus,
:<math>
\mathbf{E}\left[\sum_{i=1}^nY_i^2\right]=\mathbf{E}[2X+n]\le 2n.
</math>
Due to Markov's inequality, <math>\sum_{i=1}^nY_i^2=O(n)</math> with a constant probability. For any set <math>S</math>, we can find a suitable <math>h</math> after expected constant number of trials, and FKS can be constructed with guaranteed (instead of expected) linear-size which answers each search in constant time.

== Bloom filter ==
Now we consider the lossy representation of the original data set <math>S</math>, to further save the space usage. Such lossy data structure is sometimes called a '''''sketch'''''.

The Bloom filter is such a lossy data structure. It is a space-efficient hash table that solves the '''approximate membership''' problem with one-sided error (''false positive'').

Given a set <math>S</math> of <math>n</math> elements from a universe <math>U</math>, a Bloom filter consists of an array <math>A</math> of <math>cn</math> bits, and <math>k</math> hash functions <math>h_1,h_2,\ldots,h_k</math> map <math>U</math> to <math>[cn]</math>, where both <math>c</math> and <math>k</math> are parameters that we can try to optimize later.

As before, we assume the '''Uniform Hash Assumption (UHA)''': <math>h_1,h_2,\ldots,h_k</math> are mutually independent hash function where each <math>h_i</math> is a uniform random hash function <math>h_i:U\to[cn]</math>.

The Bloom filter works as follows:
{{Theorem|''Bloom filter'' (Bloom 1970)|
:Suppose <math>h_1,h_2,\ldots,h_k:U\to[cn]</math> are uniform and independent random hash functions.
-----
:'''Data structure construction:''' Given a set <math>S\subset U</math> of size <math>n=|S|</math>, the data structure is a Boolean array <math>A</math> of <math>cn</math> bits constructed as
:* initialize all <math>cn</math> bits of the Boolean array <math>A</math> to 0;
:* for each <math>x\in S</math>, let <math>A[h_i(x)]=1</math> for all <math>1\le i\le k</math>.
----
:'''Query resolution:''' Upon each query of an arbitrary <math>x\in U</math>,
:* answer "yes" if <math>A[h_i(x)]=1</math> for all <math>1\le i\le k</math> and "no" if otherwise.
}}
The Boolean array is our data structure, whose size is <math>cn</math> bits. With Uniform Hash Assumption (UHA), the time cost of the data structure for answering each query is <math>O(k)</math>.

When the answer returned by the algorithm is "no", it holds that <math>A[h_i(x)]=0</math> for some <math>1\le i\le k</math>, in which case the query <math>x</math> must not belong to the set <math>S</math>. Thus, the Bloom filter has no false negatives.

On the other hand, when the answer returned by the algorithm is "yes", <math>A[h_i(x)]=1</math> for all <math>1\le i\le k</math>. It is still possible for some <math>x\not\in S</math> that all bits <math>A[h_i(x)]</math> are set by elements in <math>S</math>. We want to bound such false positive, that is, the following probability for an <math>x\not\in S</math>:
:<math>\Pr[\,\forall 1\le i\le k, A[h_i(x)]=1\,]</math>,
which by independence between different hash functions and by symmetry is equal to:
:<math>\Pr[\, A[h_1(x)]=1\,]^k=(1-\Pr[\, A[h_1(x)]=0\,])^k</math>.
For an element <math>x\not\in S</math>, its hash value <math>h_1(x)</math> is independent of all hash values <math>h_i(y)</math> for all <math>1\le i\le k</math> and all <math>y\in S</math>. This is due to the Uniform Hash Assumption. The hash value <math>h_1(x)</math> of <math>x\not\in S</math> is then independent of the content of the array <math>A</math>. Therefore, the probability of this position <math>A[h_1(x)]</math> missed by all <math>kn</math> updates to the Boolean array <math>A</math> caused by all <math>n</math> elements in <math>S</math> is:
:<math>
\Pr[\, A[h_1(x)]=0\,]=\left(1-\frac{1}{cn}\right)^{kn}\approx e^{-k/c}.
</math>

Putting everything together, for any <math>x\not\in S</math>, the false positive is bounded as:
:<math>
\begin{align}
\Pr[\,\text{wrongly answer ''yes''}\,]
&=\Pr[\,\forall 1\le i\le k, A[h_i(x)]=1\,]\\
&=\Pr[\, A[h_1(x)]=1\,]^k=(1-\Pr[\, A[h_1(x)]=0\,])^k\\
&=\left(1-\left(1-\frac{1}{cn}\right)^{kn}\right)^k\\
&\approx \left(1- e^{-k/c}\right)^k
\end{align}
</math>
which is <math>(0.6185)^c</math> when <math>k=c\ln 2</math>.

Bloom filter solves the membership query with a small constant error of false positives using a data structure of <math>O(n)</math> bits which answers each query with <math>O(1)</math> time cost.

=Distinct Elements=
Consider the following problem of '''counting distinct elements''': Suppose that <math>U</math> is a sufficiently large universe.
*'''Input:''' a sequence of (not necessarily distinct) elements <math>x_1,x_2,\ldots,x_n\in U</math>;
*'''Output:''' an estimation of the total number of distinct elements <math>z=|\{x_1,x_2,\ldots,x_n\}|</math>.

A straightforward way of solving this problem is to maintain a dictionary data structure, which costs at least linear (<math>O(n)</math>) space. For ''big data'', where <math>n</math> is very large, this is still too expensive. However, due to an information-theoretical argument, linear space is necessary if you want to compute the ''exact'' value of <math>z</math>.

Our goal is to relax the problem a little bit to significantly reduce the space cost by tolerating ''approximate'' answers. The form of approximation we consider is '''<math>(\epsilon,\delta)</math>-estimator'''.
{{Theorem|<math>(\epsilon,\delta)</math>-estimator|
: A random variable <math>\widehat{Z}</math> is an '''<math>(\epsilon,\delta)</math>-estimator''' of a quantity <math>z</math> if
::<math>\Pr[\,(1-\epsilon)z\le \widehat{Z}\le (1+\epsilon)z\,]\ge 1-\delta</math>.
: <math>\widehat{Z}</math> is said to be an '''unbiased estimator''' of <math>z</math> if <math>\mathbb{E}[\widehat{Z}]=z</math>.
}}
Usually <math>\epsilon</math> is called '''approximation error''' and <math>\delta</math> is called '''confidence error'''.

We now present an elegant algorithm. The algorithm can be implemented in [https://en.wikipedia.org/wiki/Streaming_algorithm '''data stream model''']: The input elements <math>x_1,x_2,\ldots,x_n</math> is presented to the algorithm one at a time, where the size of data <math>n</math> is unknown to the algorithm. The algorithm maintains a value <math>\widehat{Z}</math> which is an <math>(\epsilon,\delta)</math>-estimator of the total number of distinct elements <math>z=|\{x_1,x_2,\ldots,x_n\}|</math>, using only a small amount of memory space to memorize (with loss) the data set <math>\{x_1,x_2,\ldots,x_n\}</math>.

A famous quotation of Flajolet describes the performance of this algorithm as:

"Using only memory equivalent to 5 lines of printed text, you can estimate with a typical accuracy of 5% and in a single pass the total vocabulary of Shakespeare."

== The <math>\min</math>-sketch ==
Suppose that we can access to an idealized random hash function <math>h:U\to[0,1]</math> which is uniformly distributed over all mappings from the universe <math>U</math> to unit interval <math>[0,1]</math>.

Recall that the input sequence <math>x_1,x_2,\ldots,x_n\in U</math> consists of <math>z=|\{x_1,x_2,\ldots,x_n\}|</math> distinct elements. These elements are mapped by the random function <math>h</math> to <math>z</math> hash values uniformly and independently distributed in <math>[0,1]</math>. We could maintain these hash values instead of the original elements, but this would still be too expensive because in the worst case we still have up to <math>n</math> distinct values to maintain. However, due to the idealized random hash function, the unit interval <math>[0,1]</math> will be partitioned into <math>z+1</math> subintervals by these <math>z</math> uniform and independent hash values. The typical length of the subinterval gives an estimation of the number <math>z</math>.

{{Theorem|Proposition|
:<math>\mathbb{E}\left[\min_{1\le i\le n}h(x_i)\right]=\frac{1}{z+1}</math>.
}}
{{Proof|
The input sequence <math>x_1,x_2,\ldots,x_n\in U</math> consisting of <math>z</math> distinct elements are mapped to <math>z</math> random hash values uniformly and independently distributed in <math>[0,1]</math>. These <math>z</math> hash values partition the unit interval <math>[0,1]</math> into <math>z+1</math> subintervals <math>[0,v_1],[v_1,v_2],[v_2,v_3]\ldots,[v_{z-1},v_z],[v_z,1]</math>, where <math>v_i</math> denotes the <math>i</math>-th smallest value among all hash values <math>\{h(x_1),h(x_2),\ldots,h(x_n)\}</math>. Clearly we have
:<math>v_1=\min_{1\le i\le n}h(x_i)</math>.
Meanwhile, since all hash values are uniformly and independently distributed in <math>[0,1]</math>, the lengths of all subintervals <math>v_1, v_2-v_1, v_3-v_2,\ldots, v_z-v_{z-1}, 1-v_z</math> are identically distributed. By symmetry, they have the same expectation, therefore
:<math>
(z+1)\mathbb{E}[v_1]=
\mathbb{E}[v_1]+\sum_{i=1}^{z-1}\mathbb{E}[v_{i+1}-v_i]+\mathbb{E}[1-v_z]
=\mathbb{E}\left[v_1+(v_2-v_1)+(v_3-v_2)+\cdots+(v_{z}-v_{z-1})+1-v_z\right]
=1,
</math>
which implies that
:<math>\mathbb{E}\left[\min_{1\le i\le n}h(x_i)\right]=\mathbb{E}[v_1]=\frac{1}{z+1}</math>.
}}

The quantity <math>\min_{1\le i\le n}h(x_i)</math> can be computed with small space cost (for storing the current smallest hash value) by scan the input sequence in a single pass. Because as we proved its expectation is <math>\frac{1}{z+1}</math>, the smallest hash value <math>Y=\min_{1\le i\le n}h(x_i)</math> gives an unbiased estimator for <math>\frac{1}{z+1}</math>. However, <math>\frac{1}{Y}-1</math> is not necessarily a good estimator for <math>z</math>. Actually, it is a rather poor estimator. Consider for example when <math>z=1</math>, all input elements are the same. In this case, there is only one hash value and <math>Y=\min_{1\le i\le n}h(x_i)</math> is distributed uniformly over <math>[0,1]</math>, thus <math>\frac{1}{Y}-1</math> fails to be close enough to the correct answer 1 with high probability.

==Apply the mean trick to the <math>\min</math>-sketch==
The reason that the above estimator of a single hash function performs poorly is that the unbiased estimator <math>\min_{1\le i\le n}h(x_i)</math> has large variance. So a natural way to reduce this variance is to have multiple independent hash functions and take the average. This generic approach for reducing the variance is called '''the mean trick'''.

Suppose that we can access to <math>k</math> independent random hash functions <math>h_1,h_2,\ldots,h_k</math>, where each <math>h_j: U\to[0,1]</math> is uniformly and independently distributed over all functions mapping <math>U</math> to <math>[0,1]</math>. Here <math>k</math> is a parameter to be fixed by the desired approximation error <math>\epsilon</math> and confidence error <math>\delta</math>. The ''<math>\min</math>-sketch algorithm'' (using the mean trick) is given by the following pseudocode.

{{Theorem|The <math>\min</math>-sketch|
:Suppose that <math>h_1,h_2,\ldots,h_k: U\to[0,1]</math> are <math>k</math> uniform and independent random hash functions, where <math>k</math> is a parameter to be fixed later.
-----
:Scan the input sequence <math>x_1,x_2,\ldots,x_n\in U</math> in a single pass to compute:
::* <math>Y_j=\min_{1\le i\le n}h_j(x_i)</math> for every <math>j=1,2,\ldots,k</math>;
::* average value <math>\overline{Y}=\frac{1}{k}\sum_{j=1}^kY_j</math>;
:return <math>\widehat{Z}=\frac{1}{\overline{Y}}-1</math> as the estimator.
}}

The algorithm is easy to implement in data stream model, with a space cost of storing <math>k</math> hash values. The following theorem guarantees that the algorithm returns an <math>(\epsilon,\delta)</math>-estimator of the total number of distinct elements for a suitable <math>k=O\left(\frac{1}{\epsilon^2\delta}\right)</math>.
{{Theorem|Theorem|
:For any <math>\epsilon,\delta<1/2</math>, if <math>k\ge\left\lceil\frac{4}{\epsilon^2\delta}\right\rceil</math> then the output <math>\widehat{Z}</math> always gives an <math>(\epsilon,\delta)</math>-estimator of the correct answer <math>z</math>.
}}

In the following we prove this main theorem for <math>\min</math>-sketch algorithm.

An obstacle to analyze the estimator <math>\widehat{Z}=\frac{1}{\overline{Y}}-1</math> is that it is a nonlinear function of <math>\overline{Y}</math> who is easier to analyze. Nevertheless, we observe that <math>\widehat{Z}</math> is an <math>(\epsilon,\delta)</math>-estimator of <math>z</math> as long as <math>\overline{Y}</math> is an <math>(\epsilon/2,\delta)</math>-estimator of <math>\frac{1}{z+1}</math>. This can be deduced by just verifying the following:
:<math>\frac{1-\epsilon/2}{z+1}\le \overline{Y}\le \frac{1+\epsilon/2}{z+1} \implies (1-\epsilon)z\le\frac{1}{\overline{Y}}-1\le (1+\epsilon)z</math>,
for <math>\epsilon<\frac{1}{2}</math>. Therefore,
:<math>\Pr\left[\,(1-\epsilon)z\le \widehat{Z} \le (1+\epsilon)z\,\right]\ge \Pr\left[\,\frac{1-\epsilon/2}{z+1}\le \overline{Y}\le \frac{1+\epsilon/2}{z+1}\,\right]
=\Pr\left[\,\left|\overline{Y}-\frac{1}{z+1}\right|\le \frac{\epsilon/2}{z+1}\,\right]</math>.
It is then sufficient to show that <math>\Pr\left[\,\left|\overline{Y}-\frac{1}{z+1}\right|\le \frac{\epsilon/2}{z+1}\,\right]\ge 1-\delta</math> for proving the main theorem above. We will see that this is equivalent to show the concentration inequality
:<math>\Pr\left[\,\left|\overline{Y}-\mathbb{E}\left[\overline{Y}\right]\right|\le \frac{\epsilon/2}{z+1}\,\right]\ge 1-\delta\quad\qquad({\color{red}*})</math>.

{{Theorem|Lemma|
:The followings hold for each <math>Y_j</math>, <math>j=1,2\ldots,k</math>, and <math>\overline{Y}=\frac{1}{k}\sum_{j=1}^kY_j</math>:
:*<math>\mathbb{E}\left[\overline{Y}\right]=\mathbb{E}\left[Y_j\right]=\frac{1}{z+1}</math>;
:*<math>\mathbf{Var}\left[Y_j\right]\le\frac{1}{(z+1)^2}</math>, and consequently <math>\mathbf{Var}\left[\overline{Y}\right]\le\frac{1}{k(z+1)^2}</math>.
}}
{{Proof|
As in the case of single hash function, by symmetry it holds that <math>\mathbb{E}[Y_j]=\frac{1}{z+1}</math> for every <math>j=1,2,\ldots,k</math>. Therefore,
:<math>\mathbb{E}\left[\overline{Y}\right]=\frac{1}{k}\sum_{j=1}^k\mathbb{E}[Y_j]=\frac{1}{z+1}</math>.
Recall that each <math>Y_j</math> is the minimum of <math>z</math> random hash values uniformly and independently distributed over <math>[0,1]</math>. By geometry probability, it holds that for any <math>y\in[0,1]</math>,
:<math>\Pr[Y_j>y]=(1-y)^z</math>,
which means <math>\Pr[Y_j\le y]=1-(1-y)^z</math>. Taking the derivative with respect to <math>y</math>, we obtain the probability density function of random variable <math>Y_j</math>, which is <math>z(1-y)^{z-1}</math>.

We then compute the second moment.
:<math>\mathbb{E}[Y_j^2]=\int^{1}_0y^2z(1-y)^{z-1}\,\mathrm{d}y=\frac{2}{(z+1)(z+2)}</math>.
The variance is bounded as
:<math>\mathbf{Var}\left[Y_j\right]=\mathbb{E}\left[Y_j^2\right]-\mathbb{E}\left[Y_j\right]^2=\frac{2}{(z+1)(z+2)}-\frac{1}{(z+1)^2}\le\frac{1}{(z+1)^2}</math>.
Due to the (pairwise) independence between <math>Y_j</math>'s,
::<math>\mathbf{Var}\left[\overline{Y}\right]=\mathbf{Var}\left[\frac{1}{k}\sum_{j=1}^kY_j\right]=\frac{1}{k^2}\sum_{j=1}^k\mathbf{Var}\left[Y_j\right]\le \frac{1}{k(z+1)^2}</math>.
}}

We resume to prove the inequality <math>({\color{red}*})</math>. By [[高级算法_(Fall 2023)/Basic_deviation_inequalities#Chebyshev.27s_inequality|Chebyshev's inequality]], it holds that
:<math>\Pr\left[\,\left|\overline{Y}-\mathbb{E}\left[\overline{Y}\right]\right|> \frac{\epsilon/2}{z+1}\,\right]
\le\frac{4}{\epsilon^2}(z+1)^2\mathbf{Var}\left[\overline{Y}\right]
\le\frac{4}{\epsilon^2k}</math>.
When <math>k\ge\left\lceil\frac{4}{\epsilon^2\delta}\right\rceil</math>, this probability is at most <math>\delta</math>. The inequality <math>({\color{red}*})</math> is proved. As we discussed above, this proves the above main theorem <math>\min</math>-sketch algorithm improved by the mean trick.

= Frequency Estimation=
Suppose that <math>U</math> is the data universe. The '''frequency estimation''' problem is defined as follows.
*'''Data:''' a sequence of (not necessarily distinct) elements <math>x_1,x_2,\ldots,x_n\in U</math>;
*'''Query:''' an element <math>x\in U</math>;
*'''Output:''' an estimation <math>\hat{f}_x</math> of the frequency <math>f_x\triangleq|\{i\mid x_i=x\}|</math> of <math>x</math> in input data.

We still want to give an algorithm in the data stream model: the algorithm scan the input sequence <math>x_1,x_2,\ldots,x_n</math> to construct a succinct data structure, such that upon each query of <math>x\in U</math>, the algorithm returns an estimation of the frequency <math>f_x</math>.

Clearly this problem can always be solved by storing all appeared distinct elements along with their frequencies. However, the space cost of this straightforward solution is rather high. Instead, we want to use a lossy representation (a ''sketch'') of input data which uses significantly less space but can still answer queries with tolarable accuracy.

Formally, upon each query of <math>x\in U</math>, the algorithm should return an answer <math>\hat{f}_x</math> satisfying:
:<math>\Pr\left[\,\left|\hat{f}_x-f_x\right|\le \epsilon n\,\right]\ge 1-\delta</math>.
Note that this notion of approximation is with bounded ''additive'' error which is weaker than the notion of <math>(\epsilon,\delta)</math>-estimator, whose error bound is ''multiplicative''.

With such weak accuracy guarantee, its is possible to give a succinct data structure whose size is determined only by the error bounds <math>\epsilon</math> and <math>\delta</math> but independent of <math>n</math>, because only the frequencies of those '''heavy hitters''' (elements <math>x</math> with high frequencies <math>f_x>\epsilon n</math>) need to be memorized, and there are at most <math>1/\epsilon</math> many such heavy hitters.

== Count-min sketch==
The [https://en.wikipedia.org/wiki/Count–min_sketch count-min sketch] given by Cormode and Muthukrishnan is an elegant data structure for frequency estimation.

The data structure is a two-dimensional <math>k\times m</math> integer array, where <math>k</math> and <math>m</math> are two parameters to be determined by the error bounds <math>\epsilon</math> and <math>\delta</math>. We still adopt the Uniform Hash Assumption to assume that we have access to <math>k</math> mutually independent uniform random hash functions <math>h_1,h_2,\ldots,h_k: U\to[m]</math>.
{{Theorem|''Count-min sketch'' (Cormode and Muthukrishnan 2003)|
:Suppose <math>h_1,h_2,\ldots,h_k: U\to[m]</math> are uniform and independent random hash functions.
-----
:'''Data structure construction:''' Given a sequence <math>x_1,x_2,\ldots,x_n\in U</math>, the data structure is a two-dimensional <math>k\times m</math> integer array <math>CMS[k][m]</math> constructed as
:*initialize all entries of <math>CMS[k][m]</math> to 0;
:*for <math>i=1,2,\ldots,n</math>, upon receiving <math>x_i</math>:
::: for every <math>1\le j\le k</math>, evaluate <math>h_j(x_i)</math> and <math>CMS[j][h_j(x_i)]++</math>.
----
:'''Query resolution:''' Upon each query of an arbitrary <math>x\in U</math>,
:* return <math>\hat{f}_x=\min_{1\le j\le k}CMS[j][h_j(x)]</math>.
}}

It is easy to see that the space cost of count-min sketch is <math>O(km)</math> memory words, or <math>O(km\log n)</math> bits. Each query is answered within time cost <math>O(k)</math>, assuming that an evaluation of hash function can be done in unit or constant time. We then analyze the error bounds.

First, it is easy to observe that for any query <math>x\in U</math> and every hash function <math>1\le j\le k</math>, it always holds for the corresponding entry in the count-min sketch
:<math>CMS[j][h_j(x)]\ge f_x</math>,
because the appearances of element <math>x</math> in the input sequence contribute at least <math>f_x</math> to the value of <math>CMS[j][h_j(x)]</math>.

Therefore, for any query <math>x\in U</math> it always holds for the answer <math>\hat{f}_x=\min_{1\le j\le k}CMS[j][h_j(x)]\ge f_x</math>, which means
:<math>\Pr\left[\,\left|\hat{f}_x- f_x\right|\ge\epsilon n\,\right]=\Pr\left[\,\hat{f}_x- f_x\ge\epsilon n\,\right]=\prod_{j=1}^k\Pr[\,CMS[j][h_j(x)]-f_x\ge\epsilon n\,],\quad\qquad({\color{red}\diamondsuit})</math>
where the second equation is due to the mutual independence of random hash functions <math>h_1,h_2,\ldots,h_k</math>.

It remains to upper bound the probability <math>\Pr[\,CMS[j][h_j(x)]-f_x\ge\epsilon n\,]</math>, which can be done by calculating the expectation of <math>CMS[j][h_j(x)]</math>.
{{Theorem|Proposition|
:For any <math>x\in U</math> and every <math>1\le j\le k</math>, it holds that <math>\mathbb{E}\left[CMS[j][h_j(x)]\right]\le f_x+\frac{n}{m}</math>.
}}
{{Proof|
The value of <math>CMS[j][h_j(x)]</math> is constituted by the frequency <math>f_x</math> of <math>x</math> and the frequencies <math>f_y</math> of all other elements <math>y\neq x</math> among <math>x_1,x_2,\ldots,x_n</math>, thus
:<math>
\begin{align}
CMS[j][h_j(x)]
&=f_x+\sum_{\scriptstyle y\in\{x_1,\ldots,x_n\}\setminus\{x\}\atop\scriptstyle h_j(y)=h_j(x)} f_y\\
&=f_x+\sum_{y\in\{x_1,\ldots,x_n\}\setminus\{x\}} f_y \cdot I[h_j(y)=h_j(x)]
\end{align}
</math>
where <math>I[h_j(y)=h_j(x)]</math> denotes the Boolean random variable that indicates the occurrence of event <math>h_j(y)=h_j(x)</math>.

By linearity of expectation,
:<math>\mathbb{E}[CMS[j][h_j(x)]]=f_x+\sum_{y\in\{x_1,x_2,\ldots,x_n\}\setminus\{x\}} f_y \cdot \Pr[h_j(y)=h_j(x)]</math>.
Due to Uniform Hash Assumption (UHA), <math>h_j: U\to[m]</math> is a uniform random function. For any <math>y\neq x</math>, the probability of hash collision is
:<math>\Pr[h_j(y)=h_j(x)]=\frac{1}{m}</math>.
Therefore,
:<math>
\begin{align}
\mathbb{E}[CMS[j][h_j(x)]]
&=f_x+\frac{1}{m}\sum_{y\in\{x_1,\ldots,x_n\}\setminus\{x\}} f_y \\
&\le f_x+\frac{1}{m}\sum_{y\in\{x_1,\ldots,x_n\}} f_y\\
&=f_x+\frac{n}{m},
\end{align}
</math>
where the last equation is due to the obvious identity <math>\sum_{y\in\{x_1,\ldots,x_n\}}f_y=n</math>.
}}
The above proposition shows that for any <math>x\in U</math> and every <math>1\le j\le k</math>
:<math>\mathbb{E}\left[CMS[j][h_j(x)]-f_x\right]\le \frac{n}{m}</math>.
Recall that <math>CMS[j][h_j(x)]\ge f_x</math> always holds, thus <math>CMS[j][h_j(x)]-f_x</math> is a positive random variable. By Markov's inequality, we have
:<math>\Pr[\,CMS[j][h_j(x)]-f_x\ge\epsilon n\,]\le \frac{1}{\epsilon m}</math>.

Combining with above equation <math>({\color{red}\diamondsuit})</math>, we have
:<math>\Pr\left[\,\left|\hat{f}_x- f_x\right|\ge\epsilon n\,\right]=(\Pr[\,CMS[j][h_j(x)]-f_x\ge\epsilon n\,])^k\le \frac{1}{(\epsilon m)^k}</math>.
By setting <math>m=\left\lceil\frac{\mathrm{e}}{\epsilon}\right\rceil</math> and <math>k=\left\lceil\ln\frac{1}{\delta}\right\rceil</math>, the above error probability is bounded as <math>\frac{1}{(\epsilon m)^k}\le\delta</math>.

For any positive <math>\epsilon</math> and <math>\delta</math>, the count-min sketch gives a data structure of size <math>O(km)=O\left(\frac{1}{\epsilon}\log\frac{1}{\delta}\right)</math> (in memory words) and answering each query <math>x\in U</math> in time <math>O(k)=O\left(\frac{1}{\epsilon}\right)</math> with the following accuracy guarantee:
:<math>\Pr\left[\,\left|\hat{f}_x- f_x\right|\le\epsilon n\,\right]\ge 1-\delta</math>.

高级算法 (Fall 2024)/Hashing and Sketching

2024-10-02T12:19:57Z

Kvrmnks: /* Families of universal hash functions */

=Balls into Bins=
The following is the so-called balls into bins model.
Consider throwing <math>m</math> balls into <math>n</math> bins uniformly and independently at random. This is equivalent to a random mapping <math>f:[m]\to[n]</math>. Needless to say, random mapping is an important random model and may have many applications in Computer Science, e.g. hashing.

We are concerned with the following three questions regarding the balls into bins model:
* birthday problem: the probability that every bin contains at most one ball (the mapping is 1-1);
* coupon collector problem: the probability that every bin contains at least one ball (the mapping is on-to);
* occupancy problem: the maximum load of bins.

== Birthday Problem==
We now consider the '''birthday problem'''.
There are <math>m</math> students in the class. Assume that for each student, his/her birthday is uniformly and independently distributed over the 365 days in a years. We wonder what the probability that no two students share a birthday.

Due to the [http://en.wikipedia.org/wiki/Pigeonhole_principle pigeonhole principle], it is obvious that for <math>m>365</math>, there must be two students with the same birthday. Surprisingly, for any <math>m>57</math> this event occurs with more than 99% probability. This is called the [http://en.wikipedia.org/wiki/Birthday_problem '''birthday paradox''']. Despite the name, the birthday paradox is not a real paradox.

We can model this problem as a balls-into-bins problem. <math>m</math> different balls (students) are uniformly and independently thrown into 365 bins (days). More generally, let <math>n</math> be the number of bins. We ask for the probability of the following event <math>\mathcal{E}</math>

* <math>\mathcal{E}</math>: there is no bin with more than one balls (i.e. no two students share birthday).

We first analyze this by counting. There are totally <math>n^m</math> ways of assigning <math>m</math> balls to <math>n</math> bins. The number of assignments that no two balls share a bin is <math>{n\choose m}m!</math>.

Thus the probability is given by:
:<math>\begin{align}
\Pr[\mathcal{E}]
=
\frac{{n\choose m}m!}{n^m}.
\end{align}
</math>

Recall that <math>{n\choose m}=\frac{n!}{(n-m)!m!}</math>. Then
:<math>\begin{align}
\Pr[\mathcal{E}]
=
\frac{{n\choose m}m!}{n^m}
=
\frac{n!}{n^m(n-m)!}
=
\frac{n}{n}\cdot\frac{n-1}{n}\cdot\frac{n-2}{n}\cdots\frac{n-(m-1)}{n}
=
\prod_{k=1}^{m-1}\left(1-\frac{k}{n}\right).
\end{align}
</math>

There is also a more "probabilistic" argument for the above equation. Consider again that <math>m</math> students are mapped to <math>n</math> possible birthdays uniformly at random.

The first student has a birthday for sure. The probability that the second student has a different birthday from the first student is <math>\left(1-\frac{1}{n}\right)</math>. Given that the first two students have different birthdays, the probability that the third student has a different birthday from the first two students is <math>\left(1-\frac{2}{n}\right)</math>. Continuing this on, assuming that the first <math>k-1</math> students all have different birthdays, the probability that the <math>k</math>th student has a different birthday than the first <math>k-1</math>, is given by <math>\left(1-\frac{k-1}{n}\right)</math>. By the chain rule, the probability that all <math>m</math> students have different birthdays is:
:<math>\begin{align}
\Pr[\mathcal{E}]=\left(1-\frac{1}{n}\right)\cdot \left(1-\frac{2}{n}\right)\cdots \left(1-\frac{m-1}{n}\right)
&=
\prod_{k=1}^{m-1}\left(1-\frac{k}{n}\right),
\end{align}
</math>
which is the same as what we got by the counting argument.

[[File:Birthday.png|border|450px|right]]

There are several ways of analyzing this formular. Here is a convenient one: Due to [http://en.wikipedia.org/wiki/Taylor_series Taylor's expansion], <math>e^{-k/n}\approx 1-k/n</math>. Then
:<math>\begin{align}
\prod_{k=1}^{m-1}\left(1-\frac{k}{n}\right)
&\approx
\prod_{k=1}^{m-1}e^{-\frac{k}{n}}\\
&=
\exp\left(-\sum_{k=1}^{m-1}\frac{k}{n}\right)\\
&=
e^{-m(m-1)/2n}\\
&\approx
e^{-m^2/2n}.
\end{align}</math>
The quality of this approximation is shown in the Figure.

Therefore, for <math>m=\sqrt{2n\ln \frac{1}{\epsilon}}</math>, the probability that <math>\Pr[\mathcal{E}]\approx\epsilon</math>.

==Universal Hashing ==
Hashing is one of the oldest tools in Computer Science. Knuth's memorandum in 1963 on analysis of hash tables is now considered to be the birth of the area of analysis of algorithms.
* Knuth. Notes on "open" addressing, July 22 1963. Unpublished memorandum.

The idea of hashing is simple: an unknown set <math>S</math> of <math>n</math> data '''items''' (or keys) are drawn from a large '''universe''' <math>U=[N]</math> where <math>N\gg n</math>; in order to store <math>S</math> in a table of <math>M</math> entries (slots), we assume a consistent mapping (called a '''hash function''') from the universe <math>U</math> to a small range <math>[M]</math>.

This idea seems clever: we use a consistent mapping to deal with an arbitrary unknown data set. However, there is a fundamental flaw for hashing.
* For sufficiently large universe (<math>N> M(n-1)</math>), for any function, there exists a bad data set <math>S</math>, such that all items in <math>S</math> are mapped to the same entry in the table.

A simple use of pigeonhole principle can prove the above statement.

To overcome this situation, randomization is introduced into hashing. We assume that the hash function is a random mapping from <math>[N]</math> to <math>[M]</math>. In order to ease the analysis, the following ideal assumption is used:

'''Simple Uniform Hash Assumption''' ('''SUHA''' or '''UHA''', a.k.a. the random oracle model):
:A ''uniform'' random function <math>h:[N]\rightarrow[M]</math> is available and the computation of <math>h</math> is efficient.

=== Families of universal hash functions ===
The assumption of completely random function simplifies the analysis. However, in practice, truly uniform random hash function is extremely expensive to compute and store. Thus, this simple assumption can hardly represent the reality.

There are two approaches for implementing practical hash functions. One is to use ''ad hoc'' implementations and wish they may work. The other approach is to construct class of hash functions which are efficient to compute and store but with weaker randomness guarantees, and then analyze the applications of hash functions based on this weaker assumption of randomness.

This route was took by Carter and Wegman in 1977 while they introduced universal families of hash functions.

{{Theorem
|Definition (universal hash families)|
:Let <math>[N]</math> be a universe with <math>N\ge M</math>. A family of hash functions <math>\mathcal{H}</math> from <math>[N]</math> to <math>[M]</math> is said to be '''<math>k</math>-universal''' if, for any distinct items <math>x_1,x_2,\ldots,x_k\in [N]</math> and for a hash function <math>h</math> chosen uniformly at random from <math>\mathcal{H}</math>, we have
::<math>
\Pr[h(x_1)=h(x_2)=\cdots=h(x_k)]\le\frac{1}{M^{k-1}}.
</math>

:A family of hash functions <math>\mathcal{H}</math> from <math>[N]</math> to <math>[M]</math> is said to be '''strongly <math>k</math>-universal''' if, for any distinct items <math>x_1,x_2,\ldots,x_k\in [N]</math>, any values <math>y_1,y_2,\ldots,y_k\in[M]</math>, and for a hash function <math>h</math> chosen uniformly at random from <math>\mathcal{H}</math>, we have
::<math>
\Pr[h(x_1)=y_1\wedge h(x_2)=y_2 \wedge \cdots \wedge h(x_k)=y_k]=\frac{1}{M^{k}}.
</math>
}}
In particular, for a 2-universal family <math>\mathcal{H}</math>, for any distinct elements <math>x_1,x_2\in[N]</math>, a uniform random <math>h\in\mathcal{H}</math> has
:<math>
\Pr[h(x_1)=h(x_2)]\le\frac{1}{M}.
</math>
For a strongly 2-universal family <math>\mathcal{H}</math>, for any distinct elements <math>x_1,x_2\in[N]</math> and any values <math>y_1,y_2\in[M]</math>, a uniform random <math>h\in\mathcal{H}</math> has
:<math>
\Pr[h(x_1)=y_1\wedge h(x_2)=y_2]=\frac{1}{M^2}.
</math>
This behavior is exactly the same as uniform random hash functions on any distinct pair of inputs. For this reason, a strongly 2-universal hash family are also called pairwise independent hash functions.

=== 2-universal hash families ===

The construction of pairwise independent random variables via modulo a prime introduced in Section 1 already provides a way of constructing a strongly 2-universal hash family.

Let <math>p</math> be a prime. The function <math>h_{a,b}:[p]\rightarrow [p]</math> is defined by
:<math>
h_{a,b}(x)=(ax+b)\bmod p,
</math>
and the family is
:<math>
\mathcal{H}=\{h_{a,b}\mid a,b\in[p]\}.
</math>

{{Theorem
|Lemma|
:<math>\mathcal{H}</math> is strongly 2-universal.
}}
{{Proof| In Section 1, we have proved the pairwise independence of the sequence of <math>(a i+b)\bmod p</math>, for <math>i=0,1,\ldots, p-1</math>, which directly implies that <math>\mathcal{H}</math> is strongly 2-universal.
}}

;The original construction of Carter-Wegman
What if we want to have hash functions from <math>[N]</math> to <math>[M]</math> for non-prime <math>N</math> and <math>M</math>? Carter and Wegman developed the following method.

Suppose that the universe is <math>[N]</math>, and the functions map <math>[N]</math> to <math>[M]</math>, where <math>N\ge M</math>. For some prime <math>p\ge N</math>, let
:<math>
h_{a,b}(x)=((ax+b)\bmod p)\bmod M,
</math>
and the family
:<math>
\mathcal{H}=\{h_{a,b}\mid 1\le a\le p-1, b\in[p]\}.
</math>
Note that unlike the first construction, now <math>a\neq 0</math>.
{{Theorem
|Lemma (Carter-Wegman)|
:<math>\mathcal{H}</math> is 2-universal.
}}
{{Proof| Due to the definition of <math>\mathcal{H}</math>, there are <math>p(p-1)</math> many different hash functions in <math>\mathcal{H}</math>, because each hash function in <math>\mathcal{H}</math> corresponds to a pair of <math>1\le a\le p-1</math> and <math>b\in[p]</math>. We only need to count for any particular pair of <math>x_1,x_2\in[N]</math> that <math>x_1\neq x_2</math>, the number of hash functions that <math>h(x_1)=h(x_2)</math>.

We first note that for any <math>x_1\neq x_2</math>, <math>a x_1+b\not\equiv a x_2+b \pmod p</math>. This is because <math>a x_1+b\equiv a x_2+b \pmod p</math> would imply that <math>a(x_1-x_2)\equiv 0\pmod p</math>, which can never happen since <math>1\le a\le p-1</math> and <math>x_1\neq x_2</math> (note that <math>x_1,x_2\in[N]</math> for an <math>N\le p</math>). Therefore, we can assume that <math>(a x_1+b)\bmod p=u</math> and <math>(a x_2+b)\bmod p=v</math> for <math>u\neq v</math>.

By linear algebra (over finite field), for any <math>x_1,x_2\in[N]</math> that <math>x_1\neq x_2</math>, for any <math>u,v\in[p]</math> that <math>u\neq v</math>, there is exact one solution to <math>(a,b)</math> satisfying:
:<math>
\begin{cases}
a x_1+b \equiv u \pmod p\\
a x_2+b \equiv v \pmod p.
\end{cases}
</math>
After modulo <math>M</math>, every <math>u\in[p]</math> has at most <math>\lceil p/M\rceil -1</math> many <math>v\in[p]</math> that <math>v\neq u</math> but <math>v\equiv u\pmod M</math>. Therefore, for every pair of <math>x_1,x_2\in[N]</math> that <math>x_1\neq x_2</math>, there exist at most <math>p(\lceil p/M\rceil -1)\le p(p-1)/M</math> pairs of <math>1\le a\le p-1</math> and <math>b\in[p]</math> such that <math>((ax_1+b)\bmod p)\bmod M=((ax_2+b)\bmod p)\bmod M</math>, which means there are at most <math> p(p-1)/M</math> many hash functions <math>h\in\mathcal{H}</math> having <math>h(x_1)=h(x_2)</math> for <math>x_1\neq x_2</math>. For <math>h</math> uniformly chosen from <math>\mathcal{H}</math>, for any <math>x_1\neq x_2</math>,
:<math>
\Pr[h(x_1)=h(x_2)]\le \frac{p(p-1)/M}{p(p-1)}=\frac{1}{M}.
</math>
We prove that <math>\mathcal{H}</math> is 2-universal.
}}

;A construction used in practice
The main issue of Carter-Wegman construction is the efficiency. The mod operation is very slow, and has been so for more than 30 years.

The following construction is due to Dietzfelbinger ''et al''. It was published in 1997 and has been practically used in various applications of universal hashing.

The family of hash functions is from <math>[2^u]</math> to <math>[2^v]</math>. With a binary representation, the functions map binary strings of length <math>u</math> to binary strings of length <math>v</math>.
Let
:<math>
h_{a}(x)=\left\lfloor\frac{a\cdot x\bmod 2^u}{2^{u-v}}\right\rfloor,
</math>
and the family
:<math>
\mathcal{H}=\{h_{a}\mid a\in[2^v]\mbox{ and }a\mbox{ is odd}\}.
</math>

This family of hash functions does not exactly meet the requirement of 2-universal family. However, Dietzfelbinger ''et al'' proved that <math>\mathcal{H}</math> is close to a 2-universal family. Specifically, for any input values <math>x_1,x_2\in[2^u]</math>, for a uniformly random <math>h\in\mathcal{H}</math>,
:<math>
\Pr[h(x_1)=h(x_2)]\le\frac{1}{2^{v-1}}.
</math>
So <math>\mathcal{H}</math> is within an approximation ratio of 2 to being 2-universal. The proof uses the fact that odd numbers are relative prime to a power of 2.

The function is extremely simple to compute in c language.
We exploit that C-multiplication (*) of unsigned u-bit numbers is done <math>\bmod 2^u</math>, and have a one-line C-code for computing the hash function:
h_a(x) = (a*x)>>(u-v)
The bit-wise shifting is a lot faster than modular. It explains the popularity of this scheme in practice than the original Carter-Wegman construction.

== Collision number ==
Consider a 2-universal family <math>\mathcal{H}</math> of hash functions from <math>[N]</math> to <math>[M]</math>. Let <math>h</math> be a hash function chosen uniformly from <math>\mathcal{H}</math>. For a fixed set <math>S</math> of <math>n</math> distinct elements from <math>[N]</math>, say <math>S=\{x_1,x_2,\ldots,x_n\}</math>, the elements are mapped to the hash values <math>h(x_1), h(x_2), \ldots, h(x_n)</math>. This can be seen as throwing <math>n</math> balls to <math>M</math> bins, with pairwise independent choices of bins.

As in the balls-into-bins with full independence, we are curious about the questions such as the birthday problem or the maximum load. These questions are interesting not only because they are natural to ask in a balls-into-bins setting, but in the context of hashing, they are closely related to the performance of hash functions.

The old techniques for analyzing balls-into-bins rely too much on the independence of the choice of the bin for each ball, therefore can hardly be extended to the setting of 2-universal hash families. However, it turns out several balls-into-bins questions can somehow be answered by analyzing a very natural quantity: the number of '''collision pairs'''.

A collision pair for hashing is a pair of elements <math>x_1,x_2\in S</math> which are mapped to the same hash value, i.e. <math>h(x_1)=h(x_2)</math>. Formally, for a fixed set of elements <math>S=\{x_1,x_2,\ldots,x_n\}</math>, for any <math>1\le i,j\le n</math>, let the random variable
:<math>
X_{ij}
=
\begin{cases}
1 & \text{if }h(x_i)=h(x_j),\\
0 & \text{otherwise.}
\end{cases}
</math>
The total number of collision pairs among the <math>n</math> items <math>x_1,x_2,\ldots,x_n</math> is
:<math>X=\sum_{i<j} X_{ij}.\,</math>

Since <math>\mathcal{H}</math> is 2-universal, for any <math>i\neq j</math>,
:<math>
\Pr[X_{ij}=1]=\Pr[h(x_i)=h(x_j)]\le\frac{1}{M}.
</math>

The expected number of collision pairs is
:<math>\mathbf{E}[X]=\mathbf{E}\left[\sum_{i<j}X_{ij}\right]=\sum_{i<j}\mathbf{E}[X_{ij}]=\sum_{i<j}\Pr[X_{ij}=1]\le{n\choose 2}\frac{1}{M}<\frac{n^2}{2M}.
</math>

In particular, for <math>n=M</math>, i.e. <math>n</math> items are mapped to <math>n</math> hash values by a pairwise independent hash function, the expected collision number is <math>\mathbf{E}[X]<\frac{n^2}{2M}=\frac{n}{2}</math>.

The above analysis gives us an estimation on the expected number of collision pairs, such that <math>\mathbf{E}[X]<\frac{n^2}{2M}</math>. Apply the Markov's inequality, for <math>0<\epsilon<1</math>, we have
:<math>
\Pr\left[X\ge \frac{n^2}{2\epsilon M}\right]\le\Pr\left[X\ge \frac{1}{\epsilon}\mathbf{E}[X]\right]\le\epsilon.
</math>

When <math>n\le\sqrt{2\epsilon M}</math>, the number of collision pairs is <math>X\ge1</math> with probability at most <math>\epsilon</math>, therefore with probability at least <math>1-\epsilon</math>, there is no collision at all. Therefore, we have the following theorem.
{{Theorem
|Theorem|
:If <math>h</math> is chosen uniformly from a 2-universal family of hash functions mapping the universe <math>[N]</math> to <math>[M]</math> where <math>N\ge M</math>, then for any set <math>S\subset [N]</math> of <math>n</math> items, where <math>n\le\sqrt{2\epsilon M}</math>, the probability that there exits a collision pair is
::<math>
\Pr[\mbox{collision occurs}]\le\epsilon.
</math>
}}

Recall that for mutually independent choices of bins, for some <math>n=\sqrt{2M\ln(1/\epsilon)}</math>, the probability that a collision occurs is about <math>\epsilon</math>. For constant <math>\epsilon</math>, this gives an essentially same bound as the pairwise independent setting. Therefore,
the behavior of pairwise independent hash function is essentially the same as the uniform random hash function for the birthday problem. This is easy to understand, because birthday problem is about the behavior of collisions, and the definition of 2-universal hash function can be interpreted as "functions that the probability of collision is as low as a uniform random function".

= Set Membership=
A basic question in Computer Science is:
:"<math>\mbox{Is }x\in S?</math>"
for a set <math>S</math> and an element <math>x</math>. This is the '''set membership''' problem.

Formally, given an arbitrary set <math>S</math> of <math>n</math> elements from a universe <math>U</math>, we want to use a succinct '''data structure''' to represent this set <math>S</math>, so that upon each '''query''' of any element <math>x</math> from the universe <math>[N]</math>, the question of whether <math>x\in S</math> is efficiently answered. The complexity of such data structure is measured in two-fold:
* '''space cost''': size of the data structure to represent a set <math>S</math> of size <math>n</math>;
* '''time cost''': time complexity of answering each query by accessing to the data structure.

Suppose that the universe <math>U</math> is of size <math>N</math>. Clearly, the membership problem can be solved by a '''dictionary data structure''', e.g.:
* '''sorted table / balanced search tree''': with space cost <math>O(n\log N)</math> bits and time cost <math>O(\log n)</math>.

Note that <math>\log{N\choose n}=\Theta\left(n\log \frac{N}{n}\right)</math> is the entropy of sets <math>S</math> of <math>n</math> elements from a universe <math>U</math> of size <math>N</math>. Therefore it is necessary to use so many bits to represent a set without losing any information.
With hashing, we can solve this fundamental problem with asymptotic optimal space cost and time cost at the same time.

== Perfect hashing using quadratic space==
The idea of perfect hashing is that we use a hash function <math>h</math> to map the <math>n</math> items to distinct entries of the table; store every item <math>x\in S</math> in the entry <math>h(x)</math>; and also store the hash function <math>h</math> in a fixed location in the table (usually the beginning of the table). The algorithm for searching for an item is as follows:

:search for <math>x</math> in table <math>T</math>:
# retrieve <math>h</math> from a fixed location in the table;
# if <math>x=T[h(x)]</math> return <math>h(x)</math>; else return NOT_FOUND;

This scheme works as long as that the hash function satisfies the following two conditions:
* The description of <math>h</math> is sufficiently short, so that <math>h</math> can be stored in one entry (or in constant many entries) of the table.
* <math>h</math> has no collisions on <math>S</math>, i.e. there is no pair of items <math>x_1,x_2\in S</math> that are mapped to the same value by <math>h</math>.

The first condition is easy to guarantee for 2-universal hash families. As shown by Carter-Wegman construction, a 2-universal hash function can be uniquely represented by two integers <math>a</math> and <math>b</math>, which can be stored in two entries (or just one, if the word length is sufficiently large) of the table.

Our discussion is now focused on the second condition. We find that it relies on the ''perfectness'' of the hash function for a data set <math>S</math>.

A hash function <math>h</math> is '''perfect''' for a set <math>S</math> of items if <math>h</math> maps all items in <math>S</math> to different values, i.e. there is no collision.

We have shown by the birthday problem for 2-universal hashing that when <math>n</math> items are mapped to <math>n^2</math> values, for an <math>h</math> chosen uniformly from a 2-universal family of hash functions, the probability that a collision occurs is at most 1/2. Thus
:<math>
\Pr[h\mbox{ is perfect for }S]\ge\frac{1}{2}
</math>
for a table of <math>n^2</math> entries.

The construction of perfect hashing is straightforward then:
:For a set <math>S</math> of <math>n</math> elements:
# uniformly choose an <math>h</math> from a 2-universal family <math>\mathcal{H}</math>; (for Carter-Wegman's construction, it means uniformly choose two integer <math>1\le a\le p-1</math> and <math>b\in[p]</math> for a sufficiently large prime <math>p</math>.)
# check whether <math>h</math> is perfect for <math>S</math>;
# if <math>h</math> is NOT perfect for <math>S</math>, start over again; otherwise, construct the table;

This is a Las Vegas randomized algorithm, which construct a perfect hashing for a fixed set <math>S</math> with expectedly at most two trials (due to geometric distribution). The resulting data structure is a <math>O(n^2)</math>-size static dictionary of <math>n</math> elements which answers every search in deterministic <math>O(1)</math> time.

== FKS perfect hashing ==
In the last section we see how to use <math>O(n^2)</math> space and constant time for answering search in a set. Now we see how to do it with linear space and constant time. This solves the problem of searching asymptotically optimal for both time and space.

This was once seemingly impossible, until Yao's seminal paper:
*Yao. Should tables be sorted? ''Journal of the ACM (JACM)'', 1981.

Yao's paper shows a possibility of achieving linear space and constant time at the same time by exploiting the power of hashing, but assumes an unrealistically large universe.

Inspired by Yao's work, Fredman, Komlós, and Szemerédi discover the first linear-space and constant-time static dictionary in a realistic setting:
* Fredman, Komlós, and Szemerédi. Storing a sparse table with O(1) worst case access time. ''Journal of the ACM (JACM)'', 1984.

The idea of FKS hashing is to arrange hash table in two levels:
* In the first level, <math>n</math> items are hashed to <math>n</math> ''buckets'' by a 2-universal hash function <math>h</math>.
: Let <math>B_i</math> be the set of items hashed to the <math>i</math>th bucket.
* In the second level, construct a <math>|B_i|^2</math>-size perfect hashing for each bucket <math>B_i</math>.

The data structure can be stored in a table. The first few entries are reserved to store the primary hash function <math>h</math>. To help the searching algorithm locate a bucket, we use the next <math>n</math> entries of the table as the "pointers" to the bucket: each entry stores the address of the first entry of the space to store a bucket. In the rest of table, the <math>n</math> buckets are stored in order, each using a <math>|B_i|^2</math> space as required by perfect hashing.

::[[File:FKS.png|600px]]

It is easy to see that the search time is constant. To search for an item <math>x</math>, the algorithm does the followings:
* Retrieve <math>h</math>.
* Retrieve the address for bucket <math>h(x)</math>.
* Search by perfect hashing within bucket <math>h(x)</math>.
Each line takes constant time. So the worst-case search time is constant.

We then need to guarantee that the space is linear to <math>n</math>. At the first glance, this seems impossible because each instance of perfect hashing for a bucket costs a square-size of space. We will prove that although the individual buckets use square-sized spaces, the sum of the them is still linear.

For a fixed set <math>S</math> of <math>n</math> items, for a hash function <math>h</math> chosen uniformly from a 2-universe family which maps the items to <math>[n]</math>, called <math>n</math> ''buckets'', let <math>Y_i=|B_i|</math> be the number of items in <math>S</math> mapped to the <math>i</math>th bucket.
We are going to bound the following quantity:
:<math>
Y=\sum_{i=1}^n Y_i^2.
</math>
Since each bucket <math>B_i</math> use a space of <math>Y_i^2</math> for perfect hashing. <math>Y</math> gives the size of the space for storing the buckets.

We will show that <math>Y</math> is related to the total number of collision pairs. (Indeed, the number of collision pairs can be computed by a degree-2 polynomial, just like <math>Y</math>.)

Note that a bucket of <math>Y_i</math> items contributes <math>{Y_i\choose 2}</math> collision pairs. Let <math>X</math> be the total number of collision pairs.
<math>X</math> can be computed by summing over the collision pairs in every bucket:
:<math>
X=\sum_{i=1}^n{Y_i\choose 2}=\sum_{i=1}^n\frac{Y_i(Y_i-1)}{2}=\frac{1}{2}\left(\sum_{i=1}^nY_i^2-\sum_{i=1}^nY_i\right)=\frac{1}{2}\left(\sum_{i=1}^nY_i^2-n\right).
</math>

Therefore, the sum of squares of the sizes of buckets is related to collision number by:
:<math>
\sum_{i=1}^nY_i^2=2X+n.
</math>
By our analysis of the collision number, we know that for <math>n</math> items mapped to <math>n</math> buckets, the expected number of collision pairs is: <math>\mathbf{E}[X]\le \frac{n}{2}</math>.
Thus,
:<math>
\mathbf{E}\left[\sum_{i=1}^nY_i^2\right]=\mathbf{E}[2X+n]\le 2n.
</math>
Due to Markov's inequality, <math>\sum_{i=1}^nY_i^2=O(n)</math> with a constant probability. For any set <math>S</math>, we can find a suitable <math>h</math> after expected constant number of trials, and FKS can be constructed with guaranteed (instead of expected) linear-size which answers each search in constant time.

== Bloom filter ==
Now we consider the lossy representation of the original data set <math>S</math>, to further save the space usage. Such lossy data structure is sometimes called a '''''sketch'''''.

The Bloom filter is such a lossy data structure. It is a space-efficient hash table that solves the '''approximate membership''' problem with one-sided error (''false positive'').

Given a set <math>S</math> of <math>n</math> elements from a universe <math>U</math>, a Bloom filter consists of an array <math>A</math> of <math>cn</math> bits, and <math>k</math> hash functions <math>h_1,h_2,\ldots,h_k</math> map <math>U</math> to <math>[cn]</math>, where both <math>c</math> and <math>k</math> are parameters that we can try to optimize later.

As before, we assume the '''Uniform Hash Assumption (UHA)''': <math>h_1,h_2,\ldots,h_k</math> are mutually independent hash function where each <math>h_i</math> is a uniform random hash function <math>h_i:U\to[cn]</math>.

The Bloom filter works as follows:
{{Theorem|''Bloom filter'' (Bloom 1970)|
:Suppose <math>h_1,h_2,\ldots,h_k:U\to[cn]</math> are uniform and independent random hash functions.
-----
:'''Data structure construction:''' Given a set <math>S\subset U</math> of size <math>n=|S|</math>, the data structure is a Boolean array <math>A</math> of <math>cn</math> bits constructed as
:* initialize all <math>cn</math> bits of the Boolean array <math>A</math> to 0;
:* for each <math>x\in S</math>, let <math>A[h_i(x)]=1</math> for all <math>1\le i\le k</math>.
----
:'''Query resolution:''' Upon each query of an arbitrary <math>x\in U</math>,
:* answer "yes" if <math>A[h_i(x)]=1</math> for all <math>1\le i\le k</math> and "no" if otherwise.
}}
The Boolean array is our data structure, whose size is <math>cn</math> bits. With Uniform Hash Assumption (UHA), the time cost of the data structure for answering each query is <math>O(k)</math>.

When the answer returned by the algorithm is "no", it holds that <math>A[h_i(x)]=0</math> for some <math>1\le i\le k</math>, in which case the query <math>x</math> must not belong to the set <math>S</math>. Thus, the Bloom filter has no false negatives.

On the other hand, when the answer returned by the algorithm is "yes", <math>A[h_i(x)]=1</math> for all <math>1\le i\le k</math>. It is still possible for some <math>x\not\in S</math> that all bits <math>A[h_i(x)]</math> are set by elements in <math>S</math>. We want to bound such false positive, that is, the following probability for an <math>x\not\in S</math>:
:<math>\Pr[\,\forall 1\le i\le k, A[h_i(x)]=1\,]</math>,
which by independence between different hash functions and by symmetry is equal to:
:<math>\Pr[\, A[h_1(x)]=1\,]^k=(1-\Pr[\, A[h_1(x)]=0\,])^k</math>.
For an element <math>x\not\in S</math>, its hash value <math>h_1(x)</math> is independent of all hash values <math>h_i(y)</math> for all <math>1\le i\le k</math> and all <math>y\in S</math>. This is due to the Uniform Hash Assumption. The hash value <math>h_1(x)</math> of <math>x\not\in S</math> is then independent of the content of the array <math>A</math>. Therefore, the probability of this position <math>A[h_1(x)]</math> missed by all <math>kn</math> updates to the Boolean array <math>A</math> caused by all <math>n</math> elements in <math>S</math> is:
:<math>
\Pr[\, A[h_1(x)]=0\,]=\left(1-\frac{1}{cn}\right)^{kn}\approx e^{-k/c}.
</math>

Putting everything together, for any <math>x\not\in S</math>, the false positive is bounded as:
:<math>
\begin{align}
\Pr[\,\text{wrongly answer ''yes''}\,]
&=\Pr[\,\forall 1\le i\le k, A[h_i(x)]=1\,]\\
&=\Pr[\, A[h_1(x)]=1\,]^k=(1-\Pr[\, A[h_1(x)]=0\,])^k\\
&=\left(1-\left(1-\frac{1}{cn}\right)^{kn}\right)^k\\
&\approx \left(1- e^{-k/c}\right)^k
\end{align}
</math>
which is <math>(0.6185)^c</math> when <math>k=c\ln 2</math>.

Bloom filter solves the membership query with a small constant error of false positives using a data structure of <math>O(n)</math> bits which answers each query with <math>O(1)</math> time cost.

=Distinct Elements=
Consider the following problem of '''counting distinct elements''': Suppose that <math>U</math> is a sufficiently large universe.
*'''Input:''' a sequence of (not necessarily distinct) elements <math>x_1,x_2,\ldots,x_n\in U</math>;
*'''Output:''' an estimation of the total number of distinct elements <math>z=|\{x_1,x_2,\ldots,x_n\}|</math>.

A straightforward way of solving this problem is to maintain a dictionary data structure, which costs at least linear (<math>O(n)</math>) space. For ''big data'', where <math>n</math> is very large, this is still too expensive. However, due to an information-theoretical argument, linear space is necessary if you want to compute the ''exact'' value of <math>z</math>.

Our goal is to relax the problem a little bit to significantly reduce the space cost by tolerating ''approximate'' answers. The form of approximation we consider is '''<math>(\epsilon,\delta)</math>-estimator'''.
{{Theorem|<math>(\epsilon,\delta)</math>-estimator|
: A random variable <math>\widehat{Z}</math> is an '''<math>(\epsilon,\delta)</math>-estimator''' of a quantity <math>z</math> if
::<math>\Pr[\,(1-\epsilon)z\le \widehat{Z}\le (1+\epsilon)z\,]\ge 1-\delta</math>.
: <math>\widehat{Z}</math> is said to be an '''unbiased estimator''' of <math>z</math> if <math>\mathbb{E}[\widehat{Z}]=z</math>.
}}
Usually <math>\epsilon</math> is called '''approximation error''' and <math>\delta</math> is called '''confidence error'''.

We now present an elegant algorithm. The algorithm can be implemented in [https://en.wikipedia.org/wiki/Streaming_algorithm '''data stream model''']: The input elements <math>x_1,x_2,\ldots,x_n</math> is presented to the algorithm one at a time, where the size of data <math>n</math> is unknown to the algorithm. The algorithm maintains a value <math>\widehat{Z}</math> which is an <math>(\epsilon,\delta)</math>-estimator of the total number of distinct elements <math>z=|\{x_1,x_2,\ldots,x_n\}|</math>, using only a small amount of memory space to memorize (with loss) the data set <math>\{x_1,x_2,\ldots,x_n\}</math>.

A famous quotation of Flajolet describes the performance of this algorithm as:

"Using only memory equivalent to 5 lines of printed text, you can estimate with a typical accuracy of 5% and in a single pass the total vocabulary of Shakespeare."

== The <math>\min</math>-sketch ==
Suppose that we can access to an idealized random hash function <math>h:U\to[0,1]</math> which is uniformly distributed over all mappings from the universe <math>U</math> to unit interval <math>[0,1]</math>.

Recall that the input sequence <math>x_1,x_2,\ldots,x_n\in U</math> consists of <math>z=|\{x_1,x_2,\ldots,x_n\}|</math> distinct elements. These elements are mapped by the random function <math>h</math> to <math>z</math> hash values uniformly and independently distributed in <math>[0,1]</math>. We could maintain these hash values instead of the original elements, but this would still be too expensive because in the worst case we still have up to <math>n</math> distinct values to maintain. However, due to the idealized random hash function, the unit interval <math>[0,1]</math> will be partitioned into <math>z+1</math> subintervals by these <math>z</math> uniform and independent hash values. The typical length of the subinterval gives an estimation of the number <math>z</math>.

{{Theorem|Proposition|
:<math>\mathbb{E}\left[\min_{1\le i\le n}h(x_i)\right]=\frac{1}{z+1}</math>.
}}
{{Proof|
The input sequence <math>x_1,x_2,\ldots,x_n\in U</math> consisting of <math>z</math> distinct elements are mapped to <math>z</math> random hash values uniformly and independently distributed in <math>[0,1]</math>. These <math>z</math> hash values partition the unit interval <math>[0,1]</math> into <math>z+1</math> subintervals <math>[0,v_1],[v_1,v_2],[v_2,v_3]\ldots,[v_{z-1},v_z],[v_z,1]</math>, where <math>v_i</math> denotes the <math>i</math>-th smallest value among all hash values <math>\{h(x_1),h(x_2),\ldots,h(x_n)\}</math>. Clearly we have
:<math>v_1=\min_{1\le i\le n}h(x_i)</math>.
Meanwhile, since all hash values are uniformly and independently distributed in <math>[0,1]</math>, the lengths of all subintervals <math>v_1, v_2-v_1, v_3-v_2,\ldots, v_z-v_{z-1}, 1-v_z</math> are identically distributed. By symmetry, they have the same expectation, therefore
:<math>
(z+1)\mathbb{E}[v_1]=
\mathbb{E}[v_1]+\sum_{i=1}^{z-1}\mathbb{E}[v_{i+1}-v_i]+\mathbb{E}[1-v_z]
=\mathbb{E}\left[v_1+(v_2-v_1)+(v_3-v_2)+\cdots+(v_{z}-v_{z-1})+1-v_z\right]
=1,
</math>
which implies that
:<math>\mathbb{E}\left[\min_{1\le i\le n}h(x_i)\right]=\mathbb{E}[v_1]=\frac{1}{z+1}</math>.
}}

The quantity <math>\min_{1\le i\le n}h(x_i)</math> can be computed with small space cost (for storing the current smallest hash value) by scan the input sequence in a single pass. Because as we proved its expectation is <math>\frac{1}{z+1}</math>, the smallest hash value <math>Y=\min_{1\le i\le n}h(x_i)</math> gives an unbiased estimator for <math>\frac{1}{z+1}</math>. However, <math>\frac{1}{Y}-1</math> is not necessarily a good estimator for <math>z</math>. Actually, it is a rather poor estimator. Consider for example when <math>z=1</math>, all input elements are the same. In this case, there is only one hash value and <math>Y=\min_{1\le i\le n}h(x_i)</math> is distributed uniformly over <math>[0,1]</math>, thus <math>\frac{1}{Y}-1</math> fails to be close enough to the correct answer 1 with high probability.

==Apply the mean trick to the <math>\min</math>-sketch==
The reason that the above estimator of a single hash function performs poorly is that the unbiased estimator <math>\min_{1\le i\le n}h(x_i)</math> has large variance. So a natural way to reduce this variance is to have multiple independent hash functions and take the average. This generic approach for reducing the variance is called '''the mean trick'''.

Suppose that we can access to <math>k</math> independent random hash functions <math>h_1,h_2,\ldots,h_k</math>, where each <math>h_j: U\to[0,1]</math> is uniformly and independently distributed over all functions mapping <math>U</math> to <math>[0,1]</math>. Here <math>k</math> is a parameter to be fixed by the desired approximation error <math>\epsilon</math> and confidence error <math>\delta</math>. The ''<math>\min</math>-sketch algorithm'' (using the mean trick) is given by the following pseudocode.

{{Theorem|The <math>\min</math>-sketch|
:Suppose that <math>h_1,h_2,\ldots,h_k: U\to[0,1]</math> are <math>k</math> uniform and independent random hash functions, where <math>k</math> is a parameter to be fixed later.
-----
:Scan the input sequence <math>x_1,x_2,\ldots,x_n\in U</math> in a single pass to compute:
::* <math>Y_j=\min_{1\le i\le n}h_j(x_i)</math> for every <math>j=1,2,\ldots,k</math>;
::* average value <math>\overline{Y}=\frac{1}{k}\sum_{j=1}^kY_j</math>;
:return <math>\widehat{Z}=\frac{1}{\overline{Y}}-1</math> as the estimator.
}}

The algorithm is easy to implement in data stream model, with a space cost of storing <math>k</math> hash values. The following theorem guarantees that the algorithm returns an <math>(\epsilon,\delta)</math>-estimator of the total number of distinct elements for a suitable <math>k=O\left(\frac{1}{\epsilon^2\delta}\right)</math>.
{{Theorem|Theorem|
:For any <math>\epsilon,\delta<1/2</math>, if <math>k\ge\left\lceil\frac{4}{\epsilon^2\delta}\right\rceil</math> then the output <math>\widehat{Z}</math> always gives an <math>(\epsilon,\delta)</math>-estimator of the correct answer <math>z</math>.
}}

In the following we prove this main theorem for <math>\min</math>-sketch algorithm.

An obstacle to analyze the estimator <math>\widehat{Z}=\frac{1}{\overline{Y}}-1</math> is that it is a nonlinear function of <math>\overline{Y}</math> who is easier to analyze. Nevertheless, we observe that <math>\widehat{Z}</math> is an <math>(\epsilon,\delta)</math>-estimator of <math>z</math> as long as <math>\overline{Y}</math> is an <math>(\epsilon/2,\delta)</math>-estimator of <math>\frac{1}{z+1}</math>. This can be deduced by just verifying the following:
:<math>\frac{1-\epsilon/2}{z+1}\le \overline{Y}\le \frac{1+\epsilon/2}{z+1} \implies (1-\epsilon)z\le\frac{1}{\overline{Y}}-1\le (1+\epsilon)z</math>,
for <math>\epsilon<\frac{1}{2}</math>. Therefore,
:<math>\Pr\left[\,(1-\epsilon)z\le \widehat{Z} \le (1+\epsilon)z\,\right]\ge \Pr\left[\,\frac{1-\epsilon/2}{z+1}\le \overline{Y}\le \frac{1+\epsilon/2}{z+1}\,\right]
=\Pr\left[\,\left|\overline{Y}-\frac{1}{z+1}\right|\le \frac{\epsilon/2}{z+1}\,\right]</math>.
It is then sufficient to show that <math>\Pr\left[\,\left|\overline{Y}-\frac{1}{z+1}\right|\le \frac{\epsilon/2}{z+1}\,\right]\ge 1-\delta</math> for proving the main theorem above. We will see that this is equivalent to show the concentration inequality
:<math>\Pr\left[\,\left|\overline{Y}-\mathbb{E}\left[\overline{Y}\right]\right|\le \frac{\epsilon/2}{z+1}\,\right]\ge 1-\delta\quad\qquad({\color{red}*})</math>.

{{Theorem|Lemma|
:The followings hold for each <math>Y_j</math>, <math>j=1,2\ldots,k</math>, and <math>\overline{Y}=\frac{1}{k}\sum_{j=1}^kY_j</math>:
:*<math>\mathbb{E}\left[\overline{Y}\right]=\mathbb{E}\left[Y_j\right]=\frac{1}{z+1}</math>;
:*<math>\mathbf{Var}\left[Y_j\right]\le\frac{1}{(z+1)^2}</math>, and consequently <math>\mathbf{Var}\left[\overline{Y}\right]\le\frac{1}{k(z+1)^2}</math>.
}}
{{Proof|
As in the case of single hash function, by symmetry it holds that <math>\mathbb{E}[Y_j]=\frac{1}{z+1}</math> for every <math>j=1,2,\ldots,k</math>. Therefore,
:<math>\mathbb{E}\left[\overline{Y}\right]=\frac{1}{k}\sum_{j=1}^k\mathbb{E}[Y_j]=\frac{1}{z+1}</math>.
Recall that each <math>Y_j</math> is the minimum of <math>z</math> random hash values uniformly and independently distributed over <math>[0,1]</math>. By geometry probability, it holds that for any <math>y\in[0,1]</math>,
:<math>\Pr[Y_j>y]=(1-y)^z</math>,
which means <math>\Pr[Y_j\le y]=1-(1-y)^z</math>. Taking the derivative with respect to <math>y</math>, we obtain the probability density function of random variable <math>Y_j</math>, which is <math>z(1-y)^{z-1}</math>.

We then compute the second moment.
:<math>\mathbb{E}[Y_j^2]=\int^{1}_0y^2z(1-y)^{z-1}\,\mathrm{d}y=\frac{2}{(z+1)(z+2)}</math>.
The variance is bounded as
:<math>\mathbf{Var}\left[Y_j\right]=\mathbb{E}\left[Y_j^2\right]-\mathbb{E}\left[Y_j\right]^2=\frac{2}{(z+1)(z+2)}-\frac{1}{(z+1)^2}\le\frac{1}{(z+1)^2}</math>.
Due to the (pairwise) independence between <math>Y_j</math>'s,
::<math>\mathbf{Var}\left[\overline{Y}\right]=\mathbf{Var}\left[\frac{1}{k}\sum_{j=1}^kY_j\right]=\frac{1}{k^2}\sum_{j=1}^k\mathbf{Var}\left[Y_j\right]\le \frac{1}{k(z+1)^2}</math>.
}}

We resume to prove the inequality <math>({\color{red}*})</math>. By [[高级算法_(Fall 2023)/Basic_deviation_inequalities#Chebyshev.27s_inequality|Chebyshev's inequality]], it holds that
:<math>\Pr\left[\,\left|\overline{Y}-\mathbb{E}\left[\overline{Y}\right]\right|> \frac{\epsilon/2}{z+1}\,\right]
\le\frac{4}{\epsilon^2}(z+1)^2\mathbf{Var}\left[\overline{Y}\right]
\le\frac{4}{\epsilon^2k}</math>.
When <math>k\ge\left\lceil\frac{4}{\epsilon^2\delta}\right\rceil</math>, this probability is at most <math>\delta</math>. The inequality <math>({\color{red}*})</math> is proved. As we discussed above, this proves the above main theorem <math>\min</math>-sketch algorithm improved by the mean trick.

= Frequency Estimation=
Suppose that <math>U</math> is the data universe. The '''frequency estimation''' problem is defined as follows.
*'''Data:''' a sequence of (not necessarily distinct) elements <math>x_1,x_2,\ldots,x_n\in U</math>;
*'''Query:''' an element <math>x\in U</math>;
*'''Output:''' an estimation <math>\hat{f}_x</math> of the frequency <math>f_x\triangleq|\{i\mid x_i=x\}|</math> of <math>x</math> in input data.

We still want to give an algorithm in the data stream model: the algorithm scan the input sequence <math>x_1,x_2,\ldots,x_n</math> to construct a succinct data structure, such that upon each query of <math>x\in U</math>, the algorithm returns an estimation of the frequency <math>f_x</math>.

Clearly this problem can always be solved by storing all appeared distinct elements along with their frequencies. However, the space cost of this straightforward solution is rather high. Instead, we want to use a lossy representation (a ''sketch'') of input data which uses significantly less space but can still answer queries with tolarable accuracy.

Formally, upon each query of <math>x\in U</math>, the algorithm should return an answer <math>\hat{f}_x</math> satisfying:
:<math>\Pr\left[\,\left|\hat{f}_x-f_x\right|\le \epsilon n\,\right]\ge 1-\delta</math>.
Note that this notion of approximation is with bounded ''additive'' error which is weaker than the notion of <math>(\epsilon,\delta)</math>-estimator, whose error bound is ''multiplicative''.

With such weak accuracy guarantee, its is possible to give a succinct data structure whose size is determined only by the error bounds <math>\epsilon</math> and <math>\delta</math> but independent of <math>n</math>, because only the frequencies of those '''heavy hitters''' (elements <math>x</math> with high frequencies <math>f_x>\epsilon n</math>) need to be memorized, and there are at most <math>1/\epsilon</math> many such heavy hitters.

== Count-min sketch==
The [https://en.wikipedia.org/wiki/Count–min_sketch count-min sketch] given by Cormode and Muthukrishnan is an elegant data structure for frequency estimation.

The data structure is a two-dimensional <math>k\times m</math> integer array, where <math>k</math> and <math>m</math> are two parameters to be determined by the error bounds <math>\epsilon</math> and <math>\delta</math>. We still adopt the Uniform Hash Assumption to assume that we have access to <math>k</math> mutually independent uniform random hash functions <math>h_1,h_2,\ldots,h_k: U\to[m]</math>.
{{Theorem|''Count-min sketch'' (Cormode and Muthukrishnan 2003)|
:Suppose <math>h_1,h_2,\ldots,h_k: U\to[m]</math> are uniform and independent random hash functions.
-----
:'''Data structure construction:''' Given a sequence <math>x_1,x_2,\ldots,x_n\in U</math>, the data structure is a two-dimensional <math>k\times m</math> integer array <math>CMS[k][m]</math> constructed as
:*initialize all entries of <math>CMS[k][m]</math> to 0;
:*for <math>i=1,2,\ldots,n</math>, upon receiving <math>x_i</math>:
::: for every <math>1\le j\le k</math>, evaluate <math>h_j(x_i)</math> and <math>CMS[j][h_j(x_i)]++</math>.
----
:'''Query resolution:''' Upon each query of an arbitrary <math>x\in U</math>,
:* return <math>\hat{f}_x=\min_{1\le j\le k}CMS[j][h_j(x)]</math>.
}}

It is easy to see that the space cost of count-min sketch is <math>O(km)</math> memory words, or <math>O(km\log n)</math> bits. Each query is answered within time cost <math>O(k)</math>, assuming that an evaluation of hash function can be done in unit or constant time. We then analyze the error bounds.

First, it is easy to observe that for any query <math>x\in U</math> and every hash function <math>1\le j\le k</math>, it always holds for the corresponding entry in the count-min sketch
:<math>CMS[j][h_j(x)]\ge f_x</math>,
because the appearances of element <math>x</math> in the input sequence contribute at least <math>f_x</math> to the value of <math>CMS[j][h_j(x)]</math>.

Therefore, for any query <math>x\in U</math> it always holds for the answer <math>\hat{f}_x=\min_{1\le j\le k}CMS[j][h_j(x)]\ge f_x</math>, which means
:<math>\Pr\left[\,\left|\hat{f}_x- f_x\right|\ge\epsilon n\,\right]=\Pr\left[\,\hat{f}_x- f_x\ge\epsilon n\,\right]=\prod_{j=1}^k\Pr[\,CMS[j][h_j(x)]-f_x\ge\epsilon n\,],\quad\qquad({\color{red}\diamondsuit})</math>
where the second equation is due to the mutual independence of random hash functions <math>h_1,h_2,\ldots,h_k</math>.

It remains to upper bound the probability <math>\Pr[\,CMS[j][h_j(x)]-f_x\ge\epsilon n\,]</math>, which can be done by calculating the expectation of <math>CMS[j][h_j(x)]</math>.
{{Theorem|Proposition|
:For any <math>x\in U</math> and every <math>1\le j\le k</math>, it holds that <math>\mathbb{E}\left[CMS[j][h_j(x)]\right]\le f_x+\frac{n}{m}</math>.
}}
{{Proof|
The value of <math>CMS[j][h_j(x)]</math> is constituted by the frequency <math>f_x</math> of <math>x</math> and the frequencies <math>f_y</math> of all other elements <math>y\neq x</math> among <math>x_1,x_2,\ldots,x_n</math>, thus
:<math>
\begin{align}
CMS[j][h_j(x)]
&=f_x+\sum_{\scriptstyle y\in\{x_1,\ldots,x_n\}\setminus\{x\}\atop\scriptstyle h_j(y)=h_j(x)} f_y\\
&=f_x+\sum_{y\in\{x_1,\ldots,x_n\}\setminus\{x\}} f_y \cdot I[h_j(y)=h_j(x)]
\end{align}
</math>
where <math>I[h_j(y)=h_j(x)]</math> denotes the Boolean random variable that indicates the occurrence of event <math>h_j(y)=h_j(x)</math>.

By linearity of expectation,
:<math>\mathbb{E}[CMS[j][h_j(x)]]=f_x+\sum_{y\in\{x_1,x_2,\ldots,x_n\}\setminus\{x\}} f_y \cdot \Pr[h_j(y)=h_j(x)]</math>.
Due to Uniform Hash Assumption (UHA), <math>h_j: U\to[m]</math> is a uniform random function. For any <math>y\neq x</math>, the probability of hash collision is
:<math>\Pr[h_j(y)=h_j(x)]=\frac{1}{m}</math>.
Therefore,
:<math>
\begin{align}
\mathbb{E}[CMS[j][h_j(x)]]
&=f_x+\frac{1}{m}\sum_{y\in\{x_1,\ldots,x_n\}\setminus\{x\}} f_y \\
&\le f_x+\frac{1}{m}\sum_{y\in\{x_1,\ldots,x_n\}} f_y\\
&=f_x+\frac{n}{m},
\end{align}
</math>
where the last equation is due to the obvious identity <math>\sum_{y\in\{x_1,\ldots,x_n\}}f_y=n</math>.
}}
The above proposition shows that for any <math>x\in U</math> and every <math>1\le j\le k</math>
:<math>\mathbb{E}\left[CMS[j][h_j(x)]-f_x\right]\le \frac{n}{m}</math>.
Recall that <math>CMS[j][h_j(x)]\ge f_x</math> always holds, thus <math>CMS[j][h_j(x)]-f_x</math> is a positive random variable. By Markov's inequality, we have
:<math>\Pr[\,CMS[j][h_j(x)]-f_x\ge\epsilon n\,]\le \frac{1}{\epsilon m}</math>.

Combining with above equation <math>({\color{red}\diamondsuit})</math>, we have
:<math>\Pr\left[\,\left|\hat{f}_x- f_x\right|\ge\epsilon n\,\right]=(\Pr[\,CMS[j][h_j(x)]-f_x\ge\epsilon n\,])^k\le \frac{1}{(\epsilon m)^k}</math>.
By setting <math>m=\left\lceil\frac{\mathrm{e}}{\epsilon}\right\rceil</math> and <math>k=\left\lceil\ln\frac{1}{\delta}\right\rceil</math>, the above error probability is bounded as <math>\frac{1}{(\epsilon m)^k}\le\delta</math>.

For any positive <math>\epsilon</math> and <math>\delta</math>, the count-min sketch gives a data structure of size <math>O(km)=O\left(\frac{1}{\epsilon}\log\frac{1}{\delta}\right)</math> (in memory words) and answering each query <math>x\in U</math> in time <math>O(k)=O\left(\frac{1}{\epsilon}\right)</math> with the following accuracy guarantee:
:<math>\Pr\left[\,\left|\hat{f}_x- f_x\right|\le\epsilon n\,\right]\ge 1-\delta</math>.

高级算法 (Fall 2024)/Hashing and Sketching

2024-10-02T12:19:12Z

Kvrmnks: /* Families of universal hash functions */ Add "distinct" elements in definition

=Balls into Bins=
The following is the so-called balls into bins model.
Consider throwing <math>m</math> balls into <math>n</math> bins uniformly and independently at random. This is equivalent to a random mapping <math>f:[m]\to[n]</math>. Needless to say, random mapping is an important random model and may have many applications in Computer Science, e.g. hashing.

We are concerned with the following three questions regarding the balls into bins model:
* birthday problem: the probability that every bin contains at most one ball (the mapping is 1-1);
* coupon collector problem: the probability that every bin contains at least one ball (the mapping is on-to);
* occupancy problem: the maximum load of bins.

== Birthday Problem==
We now consider the '''birthday problem'''.
There are <math>m</math> students in the class. Assume that for each student, his/her birthday is uniformly and independently distributed over the 365 days in a years. We wonder what the probability that no two students share a birthday.

Due to the [http://en.wikipedia.org/wiki/Pigeonhole_principle pigeonhole principle], it is obvious that for <math>m>365</math>, there must be two students with the same birthday. Surprisingly, for any <math>m>57</math> this event occurs with more than 99% probability. This is called the [http://en.wikipedia.org/wiki/Birthday_problem '''birthday paradox''']. Despite the name, the birthday paradox is not a real paradox.

We can model this problem as a balls-into-bins problem. <math>m</math> different balls (students) are uniformly and independently thrown into 365 bins (days). More generally, let <math>n</math> be the number of bins. We ask for the probability of the following event <math>\mathcal{E}</math>

* <math>\mathcal{E}</math>: there is no bin with more than one balls (i.e. no two students share birthday).

We first analyze this by counting. There are totally <math>n^m</math> ways of assigning <math>m</math> balls to <math>n</math> bins. The number of assignments that no two balls share a bin is <math>{n\choose m}m!</math>.

Thus the probability is given by:
:<math>\begin{align}
\Pr[\mathcal{E}]
=
\frac{{n\choose m}m!}{n^m}.
\end{align}
</math>

Recall that <math>{n\choose m}=\frac{n!}{(n-m)!m!}</math>. Then
:<math>\begin{align}
\Pr[\mathcal{E}]
=
\frac{{n\choose m}m!}{n^m}
=
\frac{n!}{n^m(n-m)!}
=
\frac{n}{n}\cdot\frac{n-1}{n}\cdot\frac{n-2}{n}\cdots\frac{n-(m-1)}{n}
=
\prod_{k=1}^{m-1}\left(1-\frac{k}{n}\right).
\end{align}
</math>

There is also a more "probabilistic" argument for the above equation. Consider again that <math>m</math> students are mapped to <math>n</math> possible birthdays uniformly at random.

The first student has a birthday for sure. The probability that the second student has a different birthday from the first student is <math>\left(1-\frac{1}{n}\right)</math>. Given that the first two students have different birthdays, the probability that the third student has a different birthday from the first two students is <math>\left(1-\frac{2}{n}\right)</math>. Continuing this on, assuming that the first <math>k-1</math> students all have different birthdays, the probability that the <math>k</math>th student has a different birthday than the first <math>k-1</math>, is given by <math>\left(1-\frac{k-1}{n}\right)</math>. By the chain rule, the probability that all <math>m</math> students have different birthdays is:
:<math>\begin{align}
\Pr[\mathcal{E}]=\left(1-\frac{1}{n}\right)\cdot \left(1-\frac{2}{n}\right)\cdots \left(1-\frac{m-1}{n}\right)
&=
\prod_{k=1}^{m-1}\left(1-\frac{k}{n}\right),
\end{align}
</math>
which is the same as what we got by the counting argument.

[[File:Birthday.png|border|450px|right]]

There are several ways of analyzing this formular. Here is a convenient one: Due to [http://en.wikipedia.org/wiki/Taylor_series Taylor's expansion], <math>e^{-k/n}\approx 1-k/n</math>. Then
:<math>\begin{align}
\prod_{k=1}^{m-1}\left(1-\frac{k}{n}\right)
&\approx
\prod_{k=1}^{m-1}e^{-\frac{k}{n}}\\
&=
\exp\left(-\sum_{k=1}^{m-1}\frac{k}{n}\right)\\
&=
e^{-m(m-1)/2n}\\
&\approx
e^{-m^2/2n}.
\end{align}</math>
The quality of this approximation is shown in the Figure.

Therefore, for <math>m=\sqrt{2n\ln \frac{1}{\epsilon}}</math>, the probability that <math>\Pr[\mathcal{E}]\approx\epsilon</math>.

==Universal Hashing ==
Hashing is one of the oldest tools in Computer Science. Knuth's memorandum in 1963 on analysis of hash tables is now considered to be the birth of the area of analysis of algorithms.
* Knuth. Notes on "open" addressing, July 22 1963. Unpublished memorandum.

The idea of hashing is simple: an unknown set <math>S</math> of <math>n</math> data '''items''' (or keys) are drawn from a large '''universe''' <math>U=[N]</math> where <math>N\gg n</math>; in order to store <math>S</math> in a table of <math>M</math> entries (slots), we assume a consistent mapping (called a '''hash function''') from the universe <math>U</math> to a small range <math>[M]</math>.

This idea seems clever: we use a consistent mapping to deal with an arbitrary unknown data set. However, there is a fundamental flaw for hashing.
* For sufficiently large universe (<math>N> M(n-1)</math>), for any function, there exists a bad data set <math>S</math>, such that all items in <math>S</math> are mapped to the same entry in the table.

A simple use of pigeonhole principle can prove the above statement.

To overcome this situation, randomization is introduced into hashing. We assume that the hash function is a random mapping from <math>[N]</math> to <math>[M]</math>. In order to ease the analysis, the following ideal assumption is used:

'''Simple Uniform Hash Assumption''' ('''SUHA''' or '''UHA''', a.k.a. the random oracle model):
:A ''uniform'' random function <math>h:[N]\rightarrow[M]</math> is available and the computation of <math>h</math> is efficient.

=== Families of universal hash functions ===
The assumption of completely random function simplifies the analysis. However, in practice, truly uniform random hash function is extremely expensive to compute and store. Thus, this simple assumption can hardly represent the reality.

There are two approaches for implementing practical hash functions. One is to use ''ad hoc'' implementations and wish they may work. The other approach is to construct class of hash functions which are efficient to compute and store but with weaker randomness guarantees, and then analyze the applications of hash functions based on this weaker assumption of randomness.

This route was took by Carter and Wegman in 1977 while they introduced universal families of hash functions.

{{Theorem
|Definition (universal hash families)|
:Let <math>[N]</math> be a universe with <math>N\ge M</math>. A family of hash functions <math>\mathcal{H}</math> from <math>[N]</math> to <math>[M]</math> is said to be '''<math>k</math>-universal''' if, for any distinct items <math>x_1,x_2,\ldots,x_k\in [N]</math> and for a hash function <math>h</math> chosen uniformly at random from <math>\mathcal{H}</math>, we have
::<math>
\Pr[h(x_1)=h(x_2)=\cdots=h(x_k)]\le\frac{1}{M^{k-1}}.
</math>

:A family of hash functions <math>\mathcal{H}</math> from <math>[N]</math> to <math>[M]</math> is said to be '''strongly <math>k</math>-universal''' if, for any distinct items <math>x_1,x_2,\ldots,x_k\in [N]</math>, any values <math>y_1,y_2,\ldots,y_k\in[M]</math>, and for a hash function <math>h</math> chosen uniformly at random from <math>\mathcal{H}</math>, we have
::<math>
\Pr[h(x_1)=y_1\wedge h(x_2)=y_2 \wedge \cdots \wedge h(x_k)=y_k]=\frac{1}{M^{k}}.
</math>
}}
In particular, for a 2-universal family <math>\mathcal{H}</math>, for any distinct elements <math>x_1,x_2\in[N]</math>, a uniform random <math>h\in\mathcal{H}</math> has
:<math>
\Pr[h(x_1)=h(x_2)]\le\frac{1}{M}.
</math>
For a strongly 2-universal family <math>\mathcal{H}</math>, for any distinct elements <math>x_1,x_2\in[N]</math> and any values <math>y_1,y_2\in[M]</math>, a uniform random <math>h\in\mathcal{H}</math> has
:<math>
\Pr[h(x_1)=y_1\wedge h(x_2)=y_2]=\frac{1}{M^2}.
</math>
This behavior is exactly the same as uniform random hash functions on any pair of inputs. For this reason, a strongly 2-universal hash family are also called pairwise independent hash functions.

=== 2-universal hash families ===

The construction of pairwise independent random variables via modulo a prime introduced in Section 1 already provides a way of constructing a strongly 2-universal hash family.

Let <math>p</math> be a prime. The function <math>h_{a,b}:[p]\rightarrow [p]</math> is defined by
:<math>
h_{a,b}(x)=(ax+b)\bmod p,
</math>
and the family is
:<math>
\mathcal{H}=\{h_{a,b}\mid a,b\in[p]\}.
</math>

{{Theorem
|Lemma|
:<math>\mathcal{H}</math> is strongly 2-universal.
}}
{{Proof| In Section 1, we have proved the pairwise independence of the sequence of <math>(a i+b)\bmod p</math>, for <math>i=0,1,\ldots, p-1</math>, which directly implies that <math>\mathcal{H}</math> is strongly 2-universal.
}}

;The original construction of Carter-Wegman
What if we want to have hash functions from <math>[N]</math> to <math>[M]</math> for non-prime <math>N</math> and <math>M</math>? Carter and Wegman developed the following method.

Suppose that the universe is <math>[N]</math>, and the functions map <math>[N]</math> to <math>[M]</math>, where <math>N\ge M</math>. For some prime <math>p\ge N</math>, let
:<math>
h_{a,b}(x)=((ax+b)\bmod p)\bmod M,
</math>
and the family
:<math>
\mathcal{H}=\{h_{a,b}\mid 1\le a\le p-1, b\in[p]\}.
</math>
Note that unlike the first construction, now <math>a\neq 0</math>.
{{Theorem
|Lemma (Carter-Wegman)|
:<math>\mathcal{H}</math> is 2-universal.
}}
{{Proof| Due to the definition of <math>\mathcal{H}</math>, there are <math>p(p-1)</math> many different hash functions in <math>\mathcal{H}</math>, because each hash function in <math>\mathcal{H}</math> corresponds to a pair of <math>1\le a\le p-1</math> and <math>b\in[p]</math>. We only need to count for any particular pair of <math>x_1,x_2\in[N]</math> that <math>x_1\neq x_2</math>, the number of hash functions that <math>h(x_1)=h(x_2)</math>.

We first note that for any <math>x_1\neq x_2</math>, <math>a x_1+b\not\equiv a x_2+b \pmod p</math>. This is because <math>a x_1+b\equiv a x_2+b \pmod p</math> would imply that <math>a(x_1-x_2)\equiv 0\pmod p</math>, which can never happen since <math>1\le a\le p-1</math> and <math>x_1\neq x_2</math> (note that <math>x_1,x_2\in[N]</math> for an <math>N\le p</math>). Therefore, we can assume that <math>(a x_1+b)\bmod p=u</math> and <math>(a x_2+b)\bmod p=v</math> for <math>u\neq v</math>.

By linear algebra (over finite field), for any <math>x_1,x_2\in[N]</math> that <math>x_1\neq x_2</math>, for any <math>u,v\in[p]</math> that <math>u\neq v</math>, there is exact one solution to <math>(a,b)</math> satisfying:
:<math>
\begin{cases}
a x_1+b \equiv u \pmod p\\
a x_2+b \equiv v \pmod p.
\end{cases}
</math>
After modulo <math>M</math>, every <math>u\in[p]</math> has at most <math>\lceil p/M\rceil -1</math> many <math>v\in[p]</math> that <math>v\neq u</math> but <math>v\equiv u\pmod M</math>. Therefore, for every pair of <math>x_1,x_2\in[N]</math> that <math>x_1\neq x_2</math>, there exist at most <math>p(\lceil p/M\rceil -1)\le p(p-1)/M</math> pairs of <math>1\le a\le p-1</math> and <math>b\in[p]</math> such that <math>((ax_1+b)\bmod p)\bmod M=((ax_2+b)\bmod p)\bmod M</math>, which means there are at most <math> p(p-1)/M</math> many hash functions <math>h\in\mathcal{H}</math> having <math>h(x_1)=h(x_2)</math> for <math>x_1\neq x_2</math>. For <math>h</math> uniformly chosen from <math>\mathcal{H}</math>, for any <math>x_1\neq x_2</math>,
:<math>
\Pr[h(x_1)=h(x_2)]\le \frac{p(p-1)/M}{p(p-1)}=\frac{1}{M}.
</math>
We prove that <math>\mathcal{H}</math> is 2-universal.
}}

;A construction used in practice
The main issue of Carter-Wegman construction is the efficiency. The mod operation is very slow, and has been so for more than 30 years.

The following construction is due to Dietzfelbinger ''et al''. It was published in 1997 and has been practically used in various applications of universal hashing.

The family of hash functions is from <math>[2^u]</math> to <math>[2^v]</math>. With a binary representation, the functions map binary strings of length <math>u</math> to binary strings of length <math>v</math>.
Let
:<math>
h_{a}(x)=\left\lfloor\frac{a\cdot x\bmod 2^u}{2^{u-v}}\right\rfloor,
</math>
and the family
:<math>
\mathcal{H}=\{h_{a}\mid a\in[2^v]\mbox{ and }a\mbox{ is odd}\}.
</math>

This family of hash functions does not exactly meet the requirement of 2-universal family. However, Dietzfelbinger ''et al'' proved that <math>\mathcal{H}</math> is close to a 2-universal family. Specifically, for any input values <math>x_1,x_2\in[2^u]</math>, for a uniformly random <math>h\in\mathcal{H}</math>,
:<math>
\Pr[h(x_1)=h(x_2)]\le\frac{1}{2^{v-1}}.
</math>
So <math>\mathcal{H}</math> is within an approximation ratio of 2 to being 2-universal. The proof uses the fact that odd numbers are relative prime to a power of 2.

The function is extremely simple to compute in c language.
We exploit that C-multiplication (*) of unsigned u-bit numbers is done <math>\bmod 2^u</math>, and have a one-line C-code for computing the hash function:
h_a(x) = (a*x)>>(u-v)
The bit-wise shifting is a lot faster than modular. It explains the popularity of this scheme in practice than the original Carter-Wegman construction.

== Collision number ==
Consider a 2-universal family <math>\mathcal{H}</math> of hash functions from <math>[N]</math> to <math>[M]</math>. Let <math>h</math> be a hash function chosen uniformly from <math>\mathcal{H}</math>. For a fixed set <math>S</math> of <math>n</math> distinct elements from <math>[N]</math>, say <math>S=\{x_1,x_2,\ldots,x_n\}</math>, the elements are mapped to the hash values <math>h(x_1), h(x_2), \ldots, h(x_n)</math>. This can be seen as throwing <math>n</math> balls to <math>M</math> bins, with pairwise independent choices of bins.

As in the balls-into-bins with full independence, we are curious about the questions such as the birthday problem or the maximum load. These questions are interesting not only because they are natural to ask in a balls-into-bins setting, but in the context of hashing, they are closely related to the performance of hash functions.

The old techniques for analyzing balls-into-bins rely too much on the independence of the choice of the bin for each ball, therefore can hardly be extended to the setting of 2-universal hash families. However, it turns out several balls-into-bins questions can somehow be answered by analyzing a very natural quantity: the number of '''collision pairs'''.

A collision pair for hashing is a pair of elements <math>x_1,x_2\in S</math> which are mapped to the same hash value, i.e. <math>h(x_1)=h(x_2)</math>. Formally, for a fixed set of elements <math>S=\{x_1,x_2,\ldots,x_n\}</math>, for any <math>1\le i,j\le n</math>, let the random variable
:<math>
X_{ij}
=
\begin{cases}
1 & \text{if }h(x_i)=h(x_j),\\
0 & \text{otherwise.}
\end{cases}
</math>
The total number of collision pairs among the <math>n</math> items <math>x_1,x_2,\ldots,x_n</math> is
:<math>X=\sum_{i<j} X_{ij}.\,</math>

Since <math>\mathcal{H}</math> is 2-universal, for any <math>i\neq j</math>,
:<math>
\Pr[X_{ij}=1]=\Pr[h(x_i)=h(x_j)]\le\frac{1}{M}.
</math>

The expected number of collision pairs is
:<math>\mathbf{E}[X]=\mathbf{E}\left[\sum_{i<j}X_{ij}\right]=\sum_{i<j}\mathbf{E}[X_{ij}]=\sum_{i<j}\Pr[X_{ij}=1]\le{n\choose 2}\frac{1}{M}<\frac{n^2}{2M}.
</math>

In particular, for <math>n=M</math>, i.e. <math>n</math> items are mapped to <math>n</math> hash values by a pairwise independent hash function, the expected collision number is <math>\mathbf{E}[X]<\frac{n^2}{2M}=\frac{n}{2}</math>.

The above analysis gives us an estimation on the expected number of collision pairs, such that <math>\mathbf{E}[X]<\frac{n^2}{2M}</math>. Apply the Markov's inequality, for <math>0<\epsilon<1</math>, we have
:<math>
\Pr\left[X\ge \frac{n^2}{2\epsilon M}\right]\le\Pr\left[X\ge \frac{1}{\epsilon}\mathbf{E}[X]\right]\le\epsilon.
</math>

When <math>n\le\sqrt{2\epsilon M}</math>, the number of collision pairs is <math>X\ge1</math> with probability at most <math>\epsilon</math>, therefore with probability at least <math>1-\epsilon</math>, there is no collision at all. Therefore, we have the following theorem.
{{Theorem
|Theorem|
:If <math>h</math> is chosen uniformly from a 2-universal family of hash functions mapping the universe <math>[N]</math> to <math>[M]</math> where <math>N\ge M</math>, then for any set <math>S\subset [N]</math> of <math>n</math> items, where <math>n\le\sqrt{2\epsilon M}</math>, the probability that there exits a collision pair is
::<math>
\Pr[\mbox{collision occurs}]\le\epsilon.
</math>
}}

Recall that for mutually independent choices of bins, for some <math>n=\sqrt{2M\ln(1/\epsilon)}</math>, the probability that a collision occurs is about <math>\epsilon</math>. For constant <math>\epsilon</math>, this gives an essentially same bound as the pairwise independent setting. Therefore,
the behavior of pairwise independent hash function is essentially the same as the uniform random hash function for the birthday problem. This is easy to understand, because birthday problem is about the behavior of collisions, and the definition of 2-universal hash function can be interpreted as "functions that the probability of collision is as low as a uniform random function".

= Set Membership=
A basic question in Computer Science is:
:"<math>\mbox{Is }x\in S?</math>"
for a set <math>S</math> and an element <math>x</math>. This is the '''set membership''' problem.

Formally, given an arbitrary set <math>S</math> of <math>n</math> elements from a universe <math>U</math>, we want to use a succinct '''data structure''' to represent this set <math>S</math>, so that upon each '''query''' of any element <math>x</math> from the universe <math>[N]</math>, the question of whether <math>x\in S</math> is efficiently answered. The complexity of such data structure is measured in two-fold:
* '''space cost''': size of the data structure to represent a set <math>S</math> of size <math>n</math>;
* '''time cost''': time complexity of answering each query by accessing to the data structure.

Suppose that the universe <math>U</math> is of size <math>N</math>. Clearly, the membership problem can be solved by a '''dictionary data structure''', e.g.:
* '''sorted table / balanced search tree''': with space cost <math>O(n\log N)</math> bits and time cost <math>O(\log n)</math>.

Note that <math>\log{N\choose n}=\Theta\left(n\log \frac{N}{n}\right)</math> is the entropy of sets <math>S</math> of <math>n</math> elements from a universe <math>U</math> of size <math>N</math>. Therefore it is necessary to use so many bits to represent a set without losing any information.
With hashing, we can solve this fundamental problem with asymptotic optimal space cost and time cost at the same time.

== Perfect hashing using quadratic space==
The idea of perfect hashing is that we use a hash function <math>h</math> to map the <math>n</math> items to distinct entries of the table; store every item <math>x\in S</math> in the entry <math>h(x)</math>; and also store the hash function <math>h</math> in a fixed location in the table (usually the beginning of the table). The algorithm for searching for an item is as follows:

:search for <math>x</math> in table <math>T</math>:
# retrieve <math>h</math> from a fixed location in the table;
# if <math>x=T[h(x)]</math> return <math>h(x)</math>; else return NOT_FOUND;

This scheme works as long as that the hash function satisfies the following two conditions:
* The description of <math>h</math> is sufficiently short, so that <math>h</math> can be stored in one entry (or in constant many entries) of the table.
* <math>h</math> has no collisions on <math>S</math>, i.e. there is no pair of items <math>x_1,x_2\in S</math> that are mapped to the same value by <math>h</math>.

The first condition is easy to guarantee for 2-universal hash families. As shown by Carter-Wegman construction, a 2-universal hash function can be uniquely represented by two integers <math>a</math> and <math>b</math>, which can be stored in two entries (or just one, if the word length is sufficiently large) of the table.

Our discussion is now focused on the second condition. We find that it relies on the ''perfectness'' of the hash function for a data set <math>S</math>.

A hash function <math>h</math> is '''perfect''' for a set <math>S</math> of items if <math>h</math> maps all items in <math>S</math> to different values, i.e. there is no collision.

We have shown by the birthday problem for 2-universal hashing that when <math>n</math> items are mapped to <math>n^2</math> values, for an <math>h</math> chosen uniformly from a 2-universal family of hash functions, the probability that a collision occurs is at most 1/2. Thus
:<math>
\Pr[h\mbox{ is perfect for }S]\ge\frac{1}{2}
</math>
for a table of <math>n^2</math> entries.

The construction of perfect hashing is straightforward then:
:For a set <math>S</math> of <math>n</math> elements:
# uniformly choose an <math>h</math> from a 2-universal family <math>\mathcal{H}</math>; (for Carter-Wegman's construction, it means uniformly choose two integer <math>1\le a\le p-1</math> and <math>b\in[p]</math> for a sufficiently large prime <math>p</math>.)
# check whether <math>h</math> is perfect for <math>S</math>;
# if <math>h</math> is NOT perfect for <math>S</math>, start over again; otherwise, construct the table;

This is a Las Vegas randomized algorithm, which construct a perfect hashing for a fixed set <math>S</math> with expectedly at most two trials (due to geometric distribution). The resulting data structure is a <math>O(n^2)</math>-size static dictionary of <math>n</math> elements which answers every search in deterministic <math>O(1)</math> time.

== FKS perfect hashing ==
In the last section we see how to use <math>O(n^2)</math> space and constant time for answering search in a set. Now we see how to do it with linear space and constant time. This solves the problem of searching asymptotically optimal for both time and space.

This was once seemingly impossible, until Yao's seminal paper:
*Yao. Should tables be sorted? ''Journal of the ACM (JACM)'', 1981.

Yao's paper shows a possibility of achieving linear space and constant time at the same time by exploiting the power of hashing, but assumes an unrealistically large universe.

Inspired by Yao's work, Fredman, Komlós, and Szemerédi discover the first linear-space and constant-time static dictionary in a realistic setting:
* Fredman, Komlós, and Szemerédi. Storing a sparse table with O(1) worst case access time. ''Journal of the ACM (JACM)'', 1984.

The idea of FKS hashing is to arrange hash table in two levels:
* In the first level, <math>n</math> items are hashed to <math>n</math> ''buckets'' by a 2-universal hash function <math>h</math>.
: Let <math>B_i</math> be the set of items hashed to the <math>i</math>th bucket.
* In the second level, construct a <math>|B_i|^2</math>-size perfect hashing for each bucket <math>B_i</math>.

The data structure can be stored in a table. The first few entries are reserved to store the primary hash function <math>h</math>. To help the searching algorithm locate a bucket, we use the next <math>n</math> entries of the table as the "pointers" to the bucket: each entry stores the address of the first entry of the space to store a bucket. In the rest of table, the <math>n</math> buckets are stored in order, each using a <math>|B_i|^2</math> space as required by perfect hashing.

::[[File:FKS.png|600px]]

It is easy to see that the search time is constant. To search for an item <math>x</math>, the algorithm does the followings:
* Retrieve <math>h</math>.
* Retrieve the address for bucket <math>h(x)</math>.
* Search by perfect hashing within bucket <math>h(x)</math>.
Each line takes constant time. So the worst-case search time is constant.

We then need to guarantee that the space is linear to <math>n</math>. At the first glance, this seems impossible because each instance of perfect hashing for a bucket costs a square-size of space. We will prove that although the individual buckets use square-sized spaces, the sum of the them is still linear.

For a fixed set <math>S</math> of <math>n</math> items, for a hash function <math>h</math> chosen uniformly from a 2-universe family which maps the items to <math>[n]</math>, called <math>n</math> ''buckets'', let <math>Y_i=|B_i|</math> be the number of items in <math>S</math> mapped to the <math>i</math>th bucket.
We are going to bound the following quantity:
:<math>
Y=\sum_{i=1}^n Y_i^2.
</math>
Since each bucket <math>B_i</math> use a space of <math>Y_i^2</math> for perfect hashing. <math>Y</math> gives the size of the space for storing the buckets.

We will show that <math>Y</math> is related to the total number of collision pairs. (Indeed, the number of collision pairs can be computed by a degree-2 polynomial, just like <math>Y</math>.)

Note that a bucket of <math>Y_i</math> items contributes <math>{Y_i\choose 2}</math> collision pairs. Let <math>X</math> be the total number of collision pairs.
<math>X</math> can be computed by summing over the collision pairs in every bucket:
:<math>
X=\sum_{i=1}^n{Y_i\choose 2}=\sum_{i=1}^n\frac{Y_i(Y_i-1)}{2}=\frac{1}{2}\left(\sum_{i=1}^nY_i^2-\sum_{i=1}^nY_i\right)=\frac{1}{2}\left(\sum_{i=1}^nY_i^2-n\right).
</math>

Therefore, the sum of squares of the sizes of buckets is related to collision number by:
:<math>
\sum_{i=1}^nY_i^2=2X+n.
</math>
By our analysis of the collision number, we know that for <math>n</math> items mapped to <math>n</math> buckets, the expected number of collision pairs is: <math>\mathbf{E}[X]\le \frac{n}{2}</math>.
Thus,
:<math>
\mathbf{E}\left[\sum_{i=1}^nY_i^2\right]=\mathbf{E}[2X+n]\le 2n.
</math>
Due to Markov's inequality, <math>\sum_{i=1}^nY_i^2=O(n)</math> with a constant probability. For any set <math>S</math>, we can find a suitable <math>h</math> after expected constant number of trials, and FKS can be constructed with guaranteed (instead of expected) linear-size which answers each search in constant time.

== Bloom filter ==
Now we consider the lossy representation of the original data set <math>S</math>, to further save the space usage. Such lossy data structure is sometimes called a '''''sketch'''''.

The Bloom filter is such a lossy data structure. It is a space-efficient hash table that solves the '''approximate membership''' problem with one-sided error (''false positive'').

Given a set <math>S</math> of <math>n</math> elements from a universe <math>U</math>, a Bloom filter consists of an array <math>A</math> of <math>cn</math> bits, and <math>k</math> hash functions <math>h_1,h_2,\ldots,h_k</math> map <math>U</math> to <math>[cn]</math>, where both <math>c</math> and <math>k</math> are parameters that we can try to optimize later.

As before, we assume the '''Uniform Hash Assumption (UHA)''': <math>h_1,h_2,\ldots,h_k</math> are mutually independent hash function where each <math>h_i</math> is a uniform random hash function <math>h_i:U\to[cn]</math>.

The Bloom filter works as follows:
{{Theorem|''Bloom filter'' (Bloom 1970)|
:Suppose <math>h_1,h_2,\ldots,h_k:U\to[cn]</math> are uniform and independent random hash functions.
-----
:'''Data structure construction:''' Given a set <math>S\subset U</math> of size <math>n=|S|</math>, the data structure is a Boolean array <math>A</math> of <math>cn</math> bits constructed as
:* initialize all <math>cn</math> bits of the Boolean array <math>A</math> to 0;
:* for each <math>x\in S</math>, let <math>A[h_i(x)]=1</math> for all <math>1\le i\le k</math>.
----
:'''Query resolution:''' Upon each query of an arbitrary <math>x\in U</math>,
:* answer "yes" if <math>A[h_i(x)]=1</math> for all <math>1\le i\le k</math> and "no" if otherwise.
}}
The Boolean array is our data structure, whose size is <math>cn</math> bits. With Uniform Hash Assumption (UHA), the time cost of the data structure for answering each query is <math>O(k)</math>.

When the answer returned by the algorithm is "no", it holds that <math>A[h_i(x)]=0</math> for some <math>1\le i\le k</math>, in which case the query <math>x</math> must not belong to the set <math>S</math>. Thus, the Bloom filter has no false negatives.

On the other hand, when the answer returned by the algorithm is "yes", <math>A[h_i(x)]=1</math> for all <math>1\le i\le k</math>. It is still possible for some <math>x\not\in S</math> that all bits <math>A[h_i(x)]</math> are set by elements in <math>S</math>. We want to bound such false positive, that is, the following probability for an <math>x\not\in S</math>:
:<math>\Pr[\,\forall 1\le i\le k, A[h_i(x)]=1\,]</math>,
which by independence between different hash functions and by symmetry is equal to:
:<math>\Pr[\, A[h_1(x)]=1\,]^k=(1-\Pr[\, A[h_1(x)]=0\,])^k</math>.
For an element <math>x\not\in S</math>, its hash value <math>h_1(x)</math> is independent of all hash values <math>h_i(y)</math> for all <math>1\le i\le k</math> and all <math>y\in S</math>. This is due to the Uniform Hash Assumption. The hash value <math>h_1(x)</math> of <math>x\not\in S</math> is then independent of the content of the array <math>A</math>. Therefore, the probability of this position <math>A[h_1(x)]</math> missed by all <math>kn</math> updates to the Boolean array <math>A</math> caused by all <math>n</math> elements in <math>S</math> is:
:<math>
\Pr[\, A[h_1(x)]=0\,]=\left(1-\frac{1}{cn}\right)^{kn}\approx e^{-k/c}.
</math>

Putting everything together, for any <math>x\not\in S</math>, the false positive is bounded as:
:<math>
\begin{align}
\Pr[\,\text{wrongly answer ''yes''}\,]
&=\Pr[\,\forall 1\le i\le k, A[h_i(x)]=1\,]\\
&=\Pr[\, A[h_1(x)]=1\,]^k=(1-\Pr[\, A[h_1(x)]=0\,])^k\\
&=\left(1-\left(1-\frac{1}{cn}\right)^{kn}\right)^k\\
&\approx \left(1- e^{-k/c}\right)^k
\end{align}
</math>
which is <math>(0.6185)^c</math> when <math>k=c\ln 2</math>.

Bloom filter solves the membership query with a small constant error of false positives using a data structure of <math>O(n)</math> bits which answers each query with <math>O(1)</math> time cost.

=Distinct Elements=
Consider the following problem of '''counting distinct elements''': Suppose that <math>U</math> is a sufficiently large universe.
*'''Input:''' a sequence of (not necessarily distinct) elements <math>x_1,x_2,\ldots,x_n\in U</math>;
*'''Output:''' an estimation of the total number of distinct elements <math>z=|\{x_1,x_2,\ldots,x_n\}|</math>.

A straightforward way of solving this problem is to maintain a dictionary data structure, which costs at least linear (<math>O(n)</math>) space. For ''big data'', where <math>n</math> is very large, this is still too expensive. However, due to an information-theoretical argument, linear space is necessary if you want to compute the ''exact'' value of <math>z</math>.

Our goal is to relax the problem a little bit to significantly reduce the space cost by tolerating ''approximate'' answers. The form of approximation we consider is '''<math>(\epsilon,\delta)</math>-estimator'''.
{{Theorem|<math>(\epsilon,\delta)</math>-estimator|
: A random variable <math>\widehat{Z}</math> is an '''<math>(\epsilon,\delta)</math>-estimator''' of a quantity <math>z</math> if
::<math>\Pr[\,(1-\epsilon)z\le \widehat{Z}\le (1+\epsilon)z\,]\ge 1-\delta</math>.
: <math>\widehat{Z}</math> is said to be an '''unbiased estimator''' of <math>z</math> if <math>\mathbb{E}[\widehat{Z}]=z</math>.
}}
Usually <math>\epsilon</math> is called '''approximation error''' and <math>\delta</math> is called '''confidence error'''.

We now present an elegant algorithm. The algorithm can be implemented in [https://en.wikipedia.org/wiki/Streaming_algorithm '''data stream model''']: The input elements <math>x_1,x_2,\ldots,x_n</math> is presented to the algorithm one at a time, where the size of data <math>n</math> is unknown to the algorithm. The algorithm maintains a value <math>\widehat{Z}</math> which is an <math>(\epsilon,\delta)</math>-estimator of the total number of distinct elements <math>z=|\{x_1,x_2,\ldots,x_n\}|</math>, using only a small amount of memory space to memorize (with loss) the data set <math>\{x_1,x_2,\ldots,x_n\}</math>.

A famous quotation of Flajolet describes the performance of this algorithm as:

"Using only memory equivalent to 5 lines of printed text, you can estimate with a typical accuracy of 5% and in a single pass the total vocabulary of Shakespeare."

== The <math>\min</math>-sketch ==
Suppose that we can access to an idealized random hash function <math>h:U\to[0,1]</math> which is uniformly distributed over all mappings from the universe <math>U</math> to unit interval <math>[0,1]</math>.

Recall that the input sequence <math>x_1,x_2,\ldots,x_n\in U</math> consists of <math>z=|\{x_1,x_2,\ldots,x_n\}|</math> distinct elements. These elements are mapped by the random function <math>h</math> to <math>z</math> hash values uniformly and independently distributed in <math>[0,1]</math>. We could maintain these hash values instead of the original elements, but this would still be too expensive because in the worst case we still have up to <math>n</math> distinct values to maintain. However, due to the idealized random hash function, the unit interval <math>[0,1]</math> will be partitioned into <math>z+1</math> subintervals by these <math>z</math> uniform and independent hash values. The typical length of the subinterval gives an estimation of the number <math>z</math>.

{{Theorem|Proposition|
:<math>\mathbb{E}\left[\min_{1\le i\le n}h(x_i)\right]=\frac{1}{z+1}</math>.
}}
{{Proof|
The input sequence <math>x_1,x_2,\ldots,x_n\in U</math> consisting of <math>z</math> distinct elements are mapped to <math>z</math> random hash values uniformly and independently distributed in <math>[0,1]</math>. These <math>z</math> hash values partition the unit interval <math>[0,1]</math> into <math>z+1</math> subintervals <math>[0,v_1],[v_1,v_2],[v_2,v_3]\ldots,[v_{z-1},v_z],[v_z,1]</math>, where <math>v_i</math> denotes the <math>i</math>-th smallest value among all hash values <math>\{h(x_1),h(x_2),\ldots,h(x_n)\}</math>. Clearly we have
:<math>v_1=\min_{1\le i\le n}h(x_i)</math>.
Meanwhile, since all hash values are uniformly and independently distributed in <math>[0,1]</math>, the lengths of all subintervals <math>v_1, v_2-v_1, v_3-v_2,\ldots, v_z-v_{z-1}, 1-v_z</math> are identically distributed. By symmetry, they have the same expectation, therefore
:<math>
(z+1)\mathbb{E}[v_1]=
\mathbb{E}[v_1]+\sum_{i=1}^{z-1}\mathbb{E}[v_{i+1}-v_i]+\mathbb{E}[1-v_z]
=\mathbb{E}\left[v_1+(v_2-v_1)+(v_3-v_2)+\cdots+(v_{z}-v_{z-1})+1-v_z\right]
=1,
</math>
which implies that
:<math>\mathbb{E}\left[\min_{1\le i\le n}h(x_i)\right]=\mathbb{E}[v_1]=\frac{1}{z+1}</math>.
}}

The quantity <math>\min_{1\le i\le n}h(x_i)</math> can be computed with small space cost (for storing the current smallest hash value) by scan the input sequence in a single pass. Because as we proved its expectation is <math>\frac{1}{z+1}</math>, the smallest hash value <math>Y=\min_{1\le i\le n}h(x_i)</math> gives an unbiased estimator for <math>\frac{1}{z+1}</math>. However, <math>\frac{1}{Y}-1</math> is not necessarily a good estimator for <math>z</math>. Actually, it is a rather poor estimator. Consider for example when <math>z=1</math>, all input elements are the same. In this case, there is only one hash value and <math>Y=\min_{1\le i\le n}h(x_i)</math> is distributed uniformly over <math>[0,1]</math>, thus <math>\frac{1}{Y}-1</math> fails to be close enough to the correct answer 1 with high probability.

==Apply the mean trick to the <math>\min</math>-sketch==
The reason that the above estimator of a single hash function performs poorly is that the unbiased estimator <math>\min_{1\le i\le n}h(x_i)</math> has large variance. So a natural way to reduce this variance is to have multiple independent hash functions and take the average. This generic approach for reducing the variance is called '''the mean trick'''.

Suppose that we can access to <math>k</math> independent random hash functions <math>h_1,h_2,\ldots,h_k</math>, where each <math>h_j: U\to[0,1]</math> is uniformly and independently distributed over all functions mapping <math>U</math> to <math>[0,1]</math>. Here <math>k</math> is a parameter to be fixed by the desired approximation error <math>\epsilon</math> and confidence error <math>\delta</math>. The ''<math>\min</math>-sketch algorithm'' (using the mean trick) is given by the following pseudocode.

{{Theorem|The <math>\min</math>-sketch|
:Suppose that <math>h_1,h_2,\ldots,h_k: U\to[0,1]</math> are <math>k</math> uniform and independent random hash functions, where <math>k</math> is a parameter to be fixed later.
-----
:Scan the input sequence <math>x_1,x_2,\ldots,x_n\in U</math> in a single pass to compute:
::* <math>Y_j=\min_{1\le i\le n}h_j(x_i)</math> for every <math>j=1,2,\ldots,k</math>;
::* average value <math>\overline{Y}=\frac{1}{k}\sum_{j=1}^kY_j</math>;
:return <math>\widehat{Z}=\frac{1}{\overline{Y}}-1</math> as the estimator.
}}

The algorithm is easy to implement in data stream model, with a space cost of storing <math>k</math> hash values. The following theorem guarantees that the algorithm returns an <math>(\epsilon,\delta)</math>-estimator of the total number of distinct elements for a suitable <math>k=O\left(\frac{1}{\epsilon^2\delta}\right)</math>.
{{Theorem|Theorem|
:For any <math>\epsilon,\delta<1/2</math>, if <math>k\ge\left\lceil\frac{4}{\epsilon^2\delta}\right\rceil</math> then the output <math>\widehat{Z}</math> always gives an <math>(\epsilon,\delta)</math>-estimator of the correct answer <math>z</math>.
}}

In the following we prove this main theorem for <math>\min</math>-sketch algorithm.

An obstacle to analyze the estimator <math>\widehat{Z}=\frac{1}{\overline{Y}}-1</math> is that it is a nonlinear function of <math>\overline{Y}</math> who is easier to analyze. Nevertheless, we observe that <math>\widehat{Z}</math> is an <math>(\epsilon,\delta)</math>-estimator of <math>z</math> as long as <math>\overline{Y}</math> is an <math>(\epsilon/2,\delta)</math>-estimator of <math>\frac{1}{z+1}</math>. This can be deduced by just verifying the following:
:<math>\frac{1-\epsilon/2}{z+1}\le \overline{Y}\le \frac{1+\epsilon/2}{z+1} \implies (1-\epsilon)z\le\frac{1}{\overline{Y}}-1\le (1+\epsilon)z</math>,
for <math>\epsilon<\frac{1}{2}</math>. Therefore,
:<math>\Pr\left[\,(1-\epsilon)z\le \widehat{Z} \le (1+\epsilon)z\,\right]\ge \Pr\left[\,\frac{1-\epsilon/2}{z+1}\le \overline{Y}\le \frac{1+\epsilon/2}{z+1}\,\right]
=\Pr\left[\,\left|\overline{Y}-\frac{1}{z+1}\right|\le \frac{\epsilon/2}{z+1}\,\right]</math>.
It is then sufficient to show that <math>\Pr\left[\,\left|\overline{Y}-\frac{1}{z+1}\right|\le \frac{\epsilon/2}{z+1}\,\right]\ge 1-\delta</math> for proving the main theorem above. We will see that this is equivalent to show the concentration inequality
:<math>\Pr\left[\,\left|\overline{Y}-\mathbb{E}\left[\overline{Y}\right]\right|\le \frac{\epsilon/2}{z+1}\,\right]\ge 1-\delta\quad\qquad({\color{red}*})</math>.

{{Theorem|Lemma|
:The followings hold for each <math>Y_j</math>, <math>j=1,2\ldots,k</math>, and <math>\overline{Y}=\frac{1}{k}\sum_{j=1}^kY_j</math>:
:*<math>\mathbb{E}\left[\overline{Y}\right]=\mathbb{E}\left[Y_j\right]=\frac{1}{z+1}</math>;
:*<math>\mathbf{Var}\left[Y_j\right]\le\frac{1}{(z+1)^2}</math>, and consequently <math>\mathbf{Var}\left[\overline{Y}\right]\le\frac{1}{k(z+1)^2}</math>.
}}
{{Proof|
As in the case of single hash function, by symmetry it holds that <math>\mathbb{E}[Y_j]=\frac{1}{z+1}</math> for every <math>j=1,2,\ldots,k</math>. Therefore,
:<math>\mathbb{E}\left[\overline{Y}\right]=\frac{1}{k}\sum_{j=1}^k\mathbb{E}[Y_j]=\frac{1}{z+1}</math>.
Recall that each <math>Y_j</math> is the minimum of <math>z</math> random hash values uniformly and independently distributed over <math>[0,1]</math>. By geometry probability, it holds that for any <math>y\in[0,1]</math>,
:<math>\Pr[Y_j>y]=(1-y)^z</math>,
which means <math>\Pr[Y_j\le y]=1-(1-y)^z</math>. Taking the derivative with respect to <math>y</math>, we obtain the probability density function of random variable <math>Y_j</math>, which is <math>z(1-y)^{z-1}</math>.

We then compute the second moment.
:<math>\mathbb{E}[Y_j^2]=\int^{1}_0y^2z(1-y)^{z-1}\,\mathrm{d}y=\frac{2}{(z+1)(z+2)}</math>.
The variance is bounded as
:<math>\mathbf{Var}\left[Y_j\right]=\mathbb{E}\left[Y_j^2\right]-\mathbb{E}\left[Y_j\right]^2=\frac{2}{(z+1)(z+2)}-\frac{1}{(z+1)^2}\le\frac{1}{(z+1)^2}</math>.
Due to the (pairwise) independence between <math>Y_j</math>'s,
::<math>\mathbf{Var}\left[\overline{Y}\right]=\mathbf{Var}\left[\frac{1}{k}\sum_{j=1}^kY_j\right]=\frac{1}{k^2}\sum_{j=1}^k\mathbf{Var}\left[Y_j\right]\le \frac{1}{k(z+1)^2}</math>.
}}

We resume to prove the inequality <math>({\color{red}*})</math>. By [[高级算法_(Fall 2023)/Basic_deviation_inequalities#Chebyshev.27s_inequality|Chebyshev's inequality]], it holds that
:<math>\Pr\left[\,\left|\overline{Y}-\mathbb{E}\left[\overline{Y}\right]\right|> \frac{\epsilon/2}{z+1}\,\right]
\le\frac{4}{\epsilon^2}(z+1)^2\mathbf{Var}\left[\overline{Y}\right]
\le\frac{4}{\epsilon^2k}</math>.
When <math>k\ge\left\lceil\frac{4}{\epsilon^2\delta}\right\rceil</math>, this probability is at most <math>\delta</math>. The inequality <math>({\color{red}*})</math> is proved. As we discussed above, this proves the above main theorem <math>\min</math>-sketch algorithm improved by the mean trick.

= Frequency Estimation=
Suppose that <math>U</math> is the data universe. The '''frequency estimation''' problem is defined as follows.
*'''Data:''' a sequence of (not necessarily distinct) elements <math>x_1,x_2,\ldots,x_n\in U</math>;
*'''Query:''' an element <math>x\in U</math>;
*'''Output:''' an estimation <math>\hat{f}_x</math> of the frequency <math>f_x\triangleq|\{i\mid x_i=x\}|</math> of <math>x</math> in input data.

We still want to give an algorithm in the data stream model: the algorithm scan the input sequence <math>x_1,x_2,\ldots,x_n</math> to construct a succinct data structure, such that upon each query of <math>x\in U</math>, the algorithm returns an estimation of the frequency <math>f_x</math>.

Clearly this problem can always be solved by storing all appeared distinct elements along with their frequencies. However, the space cost of this straightforward solution is rather high. Instead, we want to use a lossy representation (a ''sketch'') of input data which uses significantly less space but can still answer queries with tolarable accuracy.

Formally, upon each query of <math>x\in U</math>, the algorithm should return an answer <math>\hat{f}_x</math> satisfying:
:<math>\Pr\left[\,\left|\hat{f}_x-f_x\right|\le \epsilon n\,\right]\ge 1-\delta</math>.
Note that this notion of approximation is with bounded ''additive'' error which is weaker than the notion of <math>(\epsilon,\delta)</math>-estimator, whose error bound is ''multiplicative''.

With such weak accuracy guarantee, its is possible to give a succinct data structure whose size is determined only by the error bounds <math>\epsilon</math> and <math>\delta</math> but independent of <math>n</math>, because only the frequencies of those '''heavy hitters''' (elements <math>x</math> with high frequencies <math>f_x>\epsilon n</math>) need to be memorized, and there are at most <math>1/\epsilon</math> many such heavy hitters.

== Count-min sketch==
The [https://en.wikipedia.org/wiki/Count–min_sketch count-min sketch] given by Cormode and Muthukrishnan is an elegant data structure for frequency estimation.

The data structure is a two-dimensional <math>k\times m</math> integer array, where <math>k</math> and <math>m</math> are two parameters to be determined by the error bounds <math>\epsilon</math> and <math>\delta</math>. We still adopt the Uniform Hash Assumption to assume that we have access to <math>k</math> mutually independent uniform random hash functions <math>h_1,h_2,\ldots,h_k: U\to[m]</math>.
{{Theorem|''Count-min sketch'' (Cormode and Muthukrishnan 2003)|
:Suppose <math>h_1,h_2,\ldots,h_k: U\to[m]</math> are uniform and independent random hash functions.
-----
:'''Data structure construction:''' Given a sequence <math>x_1,x_2,\ldots,x_n\in U</math>, the data structure is a two-dimensional <math>k\times m</math> integer array <math>CMS[k][m]</math> constructed as
:*initialize all entries of <math>CMS[k][m]</math> to 0;
:*for <math>i=1,2,\ldots,n</math>, upon receiving <math>x_i</math>:
::: for every <math>1\le j\le k</math>, evaluate <math>h_j(x_i)</math> and <math>CMS[j][h_j(x_i)]++</math>.
----
:'''Query resolution:''' Upon each query of an arbitrary <math>x\in U</math>,
:* return <math>\hat{f}_x=\min_{1\le j\le k}CMS[j][h_j(x)]</math>.
}}

It is easy to see that the space cost of count-min sketch is <math>O(km)</math> memory words, or <math>O(km\log n)</math> bits. Each query is answered within time cost <math>O(k)</math>, assuming that an evaluation of hash function can be done in unit or constant time. We then analyze the error bounds.

First, it is easy to observe that for any query <math>x\in U</math> and every hash function <math>1\le j\le k</math>, it always holds for the corresponding entry in the count-min sketch
:<math>CMS[j][h_j(x)]\ge f_x</math>,
because the appearances of element <math>x</math> in the input sequence contribute at least <math>f_x</math> to the value of <math>CMS[j][h_j(x)]</math>.

Therefore, for any query <math>x\in U</math> it always holds for the answer <math>\hat{f}_x=\min_{1\le j\le k}CMS[j][h_j(x)]\ge f_x</math>, which means
:<math>\Pr\left[\,\left|\hat{f}_x- f_x\right|\ge\epsilon n\,\right]=\Pr\left[\,\hat{f}_x- f_x\ge\epsilon n\,\right]=\prod_{j=1}^k\Pr[\,CMS[j][h_j(x)]-f_x\ge\epsilon n\,],\quad\qquad({\color{red}\diamondsuit})</math>
where the second equation is due to the mutual independence of random hash functions <math>h_1,h_2,\ldots,h_k</math>.

It remains to upper bound the probability <math>\Pr[\,CMS[j][h_j(x)]-f_x\ge\epsilon n\,]</math>, which can be done by calculating the expectation of <math>CMS[j][h_j(x)]</math>.
{{Theorem|Proposition|
:For any <math>x\in U</math> and every <math>1\le j\le k</math>, it holds that <math>\mathbb{E}\left[CMS[j][h_j(x)]\right]\le f_x+\frac{n}{m}</math>.
}}
{{Proof|
The value of <math>CMS[j][h_j(x)]</math> is constituted by the frequency <math>f_x</math> of <math>x</math> and the frequencies <math>f_y</math> of all other elements <math>y\neq x</math> among <math>x_1,x_2,\ldots,x_n</math>, thus
:<math>
\begin{align}
CMS[j][h_j(x)]
&=f_x+\sum_{\scriptstyle y\in\{x_1,\ldots,x_n\}\setminus\{x\}\atop\scriptstyle h_j(y)=h_j(x)} f_y\\
&=f_x+\sum_{y\in\{x_1,\ldots,x_n\}\setminus\{x\}} f_y \cdot I[h_j(y)=h_j(x)]
\end{align}
</math>
where <math>I[h_j(y)=h_j(x)]</math> denotes the Boolean random variable that indicates the occurrence of event <math>h_j(y)=h_j(x)</math>.

By linearity of expectation,
:<math>\mathbb{E}[CMS[j][h_j(x)]]=f_x+\sum_{y\in\{x_1,x_2,\ldots,x_n\}\setminus\{x\}} f_y \cdot \Pr[h_j(y)=h_j(x)]</math>.
Due to Uniform Hash Assumption (UHA), <math>h_j: U\to[m]</math> is a uniform random function. For any <math>y\neq x</math>, the probability of hash collision is
:<math>\Pr[h_j(y)=h_j(x)]=\frac{1}{m}</math>.
Therefore,
:<math>
\begin{align}
\mathbb{E}[CMS[j][h_j(x)]]
&=f_x+\frac{1}{m}\sum_{y\in\{x_1,\ldots,x_n\}\setminus\{x\}} f_y \\
&\le f_x+\frac{1}{m}\sum_{y\in\{x_1,\ldots,x_n\}} f_y\\
&=f_x+\frac{n}{m},
\end{align}
</math>
where the last equation is due to the obvious identity <math>\sum_{y\in\{x_1,\ldots,x_n\}}f_y=n</math>.
}}
The above proposition shows that for any <math>x\in U</math> and every <math>1\le j\le k</math>
:<math>\mathbb{E}\left[CMS[j][h_j(x)]-f_x\right]\le \frac{n}{m}</math>.
Recall that <math>CMS[j][h_j(x)]\ge f_x</math> always holds, thus <math>CMS[j][h_j(x)]-f_x</math> is a positive random variable. By Markov's inequality, we have
:<math>\Pr[\,CMS[j][h_j(x)]-f_x\ge\epsilon n\,]\le \frac{1}{\epsilon m}</math>.

Combining with above equation <math>({\color{red}\diamondsuit})</math>, we have
:<math>\Pr\left[\,\left|\hat{f}_x- f_x\right|\ge\epsilon n\,\right]=(\Pr[\,CMS[j][h_j(x)]-f_x\ge\epsilon n\,])^k\le \frac{1}{(\epsilon m)^k}</math>.
By setting <math>m=\left\lceil\frac{\mathrm{e}}{\epsilon}\right\rceil</math> and <math>k=\left\lceil\ln\frac{1}{\delta}\right\rceil</math>, the above error probability is bounded as <math>\frac{1}{(\epsilon m)^k}\le\delta</math>.

For any positive <math>\epsilon</math> and <math>\delta</math>, the count-min sketch gives a data structure of size <math>O(km)=O\left(\frac{1}{\epsilon}\log\frac{1}{\delta}\right)</math> (in memory words) and answering each query <math>x\in U</math> in time <math>O(k)=O\left(\frac{1}{\epsilon}\right)</math> with the following accuracy guarantee:
:<math>\Pr\left[\,\left|\hat{f}_x- f_x\right|\le\epsilon n\,\right]\ge 1-\delta</math>.

高级算法 (Fall 2024)

2024-10-02T02:57:07Z

Kvrmnks: /* Assignments */

{{Infobox
|name = Infobox
|bodystyle =
|title = 高级算法
 Advanced Algorithms
|titlestyle =

|image =
|imagestyle =
|caption =
|captionstyle =
|headerstyle = background:#ccf;
|labelstyle = background:#ddf;
|datastyle =

|header1 =Instructor
|label1 =
|data1 =
|header2 =
|label2 =
|data2 = '''尹一通'''
|header3 =
|label3 = Email
|data3 = yinyt@nju.edu.cn
|header4 =
|label4= office
|data4= 计算机系 804
|header5 =
|label5 =
|data5 = '''栗师'''
|header6 =
|label6 = Email
|data6 = shili@nju.edu.cn
|header7 =
|label7= office
|data7= 计算机系 605
|header8 =
|label8 =
|data8 = '''刘景铖'''
|header9 =
|label9 = Email
|data9 = liu@nju.edu.cn
|header10 =
|label10= office
|data10= 计算机系 516
|header11 = Class
|label11 =
|data11 =
|header12 =
|label12 = Class meetings
|data12 = Monday (单), 4pm-6pm Thursday, 2pm-4pm 仙Ⅰ-206
|header13 =
|label13 = Place
|data13 =
|header14 =
|label14 = Office hours
|data14 = Monday, 2pm-4pm, 计算机系 804 
|header15 = Textbooks
|label15 =
|data15 =
|header16 =
|label16 =
|data16 = [[File:MR-randomized-algorithms.png|border|100px]]
|header17 =
|label17 =
|data17 = Motwani and Raghavan. ''Randomized Algorithms''. Cambridge Univ Press, 1995.
|header18 =
|label18 =
|data18 = [[File:Approximation_Algorithms.jpg|border|100px]]
|header19 =
|label19 =
|data19 = Vazirani. ''Approximation Algorithms''. Springer-Verlag, 2001.
|belowstyle = background:#ddf;
|below =
}}

This is the webpage for the ''Advanced Algorithms'' class of fall 2024. Students who take this class should check this page periodically for content updates and new announcements.

= Announcement =

* TBA

= Course info =
* '''Instructor ''':
:* [http://tcs.nju.edu.cn/yinyt/ 尹一通]：[mailto:yinyt@nju.edu.cn <yinyt@nju.edu.cn>]，计算机系 804
:*[https://tcs.nju.edu.cn/shili/ 栗师]：[mailto:shili@nju.edu.cn <shili@nju.edu.cn>]，计算机系 605
:* [https://liuexp.github.io 刘景铖]：[mailto:liu@nju.edu.cn <liu@nju.edu.cn>]，计算机系 516
* '''Teaching Assistant''':
** 于逸潇：yixiaoyu@smail.nju.edu.cn
** 张弈垚：zhangyiyao@smail.nju.edu.cn
* '''Class meeting''':
** Monday (单), 4pm-6pm, 仙Ⅰ-206
** Thursday, 2pm-4pm, 仙Ⅰ-206
* '''Office hour''': Monday, 2pm-4pm, 计算机系 804
* '''QQ群''': 757436140

= Syllabus =
随着计算机算法理论的不断发展，现代计算机算法的设计与分析大量地使用非初等的数学工具以及非传统的算法思想。“高级算法”这门课程就是面向计算机算法的这一发展趋势而设立的。课程将针对传统算法课程未系统涉及、却在计算机科学各领域的科研和实践中扮演重要角色的高等算法设计思想和算法分析工具进行系统讲授。

=== 先修课程 Prerequisites ===
* 必须：离散数学，概率论，线性代数。
* 推荐：算法设计与分析。

=== Course materials ===
* [[高级算法 (Fall 2024) / Course materials|教材和参考书]]

=== 成绩 Grades ===
* 课程成绩：本课程将会有若干次作业和一次期末考试。最终成绩将由平时作业成绩和期末考试成绩综合得出。
* 迟交：如果有特殊的理由，无法按时完成作业，请提前联系授课老师，给出正当理由。否则迟交的作业将不被接受。

=== 学术诚信 Academic Integrity ===
学术诚信是所有从事学术活动的学生和学者最基本的职业道德底线，本课程将不遗余力的维护学术诚信规范，违反这一底线的行为将不会被容忍。

作业完成的原则：署你名字的工作必须是你个人的贡献。在完成作业的过程中，允许讨论，前提是讨论的所有参与者均处于同等完成度。但关键想法的执行、以及作业文本的写作必须独立完成，并在作业中致谢（acknowledge）所有参与讨论的人。不允许其他任何形式的合作——尤其是与已经完成作业的同学“讨论”。

本课程将对剽窃行为采取零容忍的态度。在完成作业过程中，对他人工作（出版物、互联网资料、其他人的作业等）直接的文本抄袭和对关键思想、关键元素的抄袭，按照 [http://www.acm.org/publications/policies/plagiarism_policy ACM Policy on Plagiarism]的解释，都将视为剽窃。剽窃者成绩将被取消。如果发现互相抄袭行为， 抄袭和被抄袭双方的成绩都将被取消。因此请主动防止自己的作业被他人抄袭。

学术诚信影响学生个人的品行，也关乎整个教育系统的正常运转。为了一点分数而做出学术不端的行为，不仅使自己沦为一个欺骗者，也使他人的诚实努力失去意义。让我们一起努力维护一个诚信的环境。

= Assignments =
Late policy: In general, we will accomodate late submission requests ONLY IF you made such requests ahead of time.

*[[高级算法 (Fall 2024)/Problem Set 1|Problem Set 1]] 请在 2024/10/14 上课之前(16:00 UTC+8)提交到 [mailto:njuadvalg24@163.com njuadvalg24@163.com] (文件名为'学号_姓名_A1.pdf').

= Lecture Notes =
# [[高级算法 (Fall 2024)/Min Cut, Max Cut, and Spectral Cut|Min Cut, Max Cut, and Spectral Cut]] ([http://tcs.nju.edu.cn/slides/aa2024/Cut.pdf slides])
#* [[高级算法 (Fall 2024)/Probability Basics|Probability basics]]
# [[高级算法 (Fall 2024)/Fingerprinting| Fingerprinting]] ([http://tcs.nju.edu.cn/slides/aa2024/Fingerprinting.pdf slides])
#* [[高级算法 (Fall 2024)/Finite Field Basics|Finite field basics]]
# [[高级算法 (Fall 2024)/Hashing and Sketching|Hashing and Sketching]] ([http://tcs.nju.edu.cn/slides/aa2024/Hashing.pdf slides])
#* [[高级算法 (Fall 2024)/Limited independence|Limited independence]]
#* [[高级算法 (Fall 2024)/Basic deviation inequalities|Basic deviation inequalities]]
# [[高级算法 (Fall 2024)/Concentration of measure|Concentration of measure]] ([http://tcs.nju.edu.cn/slides/aa2024/Concentration.pdf slides])
#* [[高级算法 (Fall 2024)/Conditional expectations|Conditional expectations]]
# [[高级算法 (Fall 2024)/Dimension Reduction|Dimension Reduction]] ([http://tcs.nju.edu.cn/slides/aa2023/NNS.pdf slides])
#* [https://www.cs.princeton.edu/~hy2/teaching/fall22-cos521/notes/JL.pdf Professor Huacheng Yu's note on Johnson-Lindenstrauss Theorem]
#* [http://people.csail.mit.edu/gregory/annbook/introduction.pdf An introduction of LSH]

= Related Online Courses=
* [https://www.cs.cmu.edu/~15850/ Advanced Algorithms] by Anupam Gupta at CMU.
* [http://people.csail.mit.edu/moitra/854.html Advanced Algorithms] by Ankur Moitra at MIT.
* [http://courses.csail.mit.edu/6.854/current/ Advanced Algorithms] by David Karger and Aleksander Mądry at MIT.
* [http://web.stanford.edu/class/cs168/index.html The Modern Algorithmic Toolbox] by Tim Roughgarden and Gregory Valiant at Stanford.
* [https://www.cs.princeton.edu/courses/archive/fall18/cos521/ Advanced Algorithm Design] by Pravesh Kothari and Christopher Musco at Princeton.
* [http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/ Linear and Semidefinite Programming (Advanced Algorithms)] by Anupam Gupta and Ryan O'Donnell at CMU.
* [https://www.cs.cmu.edu/~odonnell/papers/cs-theory-toolkit-lecture-notes.pdf CS Theory Toolkit] by Ryan O'Donnell at CMU.
* [https://cs.uwaterloo.ca/~lapchi/cs860/index.html Eigenvalues and Polynomials] by Lap Chi Lau at University of Waterloo.
* The [https://www.cs.cornell.edu/jeh/book.pdf "Foundations of Data Science" book] by Avrim Blum, John Hopcroft, and Ravindran Kannan.

高级算法 (Fall 2024)

2024-09-29T13:28:31Z

Kvrmnks: /* Assignments */

{{Infobox
|name = Infobox
|bodystyle =
|title = 高级算法
 Advanced Algorithms
|titlestyle =

|image =
|imagestyle =
|caption =
|captionstyle =
|headerstyle = background:#ccf;
|labelstyle = background:#ddf;
|datastyle =

|header1 =Instructor
|label1 =
|data1 =
|header2 =
|label2 =
|data2 = '''尹一通'''
|header3 =
|label3 = Email
|data3 = yinyt@nju.edu.cn
|header4 =
|label4= office
|data4= 计算机系 804
|header5 =
|label5 =
|data5 = '''栗师'''
|header6 =
|label6 = Email
|data6 = shili@nju.edu.cn
|header7 =
|label7= office
|data7= 计算机系 605
|header8 =
|label8 =
|data8 = '''刘景铖'''
|header9 =
|label9 = Email
|data9 = liu@nju.edu.cn
|header10 =
|label10= office
|data10= 计算机系 516
|header11 = Class
|label11 =
|data11 =
|header12 =
|label12 = Class meetings
|data12 = Monday (单), 4pm-6pm Thursday, 2pm-4pm 仙Ⅰ-206
|header13 =
|label13 = Place
|data13 =
|header14 =
|label14 = Office hours
|data14 = Monday, 2pm-4pm, 计算机系 804 
|header15 = Textbooks
|label15 =
|data15 =
|header16 =
|label16 =
|data16 = [[File:MR-randomized-algorithms.png|border|100px]]
|header17 =
|label17 =
|data17 = Motwani and Raghavan. ''Randomized Algorithms''. Cambridge Univ Press, 1995.
|header18 =
|label18 =
|data18 = [[File:Approximation_Algorithms.jpg|border|100px]]
|header19 =
|label19 =
|data19 = Vazirani. ''Approximation Algorithms''. Springer-Verlag, 2001.
|belowstyle = background:#ddf;
|below =
}}

This is the webpage for the ''Advanced Algorithms'' class of fall 2024. Students who take this class should check this page periodically for content updates and new announcements.

= Announcement =

* TBA

= Course info =
* '''Instructor ''':
:* [http://tcs.nju.edu.cn/yinyt/ 尹一通]：[mailto:yinyt@nju.edu.cn <yinyt@nju.edu.cn>]，计算机系 804
:*[https://tcs.nju.edu.cn/shili/ 栗师]：[mailto:shili@nju.edu.cn <shili@nju.edu.cn>]，计算机系 605
:* [https://liuexp.github.io 刘景铖]：[mailto:liu@nju.edu.cn <liu@nju.edu.cn>]，计算机系 516
* '''Teaching Assistant''':
** 于逸潇：yixiaoyu@smail.nju.edu.cn
** 张弈垚：zhangyiyao@smail.nju.edu.cn
* '''Class meeting''':
** Monday (单), 4pm-6pm, 仙Ⅰ-206
** Thursday, 2pm-4pm, 仙Ⅰ-206
* '''Office hour''': Monday, 2pm-4pm, 计算机系 804
* '''QQ群''': 757436140

= Syllabus =
随着计算机算法理论的不断发展，现代计算机算法的设计与分析大量地使用非初等的数学工具以及非传统的算法思想。“高级算法”这门课程就是面向计算机算法的这一发展趋势而设立的。课程将针对传统算法课程未系统涉及、却在计算机科学各领域的科研和实践中扮演重要角色的高等算法设计思想和算法分析工具进行系统讲授。

=== 先修课程 Prerequisites ===
* 必须：离散数学，概率论，线性代数。
* 推荐：算法设计与分析。

=== Course materials ===
* [[高级算法 (Fall 2024) / Course materials|教材和参考书]]

=== 成绩 Grades ===
* 课程成绩：本课程将会有若干次作业和一次期末考试。最终成绩将由平时作业成绩和期末考试成绩综合得出。
* 迟交：如果有特殊的理由，无法按时完成作业，请提前联系授课老师，给出正当理由。否则迟交的作业将不被接受。

=== 学术诚信 Academic Integrity ===
学术诚信是所有从事学术活动的学生和学者最基本的职业道德底线，本课程将不遗余力的维护学术诚信规范，违反这一底线的行为将不会被容忍。

作业完成的原则：署你名字的工作必须是你个人的贡献。在完成作业的过程中，允许讨论，前提是讨论的所有参与者均处于同等完成度。但关键想法的执行、以及作业文本的写作必须独立完成，并在作业中致谢（acknowledge）所有参与讨论的人。不允许其他任何形式的合作——尤其是与已经完成作业的同学“讨论”。

本课程将对剽窃行为采取零容忍的态度。在完成作业过程中，对他人工作（出版物、互联网资料、其他人的作业等）直接的文本抄袭和对关键思想、关键元素的抄袭，按照 [http://www.acm.org/publications/policies/plagiarism_policy ACM Policy on Plagiarism]的解释，都将视为剽窃。剽窃者成绩将被取消。如果发现互相抄袭行为， 抄袭和被抄袭双方的成绩都将被取消。因此请主动防止自己的作业被他人抄袭。

学术诚信影响学生个人的品行，也关乎整个教育系统的正常运转。为了一点分数而做出学术不端的行为，不仅使自己沦为一个欺骗者，也使他人的诚实努力失去意义。让我们一起努力维护一个诚信的环境。

= Assignments =
*[[高级算法 (Fall 2024)/Problem Set 1|Problem Set 1]] 请在 2024/10/14 上课之前(16:00 UTC+8)提交到 [mailto:njuadvalg24@163.com njuadvalg24@163.com] (文件名为'学号_姓名_A1.pdf').

= Lecture Notes =
# [[高级算法 (Fall 2024)/Min Cut, Max Cut, and Spectral Cut|Min Cut, Max Cut, and Spectral Cut]] ([http://tcs.nju.edu.cn/slides/aa2024/Cut.pdf slides])
#* [[高级算法 (Fall 2024)/Probability Basics|Probability basics]]
# [[高级算法 (Fall 2024)/Fingerprinting| Fingerprinting]] ([http://tcs.nju.edu.cn/slides/aa2024/Fingerprinting.pdf slides])
#* [[高级算法 (Fall 2024)/Finite Field Basics|Finite field basics]]
# [[高级算法 (Fall 2024)/Hashing and Sketching|Hashing and Sketching]] ([http://tcs.nju.edu.cn/slides/aa2024/Hashing.pdf slides])
#* [[高级算法 (Fall 2024)/Limited independence|Limited independence]]
#* [[高级算法 (Fall 2024)/Basic deviation inequalities|Basic deviation inequalities]]
# [[高级算法 (Fall 2024)/Concentration of measure|Concentration of measure]] ([http://tcs.nju.edu.cn/slides/aa2024/Concentration.pdf slides])
#* [[高级算法 (Fall 2024)/Conditional expectations|Conditional expectations]]
# [[高级算法 (Fall 2024)/Dimension Reduction|Dimension Reduction]] ([http://tcs.nju.edu.cn/slides/aa2023/NNS.pdf slides])
#* [https://www.cs.princeton.edu/~hy2/teaching/fall22-cos521/notes/JL.pdf Professor Huacheng Yu's note on Johnson-Lindenstrauss Theorem]
#* [http://people.csail.mit.edu/gregory/annbook/introduction.pdf An introduction of LSH]

= Related Online Courses=
* [https://www.cs.cmu.edu/~15850/ Advanced Algorithms] by Anupam Gupta at CMU.
* [http://people.csail.mit.edu/moitra/854.html Advanced Algorithms] by Ankur Moitra at MIT.
* [http://courses.csail.mit.edu/6.854/current/ Advanced Algorithms] by David Karger and Aleksander Mądry at MIT.
* [http://web.stanford.edu/class/cs168/index.html The Modern Algorithmic Toolbox] by Tim Roughgarden and Gregory Valiant at Stanford.
* [https://www.cs.princeton.edu/courses/archive/fall18/cos521/ Advanced Algorithm Design] by Pravesh Kothari and Christopher Musco at Princeton.
* [http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/ Linear and Semidefinite Programming (Advanced Algorithms)] by Anupam Gupta and Ryan O'Donnell at CMU.
* [https://www.cs.cmu.edu/~odonnell/papers/cs-theory-toolkit-lecture-notes.pdf CS Theory Toolkit] by Ryan O'Donnell at CMU.
* [https://cs.uwaterloo.ca/~lapchi/cs860/index.html Eigenvalues and Polynomials] by Lap Chi Lau at University of Waterloo.
* The [https://www.cs.cornell.edu/jeh/book.pdf "Foundations of Data Science" book] by Avrim Blum, John Hopcroft, and Ravindran Kannan.

高级算法 (Fall 2024)/Problem Set 1

2024-09-28T12:39:25Z

Kvrmnks: /* Problem 3 (Hashing and Sketching) */

*每道题目的解答都要有完整的解题过程，中英文不限。

*我们推荐大家使用LaTeX, markdown等对作业进行排版。

== Problem 1 (Min-cut/Max-cut) ==
For any '''<math>\alpha \ge 1</math>''', a cut is called an '''<math>\alpha</math>'''-approximate min-cut in a multigraph '''<math>G</math>''' if the number of edges in it is at most '''<math>\alpha</math>''' times that of the min-cut. Prove that the number of '''<math>\alpha</math>'''-approximate min-cuts in a multigraph '''<math>G</math>''' is at most '''<math>n^{2\alpha} / 2</math>'''. ('''''Hint''''': Run Karger's algorithm until it has '''<math>\lceil 2\alpha \rceil</math>''' supernodes. What is the chance that a particular '''<math>\alpha</math>'''-approximate min-cut is still available? How many possible cuts does this collapsed graph have?)

== Problem 2 (Fingerprinting) ==
Two rooted trees <math>T_1</math> and <math>T_2</math> are said to be isomorphic if there exists a one to one mapping <math>f</math> from the nodes of <math>T_1</math> to those of <math>T_2</math> satisfying the following condition: <math>v</math> is a child of <math>w</math> in <math>T_1</math> if and only if <math>f(v)</math> is a child of <math>f(w)</math> in <math>T_2</math>. Observe that no ordering is assumed on the children of any vertex. Devise an efficient randomized algorithm for testing the isomorphism of rooted trees and analyze its performance. '''''Hint:''''' Recursively associate a polynomial <math>P_v</math> with each vertex <math>v</math> in a tree <math>T</math>.

== Problem 3 (Hashing and Sketching) ==
In class, we saw how to estimate the number of distinct elements in a data stream using the Flajolet-Martin algorithm. Consider the following alternative formulation of the distinct elements problem: given an <math>N</math> dimensional vector <math>x</math>, we want to process a stream of arbitrary increments to entries in <math>x</math>. In other words, if we see a number <math>i\in 1,\dots,N</math> in the stream, we update entry <math>x_i\gets x_i + 1</math>. Our goal is to estimate <math>\left \|x\right \|_0</math>, which measures the number of non-zero entries in <math>x</math>. With <math>x</math> viewed as a histogram that maintains counts for <math>N</math> potential elements, <math>\left \|x\right \|_0</math> is exactly the number of distinct elements processed. In this problem we will develop an alternative algorithm for estimating <math>\left \|x\right \|_0</math> that can also handle '''decrements''' to entries in <math>x</math>. Specifically, instead of the stream containing just indices <math>i</math>, it contains pairs <math>(i, +)</math> and <math>(i, −)</math>. On receiving <math>(i, +)</math>, <math>x</math> should update so that <math>x_i\gets x_i + 1</math> and on receiving <math>(i, −)</math>, <math>x</math> should update so that <math>x_i\gets x_i - 1</math>. For this problem we will assume that, at the end of our stream, each <math>x_i \ge 0</math> (i.e. for a specific index we can’t receive more decrements than increments).
# Consider a simpler problem. For a given value <math>T</math>, let’s design an algorithm that succeeds with probability <math>(1 − \delta)</math>, outputing '''LOW''' if <math>T < \frac{1}{2}\left \|x\right \|_0</math> and '''HIGH''' if <math>T > 2\left \|x\right \|_0</math>:
#* Assume we have access to a completely random hash function <math>h(\cdot)</math> that maps each <math>i</math> to a random point in <math>[0, 1]</math>. We maintain the estimator <math>s=\sum_{i:h(i)<\frac{1}{2T}}x_i</math> as we receive increment and decrement updates. Show that, at the end of our stream, (i) If <math>T < \frac{1}{2}\left \|x\right \|_0</math>, <math>\Pr_h[s=0]<1/e\approx 0.37</math> and (ii) If <math>T > 2\left \|x\right \|_0</math>, <math>\Pr_h[s=0]>0.5</math>.
#* Using this fact, show how to use <math>k=O(\log 1/\delta)</math> independent random hash functions, and corresponding individual estimators <math>s_1, s_2, . . . , s_k</math>, to output '''LOW''' if <math>T < \frac{1}{2}\left \|x\right \|_0</math> and '''HIGH''' if <math>T > 2\left \|x\right \|_0</math>. If neither event occurs you can output either '''LOW''' or '''HIGH'''. Your algorithm should succeed with probability <math>(1 − \delta)</math>.
# Using <math>O(\log N)</math> repetitions of your algorithm for the above decision problem (with <math>\delta</math> set appropriately), show how to obtain an estimate <math>F</math> for <math>\left \|x\right \|_0</math> such that <math>\frac{1}{4}\left \|x\right \|_0\le F\le 4\left \|x\right \|_0</math> w.h.p.(with probability <math>1-O(1/N)</math>).

== Problem 4 (Concentration of measure) ==
Consider the [[wikipedia:Erdős–Rényi_model#Definition|Erdős–Rényi random graph]] <math>G(n, p)</math> where every two vertices in the graph are connected randomly and independently with probability <math>p</math>. We denote <math>G \sim G(n, p)</math> if <math>G</math> is generated in this way. Recall that <math>\chi(G)</math> is the chromatic number of the graph <math>G</math>.

'''(a.)''' For <math>0 < p_1 < p_2 < 1</math>, let <math>G_1 \sim G(n, p_1)</math> and let <math>G_2 \sim G(n, p_2)</math>. Compare <math>\mathbf{E}[\chi(G_1)]</math> and <math>\mathbf{E}[\chi(G_2)]</math> and prove it.

'''(b.)''' For <math>G \sim G(n, n^{-\alpha})</math> with <math>\alpha > 5/6</math> and constant <math>C > 0</math>, prove that every subgraph of <math>G</math> on <math>C\sqrt{n \log n}</math> vertices is <math>3</math>-colorable with probability <math>1 - o(1)</math> when <math>n</math> is large enough. ('''''Hint''''': <math>\binom{n}{k} \leq (en/k)^k</math>.)

'''(c.)''' For <math>G \sim G(n, n^{-\alpha})</math> with <math>\alpha > 5/6</math>, show that <math>\chi(G)</math> is concentrated on four values with probability <math>1 - o(1)</math> when <math>n</math> is large enough. To be more exact, show that there exists an integer <math>u</math> such that <math>u \leq \chi(G) \leq u+3</math> with probability <math>1 - o(1)</math> when <math>n</math> is large enough.

== Problem 5 (Dimension reduction) ==
['''Inner product'''] Fix parameters <math>d>0, \delta,\epsilon\in(0,1)</math>. Let <math>A\in \mathbb{R}^{k\times d}</math> be a random matrix with <math>k = O(\log(1/\delta)/\epsilon^2)</math> rows, and entries of <math>A</math> are chosen i.i.d. from Gaussian distribution with mean <math>0</math> and variance <math>1/k</math>. Prove that for any <math>x,y\in \mathbb{R}^d</math>: <math>|x^\top y - (Ax)^\top(Ay)|\leq \epsilon(\|x\|_2^2 + \|y\|_2^2)</math> with probability <math>\geq 1-\delta</math>.

['''Linear separability'''] In machine learning, the goal of many classification methods is to seperate data into classes using a hyperplane. A hyperplane in <math>\mathbb{R}^d</math> is characterized by a unit vector <math>a\in \mathbb{R}^d (\|a\|_2 = 1)</math> and <math>c\in \mathbb{R}</math>. It contains all <math>z\in \mathbb{R}^d</math> such that <math>a^\top z = c</math>. Suppose our dataset consists of <math>n</math> '''unit''' vectors in <math>\mathbb{R}^d</math>. These points can be separated into two linearly separable sets <math>X,Y</math> where <math>|X|+|Y| = n</math>. That is, for all <math>x\in X</math>, <math>a^\top x>c</math> and for all <math>y\in Y</math>, <math>a^\top y<c</math> (or vice versa). Furthermore, suppose that the <math>\ell_2</math> distance of each point in <math>X</math> and <math>Y</math> to this separating hyperplane is at least <math>\epsilon</math>. When this is the case, the hyperplane is said to have margin <math>\epsilon</math>.
# Show that <math>X,Y</math> can be separated by the hyperplane characterized by <math>a\in \mathbb{R}^d (\|a\|_2 = 1)</math> and <math>c\in \mathbb{R}</math> with margin <math>\epsilon</math> is equivalent to the following condition: for all <math>x\in X</math>, <math>a^\top x \geq c+\epsilon</math> and for all <math>y\in Y</math>, <math>a^\top y \leq c-\epsilon</math> (or vice versa).
# Show that if we use a Johnson-Lindenstrauss map <math>A\in \mathbb{R}^{k\times d}</math> (the scaled Gaussian matrix given in the lecture) to reduce our data points to <math>O(\log n/\epsilon^2)</math> dimensions, then with probability at least <math>9/10</math>, the dimension reduced data can still be separated by a hyperplane with margin <math>\epsilon/4</math>. ('''''Hint''''': use the fact that JLT preserves inner product.)

高级算法 (Fall 2024)/Problem Set 1

2024-09-28T03:20:37Z

Kvrmnks: /* Problem 3 (Hashing) */

高级算法 (Fall 2024)/Problem Set 1

2024-09-28T03:15:20Z

Kvrmnks: /* Problem 3 (Hashing) */

高级算法 (Fall 2024)/Problem Set 1

2024-09-28T03:07:03Z

Kvrmnks: /* Problem 2 (Fingerprinting) */

高级算法 (Fall 2024)/Problem Set 1

2024-09-28T03:01:18Z

Kvrmnks: /* Problem 1 (Min-cut/Max-cut) */

高级算法 (Fall 2024)/Problem Set 1

2024-09-28T03:00:12Z

Kvrmnks: /* Problem 2 (Fingerprinting) */

*每道题目的解答都要有完整的解题过程，中英文不限。

*我们推荐大家使用LaTeX, markdown等对作业进行排版。

== Problem 1 (Min-cut/Max-cut) ==
['''Counting <math>\alpha</math>-approximate min-cut'''] For any '''<math>\alpha \ge 1</math>''', a cut is called an '''<math>\alpha</math>'''-approximate min-cut in a multigraph '''<math>G</math>''' if the number of edges in it is at most '''<math>\alpha</math>''' times that of the min-cut. Prove that the number of '''<math>\alpha</math>'''-approximate min-cuts in a multigraph '''<math>G</math>''' is at most '''<math>n^{2\alpha} / 2</math>'''. ('''''Hint''''': Run Karger's algorithm until it has '''<math>\lceil 2\alpha \rceil</math>''' supernodes. What is the chance that a particular '''<math>\alpha</math>'''-approximate min-cut is still available? How many possible cuts does this collapsed graph have?)

== Problem 2 (Fingerprinting) ==
Design a randomized algorithm to decide if an integer sequence <math>a_1,...,a_n</math> is a permutation of another integer sequence <math>b_1,...,b_n</math>. Give upper bounds on the time complexity and the error probability.

== Problem 3 (Hashing) ==

== Problem 4 (Concentration of measure) ==

== Problem 5 (Dimension reduction) ==

高级算法 (Fall 2024)/Problem Set 1

2024-09-28T02:08:08Z

Kvrmnks: /* Problem 1 (Min-cut/Max-cut) */

*每道题目的解答都要有完整的解题过程，中英文不限。

*我们推荐大家使用LaTeX, markdown等对作业进行排版。

== Problem 1 (Min-cut/Max-cut) ==
['''Counting <math>\alpha</math>-approximate min-cut'''] For any '''<math>\alpha \ge 1</math>''', a cut is called an '''<math>\alpha</math>'''-approximate min-cut in a multigraph '''<math>G</math>''' if the number of edges in it is at most '''<math>\alpha</math>''' times that of the min-cut. Prove that the number of '''<math>\alpha</math>'''-approximate min-cuts in a multigraph '''<math>G</math>''' is at most '''<math>n^{2\alpha} / 2</math>'''. ('''''Hint''''': Run Karger's algorithm until it has '''<math>\lceil 2\alpha \rceil</math>''' supernodes. What is the chance that a particular '''<math>\alpha</math>'''-approximate min-cut is still available? How many possible cuts does this collapsed graph have?)

== Problem 2 (Fingerprinting) ==

== Problem 3 (Hashing) ==

== Problem 4 (Concentration of measure) ==

== Problem 5 (Dimension reduction) ==

高级算法 (Fall 2024)/Problem Set 1

2024-09-27T06:38:21Z

Kvrmnks: Created page with "*每道题目的解答都要有完整的解题过程，中英文不限。 *我们推荐大家使用LaTeX, markdown等对作业进行排版。 == Problem 1 (Min-cut/Max-cut) == == Problem 2 (Fingerprinting) == == Problem 3 (Hashing) == == Problem 4 (Concentration of measure) == == Problem 5 (Dimension reduction) =="

*每道题目的解答都要有完整的解题过程，中英文不限。

*我们推荐大家使用LaTeX, markdown等对作业进行排版。

== Problem 1 (Min-cut/Max-cut) ==

== Problem 2 (Fingerprinting) ==

== Problem 3 (Hashing) ==

== Problem 4 (Concentration of measure) ==

== Problem 5 (Dimension reduction) ==

高级算法 (Fall 2024)

2024-09-07T09:07:39Z

Kvrmnks:

{{Infobox
|name = Infobox
|bodystyle =
|title = 高级算法
 Advanced Algorithms
|titlestyle =

|image =
|imagestyle =
|caption =
|captionstyle =
|headerstyle = background:#ccf;
|labelstyle = background:#ddf;
|datastyle =

|header1 =Instructor
|label1 =
|data1 =
|header2 =
|label2 =
|data2 = '''尹一通'''
|header3 =
|label3 = Email
|data3 = yinyt@nju.edu.cn
|header4 =
|label4= office
|data4= 计算机系 804
|header5 =
|label5 =
|data5 = '''栗师'''
|header6 =
|label6 = Email
|data6 = shili@nju.edu.cn
|header7 =
|label7= office
|data7= 计算机系 605
|header8 =
|label8 =
|data8 = '''刘景铖'''
|header9 =
|label9 = Email
|data9 = liu@nju.edu.cn
|header10 =
|label10= office
|data10= 计算机系 516
|header11 = Class
|label11 =
|data11 =
|header12 =
|label12 = Class meetings
|data12 = Monday (单), 4pm-6pm Thursday, 2pm-4pm 仙Ⅰ-206
|header13 =
|label13 = Place
|data13 =
|header14 =
|label14 = Office hours
|data14 = Monday, 2pm-4pm, 计算机系 804 
|header15 = Textbooks
|label15 =
|data15 =
|header16 =
|label16 =
|data16 = [[File:MR-randomized-algorithms.png|border|100px]]
|header17 =
|label17 =
|data17 = Motwani and Raghavan. ''Randomized Algorithms''. Cambridge Univ Press, 1995.
|header18 =
|label18 =
|data18 = [[File:Approximation_Algorithms.jpg|border|100px]]
|header19 =
|label19 =
|data19 = Vazirani. ''Approximation Algorithms''. Springer-Verlag, 2001.
|belowstyle = background:#ddf;
|below =
}}

This is the webpage for the ''Advanced Algorithms'' class of fall 2024. Students who take this class should check this page periodically for content updates and new announcements.

= Announcement =

* TBA

= Course info =
* '''Instructor ''':
:* [http://tcs.nju.edu.cn/yinyt/ 尹一通]：[mailto:yinyt@nju.edu.cn <yinyt@nju.edu.cn>]，计算机系 804
:*[https://tcs.nju.edu.cn/shili/ 栗师]：[mailto:shili@nju.edu.cn <shili@nju.edu.cn>]，计算机系 605
:* [https://liuexp.github.io 刘景铖]：[mailto:liu@nju.edu.cn <liu@nju.edu.cn>]，计算机系 516
* '''Teaching Assistant''':
** 于逸潇：yixiaoyu@smail.nju.edu.cn
** 张弈垚：zhangyiyao@smail.nju.edu.cn
* '''Class meeting''':
** Monday (单), 4pm-6pm, 仙Ⅰ-206
** Thursday, 2pm-4pm, 仙Ⅰ-206
* '''Office hour''': Monday, 2pm-4pm, 计算机系 804
* '''QQ群''': 757436140

= Syllabus =
随着计算机算法理论的不断发展，现代计算机算法的设计与分析大量地使用非初等的数学工具以及非传统的算法思想。“高级算法”这门课程就是面向计算机算法的这一发展趋势而设立的。课程将针对传统算法课程未系统涉及、却在计算机科学各领域的科研和实践中扮演重要角色的高等算法设计思想和算法分析工具进行系统讲授。

=== 先修课程 Prerequisites ===
* 必须：离散数学，概率论，线性代数。
* 推荐：算法设计与分析。

=== Course materials ===
* [[高级算法 (Fall 2024) / Course materials|教材和参考书]]

=== 成绩 Grades ===
* 课程成绩：本课程将会有若干次作业和一次期末考试。最终成绩将由平时作业成绩和期末考试成绩综合得出。
* 迟交：如果有特殊的理由，无法按时完成作业，请提前联系授课老师，给出正当理由。否则迟交的作业将不被接受。

=== 学术诚信 Academic Integrity ===
学术诚信是所有从事学术活动的学生和学者最基本的职业道德底线，本课程将不遗余力的维护学术诚信规范，违反这一底线的行为将不会被容忍。

作业完成的原则：署你名字的工作必须是你个人的贡献。在完成作业的过程中，允许讨论，前提是讨论的所有参与者均处于同等完成度。但关键想法的执行、以及作业文本的写作必须独立完成，并在作业中致谢（acknowledge）所有参与讨论的人。不允许其他任何形式的合作——尤其是与已经完成作业的同学“讨论”。

本课程将对剽窃行为采取零容忍的态度。在完成作业过程中，对他人工作（出版物、互联网资料、其他人的作业等）直接的文本抄袭和对关键思想、关键元素的抄袭，按照 [http://www.acm.org/publications/policies/plagiarism_policy ACM Policy on Plagiarism]的解释，都将视为剽窃。剽窃者成绩将被取消。如果发现互相抄袭行为， 抄袭和被抄袭双方的成绩都将被取消。因此请主动防止自己的作业被他人抄袭。

学术诚信影响学生个人的品行，也关乎整个教育系统的正常运转。为了一点分数而做出学术不端的行为，不仅使自己沦为一个欺骗者，也使他人的诚实努力失去意义。让我们一起努力维护一个诚信的环境。

= Assignments =
* TBA

= Lecture Notes =
# [[高级算法 (Fall 2024)/Min Cut, Max Cut, and Spectral Cut|Min Cut, Max Cut, and Spectral Cut]] ([http://tcs.nju.edu.cn/slides/aa2024/Cut.pdf slides])
#* [[高级算法 (Fall 2024)/Probability Basics|Probability basics]]

= Related Online Courses=
* [https://www.cs.cmu.edu/~15850/ Advanced Algorithms] by Anupam Gupta at CMU.
* [http://people.csail.mit.edu/moitra/854.html Advanced Algorithms] by Ankur Moitra at MIT.
* [http://courses.csail.mit.edu/6.854/current/ Advanced Algorithms] by David Karger and Aleksander Mądry at MIT.
* [http://web.stanford.edu/class/cs168/index.html The Modern Algorithmic Toolbox] by Tim Roughgarden and Gregory Valiant at Stanford.
* [https://www.cs.princeton.edu/courses/archive/fall18/cos521/ Advanced Algorithm Design] by Pravesh Kothari and Christopher Musco at Princeton.
* [http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/ Linear and Semidefinite Programming (Advanced Algorithms)] by Anupam Gupta and Ryan O'Donnell at CMU.
* [https://www.cs.cmu.edu/~odonnell/papers/cs-theory-toolkit-lecture-notes.pdf CS Theory Toolkit] by Ryan O'Donnell at CMU.
* [https://cs.uwaterloo.ca/~lapchi/cs860/index.html Eigenvalues and Polynomials] by Lap Chi Lau at University of Waterloo.
* The [https://www.cs.cornell.edu/jeh/book.pdf "Foundations of Data Science" book] by Avrim Blum, John Hopcroft, and Ravindran Kannan.

计算方法 Numerical method (Spring 2024)/Homework7 提交名单

2024-06-12T02:34:07Z

Kvrmnks: Created page with " 如有错漏请邮件联系助教. <center> {| class="wikitable" |- ! 学号 !! 姓名 |- | 191240047 || 孙宇飞 |- | 201240036 || 钱儒凡 |- | 201240090 || 陈诺星 |- | 211240013 || 李昀芃 |- | 211240020 || 朱睿骐 |- | 211240035 || 王祉天 |- | 211294003 || 倪昀 |- | 221180133 || 黄可唯 |- | 221220002 || 沈均文 |- | 221220003 || 林涵坤 |- | 221220005 || 刘稼新 |- | 221220019 || 洪观澜 |- | 221220027 || 蒋宇阳 |- | 22122002..."

如有错漏请邮件联系助教.
<center>
{| class="wikitable"
|-
! 学号 !! 姓名
|-
| 191240047 || 孙宇飞
|-
| 201240036 || 钱儒凡
|-
| 201240090 || 陈诺星
|-
| 211240013 || 李昀芃
|-
| 211240020 || 朱睿骐
|-
| 211240035 || 王祉天
|-
| 211294003 || 倪昀
|-
| 221180133 || 黄可唯
|-
| 221220002 || 沈均文
|-
| 221220003 || 林涵坤
|-
| 221220005 || 刘稼新
|-
| 221220019 || 洪观澜
|-
| 221220027 || 蒋宇阳
|-
| 221220029 || 陈俊翰
|-
| 221220033 || 孙一鸣
|-
| 221220034 || 王旭
|-
| 221220035 || 朱一宝
|-
| 221220054 || 董旭
|-
| 221220067 || 刘思远
|-
| 221220073 || 任卫洪
|-
| 221220076 || 落华栋
|-
| 221220092 || 谷莘
|-
| 221220118 || 但佳霖
|-
| 221220125 || 饶博文
|-
| 221220142 || 欧阳瑞泽
|-
| 221220151 || 侯君瑜
|-
| 221220156 || 陈伯昆
|-
| 221220158 || 颜伟坤
|-
| 221240001 || 王炳旭
|-
| 221240008 || 胡佳昕
|-
| 221240022 || 韩瑞
|-
| 221240023 || 蒋耀瑾
|-
| 221240024 || 唐之尧
|-
| 221240026 || 刘俨东
|-
| 221240027 || 唐诗博
|-
| 221240032 || 蔡坤志
|-
| 221240035 || 李想
|-
| 221240041 || 周越洋
|-
| 221240047 || 孙梓洋
|-
| 221240066 || 张植翔
|-
| 221240074 || 曹任飞
|-
| 221240089 || 杨周宇霄
|-
| 221240092 || 杨煜申
|-
| 221240093 || 陈力峥
|-
| 221502001 || 赵子轩
|-
| 221502004 || 李梓荣
|-
| 221502005 || 王昕浩
|-
| 221502006 || 张文权
|-
| 221502008 || 梁今为
|-
| 221502009 || 李嘉洲
|-
| 221502010 || 梁志浩
|-
| 221502014 || 施翔
|-
| 221502017 || 卢君和
|-
| 221502018 || 陈正道
|-
| 221502020 || 李维岩
|-
| 221502024 || 杨栋凯
|-
| 221502025 || 宋相廷
|-
| 221830206 || 李君羡
|-
| 221840201 || 钟锦立
|-
| 221840223 || 王逸飞
|-
| 221840262 || 孙纯洁
|-
| 221840315 || 凌枫
|-
| 221850025 || 高维康
|-
| 221870199 || 黄一凡
|-
| 221870222 || 刘烨
|}
</center>
共 65 人

计算方法 Numerical method (Spring 2024)

2024-06-12T02:33:42Z

Kvrmnks: /* Assignments */

{{Infobox
|name = Infobox
|bodystyle =
|title = 计算方法
 Numerical method
|titlestyle =

|image =
|imagestyle =
|caption =
|captionstyle =
|headerstyle = background:#ccf;
|labelstyle = background:#ddf;
|datastyle =

|header1 =Instructor
|label1 =
|data1 =
|header2 =
|label2 =
|data2 = 刘景铖
|header3 =
|label3 = Email
|data3 = liu [at] nju [dot] edu [dot] cn
|header4 =
|label4= Office
|data4= 计算机系 516
|header5 = Class
|label5 =
|data5 =
|header6 =
|label6 = Class meetings
|data6 = 周三 14:00-16:00 仙 Ⅱ-303
|header7 =
|label7 = Place
|data7 =
|header8 =
|label8 =
|data8 =
|header9 = Textbooks
|label9 =
|data9 =
|header10 =
|label10 =
|data10 =
|header11 =
|label11 =
|data11 = Timothy Sauer 数值分析（Numerical Analysis）（原书第2版）. 机械工业出版社.
|header12 = Teaching Assistants
|data13= 傅心语，于逸潇
|label14= Email
|data14= {xyfu, yixiaoyu} [at] smail [dot] nju [dot] edu [dot] cn
|label15= Office
|data15=计算机系 410
|belowstyle = background:#ddf;
|below =
}}
=Announcement=
*Welcome
=Course info=
*'''Instructor''': 刘景铖 ( liu [at] nju [dot] edu [dot] cn )

*'''Teaching assistants''': 傅心语，于逸潇
*'''TA email''': {xyfu, yixiaoyu} [at] smail [dot] nju [dot] edu [dot] cn
*'''Homework email''': nm_nju_2024@163.com
*'''Class meeting''':周三 14:00-16:00，仙 Ⅱ-303
*'''Office hour''': 周二 16:00-18:00?, 计算机系516 (subject to change)
*'''QQ群''': 855212527.(加入时需报姓名、专业、学号)

=Textbooks and Readings=
*数值分析（Numerical Analysis）（原书第2版）. Timothy Sauer. 机械工业出版社.
*[https://people.csail.mit.edu/jsolomon/share/book/numerical_book.pdf Numerical Algorithms: Methods for Computer Vision, Machine Learning, and Graphics. Justin Solomon. CRC Press]
*[https://www.cs.yale.edu/homes/vishnoi/Lxb-Web.pdf Lx=b, Laplacian Solver and Their Algorithmic Applications. Nisheeth K. Vishnoi.]
如果在获取教材方面有困难可以联系助教。(仅限英文版)

= Collaboration on Homework =
You are welcome to work on homework problems in study groups of no more than 3 people; however, you must always write up the solutions on your own, listing all collaborators at the top. Similarly, you may use books or online resources to help solve homework problems, but you must always credit all such sources in your writeup and you must never copy material verbatim.

We believe that most students can distinguish between helping other students and cheating. You may discuss approaches but your solution must be written by you and you only. You should acknowledge everyone whom you have worked with or who has given you any significant ideas about the homework.

Further, it is your responsibility to ensure that your solutions will not be visible to other students. If you use Github or another source control system to store your solutions electronically, you must ensure your account is configured so your solutions are not publicly visible. Many popular version control systems provide free repositories to students.

As a final note, we’d like to point out that collaboration on homework, while permitted, can be detrimental to your learning if misused. In particular, avoid collaborations where you do not contribute enough to your own satisfaction. Such a collaboration not only cheats you out of an opportunity to learn through homework, but can also affect your confidence. If you feel that you are not contributing enough to your group, then try to spend time thinking about the problems alone before working with your group. If you end up solving the problem all by yourself, that’s great! And if not, you’ll still be better prepared to contribute to your group.

See also [http://www.acm.org/publications/policies/plagiarism_policy ACM Policy on Plagiarism].

=Assignments=
Late policy: In general, we will accomodate late submission requests ONLY IF you made such requests ahead of time.

# [[Media:Computational Method 2024 Assignments1 new.pdf| Homework1]] 请在 2024年03月12日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A1.pdf') [[计算方法 Numerical method (Spring 2024)/Homework1 提交名单| Homework1 提交名单]]
#[[Media:Computational Method 2024 Assignments 2.pdf| Homework2]] 请在 2024年03月26日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A2.pdf') [[计算方法 Numerical method (Spring 2024)/Homework2 提交名单| Homework2 提交名单]]
#[[Media:Computational Method 2024 Assignments 3.pdf| Homework3]] 请在 2024年04月09日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A3.pdf') [[计算方法 Numerical method (Spring 2024)/Homework3 提交名单| Homework3 提交名单]]
#[[Media:Computational Method 2024 Assignments4.pdf| Homework4]] 请在 2024年05月01日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A4.pdf') [[计算方法 Numerical method (Spring 2024)/Homework4 提交名单| Homework4 提交名单]]
#[[Media:Computational Method 2024 Assignments 5.pdf| Homework5]] 请在 2024年05月14日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A5.pdf') [[计算方法 Numerical method (Spring 2024)/Homework5 提交名单| Homework5 提交名单]]
#[[Media:Computational Method 2024 Assignments 6.pdf| Homework6]] 请在 2024年05月28日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A6.pdf') [[计算方法 Numerical method (Spring 2024)/Homework6 提交名单| Homework6 提交名单]]
#[[Media:Computational Method 2024 Assignments 7.pdf| Homework7]] 请在 2024年06月11日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A7.pdf') [[计算方法 Numerical method (Spring 2024)/Homework7 提交名单| Homework7 提交名单]]

=Lecture Notes=
如果有下载课件的问题请及时联系助教。

# [[Media:计算方法1-课程简介-Spring2024.pdf| 课程简介，函数求根]]
# [[Media:计算方法2-插值-Spring2024.pdf| 牛顿法，插值，秘密分享，自纠错码]]
# [[Media:计算方法3-Chebyshev-Spring2024.pdf| Chebyshev插值与多项式，范数]]
# [[Media:计算方法4-最小二乘法.pdf|最小二乘法，Gram-Schmidt正交化与QR分解]]
# [[Media:计算方法5-高斯消元.pdf|FFT，高斯消元与LU分解]]
# [[Media:计算方法6-条件数和迭代法.pdf|算子范数，条件数和迭代法]]
# [[Media:计算方法 7-特征值与幂迭代.pdf|特征值与幂迭代]]
# [[Media:计算方法 8-特征值的其它迭代方法与SVD.pdf|特征值的其它迭代方法与SVD]]
#* Further reading: [https://web.stanford.edu/class/cs168/l/l9.pdf lecture note by Tim Roughgarden and Greg Valiant on matrix completions]
# [[Media:计算方法9-ConjugateGradient.pdf | 迭代法解线性方程组：梯度下降方法与共轭梯度]]
# [[Media:计算方法10-2024.pdf | 幂迭代的特例：随机游走与马尔可夫链]]
# [[Media:计算方法11-谱图论-2024.pdf |谱图论]]
# [[Media:计算方法12-电阻电路网络-2024.pdf |电阻电路网络]]
# [[Media:计算方法13-hitting time and LP-2024.pdf|碰撞时间、遍历时间和线性规划入门]]
# [[Media:计算方法14-duality-2024.pdf|对偶性原理与应用]]

计算方法 Numerical method (Spring 2024)/Homework6 提交名单

2024-05-29T06:18:04Z

Kvrmnks: Created page with " 如有错漏请邮件联系助教. <center> {| class="wikitable" |- ! 学号 !! 姓名 |- | 191240047 || 孙宇飞 |- | 201240036 || 钱儒凡 |- | 201240090 || 陈诺星 |- | 211240013 || 李昀芃 |- | 211240020 || 朱睿骐 |- | 211240035 || 王祉天 |- | 211294003 || 倪昀 |- | 221180133 || 黄可唯 |- | 221220002 || 沈均文 |- | 221220003 || 林涵坤 |- | 221220019 || 洪观澜 |- | 221220027 || 蒋宇阳 |- | 221220029 || 陈俊翰 |- | 22122003..."

如有错漏请邮件联系助教.
<center>
{| class="wikitable"
|-
! 学号 !! 姓名
|-
| 191240047 || 孙宇飞
|-
| 201240036 || 钱儒凡
|-
| 201240090 || 陈诺星
|-
| 211240013 || 李昀芃
|-
| 211240020 || 朱睿骐
|-
| 211240035 || 王祉天
|-
| 211294003 || 倪昀
|-
| 221180133 || 黄可唯
|-
| 221220002 || 沈均文
|-
| 221220003 || 林涵坤
|-
| 221220019 || 洪观澜
|-
| 221220027 || 蒋宇阳
|-
| 221220029 || 陈俊翰
|-
| 221220033 || 孙一鸣
|-
| 221220034 || 王旭
|-
| 221220035 || 朱一宝
|-
| 221220067 || 刘思远
|-
| 221220073 || 任卫洪
|-
| 221220076 || 落华栋
|-
| 221220092 || 谷莘
|-
| 221220095 || 曾凡俊
|-
| 221220106 || 黄伟贤
|-
| 221220118 || 但佳霖
|-
| 221220125 || 饶博文
|-
| 221220142 || 欧阳瑞泽
|-
| 221220151 || 侯君瑜
|-
| 221220156 || 陈伯昆
|-
| 221220158 || 颜伟坤
|-
| 221240001 || 王炳旭
|-
| 221240008 || 胡佳昕
|-
| 221240022 || 韩瑞
|-
| 221240023 || 蒋耀瑾
|-
| 221240024 || 唐之尧
|-
| 221240026 || 刘俨东
|-
| 221240027 || 唐诗博
|-
| 221240032 || 蔡坤志
|-
| 221240035 || 李想
|-
| 221240041 || 周越洋
|-
| 221240047 || 孙梓洋
|-
| 221240066 || 张植翔
|-
| 221240074 || 曹任飞
|-
| 221240089 || 杨周宇霄
|-
| 221240092 || 杨煜申
|-
| 221240093 || 陈力峥
|-
| 221502001 || 赵子轩
|-
| 221502004 || 李梓荣
|-
| 221502005 || 王昕浩
|-
| 221502006 || 张文权
|-
| 221502009 || 李嘉洲
|-
| 221502010 || 梁志浩
|-
| 221502014 || 施翔
|-
| 221502015 || 王天宇
|-
| 221502017 || 卢君和
|-
| 221502018 || 陈正道
|-
| 221502020 || 李维岩
|-
| 221502025 || 宋相廷
|-
| 221830012 || 茆弘之
|-
| 221830206 || 李君羡
|-
| 221840201 || 钟锦立
|-
| 221840223 || 王逸飞
|-
| 221840262 || 孙纯洁
|-
| 221840315 || 凌枫
|-
| 221850025 || 高维康
|-
| 221870052 || 吴隽雨
|-
| 221870199 || 黄一凡
|-
| 221870222 || 刘烨
|-
| 225102007 || 崔毓泽
|}
</center>
共 67 人

计算方法 Numerical method (Spring 2024)

2024-05-29T05:59:21Z

Kvrmnks: /* Assignments */

{{Infobox
|name = Infobox
|bodystyle =
|title = 计算方法
 Numerical method
|titlestyle =

|image =
|imagestyle =
|caption =
|captionstyle =
|headerstyle = background:#ccf;
|labelstyle = background:#ddf;
|datastyle =

|header1 =Instructor
|label1 =
|data1 =
|header2 =
|label2 =
|data2 = 刘景铖
|header3 =
|label3 = Email
|data3 = liu [at] nju [dot] edu [dot] cn
|header4 =
|label4= Office
|data4= 计算机系 516
|header5 = Class
|label5 =
|data5 =
|header6 =
|label6 = Class meetings
|data6 = 周三 14:00-16:00 仙 Ⅱ-303
|header7 =
|label7 = Place
|data7 =
|header8 =
|label8 =
|data8 =
|header9 = Textbooks
|label9 =
|data9 =
|header10 =
|label10 =
|data10 =
|header11 =
|label11 =
|data11 = Timothy Sauer 数值分析（Numerical Analysis）（原书第2版）. 机械工业出版社.
|header12 = Teaching Assistants
|data13= 傅心语，于逸潇
|label14= Email
|data14= {xyfu, yixiaoyu} [at] smail [dot] nju [dot] edu [dot] cn
|label15= Office
|data15=计算机系 410
|belowstyle = background:#ddf;
|below =
}}
=Announcement=
*Welcome
=Course info=
*'''Instructor''': 刘景铖 ( liu [at] nju [dot] edu [dot] cn )

*'''Teaching assistants''': 傅心语，于逸潇
*'''TA email''': {xyfu, yixiaoyu} [at] smail [dot] nju [dot] edu [dot] cn
*'''Homework email''': nm_nju_2024@163.com
*'''Class meeting''':周三 14:00-16:00，仙 Ⅱ-303
*'''Office hour''': 周二 16:00-18:00?, 计算机系516 (subject to change)
*'''QQ群''': 855212527.(加入时需报姓名、专业、学号)

=Textbooks and Readings=
*数值分析（Numerical Analysis）（原书第2版）. Timothy Sauer. 机械工业出版社.
*[https://people.csail.mit.edu/jsolomon/share/book/numerical_book.pdf Numerical Algorithms: Methods for Computer Vision, Machine Learning, and Graphics. Justin Solomon. CRC Press]
*[https://www.cs.yale.edu/homes/vishnoi/Lxb-Web.pdf Lx=b, Laplacian Solver and Their Algorithmic Applications. Nisheeth K. Vishnoi.]
如果在获取教材方面有困难可以联系助教。(仅限英文版)

= Collaboration on Homework =
You are welcome to work on homework problems in study groups of no more than 3 people; however, you must always write up the solutions on your own, listing all collaborators at the top. Similarly, you may use books or online resources to help solve homework problems, but you must always credit all such sources in your writeup and you must never copy material verbatim.

We believe that most students can distinguish between helping other students and cheating. You may discuss approaches but your solution must be written by you and you only. You should acknowledge everyone whom you have worked with or who has given you any significant ideas about the homework.

Further, it is your responsibility to ensure that your solutions will not be visible to other students. If you use Github or another source control system to store your solutions electronically, you must ensure your account is configured so your solutions are not publicly visible. Many popular version control systems provide free repositories to students.

As a final note, we’d like to point out that collaboration on homework, while permitted, can be detrimental to your learning if misused. In particular, avoid collaborations where you do not contribute enough to your own satisfaction. Such a collaboration not only cheats you out of an opportunity to learn through homework, but can also affect your confidence. If you feel that you are not contributing enough to your group, then try to spend time thinking about the problems alone before working with your group. If you end up solving the problem all by yourself, that’s great! And if not, you’ll still be better prepared to contribute to your group.

See also [http://www.acm.org/publications/policies/plagiarism_policy ACM Policy on Plagiarism].

=Assignments=
Late policy: In general, we will accomodate late submission requests ONLY IF you made such requests ahead of time.

# [[Media:Computational Method 2024 Assignments1 new.pdf| Homework1]] 请在 2024年03月12日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A1.pdf') [[计算方法 Numerical method (Spring 2024)/Homework1 提交名单| Homework1 提交名单]]
#[[Media:Computational Method 2024 Assignments 2.pdf| Homework2]] 请在 2024年03月26日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A2.pdf') [[计算方法 Numerical method (Spring 2024)/Homework2 提交名单| Homework2 提交名单]]
#[[Media:Computational Method 2024 Assignments 3.pdf| Homework3]] 请在 2024年04月09日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A3.pdf') [[计算方法 Numerical method (Spring 2024)/Homework3 提交名单| Homework3 提交名单]]
#[[Media:Computational Method 2024 Assignments4.pdf| Homework4]] 请在 2024年05月01日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A4.pdf') [[计算方法 Numerical method (Spring 2024)/Homework4 提交名单| Homework4 提交名单]]
#[[Media:Computational Method 2024 Assignments 5.pdf| Homework5]] 请在 2024年05月14日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A5.pdf') [[计算方法 Numerical method (Spring 2024)/Homework5 提交名单| Homework5 提交名单]]
#[[Media:Computational Method 2024 Assignments 6.pdf| Homework6]] 请在 2024年05月28日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A6.pdf') [[计算方法 Numerical method (Spring 2024)/Homework6 提交名单| Homework6 提交名单]]
#[[Media:Computational Method 2024 Assignments 7.pdf| Homework7]] 请在 2024年06月11日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A7.pdf')

=Lecture Notes=
如果有下载课件的问题请及时联系助教。

# [[Media:计算方法1-课程简介-Spring2024.pdf| 课程简介，函数求根]]
# [[Media:计算方法2-插值-Spring2024.pdf| 牛顿法，插值，秘密分享，自纠错码]]
# [[Media:计算方法3-Chebyshev-Spring2024.pdf| Chebyshev插值与多项式，范数]]
# [[Media:计算方法4-最小二乘法.pdf|最小二乘法，Gram-Schmidt正交化与QR分解]]
# [[Media:计算方法5-高斯消元.pdf|FFT，高斯消元与LU分解]]
# [[Media:计算方法6-条件数和迭代法.pdf|算子范数，条件数和迭代法]]
# [[Media:计算方法 7-特征值与幂迭代.pdf|特征值与幂迭代]]
# [[Media:计算方法 8-特征值的其它迭代方法与SVD.pdf|特征值的其它迭代方法与SVD]]
## Further reading: [https://web.stanford.edu/class/cs168/l/l9.pdf lecture note by Tim Roughgarden and Greg Valiant on matrix completions]
# [[Media:计算方法9-ConjugateGradient.pdf | 迭代法解线性方程组：梯度下降方法与共轭梯度]]
# [[Media:计算方法10-2024.pdf | 幂迭代的特例：随机游走与马尔可夫链]]
# [[Media:计算方法11-谱图论-2024.pdf |谱图论]]
# [[Media:计算方法12-电阻电路网络-2024.pdf |电阻电路网络]]
# [[Media:计算方法13-hitting time and LP-2024.pdf|碰撞时间、遍历时间和线性规划入门]]

File:Computational Method 2024 Assignments 6.pdf

2024-05-26T09:05:59Z

Kvrmnks: Kvrmnks uploaded a new version of File:Computational Method 2024 Assignments 6.pdf

File:Computational Method 2024 Assignments 6.pdf

2024-05-20T14:38:50Z

Kvrmnks: Kvrmnks uploaded a new version of File:Computational Method 2024 Assignments 6.pdf

File:Computational Method 2024 Assignments 6.pdf

2024-05-15T07:07:55Z

Kvrmnks: Kvrmnks uploaded a new version of File:Computational Method 2024 Assignments 6.pdf

计算方法 Numerical method (Spring 2024)/Homework5 提交名单

2024-05-14T16:06:29Z

Kvrmnks: Created page with " 如有错漏请邮件联系助教. <center> {| class="wikitable" |- ! 学号 !! 姓名 |- | 191240047 || 孙宇飞 |- | 201240036 || 钱儒凡 |- | 211240013 || 李昀芃 |- | 211240020 || 朱睿骐 |- | 211240035 || 王祉天 |- | 211294003 || 倪昀 |- | 221180133 || 黄可唯 |- | 221220002 || 沈均文 |- | 221220003 || 林涵坤 |- | 221220019 || 洪观澜 |- | 221220027 || 蒋宇阳 |- | 221220029 || 陈俊翰 |- | 221220033 || 孙一鸣 |- | 22122003..."

如有错漏请邮件联系助教.
<center>
{| class="wikitable"
|-
! 学号 !! 姓名
|-
| 191240047 || 孙宇飞
|-
| 201240036 || 钱儒凡
|-
| 211240013 || 李昀芃
|-
| 211240020 || 朱睿骐
|-
| 211240035 || 王祉天
|-
| 211294003 || 倪昀
|-
| 221180133 || 黄可唯
|-
| 221220002 || 沈均文
|-
| 221220003 || 林涵坤
|-
| 221220019 || 洪观澜
|-
| 221220027 || 蒋宇阳
|-
| 221220029 || 陈俊翰
|-
| 221220033 || 孙一鸣
|-
| 221220034 || 王旭
|-
| 221220035 || 朱一宝
|-
| 221220054 || 董旭
|-
| 221220067 || 刘思远
|-
| 221220073 || 任卫洪
|-
| 221220076 || 落华栋
|-
| 221220092 || 谷莘
|-
| 221220095 || 曾凡俊
|-
| 221220106 || 黄伟贤
|-
| 221220118 || 但佳霖
|-
| 221220125 || 饶博文
|-
| 221220142 || 欧阳瑞泽
|-
| 221220151 || 侯君瑜
|-
| 221220156 || 陈伯昆
|-
| 221220158 || 颜伟坤
|-
| 221240001 || 王炳旭
|-
| 221240008 || 胡佳昕
|-
| 221240022 || 韩瑞
|-
| 221240023 || 蒋耀瑾
|-
| 221240024 || 唐之尧
|-
| 221240026 || 刘俨东
|-
| 221240027 || 唐诗博
|-
| 221240032 || 蔡坤志
|-
| 221240035 || 李想
|-
| 221240041 || 周越洋
|-
| 221240043 || 袁汉峙
|-
| 221240047 || 孙梓洋
|-
| 221240066 || 张植翔
|-
| 221240074 || 曹任飞
|-
| 221240089 || 杨周宇霄
|-
| 221240092 || 杨煜申
|-
| 221240093 || 陈力峥
|-
| 221502001 || 赵子轩
|-
| 221502004 || 李梓荣
|-
| 221502005 || 王昕浩
|-
| 221502006 || 张文权
|-
| 221502007 || 崔毓泽
|-
| 221502008 || 梁今为
|-
| 221502009 || 李嘉洲
|-
| 221502010 || 梁志浩
|-
| 221502014 || 施翔
|-
| 221502015 || 王天宇
|-
| 221502017 || 卢君和
|-
| 221502018 || 陈正道
|-
| 221502019 || 黄诗雅
|-
| 221502020 || 李维岩
|-
| 221502024 || 杨栋凯
|-
| 221502025 || 宋相廷
|-
| 221830206 || 李君羡
|-
| 221840201 || 钟锦立
|-
| 221840262 || 孙纯洁
|-
| 221840315 || 凌枫
|-
| 221850025 || 高维康
|-
| 221870052 || 吴隽雨
|-
| 221870199 || 黄一凡
|-
| 221870222 || 刘烨
|}
</center>
共 69 人

计算方法 Numerical method (Spring 2024)

2024-05-14T16:06:20Z

Kvrmnks: /* Assignments */

{{Infobox
|name = Infobox
|bodystyle =
|title = 计算方法
 Numerical method
|titlestyle =

|image =
|imagestyle =
|caption =
|captionstyle =
|headerstyle = background:#ccf;
|labelstyle = background:#ddf;
|datastyle =

|header1 =Instructor
|label1 =
|data1 =
|header2 =
|label2 =
|data2 = 刘景铖
|header3 =
|label3 = Email
|data3 = liu [at] nju [dot] edu [dot] cn
|header4 =
|label4= Office
|data4= 计算机系 516
|header5 = Class
|label5 =
|data5 =
|header6 =
|label6 = Class meetings
|data6 = 周三 14:00-16:00 仙 Ⅱ-303
|header7 =
|label7 = Place
|data7 =
|header8 =
|label8 =
|data8 =
|header9 = Textbooks
|label9 =
|data9 =
|header10 =
|label10 =
|data10 =
|header11 =
|label11 =
|data11 = Timothy Sauer 数值分析（Numerical Analysis）（原书第2版）. 机械工业出版社.
|header12 = Teaching Assistants
|data13= 傅心语，于逸潇
|label14= Email
|data14= {xyfu, yixiaoyu} [at] smail [dot] nju [dot] edu [dot] cn
|label15= Office
|data15=计算机系 410
|belowstyle = background:#ddf;
|below =
}}
=Announcement=
*Welcome
=Course info=
*'''Instructor''': 刘景铖 ( liu [at] nju [dot] edu [dot] cn )

*'''Teaching assistants''': 傅心语，于逸潇
*'''TA email''': {xyfu, yixiaoyu} [at] smail [dot] nju [dot] edu [dot] cn
*'''Homework email''': nm_nju_2024@163.com
*'''Class meeting''':周三 14:00-16:00，仙 Ⅱ-303
*'''Office hour''': 周二 16:00-18:00?, 计算机系516 (subject to change)
*'''QQ群''': 855212527.(加入时需报姓名、专业、学号)

=Textbooks and Readings=
*数值分析（Numerical Analysis）（原书第2版）. Timothy Sauer. 机械工业出版社.
*[https://people.csail.mit.edu/jsolomon/share/book/numerical_book.pdf Numerical Algorithms: Methods for Computer Vision, Machine Learning, and Graphics. Justin Solomon. CRC Press]
*[https://www.cs.yale.edu/homes/vishnoi/Lxb-Web.pdf Lx=b, Laplacian Solver and Their Algorithmic Applications. Nisheeth K. Vishnoi.]
如果在获取教材方面有困难可以联系助教。(仅限英文版)

= Collaboration on Homework =
You are welcome to work on homework problems in study groups of no more than 3 people; however, you must always write up the solutions on your own, listing all collaborators at the top. Similarly, you may use books or online resources to help solve homework problems, but you must always credit all such sources in your writeup and you must never copy material verbatim.

We believe that most students can distinguish between helping other students and cheating. You may discuss approaches but your solution must be written by you and you only. You should acknowledge everyone whom you have worked with or who has given you any significant ideas about the homework.

Further, it is your responsibility to ensure that your solutions will not be visible to other students. If you use Github or another source control system to store your solutions electronically, you must ensure your account is configured so your solutions are not publicly visible. Many popular version control systems provide free repositories to students.

As a final note, we’d like to point out that collaboration on homework, while permitted, can be detrimental to your learning if misused. In particular, avoid collaborations where you do not contribute enough to your own satisfaction. Such a collaboration not only cheats you out of an opportunity to learn through homework, but can also affect your confidence. If you feel that you are not contributing enough to your group, then try to spend time thinking about the problems alone before working with your group. If you end up solving the problem all by yourself, that’s great! And if not, you’ll still be better prepared to contribute to your group.

See also [http://www.acm.org/publications/policies/plagiarism_policy ACM Policy on Plagiarism].

=Assignments=
Late policy: In general, we will accomodate late submission requests ONLY IF you made such requests ahead of time.

# [[Media:Computational Method 2024 Assignments1 new.pdf| Homework1]] 请在 2024年03月12日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A1.pdf') [[计算方法 Numerical method (Spring 2024)/Homework1 提交名单| Homework1 提交名单]]
#[[Media:Computational Method 2024 Assignments 2.pdf| Homework2]] 请在 2024年03月26日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A2.pdf') [[计算方法 Numerical method (Spring 2024)/Homework2 提交名单| Homework2 提交名单]]
#[[Media:Computational Method 2024 Assignments 3.pdf| Homework3]] 请在 2024年04月09日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A3.pdf') [[计算方法 Numerical method (Spring 2024)/Homework3 提交名单| Homework3 提交名单]]
#[[Media:Computational Method 2024 Assignments4.pdf| Homework4]] 请在 2024年05月01日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A4.pdf') [[计算方法 Numerical method (Spring 2024)/Homework4 提交名单| Homework4 提交名单]]
#[[Media:Computational Method 2024 Assignments 5.pdf| Homework5]] 请在 2024年05月14日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A5.pdf') [[计算方法 Numerical method (Spring 2024)/Homework5 提交名单| Homework5 提交名单]]
#[[Media:Computational Method 2024 Assignments 6.pdf| Homework6]] 请在 2024年05月28日23点59分之前提交到 nm_nju_2024@163.com (文件名为'学号_姓名_A6.pdf')

=Lecture Notes=
如果有下载课件的问题请及时联系助教。

# [[Media:计算方法1-课程简介-Spring2024.pdf| 课程简介，函数求根]]
# [[Media:计算方法2-插值-Spring2024.pdf| 牛顿法，插值，秘密分享，自纠错码]]
# [[Media:计算方法3-Chebyshev-Spring2024.pdf| Chebyshev插值与多项式，范数]]
# [[Media:计算方法4-最小二乘法.pdf|最小二乘法，Gram-Schmidt正交化与QR分解]]
# [[Media:计算方法5-高斯消元.pdf|FFT，高斯消元与LU分解]]
# [[Media:计算方法6-条件数和迭代法.pdf|算子范数，条件数和迭代法]]
# [[Media:计算方法 7-特征值与幂迭代.pdf|特征值与幂迭代]]
# [[Media:计算方法 8-特征值的其它迭代方法与SVD.pdf|特征值的其它迭代方法与SVD]]
## Further reading: [https://web.stanford.edu/class/cs168/l/l9.pdf lecture note by Tim Roughgarden and Greg Valiant on matrix completions]
# [[Media:计算方法9-ConjugateGradient.pdf | 迭代法解线性方程组：梯度下降方法与共轭梯度]]
# [[Media:计算方法10-2024.pdf | 随机游走，马尔可夫链与幂迭代]]

File:Computational Method 2024 Assignments 6.pdf

2024-05-14T08:35:46Z

Kvrmnks: Kvrmnks uploaded a new version of File:Computational Method 2024 Assignments 6.pdf