Gumbel distribution and 组合数学 (Fall 2017)/Problem Set 1: Difference between pages

From TCS Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>KLBot2
m (Bot: Migrating 1 interwiki links, now provided by Wikidata on d:Q1096862)
 
imported>Etone
No edit summary
 
Line 1: Line 1:
[[File:Gumbel-Density.svg|thumb|260px|Gumbel probability distribution function (PDF)]]
== Problem 1 ==
[[File:Gumbel-Cumulative.svg|thumb|260px|Gumbel cumulative distribution function (CDF)]]
#有<math>k</math>种不同的明信片,每种明信片有无限多张,寄给<math>n</math>个人,每人一张,有多少种方法?
#有<math>k</math>种不同的明信片,每种明信片有无限多张,寄给<math>n</math>个人,每人一张,每个人必须收到不同种类的明信片,有多少种方法?
#有<math>k</math>种不同的明信片,每种明信片有无限多张,寄给<math>n</math>个人,每人收到<math>r</math>张不同的明信片(但不同的人可以收到相同的明信片),有多少种方法?
#只有一种明信片,共有<math>m</math>张,寄给<math>n</math>个人,全部寄完,每个人可以收多张明信片或者不收明信片,有多少种方法?
#有<math>k</math>种不同的明信片,其中第<math>i</math>种明信片有<math>m_i</math>张,寄给<math>n</math>个人,全部寄完,每个人可以收多张明信片或者不收明信片,有多少种方法?


The '''Gumbel distribution''' is a [[probability distribution]] of extreme values.
== Problem 2 ==
Find the number of ways to select <math>2n</math> balls from <math>n</math> identical blue balls, <math>n</math> identical red balls and <math>n</math> identical green balls.
* Give a combinatorial proof for the problem.
* Give an algebraic proof for the problem.


In [[probability theory]] and [[statistics]], the Gumbel distribution is used to model the distribution of the maximum (or the minimum) of a number of samples of various distributions. <ref>Gumbel, E.J. 1954. "Statistical theory of extreme values and some practical applications". ''Applied Mathematics Series'', 33. U.S. Department of Commerce, National Bureau of Standards.</ref>
== Problem 3 ==
*一个长度为<math>n</math>的“山峦”是如下由<math>n</math>个"/"和<math>n</math>"\"组成的,从坐标<math>(0,0)</math>到<math>(0,2n)</math>的折线,但任何时候都不允许低于<math>x</math>轴。例如下图:


Such a distribution might be used to represent the distribution of the maximum level of a river in a particular year if there was a list of maximum values for the past ten years. It is also useful in predicting the chance that an extreme earthquake, flood or other natural disaster will occur.
    /\
  /  \/\/\    /\/\
  /        \/\/    \/\/\
  ----------------------
:长度为<math>n</math>的“山峦”有多少?


== Properties ==
*一个长度为<math>n</math>的“地貌”是由<math>n</math>个"/"和<math>n</math>个"\"组成的,从坐标<math>(0,0)</math>到<math>(0,2n)</math>的折线,允许低于<math>x</math>轴。长度为<math>n</math>的“地貌”有多少?


The Gumbel distribution is a [[continuous]] probability distribution. Gumbel distributions are a family of distributions of the same general form. These distributions differ in their ''location'' and ''scale'' [[parameter]]s: the [[Mean (statistics)|mean]] ("[[average]]") of the distribution defines its location, and the [[standard deviation]] ("variability") defines the scale.
== Problem 4==
李雷和韩梅梅竞选学生会主席,韩梅梅获得选票 <math>p</math> 张,李雷获得选票 <math>q</math> 张,<math>p>q</math>。我们将总共的 <math>p+q</math> 张选票一张一张的点数,有多少种选票的排序方式使得在整个点票过程中,韩梅梅的票数一直高于李雷的票数?等价地,假设选票均匀分布的随机排列,以多大概率在整个点票过程中,韩梅梅的票数一直高于李雷的票数。


One recognizes the Gumbel ''probability density function'' (PDF) and the Gumbel ''cumulative distribution function'' (CDF).
==Problem 5==
A <math>2\times n</math> rectangle is to be paved with <math>1\times 2</math> identical blocks and <math>2\times 2</math> identical blocks. Let <math>f(n)</math> denote the number of ways that can be done. Find a recurrence relation for <math>f(n)</math>, solve the recurrence relation.


=== PDF ===
== Problem 6 ==
Let <math>a_n</math> be a sequence of numbers satisfying the recurrence relation:
:<math>p a_n+q a_{n-1}+r a_{n-2}=0</math>
with initial condition <math>a_0=s</math> and <math>a_1=t</math>, where <math>p,q,r,s,t</math> are constants such that <math>{p}+q+r=0</math>, <math>p\neq 0</math> and <math>s\neq t</math>. Solve the recurrence relation.


[[File:Standard deviation diagram.svg|thumb|260px|The [[Normal distribution|normal probability density function]] (PDF) is symmetric.]]
== Problem 7 ==
* 令<math>s_n</math>表示长度为<math>n</math>,没有2个连续的1的二进制串的数量,即
*:<math>s_n=|\{x\in\{0,1\}^n\mid \forall 1\le i\le n-1, x_ix_{i+1}\neq 11\}|</math>。
:求 <math>s_n</math>。


In the PDF, the probability P of a value V to occur between limits A and B, briefly written as P(A<V<B), is found by the area under the PDF curve between A and B.
*令<math>t_n</math>表示长度为<math>n</math>,没有3个连续的1的二进制串的数量,即
*:<math>t_n=|\{x\in\{0,1\}^n\mid \forall 1\le i\le n-2, x_ix_{i+1}x_{i+2}\neq 111\}|</math>。
*#给出计算<math>t_n</math>的递归式,并给出足够的初始值。
*#计算<math>t_n</math>的生成函数<math>T(x)=\sum_{n\ge 0}t_n x^n</math>,给出生成函数<math>T(x)</math>的闭合形式。


:{| class="wikitable"
注意:只需解生成函数的闭合形式,无需展开。
| bgcolor=#faf0e6 | Example of probability in the PDF
|-
| In the figure of the [[Normal distribution|normal probability density function]], the values on the horizontal axis should read: μ-3σ,  μ-2σ, μ-1σ, μ+1σ, μ+2σ, and μ+3σ respectively.<br>
μ = [[Mean (statistics)|mean]], σ = [[standard deviation]].<br>
The areas under the curve in the intervals, each with a width of one standard deviation, give the probability of occurrence in those intervals.<br>
Example: the probability of a value V to occur in the interval between A=μ+1σ and B=μ+2σ is P(μ+1σ<V<μ+2σ)=13.6% or 0.136
|}
 
Contrary to the normal distribution, the Gumbel PDF is a-symmetrical and skew to the right.
 
=== CDF ===
 
In the CDF, the probability that a value V is less than A is found directly as the CDF value at A:
 
:<math>P(V \leq A) = CDF(A) </math>.
 
:{| class="wikitable"
| bgcolor=#faf0e6 | Example of probability in the CDF
|-
| In the Gumbel CDF figure, the red curve indicates that the probability of V to be less than 5 is 0.9 (or 90%), whereas for the dark blue line this probability is 0.7 or 70%
|}
 
== Mathematics ==
 
=== The CDF ===
[[File:Comparison standard deviations.svg|thumb|250px|There are two data series: red and blue. Both have the same ''mean (average)'' : 100, but the blue group has a larger ''standard deviation'' (SD=σ=50) than the red group (SD=σ=10).]]
 
The mathematical expression of the CDF is:
 
:<math>CDF(A) = e^{-e^{-(A-\mu)/\beta}} ,</math>
 
where μ is the ''mode'' (the value where the probability density function reaches its peak), ''e'' is a [[e (mathematical constant)|mathematical constant]], about 2.718, and β is a value related to the ''[[standard deviation]]'' (σ) :
 
:<math> \beta = \sigma \sqrt{6}/ \pi , </math>
 
where π is the Greek symbol for [[Pi (math)|Pi]] whose value is close to 22/7 or 3.142, and the symbol <math>\sqrt{\,\,}</math> stands for the [[square root]].
 
=== Mode and median ===
 
The ''mode'' μ can be found from the ''[[median]]'' M, being the value of A where CDF(A)=0.5,  and β:
 
:<math>\mu = M+\beta \ln\left(\ln 2\right) ,</math>
 
where ''ln'' is the [[Logarithm#Natural logarithms|natural logarithm]].
 
=== Mean ===
 
The ''[[Mean (statistics)|mean]]'', E(x), given by:
 
:<math>\operatorname{E}(x)=\mu+c\beta ,</math>
 
where <math>c</math> = ''Euler constant'' <math>\approx</math> 0.5772.
 
== Estimation ==
 
In a data series, the parameters ''mode'' (μ) and β can be estimated from the [[average]], [[median]] and [[standard deviation]]. The calculation of the last three quantities is explained in the respective Wiki pages. Then, with the help of formulas given in the previous section, the factors μ and β can be calculated. In this way, the CDF of the Gumbel distribution belonging to the data can be determined and the probability of interesting data values can be found.
 
[[File:FitGumbelDistr.tif|thumb|260px|Fitted cumulative Gumbel distribution to maximum one-day
October rainfalls using CumFreq <ref> [http://www.waterlog.info/cumfreq.htm CumFreq software] for distribution fitting</ref> ]]
 
== Application ==
 
In [[hydrology]], the Gumbel distribution is used to analyze such variables as monthly and annual maximum values of daily rainfall and river discharge volumes,<ref>{{cite book|last=Ritzema (ed.)|first=H.P.|title=Frequency and Regression Analysis|year=1994|publisher=Chapter 6 in: Drainage Principles and Applications, Publication 16, International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands|pages=175–224|url=http://www.waterlog.info/pdf/freqtxt.pdf|isbn=90-70754-33-9}}</ref> and also to describe droughts.<ref>Burke, E.J.; Perry R.H.J.;  Brown, S.J. (2010) "An extreme value analysis of UK drought and projections of change in the future", ''Journal of Hydrology'' </ref>
The blue picture illustrates an example of fitting the Gumbel distribution to ranked maximum one-day October rainfalls showing also the 90% ''confidence belt'' based on the [[binomial distribution]].
 
== References ==
{{reflist}}
 
[[Category:Probability distributions]]

Revision as of 13:09, 17 September 2017

Problem 1

  1. [math]\displaystyle{ k }[/math]种不同的明信片,每种明信片有无限多张,寄给[math]\displaystyle{ n }[/math]个人,每人一张,有多少种方法?
  2. [math]\displaystyle{ k }[/math]种不同的明信片,每种明信片有无限多张,寄给[math]\displaystyle{ n }[/math]个人,每人一张,每个人必须收到不同种类的明信片,有多少种方法?
  3. [math]\displaystyle{ k }[/math]种不同的明信片,每种明信片有无限多张,寄给[math]\displaystyle{ n }[/math]个人,每人收到[math]\displaystyle{ r }[/math]张不同的明信片(但不同的人可以收到相同的明信片),有多少种方法?
  4. 只有一种明信片,共有[math]\displaystyle{ m }[/math]张,寄给[math]\displaystyle{ n }[/math]个人,全部寄完,每个人可以收多张明信片或者不收明信片,有多少种方法?
  5. [math]\displaystyle{ k }[/math]种不同的明信片,其中第[math]\displaystyle{ i }[/math]种明信片有[math]\displaystyle{ m_i }[/math]张,寄给[math]\displaystyle{ n }[/math]个人,全部寄完,每个人可以收多张明信片或者不收明信片,有多少种方法?

Problem 2

Find the number of ways to select [math]\displaystyle{ 2n }[/math] balls from [math]\displaystyle{ n }[/math] identical blue balls, [math]\displaystyle{ n }[/math] identical red balls and [math]\displaystyle{ n }[/math] identical green balls.

  • Give a combinatorial proof for the problem.
  • Give an algebraic proof for the problem.

Problem 3

  • 一个长度为[math]\displaystyle{ n }[/math]的“山峦”是如下由[math]\displaystyle{ n }[/math]个"/"和[math]\displaystyle{ n }[/math]个"\"组成的,从坐标[math]\displaystyle{ (0,0) }[/math][math]\displaystyle{ (0,2n) }[/math]的折线,但任何时候都不允许低于[math]\displaystyle{ x }[/math]轴。例如下图:
   /\
  /  \/\/\    /\/\
 /        \/\/    \/\/\
 ----------------------
长度为[math]\displaystyle{ n }[/math]的“山峦”有多少?
  • 一个长度为[math]\displaystyle{ n }[/math]的“地貌”是由[math]\displaystyle{ n }[/math]个"/"和[math]\displaystyle{ n }[/math]个"\"组成的,从坐标[math]\displaystyle{ (0,0) }[/math][math]\displaystyle{ (0,2n) }[/math]的折线,允许低于[math]\displaystyle{ x }[/math]轴。长度为[math]\displaystyle{ n }[/math]的“地貌”有多少?

Problem 4

李雷和韩梅梅竞选学生会主席,韩梅梅获得选票 [math]\displaystyle{ p }[/math] 张,李雷获得选票 [math]\displaystyle{ q }[/math] 张,[math]\displaystyle{ p\gt q }[/math]。我们将总共的 [math]\displaystyle{ p+q }[/math] 张选票一张一张的点数,有多少种选票的排序方式使得在整个点票过程中,韩梅梅的票数一直高于李雷的票数?等价地,假设选票均匀分布的随机排列,以多大概率在整个点票过程中,韩梅梅的票数一直高于李雷的票数。

Problem 5

A [math]\displaystyle{ 2\times n }[/math] rectangle is to be paved with [math]\displaystyle{ 1\times 2 }[/math] identical blocks and [math]\displaystyle{ 2\times 2 }[/math] identical blocks. Let [math]\displaystyle{ f(n) }[/math] denote the number of ways that can be done. Find a recurrence relation for [math]\displaystyle{ f(n) }[/math], solve the recurrence relation.

Problem 6

Let [math]\displaystyle{ a_n }[/math] be a sequence of numbers satisfying the recurrence relation:

[math]\displaystyle{ p a_n+q a_{n-1}+r a_{n-2}=0 }[/math]

with initial condition [math]\displaystyle{ a_0=s }[/math] and [math]\displaystyle{ a_1=t }[/math], where [math]\displaystyle{ p,q,r,s,t }[/math] are constants such that [math]\displaystyle{ {p}+q+r=0 }[/math], [math]\displaystyle{ p\neq 0 }[/math] and [math]\displaystyle{ s\neq t }[/math]. Solve the recurrence relation.

Problem 7

  • [math]\displaystyle{ s_n }[/math]表示长度为[math]\displaystyle{ n }[/math],没有2个连续的1的二进制串的数量,即
    [math]\displaystyle{ s_n=|\{x\in\{0,1\}^n\mid \forall 1\le i\le n-1, x_ix_{i+1}\neq 11\}| }[/math]
[math]\displaystyle{ s_n }[/math]
  • [math]\displaystyle{ t_n }[/math]表示长度为[math]\displaystyle{ n }[/math],没有3个连续的1的二进制串的数量,即
    [math]\displaystyle{ t_n=|\{x\in\{0,1\}^n\mid \forall 1\le i\le n-2, x_ix_{i+1}x_{i+2}\neq 111\}| }[/math]
    1. 给出计算[math]\displaystyle{ t_n }[/math]的递归式,并给出足够的初始值。
    2. 计算[math]\displaystyle{ t_n }[/math]的生成函数[math]\displaystyle{ T(x)=\sum_{n\ge 0}t_n x^n }[/math],给出生成函数[math]\displaystyle{ T(x) }[/math]的闭合形式。

注意:只需解生成函数的闭合形式,无需展开。