利用线性同余方程构造两两独立的随机变量

令[math]\displaystyle{ p }[/math]为一质数。考虑模[math]\displaystyle{ p }[/math]余数构成的集合[math]\displaystyle{ [p]=\{0,1,\ldots,p-1\}=\mathbb{Z}_p }[/math]。众所周知，当[math]\displaystyle{ p }[/math]为质数时，[math]\displaystyle{ \mathbb{Z}_p }[/math]为对模[math]\displaystyle{ p }[/math]加法和乘法运算闭合的有限域。

我们现在构造一系列值域为[math]\displaystyle{ [p] }[/math]的两两独立（pairwise Independent）且均匀分布（uniformly distributed）的随机变量[math]\displaystyle{ X_{0},X_{1},\ldots,X_{p-1} }[/math]。构造如下：

令[math]\displaystyle{ \boldsymbol{a},\boldsymbol{b}\in[p] }[/math]为一对相互独立的取值在[math]\displaystyle{ [p] }[/math]上均匀分布的随机变量；
对于每个[math]\displaystyle{ i\in[p] }[/math]，定义 [math]\displaystyle{ X_i=(\boldsymbol{a}\cdot i+\boldsymbol{b})\bmod p }[/math]。

我们证明如此构造的随机变量序列[math]\displaystyle{ X_{0},X_{1},\ldots,X_{p-1} }[/math]的确两两独立且在[math]\displaystyle{ [p] }[/math]上均匀分布。

定理一

（均匀分布）对于任何[math]\displaystyle{ i\in[p] }[/math]，任意取值[math]\displaystyle{ c\in[p] }[/math]，有
[math]\displaystyle{ \Pr(X_i=c)=\frac{1}{p} }[/math]
（两两独立性）对于任何不同的[math]\displaystyle{ i,j\in[p] }[/math]，以及任意取值[math]\displaystyle{ c,d\in[p] }[/math]，有
[math]\displaystyle{ \Pr(X_i=c\cap X_j=d)=\frac{1}{p^2} }[/math]

Proof.

首先验证每个[math]\displaystyle{ X_i }[/math]在其值域[math]\displaystyle{ [p] }[/math]上都是均匀分布的。根据全概率法则：

[math]\displaystyle{ \begin{align} \Pr(X_i=c) &= \Pr\left((\boldsymbol{a}\cdot i+\boldsymbol{b})=c\pmod p\right)\\ &= \sum_{a\in[p]}\frac{1}{p}\Pr\left(\boldsymbol{b}\equiv c-a\cdot i\pmod{p}\right)\\ &= \frac{1}{p}. \end{align} }[/math]

其中，最后一个等式成立是因为对于任意的[math]\displaystyle{ i,a,c\in[p] }[/math]，仅有唯一的取值[math]\displaystyle{ b\in[p] }[/math]满足同余关系[math]\displaystyle{ b\equiv c-a\cdot i\pmod{p} }[/math]，因此 [math]\displaystyle{ \Pr\left(\boldsymbol{b}\equiv c-a\cdot i\pmod{p}\right)=\frac{1}{p} }[/math].

其次验证没对不同的[math]\displaystyle{ X_i,X_j }[/math]之间的两两独立性。注意到，对任意不同[math]\displaystyle{ i,j\in[p] }[/math]和任意取值[math]\displaystyle{ c,d\in[p] }[/math]，如下以[math]\displaystyle{ \boldsymbol{a} }[/math]和[math]\displaystyle{ \boldsymbol{b} }[/math]为未知量的线性同余方程组：

[math]\displaystyle{ \begin{cases} (\boldsymbol{a}\cdot i+\boldsymbol{b})\equiv c\pmod{p}\\ (\boldsymbol{a}\cdot j+\boldsymbol{b})\equiv d\pmod{p} \end{cases} }[/math]

有唯一解[math]\displaystyle{ a,b\in[p] }[/math]。这是因为[math]\displaystyle{ \mathbb{Z}_p }[/math]作为一个有限域是一个合法的数域，因此线性代数的理论均保持成立。于是有，

[math]\displaystyle{ \Pr(X_i=c\cap X_j=d)=\Pr(\boldsymbol{a}=a\cap \boldsymbol{b}=b) =\frac{1}{p^2} }[/math]

[math]\displaystyle{ \square }[/math]

两点采样（Two-point sampling）

令[math]\displaystyle{ f:\{0,1\}^*\to\{0,1\} }[/math]表示某个判定问题。令[math]\displaystyle{ \mathcal{A} }[/math]为解决该判定问题的某个有单侧错误的蒙特卡罗随机算法（Monte Carlo randomized algorithm with one-sided error）。令[math]\displaystyle{ 0\lt \epsilon\lt 1 }[/math]表示误差界。对于任意输入[math]\displaystyle{ x\in\{0,1\}^* }[/math]，令[math]\displaystyle{ p }[/math]为某足够大的质数，[math]\displaystyle{ r\in[p] }[/math]为在[math]\displaystyle{ [p] }[/math]上均匀分布的随机种子，假设算法[math]\displaystyle{ \mathcal{A} }[/math]总可以在有限时间内终止，且满足：

[math]\displaystyle{ f(x)=1\implies \Pr(\mathcal{A}(x,r)=1)\ge 1-\epsilon }[/math];
[math]\displaystyle{ f(x)=0\implies \Pr(\mathcal{A}(x,r)=0)=1 }[/math]，即对于所有随机种子[math]\displaystyle{ r\in[p] }[/math]总是有[math]\displaystyle{ \mathcal{A}(x,r)=0 }[/math].

在上述计算模型中，我们假设随机种子均匀分布在某个有限域[math]\displaystyle{ [p] }[/math]上，该有限域的大小[math]\displaystyle{ p }[/math]为质数，且可以与输入[math]\displaystyle{ x }[/math]的长度相关；同时，我们假设算法[math]\displaystyle{ \mathcal{A} }[/math]的随机性完全来自于随机种子[math]\displaystyle{ r\in[p] }[/math]，即：当判定问题的输入[math]\displaystyle{ x }[/math]和随机种子[math]\displaystyle{ r\in[p] }[/math]的取值都被确定之后，算法[math]\displaystyle{ \mathcal{A}(x,r) }[/math]的执行过程是确定的。

通过上述定义，算法[math]\displaystyle{ \mathcal{A} }[/math]有单侧的假阴性（false negative）概率误差，至多为[math]\displaystyle{ \epsilon }[/math]。如需减小这一误差，一个常规的办法是在同一个输入[math]\displaystyle{ x }[/math]上多次运行随机算法[math]\displaystyle{ \mathcal{A} }[/math]。

给定自然数[math]\displaystyle{ k\ge 1 }[/math]。对于问题输入[math]\displaystyle{ x\in\{0,1\}^* }[/math]，随机种子[math]\displaystyle{ r_0,r_1,\ldots,r_{k-1}\in[p] }[/math]，定义新算法：

[math]\displaystyle{ \mathcal{A}^k(x,r_0,r_1,\ldots,r_{k-1})=\bigvee_{i=0}^{k-1}\mathcal{A}(x,r_i) }[/math]。

即，新算法[math]\displaystyle{ \mathcal{A}^k(x,r_0,r_1,\ldots,r_{k-1}) }[/math]输出0当且仅当对于所有[math]\displaystyle{ i=0,1,\ldots,k-1 }[/math]，算法[math]\displaystyle{ \mathcal{A}(x,r_i) }[/math]都输出0；除此之外新算法[math]\displaystyle{ \mathcal{A}^k(x,r_0,r_1,\ldots,r_{k-1}) }[/math]输出1。

对于如此构造的算法[math]\displaystyle{ \mathcal{A}^k }[/math]，容易验证：无论我们如何选取随机种子[math]\displaystyle{ r_0,r_1,\ldots,r_{k-1}\in[p] }[/math]，始终有：

[math]\displaystyle{ f(x)=0\implies \mathcal{A}^k(x,r_0,r_1,\ldots,r_{k-1})=0 }[/math]，因为此时[math]\displaystyle{ \mathcal{A}(x,r_i)=0 }[/math]总是成立。

因此新算法[math]\displaystyle{ \mathcal{A}^k }[/math]仍旧没有假阳性错误。我们仅需分析其假阴性错误的概率即可：在此之后我们始终假设输入[math]\displaystyle{ x }[/math]满足[math]\displaystyle{ f(x)=1 }[/math]，分析算法[math]\displaystyle{ \mathcal{A}^k }[/math]输出0的概率上界。

给定自然数[math]\displaystyle{ k\le p }[/math]。考虑如下的两种方案来产生上述新算法[math]\displaystyle{ \mathcal{A}^k }[/math]中使用的随机种子[math]\displaystyle{ r_0,r_1,\ldots,r_{k-1}\in[p] }[/math]。

方案I（独立重试）：

令[math]\displaystyle{ r_0,r_1,\ldots,r_{k-1}\in[p] }[/math]为相互独立均匀分布的随机数。

对于该方案，在输入[math]\displaystyle{ x }[/math]满足[math]\displaystyle{ f(x)=1 }[/math]的前提下显然有：

[math]\displaystyle{ \Pr\left(\mathcal{A}^k(x,r_0,r_1,\ldots,r_{k-1})=0\right)=\Pr\left(\bigcap_{i=0}^{k-1}(\mathcal{A}(x,r_i)=0)\right)=\prod_{i=0}^{k-1}\Pr\left(\mathcal{A}(x,r_i)=0)\right)\le \epsilon^k }[/math]

因此，方案I将单侧概率误差[math]\displaystyle{ \epsilon }[/math]降为[math]\displaystyle{ \epsilon^k }[/math]，代价是使用[math]\displaystyle{ k }[/math]倍的时间复杂度和[math]\displaystyle{ k }[/math]个相互独立的随机种子。

下面我们考虑采用前述定理一中的两两独立的构造来产生随机种子[math]\displaystyle{ r_0,r_1,\ldots,r_{k-1}\in[p] }[/math]。

方案II（两点采样）：

令[math]\displaystyle{ \boldsymbol{a},\boldsymbol{b}\in[p] }[/math]为相互独立均匀分布的随机数；
对于[math]\displaystyle{ i=0,1,\ldots,k-1 }[/math]，令[math]\displaystyle{ r_i=(\boldsymbol{a}\cdot i+\boldsymbol{b})\bmod p }[/math]。

根据前述定理一可知：方案II中构造的[math]\displaystyle{ r_0,r_1,\ldots,r_{k-1}\in[p] }[/math]为[math]\displaystyle{ [p] }[/math]上均匀分布的两两独立的随机数。事实上仅使用了[math]\displaystyle{ \boldsymbol{a},\boldsymbol{b}\in[p] }[/math]这两个相互独立的随机种子。

在输入[math]\displaystyle{ x }[/math]满足[math]\displaystyle{ f(x)=1 }[/math]的假设下，对于每个[math]\displaystyle{ 0\le i\le k-1 }[/math]都有[math]\displaystyle{ \Pr(\mathcal{A}(x,r_i)=1)\ge 1-\epsilon }[/math]（因为[math]\displaystyle{ r_i\in[p] }[/math]均匀分布）。我们将分析概率[math]\displaystyle{ \Pr(\mathcal{A}^k(x,r_0,r_1,\ldots,r_{k-1})=0) }[/math]的上界。

对于每个[math]\displaystyle{ i=0,1,\ldots, k-1 }[/math]，定义伯努利随机变量[math]\displaystyle{ X_i\in\{0,1\} }[/math]为[math]\displaystyle{ X_i=\mathcal{A}(x,r_i) }[/math]，并且定义[math]\displaystyle{ X=\sum_{i=0}^{k-1}X_i }[/math]。因此有：

[math]\displaystyle{ \Pr\left(\mathcal{A}^k(x,r_0,r_1,\ldots,r_{k-1})=0\right)=\Pr\left(\bigcap_{i=0}^{k-1}(\mathcal{A}(x,r_i)=0)\right) =\Pr(X=0) }[/math]

而根据切比雪夫不等式：

[math]\displaystyle{ \Pr(X=0)= \Pr(\mathbb{E}[X]-X=\mathbb{E}[X])\le \Pr(|X-\mathbb{E}[X]|\ge\mathbb{E}[X])\le\frac{\mathbf{Var}[X]}{\mathbb{E}[X]^2} }[/math].

接下来，我们将具体计算[math]\displaystyle{ {\mathbf{Var}[X]}/{\mathbb{E}[X]^2} }[/math]的上界。

根据定义，[math]\displaystyle{ X_0,X_1,\ldots,X_{k-1} }[/math]是两两独立的伯努利变量，而且：

[math]\displaystyle{ \mathbb{E}[X_i] =\Pr(\mathcal{A}(x,r_i)=1)\ge 1-\epsilon }[/math].

于是根据期望的线性：

[math]\displaystyle{ \mathbb{E}[X] =\sum_{i=0}^{k-1}\mathbb{E}[X_i]\ge (1-\epsilon)k }[/math].

注意到作为伯努利变量[math]\displaystyle{ X_0,X_1,\ldots,X_{k-1}\in\{0,1\} }[/math]，有[math]\displaystyle{ X_i^2=X_i }[/math]；同时由于两两独立性：

[math]\displaystyle{ \mathbf{Var}[X] = \sum_{i=0}^{k-1}\mathbf{Var}[X_i] = \sum_{i=0}^{k-1}(\mathbb{E}[X_i^2]-\mathbb{E}[X_i]^2) \le \sum_{i=0}^{k-1}\mathbb{E}[X_i^2] = \sum_{i=0}^{k-1}\mathbb{E}[X_i] = \mathbb{E}[X] }[/math].

综上，

[math]\displaystyle{ \Pr\left(\mathcal{A}^k(x,r_0,r_1,\ldots,r_{k-1})=0\right)=\Pr(X=0)\le \frac{\mathbf{Var}[X]}{\mathbb{E}[X]^2}\le \frac{1}{\mathbb{E}[X]}\le\frac{1}{(1-\epsilon)k} }[/math].

因此，方案II仍花费[math]\displaystyle{ k }[/math]倍的计算时间代价，却将单侧概率误差从[math]\displaystyle{ \epsilon }[/math]降至[math]\displaystyle{ \frac{1}{(1-\epsilon)k} }[/math]。同方案I能达到的几何衰减的[math]\displaystyle{ \epsilon^k }[/math]误差相比，这似乎并不理想，但别忘了：与方案I中使用了[math]\displaystyle{ k }[/math]个相互独立的随机种子不同，方案II中采用的“两点采样”仅需两个随机种子，在随机性的开销上更有效率。

概率论与数理统计 (Spring 2023)/Two-point sampling

利用线性同余方程构造两两独立的随机变量

两点采样（Two-point sampling）

Navigation menu

概率论与数理统计 (Spring 2023)/Two-point sampling

利用线性同余方程构造两两独立的随机变量

两点采样（Two-point sampling）

Navigation menu

Search