R 编程/概率分布

R 编程

编辑此框

本页面回顾主要的概率分布并描述处理它们的R函数。

R有很多概率函数。

r是随机变量生成器的通用前缀，例如runif()，rnorm()。
d是概率密度函数的通用前缀，例如dunif()，dnorm()。
p是累积密度函数的通用前缀，例如punif()，pnorm()。
q是分位数函数的通用前缀，例如qunif()，qnorm()。

离散分布

本福德分布

The 本福德分布是数字首位数字的分布。它是由 Benford 1938 ^[1]和 Newcomb 1881 ^[2]提出的。

> library(VGAM)
> dbenf(c(1:9))
[1] 0.30103000 0.17609126 0.12493874 0.09691001 0.07918125 0.06694679 0.05799195 0.05115252 0.04575749

伯努利

我们可以使用伯努利抽样sample(), runif()或者rbinom()使用size = 1.

> n <- 1000
> x <- sample(c(0,1), n, replace=T)
> x <- sample(c(0,1), n, replace=T, prob=c(0.3,0.7))
> x <- runif(n) > 0.3
> x <- rbinom(n, size=1, prob=0.2)

二项式

我们可以使用rbinom()函数从二项分布中抽样，其中参数n表示要抽取的样本数量，size定义试验次数，prob定义每次试验中成功的概率。

> x <- rbinom(n=100,size=10,prob=0.5)

超几何分布

我们可以使用rhyper()函数从超几何分布中抽取n次。

> x <- rhyper(n=1000, 15, 5, 5)

几何分布

The 几何分布.

> N <- 10000
> x <- rgeom(N, .5)
> x <- rgeom(N, .01)

多项式

The 多项式分布.

> sample(1:6, 100, replace=T, prob= rep(1/6,6))

负二项式分布

The 负二项式分布是一系列伯努利事件中，在 k 次成功之前失败次数的分布。

> N <- 100000
> x <- rnbinom(N, 10, .25)

泊松分布

我们可以从泊松分布中抽取n个值，平均值由参数lambda设置。

> x <- rpois(n=100, lambda=3)

齐夫定律

单词频率的分布被称为齐夫定律。它也是城市规模分布的良好描述 ^[3]。dzipf()和pzipf()(VGAM)

> library(VGAM)
> dzipf(x=2, N=1000, s=2)

连续分布

Beta 和 Dirichlet 分布

Beta 分布
Dirichlet 在 gtools 和 MCMCpack 中

>library(gtools)
>?rdirichlet
>library(bayesm)
>?rdirichlet
>library(MCMCpack)
>?Dirichlet

柯西

我们可以使用rcauchy()函数从给定location参数 $x_{0}$ （默认值为 0）和scale参数 $\gamma$ （默认值为 1）的柯西分布中抽取n个值。

> x <- rcauchy(n=100, location=0, scale=1)

卡方分布

The 卡方分布（ $\chi ^{2}$ 分布）的分位数

> qchisq(.95,1)
[1] 3.841459
> qchisq(.95,10)
[1] 18.30704
> qchisq(.95,100)
[1] 124.3421

指数

我们可以使用rexp()函数从给定rate（默认值为 1）的指数分布中抽取n个值。

> x <- rexp(n=100, rate=1)

费希尔-斯尼德科

我们可以绘制费希尔分布（F 分布）的密度。

> par(mar=c(3,3,1,1))
> x <- seq(0,5,len=1000)
> plot(range(x),c(0,2),type="n")
> grid()
> lines(x,df(x,df1=1,df2=1),col="black",lwd=3)
> lines(x,df(x,df1=2,df2=1),col="blue",lwd=3)
> lines(x,df(x,df1=5,df2=2),col="green",lwd=3)
> lines(x,df(x,df1=100,df2=1),col="red",lwd=3)
> lines(x,df(x,df1=100,df2=100),col="grey",lwd=3)
> legend(2,1.5,legend=c("n1=1, n2=1","n1=2, n2=1","n1=5, n2=2","n1=100, n2=1","n1=100, n2=100"),col=c("black","blue","green","red","grey"),lwd=3,bty="n")

伽马分布

我们可以使用 rgamma() 函数从给定 shape 参数和 scale 参数 $\theta$ 的伽马分布中采样 n 个值。或者，可以给出 shape 参数和 rate 参数 $\beta =1/\theta$ 。

> x <- rgamma(n=10, scale=1, shape=0.4)
> x <- rgamma(n=100, scale=1, rate=0.8)

莱维分布

我们可以使用 rlevy() 函数从给定位置参数 $\mu$ （由参数 m 定义，默认值为 0）和缩放参数（由参数 s 给出，默认值为 1）的莱维分布中采样 n 个值。

> x <- rlevy(n=100, m=0, s=1)

对数正态分布

我们可以使用 rlnorm() 函数从给定 meanlog（默认值为 0）和 sdlog（默认值为 1）的对数正态分布中采样 n 个值。

> x <- rlnorm(n=100, meanlog=0, sdlog=1)

正态分布及相关分布

我们可以使用 rnorm() 函数从给定 mean（默认值为 0）和 sd（默认值为 1）的正态分布或高斯分布中采样 n 个值。

> x <- rnorm(n=100, mean=0, sd=1)

正态分布的分位数

> qnorm(.95)
[1] 1.644854
> qnorm(.975)
[1] 1.959964
> qnorm(.99)
[1] 2.326348

mvtnorm 包含多元正态分布的函数。
- rmvnorm()生成多元正态分布。

> library(mvtnorm)
> sig <- matrix(c(1, 0.8, 0.8, 1), 2, 2)
> r <- rmvnorm(1000, sigma = sig)
> cor(r) 
          [,1]      [,2]
[1,] 1.0000000 0.8172368
[2,] 0.8172368 1.0000000

帕累托分布

广义帕累托分布 dgpd()在 evd 中
dpareto(), ppareto(), rpareto(), qpareto() 在 actuar 中
VGAM 包也包含了帕累托分布的函数。

学生 t 分布

学生 t 分布的分位数

> qt(.975,30)
[1] 2.042272
> qt(.975,100)
[1] 1.983972
> qt(.975,1000)
[1] 1.962339

以下几行绘制了 t 分布的 0.975 分位数随自由度的变化情况。

curve(qt(.975,x), from = 2 , to = 100, ylab = "Quantile 0.975 ", xlab = "Degrees of freedom", main = "Student t distribution")
abline(h=qnorm(.975), col = 2)

均匀分布

我们可以使用 runif() 函数从两个值（默认值为 0 和 1）之间的均匀分布（也称为矩形分布）中采样 n 个值。

> runif(n=100, min=0, max=1)

威布尔分布

我们可以使用 rweibull() 函数从给定 shape 和 scale 参数 $\mu$ （默认值为 1）的威布尔分布中采样 n 个值。

> x <- rweibull(n=100, shape=0.5, scale=1)

极值分布及相关分布

Gumbel 分布
逻辑斯谛分布：两个 Gumbel 分布之差的分布。

plogis, qlogis, dlogis, rlogis

Fréchet dfrechet() evd
广义极值分布 dgev() evd
Gumbel dgumbel() evd
Burr, dburr, pburr, qburr, rburr 在 actuar 中

循环统计中的分布

CircStats 包包含循环统计的函数。
- dvm() 冯·米塞斯分布（也称为循环正态分布或 Tikhonov 分布）密度函数
- dtri() 三角形密度函数
- dmixedvm() 混合冯·米塞斯密度
- dwrpcauchy() 包裹的柯西密度
- dwrpnorm() 包裹的正态密度。

另请参阅

VGAM、SuppDists、actuar、fBasics、bayesm、MCMCpack 包

参考文献

↑ Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society, 78, 551–572.
↑ Newcomb, S. (1881) Note on the Frequency of Use of the Different Digits in Natural Numbers. American Journal of Mathematics, 4, 39–40.
↑ Gabaix, Xavier (August 1999). "Zipf's Law for Cities: An Explanation". Quarterly Journal of Economics 114 (3): 739–67. doi:10.1162/003355399556133. ISSN 0033-5533. http://pages.stern.nyu.edu/~xgabaix/papers/zipf.pdf.

前一页：优化

索引

下一页：随机数生成

[benford-1] Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society, 78, 551–572.

[newcomb-2] Newcomb, S. (1881) Note on the Frequency of Use of the Different Digits in Natural Numbers. American Journal of Mathematics, 4, 39–40.

[3] Gabaix, Xavier (August 1999). "Zipf's Law for Cities: An Explanation". Quarterly Journal of Economics 114 (3): 739–67. doi:10.1162/003355399556133. ISSN 0033-5533. http://pages.stern.nyu.edu/~xgabaix/papers/zipf.pdf.

[1]

[2]

[3]