R 编程/非参数方法

本页介绍了一组非参数方法，包括累积分布函数 (CDF) 的估计、使用直方图和核方法的概率密度函数 (PDF) 的估计，以及灵活回归模型的估计，例如局部回归和广义加性模型。

要了解非参数方法的介绍，您可以参考以下书籍或讲义。

非参数计量经济学：入门 由 Jeffrey S. Racine 撰写^[1].
Li 和 Racine 的手册，非参数计量经济学^[2].
Larry Wasserman 所有非参数统计^[3]

经验分布函数

估计经验 CDF 的最简单方法是使用rank()和length()函数。
ecdf()计算经验累积分布函数。
ecdf.ksCI()(sfsmisc) 绘制具有置信区间的经验分布函数。

> N <- 1000
> x <- rnorm(N)
> edf <- rank(x)/length(x)
> plot(x,edf)
> plot(ecdf(x),xlab = "x",ylab = "Distribution of x")
> grid()
> library("sfsmisc")
> ecdf.ksCI(x1)

密度估计

直方图

hist()是绘制直方图的标准函数。如果您将直方图存储为对象，则估计的参数将返回到此对象中。

> x <- rnorm(1000)
> hist(x, probability = T) # The default uses Sturges method.
> # Sturges, H. A. (1926) The choice of a class interval.
> # Journal of the American Statistical Association 21, 65–66. 
> hist(x, breaks = "Sturges", probability = T)
> 
> # Freedman, D. and Diaconis, P. (1981) On the histogram as a density estimator: L_2 theory.
> # Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57, 453–476. 
> # (n^1/3 * range)/(2 * IQR).
> hist(x, breaks = "FD", probability = T)
> 
> # Scott, D. W. (1979). On optimal and data-based histograms. Biometrika, 66, 605–610. 
> # ceiling[n^1/3 * range/(3.5 * s)].
> hist(x, breaks = "scott", probability = T)
> 
> # Wand, M. P. (1995). Data-based choice of histogram binwidth.
> # The American Statistician, 51, 59–64. 
> library("KernSmooth")
> h <- dpih(x)
> bins <- seq(min(x)-h, max(x)+h, by=h)
> hist(x, breaks=bins, probability = T)

也可以选择断点。

> x <- rnorm(1000)
> hist(x, breaks = seq(-4,4,.1))

n.bins()(car 包) 包含几种方法来计算直方图的箱体数量。
histogram()(lattice)
truehist()(MASS)
hist.scott()(MASS) 绘制使用 Scott 或 Freedman–Diaconis 公式自动选择箱体宽度的直方图。
histogram 包。

核密度估计

density()估计向量的核密度。
- 使用以下方法选择带宽选择方法：bw.
- 使用以下方法检查带宽选择的敏感性：adjust. 默认值为 1。建议查看adjust=.5和adjust=2.

> x <- rnorm(10^3)
> plot(density(x,bw = "nrd0", adjust = 1, kernel = "gaussian"), col = 1)
> lines(density(x,bw = "nrd0", adjust = .5, kernel = "gaussian"), col = 2)
> lines(density(x,bw = "nrd0", adjust = 2, kernel = "gaussian"), col = 3)
> legend("topright", legend = c("adjust = 1", "adjust = .5", "adjust = 2"), col = 1:3, lty = 1)

- 使用以下方法选择核函数：kernel : "gaussian", "epanechnikov", "rectangular", "triangular", "biweight", "cosine", "optcosine"。

> x <- rnorm(10^3)
> plot(density(x,bw = "nrd0", adjust = 1, kernel = "gaussian"), col = 1)
> lines(density(x,bw = "nrd0", adjust = 1, kernel = "epanechnikov"), col = 2)
> lines(density(x,bw = "nrd0", adjust = 1, kernel = "rectangular"), col = 3)
> lines(density(x,bw = "nrd0", adjust = 1, kernel = "triangular"), col = 3)
> legend("topright", legend = c("gaussian", "epanechnikov", "rectangular",  "triangular"), col = 1:4, lty = 1)

tkdensity()(sfsmisc) 是一个很好的函数，允许您使用方便的图形用户界面动态选择核和带宽。这是一种检查带宽和/或核选择对密度估计的敏感性的好方法。

> x  <- rnorm(10^3)
> library("sfsmisc")
> tkdensity(x)

kde2d()(MASS) 估计双变量核密度。

> N <- 1000
> x <- rnorm(N)
> y <- 1 + x^2 + rnorm(N)
> dd <-  kde2d(y,x) # estimate the bivariate kernel
> contour(dd) # plot the bivariate density
> image(dd) # another plot the bivariate density

示例

局部回归

loess()是局部线性回归的标准函数。
lowess()类似于loess()但没有用于回归 y ~ x 的标准语法。这是 loess 的祖先（具有不同的默认值！）。
ksmooth()(stats) 计算 Nadaraya–Watson 核回归估计。
locpoly()(KernSmooth 包)
npreg()(np 包)
locpol 计算局部多项式估计量
locfit 局部回归、似然和密度估计

示例

广义加性半参数模型 (GAM)

gam()(gam)
gam()(mgcv)

> N <- 10^3
> u <- rnorm(N)
> x1 <- rnorm(N)
> x2 <- rnorm(N) + x1
> y <- 1 + x1^2 + x2^3 + u
> 
> library(gam)
> g1 <- gam(y ~ x1 + x2 ) # Standard linear model
> par(mfrow=c(1,2))
> plot(g1, se = T)
> 
> g1 <- gam(y ~ s(x1) + x2 ) # x1 is locally estimated
> par(mfrow=c(1,2))
> plot(g1, se = T)
> 
> g1 <- gam(y ~ s(x1) + s(x2) ) # x1 and x2 are locally estimated
> par(mfrow=c(1,2))
> plot(g1, se = T)
> 
> library(mgcv)
> g1 <- gam(y ~ s(x1) + s(x2) ) # x1 and x2 are locally estimated
> par(mfrow=c(1,2))
> plot(g1, se = T)

参考文献

↑ Jeffrey S. Racine 非参数计量经济学：入门 http://socserv.mcmaster.ca/racine/ECO0301.pdf 以及 R 代码示例 http://socserv.mcmaster.ca/racine/primer_code.zip
↑ Qi Li, Jeffrey S. Racine, 非参数计量经济学，普林斯顿大学出版社 - 2007 年
↑ Wasserman, Larry, "所有非参数统计"，施普林格 (2007) (ISBN: 0387251456)

前一个：引导

索引

下一个：分位数回归

[primer-1] Jeffrey S. Racine 非参数计量经济学：入门 http://socserv.mcmaster.ca/racine/ECO0301.pdf 以及 R 代码示例 http://socserv.mcmaster.ca/racine/primer_code.zip

[liracine-2] Qi Li, Jeffrey S. Racine, 非参数计量经济学，普林斯顿大学出版社 - 2007 年

[wasserman-3] Wasserman, Larry, "所有非参数统计"，施普林格 (2007) (ISBN: 0387251456)

[1]

[2]

[3]