R 编程/非参数方法
外观
< R 编程
本页介绍了一组非参数方法,包括累积分布函数 (CDF) 的估计、使用直方图和核方法的概率密度函数 (PDF) 的估计,以及灵活回归模型的估计,例如局部回归和广义加性模型。
要了解非参数方法的介绍,您可以参考以下书籍或讲义。
- 估计经验 CDF 的最简单方法是使用rank()和length()函数。
- ecdf()计算经验累积分布函数。
- ecdf.ksCI()(sfsmisc) 绘制具有置信区间的经验分布函数。
> N <- 1000
> x <- rnorm(N)
> edf <- rank(x)/length(x)
> plot(x,edf)
> plot(ecdf(x),xlab = "x",ylab = "Distribution of x")
> grid()
> library("sfsmisc")
> ecdf.ksCI(x1)
- hist()是绘制直方图的标准函数。如果您将直方图存储为对象,则估计的参数将返回到此对象中。
> x <- rnorm(1000)
> hist(x, probability = T) # The default uses Sturges method.
> # Sturges, H. A. (1926) The choice of a class interval.
> # Journal of the American Statistical Association 21, 65–66.
> hist(x, breaks = "Sturges", probability = T)
>
> # Freedman, D. and Diaconis, P. (1981) On the histogram as a density estimator: L_2 theory.
> # Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57, 453–476.
> # (n^1/3 * range)/(2 * IQR).
> hist(x, breaks = "FD", probability = T)
>
> # Scott, D. W. (1979). On optimal and data-based histograms. Biometrika, 66, 605–610.
> # ceiling[n^1/3 * range/(3.5 * s)].
> hist(x, breaks = "scott", probability = T)
>
> # Wand, M. P. (1995). Data-based choice of histogram binwidth.
> # The American Statistician, 51, 59–64.
> library("KernSmooth")
> h <- dpih(x)
> bins <- seq(min(x)-h, max(x)+h, by=h)
> hist(x, breaks=bins, probability = T)
也可以选择断点。
> x <- rnorm(1000)
> hist(x, breaks = seq(-4,4,.1))
- n.bins()(car 包) 包含几种方法来计算直方图的箱体数量。
- histogram()(lattice)
- truehist()(MASS)
- hist.scott()(MASS) 绘制使用 Scott 或 Freedman–Diaconis 公式自动选择箱体宽度的直方图。
- histogram 包。
- density()估计向量的核密度。
- 使用以下方法选择带宽选择方法:bw.
- 使用以下方法检查带宽选择的敏感性:adjust. 默认值为 1。建议查看adjust=.5和adjust=2.
> x <- rnorm(10^3)
> plot(density(x,bw = "nrd0", adjust = 1, kernel = "gaussian"), col = 1)
> lines(density(x,bw = "nrd0", adjust = .5, kernel = "gaussian"), col = 2)
> lines(density(x,bw = "nrd0", adjust = 2, kernel = "gaussian"), col = 3)
> legend("topright", legend = c("adjust = 1", "adjust = .5", "adjust = 2"), col = 1:3, lty = 1)
- 使用以下方法选择核函数:kernel : "gaussian", "epanechnikov", "rectangular", "triangular", "biweight", "cosine", "optcosine"。
> x <- rnorm(10^3)
> plot(density(x,bw = "nrd0", adjust = 1, kernel = "gaussian"), col = 1)
> lines(density(x,bw = "nrd0", adjust = 1, kernel = "epanechnikov"), col = 2)
> lines(density(x,bw = "nrd0", adjust = 1, kernel = "rectangular"), col = 3)
> lines(density(x,bw = "nrd0", adjust = 1, kernel = "triangular"), col = 3)
> legend("topright", legend = c("gaussian", "epanechnikov", "rectangular", "triangular"), col = 1:4, lty = 1)
- tkdensity()(sfsmisc) 是一个很好的函数,允许您使用方便的图形用户界面动态选择核和带宽。这是一种检查带宽和/或核选择对密度估计的敏感性的好方法。
> x <- rnorm(10^3)
> library("sfsmisc")
> tkdensity(x)
- kde2d()(MASS) 估计双变量核密度。
> N <- 1000
> x <- rnorm(N)
> y <- 1 + x^2 + rnorm(N)
> dd <- kde2d(y,x) # estimate the bivariate kernel
> contour(dd) # plot the bivariate density
> image(dd) # another plot the bivariate density
- loess()是局部线性回归的标准函数。
- lowess()类似于loess()但没有用于回归
y ~ x
的标准语法。这是 loess 的祖先(具有不同的默认值!)。 - ksmooth()(stats) 计算 Nadaraya–Watson 核回归估计。
- locpoly()(KernSmooth 包)
- npreg()(np 包)
- locpol 计算局部多项式估计量
- locfit 局部回归、似然和密度估计
- gam()(gam)
- gam()(mgcv)
> N <- 10^3
> u <- rnorm(N)
> x1 <- rnorm(N)
> x2 <- rnorm(N) + x1
> y <- 1 + x1^2 + x2^3 + u
>
> library(gam)
> g1 <- gam(y ~ x1 + x2 ) # Standard linear model
> par(mfrow=c(1,2))
> plot(g1, se = T)
>
> g1 <- gam(y ~ s(x1) + x2 ) # x1 is locally estimated
> par(mfrow=c(1,2))
> plot(g1, se = T)
>
> g1 <- gam(y ~ s(x1) + s(x2) ) # x1 and x2 are locally estimated
> par(mfrow=c(1,2))
> plot(g1, se = T)
>
> library(mgcv)
> g1 <- gam(y ~ s(x1) + s(x2) ) # x1 and x2 are locally estimated
> par(mfrow=c(1,2))
> plot(g1, se = T)
- ↑ Jeffrey S. Racine 非参数计量经济学:入门 http://socserv.mcmaster.ca/racine/ECO0301.pdf 以及 R 代码示例 http://socserv.mcmaster.ca/racine/primer_code.zip
- ↑ Qi Li, Jeffrey S. Racine, 非参数计量经济学,普林斯顿大学出版社 - 2007 年
- ↑ Wasserman, Larry, "所有非参数统计",施普林格 (2007) (ISBN: 0387251456)