R 编程/二项式模型

在本节中，我们介绍了二项式模型。我们有一个二进制的输出结果和一组解释变量。

这种模型可以使用线性概率模型进行分析。但是，这种模型对于伯努利分布参数的一个缺点是，除非对 $\beta$ 进行限制，否则估计的系数可能意味着概率超出单位区间 $[0,1]$ 。因此，更常使用诸如Logit 模型或Probit 模型之类的模型。如果你想估计线性概率模型，可以查看线性模型页面。

Logit 模型

模型的形式为： $y_{i}\sim Bernoulli(\pi _{i})$ ，逆链接函数为： $\pi _{i}={\frac {exp(x_{i}'\beta )}{(1+exp(x_{i}'\beta ))}}$ 。可以使用最大似然估计或贝叶斯方法进行估计。

虚假数据模拟

> x <- 1 + rnorm(1000,1) 
> xbeta <- -1  + (x* 1)
> proba <- exp(xbeta)/(1 + exp(xbeta))
> y <- ifelse(runif(1000,0,1) < proba,1,0)
> table(y)
> df <- data.frame(y,x)

最大似然估计

估计 Logit 模型的标准方法是glm()函数，使用 family 为binomial以及 link 为logit.
lrm()(Design) 是逻辑回归模型的另一种实现。
Zelig 包^[1]中也包含了实现。

在这个例子中，我们模拟了一个包含一个连续预测变量的模型，并使用glm()函数来估计该模型。

> res <- glm(y ~ x , family  = binomial(link=logit))
> summary(res) # results
> confint(res) # confindence intervals
> names(res) 
> exp(res$coefficients) # odds ratio
> exp(confint(res)) # Confidence intervals for odds ratio (delta method)
> predict(res) # prediction on a linear scale
> predict(res, type = "response") # predicted probabilities
> plot(x, predict(res, type = "response")) # plot the predicted probabilities

Zelig

Zelig' 包使计算所有感兴趣的量变得容易。

我们开发了一个新的例子。首先，我们模拟了一个包含两个连续解释变量的新数据集，然后使用zelig()函数，使用model = "logit"选项来估计模型。

然后，我们观察在 x1 和 x2 均值下的 y 的预测值。
然后，我们观察在 x1 = 0 和 x2 = 0 时的预测值。
我们还观察了当 x1 从第 3 个四分位数变为第 1 个四分位数时会发生什么。

> x1 <- 1 + rnorm(1000)
> x2 <- -1 + x1 + rnorm(1000)
> xbeta <- -1  + x1 + x2
> proba <- exp(xbeta)/(1 + exp(xbeta))
> y <- ifelse(runif(1000,0,1) < proba,1,0)
> mydat <- data.frame(y,x1,x2)
> table(y)
> 
> z.out <- zelig(y ~ x1 + x2, model = "logit", data = mydat) # estimating the model
> summary(z.out)
> x.out <- setx(z.out, x1 = mean(x1), x2 = mean(x2)) # setting values for the explanatory variables
> s.out <- sim(z.out, x = x.out) # simulating the quantities of interest
> summary(s.out)
> plot(s.out) # plot the quantities of interest

> # the same with other values
> x.out <- setx(z.out, x1 = 0, x2 = 0)
> s.out <- sim(z.out, x = x.out)
> summary(s.out)

> # What happens if x1 change from the 3rd quartile to the 1st quartile ? 
> x.high <- setx(z.out, x1 = quantile(mydat$x1,.75), x2 = mean(mydat$x2)) 
> x.low <- setx(z.out, x1 = quantile(mydat$x1,.25), x2 = mean(x2)) 
> s.out2<-sim(z.out, x=x.high, x1=x.low) 
> plot(s.out2)

verification 包中的 ROC 曲线。
Zelig 包含rocplot()函数来估计该模型。

请查看UCLA 统计计算示例

贝叶斯估计

bayesglm()函数，使用 arm 包。
MCMClogit()函数，使用 MCMCpack 包来进行 Logit 模型的贝叶斯估计。

> # Data generating process
> x <- 1 + rnorm(1000,1) 
> xbeta <- -1  + (x* 1)
> proba <- exp(xbeta)/(1 + exp(xbeta))
> y <- ifelse(runif(1000,0,1) < proba,1,0)
> table(y)
> 
> library(MCMCpack)
> res <- MCMClogit(y ~ x)
> summary(res)

> library("arm")
> res <- bayesglm(y ~ x, family = binomial(link=logit))
> summary(res)

Probit 模型

Probit 模型是一种二元模型，其中我们假设链接函数是正态分布的累积分布函数。

我们模拟虚假数据。首先，我们在任何分布中（这无关紧要）抽取两个随机变量 x1 和 x2。然后，我们将 xbeta 作为 x1 和 x2 的线性组合来创建向量。我们将链接函数应用于该向量，并将二元变量 y 作为伯努利随机变量来抽取。

> x1 <- 1 + rnorm(1000)
> x2 <- -1 + x1 + rnorm(1000)
> xbeta <- -1  + x1 + x2
> proba <- pnorm(xbeta)
> y <- ifelse(runif(1000,0,1) < proba,1,0)
> mydat <- data.frame(y,x1,x2)
> table(y)

最大似然

我们可以使用glm()函数，使用family=binomial(link=probit)选项，或者使用 sampleSelection 包中的probit()函数，它是前者的包装器。

> res <- glm(y ~ x1 + x2 , family = binomial(link=probit), data = mydat)
> summary(res)
> 
> library("sampleSelection")
> probit(y ~ x1 + x2, data = mydat)
> summary(res)

贝叶斯估计

MCMCprobit()(MCMCpack)

> library("MCMCpack")
> post <- MCMCprobit(y ~ x1 + x2 , data = mydat)
> summary(post)
> plot(post)

另请参见

UCLA 统计计算网站^[2]上有一个关于使用 R 的 Probit 模型的示例。

半参数模型

Klein 和 Spady 估计器^[3]在 np 包^[4]中实现（请查看 npindex()，使用method = "kleinspady"选项）。

参考文献

↑ Kosuke Imai, Gary King 和 Olivia Lau。2008。“logit：二元因变量的逻辑回归”在 Kosuke Imai、Gary King 和 Olivia Lau，“Zelig：每个人都可以使用的统计软件”中，http://gking.harvard.edu/zelig
↑ UCLA 统计计算 Probit 示例 http://www.ats.ucla.edu/stat/R/dae/probit.htm
↑ Klein, R. W. 和 R. H. Spady（1993），“用于二元响应模型的有效半参数估计器”，Econometrica，61，387-421。
↑ Tristen Hayfield 和 Jeffrey S. Racine（2008）。非参数计量经济学：np 包。统计软件杂志 27(5)。URL http://www.jstatsoft.org/v27/i05/。

前一节：分位数回归

索引

下一节：多项式有序模型

[zeliglogit-1] Kosuke Imai, Gary King 和 Olivia Lau。2008。“logit：二元因变量的逻辑回归”在 Kosuke Imai、Gary King 和 Olivia Lau，“Zelig：每个人都可以使用的统计软件”中，http://gking.harvard.edu/zelig

[uclaprobit-2] UCLA 统计计算 Probit 示例 http://www.ats.ucla.edu/stat/R/dae/probit.htm

[3] Klein, R. W. 和 R. H. Spady（1993），“用于二元响应模型的有效半参数估计器”，Econometrica，61，387-421。

[4] Tristen Hayfield 和 Jeffrey S. Racine（2008）。非参数计量经济学：np 包。统计软件杂志 27(5)。URL http://www.jstatsoft.org/v27/i05/。

[1]

[2]

[3]

[4]