Stata/线性模型
外观
< Stata
我们生成一个简单的假数据集
clear set obs 1000 gen u = invnorm(uniform()) gen x = invnorm(uniform()) gen y = 1 + x + u
reg y x eret list /*gives the list of all stored results */ predict yhat /*gives the predicted value of y*/ predict res, res /*gives the residuals*/
leanout 是一个简化输出的 前缀[1]。此命令不显示无用的辅助统计信息,而是关注置信区间而不是零假设检验。
ssc install leanout{{typo help inline|reason=similar to cleanout|date=September 2022}} leanout : reg y x
有时您想对同一子样本进行多元回归。这不是很明显,因为当模型中缺少某个变量时,观察值会被删除。确保使用同一子样本的一种方法是使用 'e(sample)' 命令,它返回所有使用观察值的列表。在下面的示例中,qui 将 'e(sample)' 的结果存储在变量 'samp1' 和 'samp2' 中,我们执行模型,以 'samp1==1 & samp2 == 1' 为条件。因此,我们确信这两个估计都是使用相同的观察值完成的。
. clear . set obs 1000 . gen u = invnorm(uniform()) . gen x = invnorm(uniform()) . gen y1 = 1 + x + u if uniform() < .8 . gen y2 = 1 + x + u if uniform() < .9 . qui reg y1 x . gen samp1 = e(sample) . ta samp1 . qui reg y2 x . gen samp2 = e(sample) . ta samp2 . eststo clear . eststo : qui : reg y1 x if samp1 & samp2 . eststo : qui : reg y2 x if samp1 & samp2 . esttab , star(* 0.1 ** 0.05 *** 0.01) se
以下是一个工具变量设置的 数据生成过程。u 与 x 相关联,这会导致内生性。z 与 u 独立且与 x 相关联,这使其有资格作为 x 的有效工具。
clear set obs 1000 gen u = invnorm(uniform()) gen z = invnorm(uniform()) gen x = invnorm(uniform()) + z + u gen y = 1 + 2*x + u
很容易看出标准最小二乘估计是有偏的,而 IV 估计是无偏的。
eststo clear eststo : reg y x eststo : ivreg y (x=z) esttab , se
您可以使用 overid 或 ivreg2 执行过度识别检验
clear set obs 1000 gen u = invnorm(uniform()) gen z1 = invnorm(uniform()) gen z2 = invnorm(uniform()) gen x = invnorm(uniform()) + z1 - 2*z2 + u gen y = 2*x + u ivreg y (x=z1 z2) overid ivreg2 y (x=z1 z2)
. clear . set obs 1000 . local s11 = 1 . local s12 = .5 . local s22 = 1 . local s13 = .5 . local s23 = .5 . local s33 = 1 . forvalues k = 1/3{ 2. tempvar u`k' 3. gen `u`k'' = invnorm(uniform()) 4. } . gen eta1 = `s11' * `u1' . gen eta2 = `s12' * `u1' + `s22' * `u2' . gen eta3 = `s13' * `u1' + `s23' * `u2' + `s33' * `u3' . gen x = invnorm(uniform()) . forvalues k=1/3{ 2. gen z`k' = invnorm(uniform()) 3. } . gen y1 = 1 + 2*x + z1 + eta1 . gen y2 = - 1 + x + z2 + eta2 . gen y3 = 4 + z3 + eta3 . global eq1 = "y1 x z1" . global eq2 = "y2 x z2" . global eq3 = "y3 x z3" . reg $eq1 . reg $eq2 . reg $eq3 . sureg (toto1 : $eq1) (toto2 : $eq2) (toto3 : $eq3)
- xtset
- xtreg
- xtabond
- xtabond2
- ivreg2
- xtivreg2
- ivendog
- ivhettest
- overid[检查拼写] : 过度识别检验
- xtoverid : 过度识别检验
- xttest2
- ivgmm0
- xtarsim
- xtdpd
- xtdpdsys
我们假设 。其中 f 与 x 和 z 独立,u 与 x 和 z 独立。
. clear . set obs 1000 . gen id = _n . gen f = invnorm(uniform()) . gen z = uniform() . expand 10 . gen u = invnorm(uniform()) . gen x = uniform() . gen y = 1 + x + z + f + u . eststo clear . eststo : qui : reg y x z . eststo : qui : reg y x z, robust . eststo : qui : reg y x z, cluster(id) . eststo : qui : xtreg y x z, i(id) re . eststo : qui : xtreg y x z, i(id) mle . eststo : qui : xtmixed y x z || id : , mle . esttab * , se
Layard 和 Nickel 失业率数据集。
. use http://fmwww.bc.edu/ec-p/data/macro/abdata.dta, clear (Layard & Nickell, Unemployment in Britain, Economica 53, 1986 from Ox dist)
您还可以生成假数据
clear set obs 10000 set seed 123456 gen id = _n gen f= invnorm(uniform()) forvalues t=1/5{ gen u`t' = invnorm(uniform()) } gen y1 = f/.3 + u1 forvalues t=2/5{ local z=`t'-1 gen y`t' = .7 * y`z' + f + u`t' } save wide, replace reshape long y, i(id) j(year) drop u* f tsset siren an save long, replace
很容易看出标准随机效应和固定效应模型是有偏的,但工具随机效应和固定效应模型是无偏的
eststo clear eststo : qui : xtreg y l.y, re eststo : qui : xtreg y l.y, fe eststo : qui : xtivreg y (l.y= l2.d.y) , re eststo : qui : xtivreg y (l.y= l2.y) , fd esttab ,se
eststo clear eststo : qui : xi : xtabond2 y l.y, gmmstyle(l.y, lag(2 .) equation(level)) nomata robust eststo : qui : xi : xtabond2 y l.y, gmmstyle(l.y, lag(2 .) equation(level)) ivstyle( , e(diff)) nomata robust eststo : qui : xi : xtabond2 y l.y, iv(l.y l2.y l3.y, equation(diff)) nomata robust esttab , se
- ↑ Nathaniel Beck "leanout: A prefix to regress (and similar commands) to produce less output that is more useful" Stata Journal, forthcoming http://politics.as.nyu.edu/docs/IO/2576/sj_driver.pdf