当数据具有 NA 时，使用 plm 包和双向效应进行回归

因此，我想对时间和商店使用双向效应对面板数据进行回归。如果面板是完美平衡的，它工作正常，但出于某种原因，如果不是，代码就会卡住。 (参见:https://stat.ethz.ch/pipermail/r-help/2010-May/239272.html)。

特别是我的数据本质上不是不平衡的，但它有一些 NA，所以我猜想当 plm 函数删除带有 NA 的行时它变得不平衡。我写了一个示例代码来举例说明我拥有的数据。

如果我运行这个:

set.seed(123)
library(plm)
number.of.days <- 1100
number.of.stores <- 1000
days <- sort(rep(c(1:number.of.days),number.of.stores))
stores <- rep(c(1:number.of.stores),number.of.days)

data <- cbind.data.frame(stores,days,matrix(rnorm(number.of.days*number.of.stores*7),nrow=number.of.days*number.of.stores,ncol=7))
colnames(data)[3:9] <- c('y',paste0('x',1:6))

data <- plm.data(data,c("stores","days"))  
fit <- plm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = data, index=c("stores","days"), effect="twoway", model="within")

它工作正常，因为面板是平衡的。但是，如果我创建一些 NA 值:

data$y[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x1[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x2[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x3[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x4[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x5[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x6[sample(1:number.of.days*number.of.stores,150)] <- NA

并尝试再次运行回归:

 fit <- plm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = data, index=c("stores","days"), effect="twoway", model="within")

它不起作用(代码显然永远不会停止运行)

我尝试对商店使用“个体”效应，并为时间添加一个带有虚拟变量的矩阵，但由于有 1100 天，它变得同样缓慢。

我认为这不是一个罕见的问题。有没有已知的解决方案？

谢谢

最佳答案

lfe 中的 felm 函数package 能够处理这个问题(而且也很有效)。

运行

fit2 <- felm(y ~ x1 + x2 + x3 + x4 + x5 + x6 | stores + days | 0 | stores , data = data)

对带有 NA 的数据产生一个结果。

请注意公式规范，您在其中指定要预测出哪些因素(即固定效应)。公式中的最后一个 stores 指定用于聚类标准错误的变量。有关详细信息，请参阅优秀的 felm 帮助文件和 lfe 包文档。

关于当数据具有 NA 时，使用 plm 包和双向效应进行回归，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41045485/

当数据具有 NA 时，使用 plm 包和双向效应进行回归

上一篇：apache - 将所有请求重定向到 index.php

下一篇：google-bigquery - 在范围内进行连接时 Bigquery 是否很慢？