r - 将 plm 拟合值合并到数据集

我正在使用 plm 处理固定效应回归模型。

模型如下所示:

FE.model <-plm(fml, data = data.reg2,
           index=c('Site.ID','date.hour'), # cross section ID and time series ID
           model='within', #coefficients are fixed
           effect='individual')
summary(FE.model)

“fml”是我之前定义的公式。我有很多自变量，所以这使它更有效率。

我想要做的是获取我的拟合值(我的 yhats)并将它们加入我的基础数据集；数据.reg2

我能够使用此代码获得拟合值:

 Fe.model.fitted <- FE.model$model[[1]] - FE.model$residuals

但是，这只给了我一个仅包含拟合值的列向量 - 我无法将它加入我的基础数据集。

或者，我尝试过这样的事情:

 Fe.model.fitted <- cbind(data.reg2, resid=resid(FE.model), fitted=fitted(FE.model))

但是，我得到了这个错误:

 Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class ""pseries"" to a data.frame

还有其他方法可以在我的基础数据集中获取我的拟合值吗？或者有人可以解释我遇到的错误以及修复它的方法吗？

我应该注意，我不想根据我的 beta 手动计算 yhats。我对该选项有太多的自变量，并且我定义的公式(fml)可能会改变，因此该选项不会有效。

非常感谢!!

最佳答案

将 plm 拟合值合并回原始数据集需要一些中间步骤 - plm 删除任何缺少数据的行，据我所知，plm 对象不包含索引信息。数据的顺序不保留——请参阅 plm 的作者之一 Giovanni Millo 在 this thread 中的评论:

"...the input order is not always preserved: observations are always reordered by (individual, time) internally, so that the output you get is ordered accordingly..."

简单的步骤:

从估计的 plm 对象中获取拟合值。它是单个向量，但条目已命名。名称与索引中的位置相对应。
使用 index() 函数获取索引。它可以返回单个索引和时间索引。请注意，索引可能包含比拟合值更多的行，以防因缺失数据而删除行。 (也可以直接从原始数据生成索引，但我没有看到 plm 返回的内容中保留数据原始顺序的 promise 。)
合并到原始数据中，从索引中查找 id 和 time 值。

下面提供了示例代码。有点长，但我试图发表评论。代码没有优化，我的意图是明确列出这些步骤。另外，我使用的是 data.tables 而不是 data.frames。

library(data.table); library(plm)

### Generate dummy data. This way we know the "true" coefficients
set.seed(100)
n <- 500 # Run with more data if you want to get closer to the "true" coefficients
DT <- data.table(CJ(id = c("a","b","c","d","e"), time = c(1:(n / 5))))
DT[, x1 := rnorm(n)]
DT[, x2 := rnorm(n)]
DT[, y  := x1 + 2 * x2 + rnorm(n) / 10]

setkey(DT, id, time)
# # Make it an unbalanced panel & put in some NAs
DT <- DT[!(id == "a" & time == 4)]
DT[.("a", 3), x2 := as.numeric(NA)]
DT[.("d", 2), x2 := as.numeric(NA)]

str(DT)

### Run the model -- both individual and time effects; "within" model
summary(PLM <- plm(data = DT, id = c("id", "time"), formula = y ~ x1 + x2, model = "within", effect = "twoways", na.action = "na.omit"))

### Merge the fitted values back into the data.table DT
# Note that PLM$model$y is shorter than the data, i.e. the row(s) with NA have been dropped
cat("\nRows omitted (due to NA): ", nrow(DT) - length(PLM$model$y))

# Since the objects returned by plm() do not contain the index, need to generate it from the data
# The object returned by plm(), i.e. PLM$model$y, has names that point to the place in the index
# Note: The index can also be done as INDEX <- DT[, j = .(id, time)], but use the longer way with index() in case plm does not preserve the order
INDEX <- data.table(index(x = pdata.frame(x = DT, index = c("id", "time")), which = NULL)) # which = NULL extracts both the individual and time indexes
INDEX[, id := as.character(id)]
INDEX[, time := as.integer(time)] # it is returned as a factor, convert back to integer to match the variable type in DT

# Generate the fitted values as the difference between the y values and the residuals
if (all(names(PLM$residuals) == names(PLM$model$y))) { # this should not be needed, but just in case...
    FIT <- data.table(
        index   = as.integer(names(PLM$model$y)), # this index corresponds to the position in the INDEX, from where we get the "id" and "time" below
        fit.plm = as.numeric(PLM$model$y) - as.numeric(PLM$residuals)
    )
}

FIT[, id   := INDEX[index]$id]
FIT[, time := INDEX[index]$time]
# Now FIT has both the id and time variables, can match it back into the original dataset (i.e. we have the missing data accounted for)
DT <- merge(x = DT, y = FIT[, j = .(id, time, fit.plm)], by = c("id", "time"), all = TRUE) # Need all = TRUE, or some data from DT will be dropped!

关于r - 将 plm 拟合值合并到数据集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23143428/

r - 将 plm 拟合值合并到数据集

上一篇：git - git 缓存它的结果吗？

下一篇：macos - 如何将其他 NuGet 包源添加到 Mac OS X 上的 Visual Studio Code？