r - 转换为平衡面板数据

标签 r data.table panel reshape2

我有一个不平衡的面板,如下例所示:

test <- read.table(
text = "
A   2010-01-01  1   rdm
A   2010-01-10  2   dfg
A   2010-01-14  3   fdgfd
A   2010-02-15  4   fdgfd
A   2010-08-17  5   dg
A   2010-12-19  6   dfg
B   2009-01-01  1   dfg
B   2010-01-01  2   ydg
B   2010-01-10  3   fdgfd
B   2010-01-14  4   dfg
B   2010-02-15  5   dfg
",header=F)
library(data.table)
setDT(test)
names(test) <-  c("ID", "date", "nr", "namecol")

我想在日期方面进行平衡,即每个人(A、B 等)对于没有数据的日期都有 NA 行。我不知道每个组的最短日期或跨组的最短日期。与最大值相同,但选择等于特定日期的最大值可能会更快(与跨组计算相比)。 所需的输出是:

out <- read.table(
text = "
A   2009-01-01  NA  NA
A   2010-01-01  1   rdm
A   2010-01-10  2   dfg
A   2010-01-14  3   fdgfd
A   2010-02-15  4   fdgfd
A   2010-08-17  5   dg
A   2010-12-19  6   dfg
B   2009-01-01  1   dfg
B   2010-01-01  2   ydg
B   2010-01-10  3   fdgfd
B   2010-01-14  4   dfg
B   2010-02-15  5   dfg
B   2010-08-17  NA  NA
B   2010-12-19  NA  NA
",header=F)
setDT(out)
names(out) <-  c("ID", "date", "nr", "namecol")

我的数据集非常大,所以我相信最好在 data.table 中执行此操作(或 plyrreshape2 )或类似的合适的东西。

最佳答案

在设置key列后,我们使用数据集的唯一“ID”和“日期”进行交叉连接(CJ)作为“ID”和“日期”,然后与原始数据集进行连接

setDT(test, key = c("ID", "date"))[CJ(ID, date, unique=TRUE)]
#    ID       date nr namecol
# 1:  A 2009-01-01 NA      NA
# 2:  A 2010-01-01  1     rdm
# 3:  A 2010-01-10  2     dfg
# 4:  A 2010-01-14  3   fdgfd
# 5:  A 2010-02-15  4   fdgfd
# 6:  A 2010-08-17  5      dg
# 7:  A 2010-12-19  6     dfg
# 8:  B 2009-01-01  1     dfg
# 9:  B 2010-01-01  2     ydg
#10:  B 2010-01-10  3   fdgfd
#11:  B 2010-01-14  4     dfg
#12:  B 2010-02-15  5     dfg
#13:  B 2010-08-17 NA      NA
#14:  B 2010-12-19 NA      NA

数据

test <- structure(list(ID = c("A", "A", "A", "A", "A", "A", "B", "B", 
"B", "B", "B"), date = structure(c(14610, 14619, 14623, 14655, 
14838, 14962, 14245, 14610, 14619, 14623, 14655), class = "Date"), 
nr = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L), namecol = c("rdm", 
"dfg", "fdgfd", "fdgfd", "dg", "dfg", "dfg", "ydg", "fdgfd", 
"dfg", "dfg")), .Names = c("ID", "date", "nr", "namecol"),
 row.names = c(NA, -11L), class = "data.frame")

关于r - 转换为平衡面板数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39435623/

相关文章:

r - 如何创建POSIXct矩阵

R插入符号在训练后重命名data.table中的列

CSS 3 面板液体布局内容间距

r - 缺少误差线的分组 ggplot2 条形图

r - 如何在具有多个预测变量的混合模型中绘制随机截距和斜率?

r - libstdc++.so.6 : version `GLIBCXX_3.4.26' not found on Linux

R - 根据其他列中的组元素数量创建列

r - 在函数内使用 setDT

c# - 扩展 ASP.NET 面板

c# - 如何显示深色背景的弹出消息