我有一个按日期排列的因素和相应值的表格,但我还想按排名而不是因素来显示数据。在第一步中,我确定每个排名的相应因子(输出中的 .fact 列),在第二步中,我使用该因子确定每个排名的相应值(输出中的 .val2 列)。实际数据集包含更多日期和因素(名称和计数各不相同)。每一行对应于特定日期每个因素的模型预测排名,数值数据为实现值。
下面的代码可以工作,但是是否有更有效(更快)的方法来完成下面的按行查找操作?我读过的许多 data.table
建议都不鼓励使用 .SD[, with=FALSE]
,但我还没有找到其他解决方案。
library(data.table)
dt = data.table(Date = c("1/31/2013", "2/28/2013", "3/31/2013",
"4/30/2013", "5/31/2013"),
A.rnk = c(5L, 2L, 2L, 3L, 3L),
B.rnk = c(4L, 3L, 1L, 2L, 5L),
C.rnk = c(3L, 1L, 4L, 1L, 1L),
D.rnk = c(2L, 4L, 3L, 5L, 2L),
E.rnk = c(1L, 5L, 5L, 4L, 4L),
A.val = rnorm(5), B.val = rnorm(5),
C.val = rnorm(5), D.val = rnorm(5),
E.val = rnorm(5))
nms = c("A", "B", "C", "D", "E")
rnks = as.character(1:5)
# determine the factor (A,B,C) for each rank, by date
dt = dt[, c(rnks):={
cols = .SD[, paste0(nms, ".rnk"), with=FALSE]
cols = names(cols)[order(cols)]
as.list(stringr::str_extract(cols, stringr::perl(".{1}(?=.rnk)")))
}, by=Date]
# determine the factor value (val) for each rank, by date
dt[, paste0(rnks, ".val2"):=
.SD[, paste0(.SD[, rnks, with=FALSE], ".val"), with=FALSE], by=Date]
最佳答案
这是一个完整的重写。根据您的要求,最好使用长格式的数据,如下所示。它使用reshape2
包,其中具有melt
和dcast
函数分别将数据转换为长格式和宽格式。
Note that faster versions of
melt
anddcast
are implemented (in C) in the current development version (1.8.11) ofdata.table
. So, after the next release ofdata.table
, you can use the same code, but you don't have to convert it back to adata.table
(done usingas.dat.table
shown below) after themelt
anddcast
step+
it'll be a lot faster.
现在讨论解决方案:
# loading packages
require(data.table)
require(reshape2)
# long format on just the .rnk columns
dt.m <- as.data.table(melt(dt, id="Date", measure=2:6))
setnames(dt.m, c("Date", "var1", "val1"))
dt.m[, c("var2", "val2") := as.data.table(melt(dt, id="Date",
measure=7:11))[, list(variable, value)]]
# sort by date column by reference
setkey(dt.m, Date)
# here you can alternatively use `order`, but `fastorder` is well, faster
oo <- data.table:::fastorder(as.list(dt.m)[c("Date","val1")])
dt.m[, val3 := rep(nms, length.out=length(oo))[oo]]
dt.m[, val4 := val2[val1], by=Date]
ans1 <- as.data.table(dcast(dt.m, Date ~ var1, value.var="val3"))[, Date := NULL]
ans2 <- as.data.table(dcast(dt.m, Date ~ var2, value.var="val4"))[, Date := NULL]
setnames(ans1, rnks)
setnames(ans2, paste(rnks, ".val2", sep=""))
cbind(dt, ans1, ans2)
关于使用 data.table 在 R 中进行行查找,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21385573/