我想为 data.table 计算每组的滚动加权平均值,如下所示:
DT <- data.table(group = rep(c(1,2), each = 5), value = 1:10, weight = 11:20)
group value weight
1: 1 1 11
2: 1 2 12
3: 1 3 13
4: 1 4 14
5: 1 5 15
6: 2 6 16
7: 2 7 17
8: 2 8 18
9: 2 9 19
10: 2 10 20
我在这个问题 Rolling over function with 2 vector arguments 中使用 runner
包找到了一个可行的解决方案:
my_weighted_mean <- function(data) {
weighted.mean(data[, 1], w = data[, 2])
}
DT[, weighted_mean := runner::runner(x = .SD, f = my_weighted_mean , k = 3, na_pad = TRUE), .SDcols = c("value", "weight"), by = list(group)]
但是代码很慢。
我想它应该与 frollapply
一起使用,但下面的不行,因为我不明白如何将 frollapply 与两列函数一起使用:
DT[, weighted_mean := frollapply(value, FUN = weighted.mean, n = 3, w = weights), by = list(group)]
寻找更好的性能(以及没有运行器的解决方案)
最佳答案
“frollapply with a two column function”:不是在值上滚动,而是在索引上滚动,内部函数可以根据需要使用尽可能多的列。
DT[, weighted_mean := frollapply(seq_len(.N),
FUN = function(ind) weighted.mean(value[ind], weight[ind]),
n = 3),
by = .(group)]
# group value weight weighted_mean
# <num> <int> <int> <num>
# 1: 1 1 11 NA
# 2: 1 2 12 NA
# 3: 1 3 13 2.055556
# 4: 1 4 14 3.051282
# 5: 1 5 15 4.047619
# 6: 2 6 16 NA
# 7: 2 7 17 NA
# 8: 2 8 18 7.039216
# 9: 2 9 19 8.037037
# 10: 2 10 20 9.035088
关于带有 data.table 的滚动加权平均值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68759855/