r - 累计和滞后

标签 r sum lag cumsum

我有一个很大的数据集,看起来像这样简化:

row.    member_id   entry_id    comment_count   timestamp
1       1            a              4           2008-06-09 12:41:00
2       1            b              1           2008-07-14 18:41:00
3       1            c              3           2008-07-17 15:40:00
4       2            d              12          2008-06-09 12:41:00
5       2            e              50          2008-09-18 10:22:00
6       3            f              0           2008-10-03 13:36:00

我可以使用以下代码汇总计数:
transform(df, aggregated_count = ave(comment_count, member_id, FUN = cumsum))

但是我希望累积数据中的滞后时间为1,或者我希望cumsum忽略当前行。结果应为:
row.    member_id   entry_id     comment_count  timestamp             previous_comments
1       1            a              4           2008-06-09 12:41:00        0
2       1            b              1           2008-07-14 18:41:00        4
3       1            c              3           2008-07-17 15:40:00        5
4       2            d              12          2008-06-09 12:41:00        0
5       2            e              50          2008-09-18 10:22:00        12
6       3            f              0           2008-10-03 13:36:00        0

知道如何在R中执行此操作吗?甚至滞后时间大于1?

再现性数据:
# dput(df)
structure(list(member_id = c(1L, 1L, 1L, 2L, 2L, 3L), entry_id = c("a", 
"b", "c", "d", "e", "f"), comment_count = c(4L, 1L, 3L, 12L, 
50L, 0L), timestamp = c("2008-06-09 12:41:00", "2008-07-14 18:41:00", 
"2008-07-17 15:40:00", "2008-06-09 12:41:00", "2008-09-18 10:22:00", 
"2008-10-03 13:36:00")), .Names = c("member_id", "entry_id", 
"comment_count", "timestamp"), row.names = c("1", "2", "3", "4", 
"5", "6"), class = "data.frame")

最佳答案

您可以将0用作第一个元素,并使用head(, -1)删除最后一个元素

transform(df, previous_comments=ave(comment_count, member_id, 
          FUN = function(x) cumsum(c(0, head(x, -1)))))
#  member_id entry_id comment_count           timestamp previous_comments
#1         1        a             4 2008-06-09 12:41:00                 0
#2         1        b             1 2008-07-14 18:41:00                 4
#3         1        c             3 2008-07-17 15:40:00                 5
#4         2        d            12 2008-06-09 12:41:00                 0
#5         2        e            50 2008-09-18 10:22:00                12
#6         3        f             0 2008-10-03 13:36:00                 0

关于r - 累计和滞后,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27649206/

相关文章:

r - ggplot2警告:ymin!= 0时,堆栈定义不正确

mysql - 查询求和mysql中不同表的不同列

mysql - 每次对 DISTINCT 值进行 SQL SUM

r - 在 dplyr 中使用动态位置数创建滞后/超前变量

android - Android 中的 ListView 滚动滞后

R:如何使 switch 语句失效

r - 如何获取 data.table 中每个(选定)列的前 k 值的索引

R 数据表 : How to find unknown number of empty cells directly below a specific cell and fill them with numbered strings

javascript - 计算 4x verse 容器的宽度

基于列条件的 Pyspark 计数器