r - data.table或dplyr-数据操作

标签 r data.table plyr data-manipulation dplyr

我有以下数据

Date           Col1       Col2
2014-01-01     123        12
2014-01-01     123        21
2014-01-01     124        32
2014-01-01     125        32
2014-01-02     123        34
2014-01-02     126        24
2014-01-02     127        23
2014-01-03     521        21
2014-01-03     123        13
2014-01-03     126        15

现在，我想为每个日期计算Col1中的唯一值(在上一个日期中没有重复)，然后将其添加到上一个计数中。例如，

Date           Count
2014-01-01       3 i.e. 123,124,125
2014-01-02       5 (2 + above 3) i.e. 126, 127
2014-01-03       6 (1 + above 5) i.e. 521 only

最佳答案

library(dplyr)
df %.% 
  arrange(Date) %.% 
  filter(!duplicated(Col1)) %.% 
  group_by(Date) %.% 
  summarise(Count=n()) %.% # n() <=> length(Date)
  mutate(Count = cumsum(Count))
# Source: local data frame [3 x 2]
# 
#         Date Count
# 1 2014-01-01     3
# 2 2014-01-02     5
# 3 2014-01-03     6

library(data.table)
dt <- data.table(df, key="Date")
dt <- unique(dt, by="Col1")
(dt <- dt[, list(Count=.N), by=Date][, Count:=cumsum(Count)])
#          Date Count
# 1: 2014-01-01     3
# 2: 2014-01-02     5
# 3: 2014-01-03     6

或者

dt <- data.table(df, key="Date")
dt <- unique(dt, by="Col1")
dt[, .N, by=Date][, Count:=cumsum(N)]

为了在这样的链接操作中方便起见，.N自动命名为N(无点)，因此，如果需要，可以在下一个操作中将.N和N一起使用。

关于r - data.table或dplyr-数据操作，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/21416918/

上一篇：arrays - PL/SQL关联数组验证索引是否存在

下一篇：r - 如何访问 R 中嵌套列表中的特定命名列表

相关文章：

r - 添加图例以指示形状

r - 为什么在添加新列时会复制data.table？

r - 使用 plyr、doMC 和 summarise() 处理非常大的数据集？

r - 在 lapply/ldply 的列表中使用对象名称

r - 将t.test应用于大型矩阵的每一列的最快方法是什么？

r - r 中的样本大小和功率计算可以作为 SAS 中 proc power 的可行替代方案吗？

r - 使用Dockerfile进行Dockerizing Shiny-app

r - 有效查找数据表的第一个非零元素(对应列)

R data.table ':=' 在直接调用中工作，但包中的相同功能失败

r - 用 R 合并 .csv 文件