我有一个数据表,如下所示:
a <- data.table(color=c("Red","Blue","Red","Green","Red","Blue","Blue"), count=c(1,2,6,4,2,1,1),include=c(1,1,1,1,0,0,1))
> a
color count include
[1,] Red 1 1
[2,] Blue 2 1
[3,] Red 6 1
[4,] Green 4 1
[5,] Red 2 0
[6,] Blue 1 0
[7,] Blue 1 1
我想创建一个新的 data.table,它只有唯一的颜色值,以及每个匹配 include=1 的计数列的总和,如下所示:
colour total
[1,] Red 7
[2,] Blue 2
[3,] Green 4
我尝试了以下方法,过去我曾取得过一些成功:
> a[,include == 1,list(total=sum(count)),by=colour]
Error in `[.data.table`(a, , include == 1, list(quantity = sum(count)), :
Provide either 'by' or 'keyby' but not both
当 a
没有键,而当它有 colour
键时,会收到同样的错误消息。我也尝试过,将键设置为 colour
,如下所示:
> a[,include == 1,list(quantity=sum(count))]
Error in `[.data.table`(a, , include == 1, list(quantity = sum(count))) :
Each item in the 'by' or 'keyby' list must be same length as rows in x (7): 1
我找不到任何其他好的解决方案。非常感谢任何帮助。
最佳答案
这应该可行
library(data.table)
a <- data.table(color=c("Red","Blue","Red","Green","Red","Blue","Blue"), count=c(1,2,6,4,2,1,1),include=c(1,1,1,1,0,0,1))
a[include == 1, list(total=sum(count)), keyby = color]
color total
1: Blue 3
2: Green 4
3: Red 7
从马修编辑:
或者如果 include
取(仅)值 0
和 1
那么:
a[, list(total=sum(count*include)), keyby = color]
或者如果 include
包含其他值则:
a[, list(total=sum(count*(include==1))), keyby = color]
可能需要考虑 NA
。
通过避免向量扫描 i
可能会更有效,但这在很大程度上取决于数据大小和属性。这些只需要与最大组一样大的工作内存,而 i
中的 include==1
需要至少分配一个向量,只要 nrow(a)
.
关于r - 将满足所有可能条件的条件的所有值相加,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11935728/