R数据表: How to sum variables by group based on a condition?

标签 r dataframe data.table aggregate

假设我有以下 R data.table(尽管我也很高兴使用 base R,data.frame)

library(data.table)

dt = data.table(Category=c("First","First","First","Second","Third", "Third", "Second"), Frequency=c(10,15,5,2,14,20,3), times = c(0, 0, 0, 3, 3, 1))

> dt
   Category Frequency times
1:    First        10     0
2:    First        15     0
3:    First         5     0
4:   Second         2     3
5:    Third        14     3
6:    Third        20     1
7:   Second         3     0

如果我想按类别对频率求和,我会使用以下内容:

data[, sum(Frequency), by = Category]

但是,如果且仅当 times 非零且不等于 时,假设我想按 CategoryFrequency 求和>不适用?

如何根据单独列的值使此总和成为条件?

编辑:为显而易见的问题道歉。快速补充:如果某列的元素是字符串怎么办?

例如

> dt
   Category Frequency times
1:    First        ten    0
2:    First        ten    0
3:    First        five   0
4:   Second        five   3
5:    Third        five   3
6:    Third        five   1
7:   Second        ten    0

Sum() 不会计算 105

的频率

最佳答案

记住data.table的逻辑:dt[i, j, by],即取dt,子集行使用i,然后计算按by分组的j

dt[times != 0 & !is.na(times), sum(Frequency), by = Category]
   Category V1
1:   Second  2
2:    Third 34

关于R数据表: How to sum variables by group based on a condition?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45679883/

相关文章:

RDSTK : Reverse geocode lat/lon to city (using coordinates2politics)

python - 将多个数据帧合并为一个多索引数据帧

R数据.表: replace missing values by group by value depending on number of missing values in group

r - 如何使用 ifelse 语句通过 data.table 语法按组获取平均值?

r - 突变多个/连续的列(使用dplyr或base R)

R过渡图

重命名数据框中的观察结果

r - 如何使用 mutate_at 和嵌套的 ifelse 语句自动重新编码许多变量?

r - R中的xlsx包将数字数据帧转换为xlsx文件中的文本

R,深拷贝与浅拷贝,按引用传递