我想在以下数据示例中聚合行(并汇总值):
df <- data.frame(from=c("A" ,"A", "A", "C", "C", "D", "A"),
to=c("B", "B", "B", "A", "A", "B", "D"),
values=c(5,6,2,10,2,6,3),
product=c("x","x", "x", "y", "z", "w", "w"),
year=c(1990,1991,1991,1990,1990,1991,1992))
> df
from to values product year
1 A B 5 x 1990
2 A B 6 x 1991
3 A B 2 x 1991
4 C A 10 y 1990
5 C A 2 z 1990
6 D B 6 w 1991
7 A D 3 w 1992
from
、to
、product
和 year
列的所有包含相同值/字符的行应该是聚合到一行,values
列中的值应该相加。
我尝试了以下代码:
aggregate(values~from+to+product+year, df, FUN=sum)
和
ddply(df_id, c("from", "to", "product", "year"), numcolwise(sum))
这些代码运行良好。但是,两者都更改了行的顺序(还有不太重要的列),请参见下文:
for aggregate:
from to product year values
1 A B x 1990 5
2 C A y 1990 10
3 C A z 1990 2
4 D B w 1991 6
5 A B x 1991 8
6 A D w 1992 3
and for ddply:
from to product year values
1 C A y 1990 10
2 C A z 1990 2
3 A B x 1990 5
4 A B x 1991 8
5 A D w 1992 3
6 D B w 1991 6
预期的结果应该是这样的:
from to values product year
1 A B 5 x 1990
2 A B 8 x 1991
3 C A 10 y 1990
4 C A 2 z 1990
5 D B 6 w 1991
6 A D 3 w 1992
关于如何解决这个顺序问题(至少对于行)有什么想法吗?谢谢
最佳答案
data.table 包默认保留原始分组排序:
library(data.table)
setDT(df)[, .(v = sum(values)), by=.(from,to,product,year)]
# from to product year v
# 1: A B x 1990 5
# 2: A B x 1991 8
# 3: C A y 1990 10
# 4: C A z 1990 2
# 5: D B w 1991 6
# 6: A D w 1992 3
只有 keyby=
而不是 by=
才会对组进行排序。
关于R:如何根据多列条件聚合(和汇总)df 中的行并保持先前的顺序?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33046910/