r - 根据特定因素组合对行求和

这可能是一个愚蠢的问题，但我已经通读了克劳利关于数据帧的章节，并在互联网上搜索，但还没有做出任何工作。

这是一个类似于我的示例数据集:

> data<-data.frame(site=c("A","A","A","A","B","B"), plant=c("buttercup","buttercup",
"buttercup","rose","buttercup","rose"), treatment=c(1,1,2,1,1,1), 
plant_numb=c(1,1,2,1,1,2), fruits=c(1,2,1,4,3,2),seeds=c(45,67,32,43,13,25))
> data
  site     plant treatment plant_numb fruits seeds
1    A buttercup         1          1      1    45
2    A buttercup         1          1      2    67
3    A buttercup         2          2      1    32
4    A      rose         1          1      4    43
5    B buttercup         1          1      3    13
6    B      rose         1          2      2    25

我想做的是创建一个场景，只要存在独特的站点、植物、处理和植物 NumPy 的组合，就可以将“种子”和“水果”相加。理想情况下，这会导致行减少，但保留原始列(即我需要上面的示例看起来像这样:)

  site     plant treatment plant_numb fruits seeds
1    A buttercup         1          1      3   112
2    A buttercup         2          2      1    32
3    A      rose         1          1      4    43
4    B buttercup         1          1      3    13
5    B      rose         1          2      2    25

这个例子非常基本(我的数据集是 ~5000 行)，虽然在这里你只看到需要求和的两行，但需要求和的行数各不相同，范围从 1 到 ~45。

到目前为止，我已经尝试过 rowsum() 和 tapply() 结果非常糟糕(错误告诉我这些函数对因子没有意义)，所以如果你能指出我正确的方向，我将不胜感激!

非常感谢!

最佳答案

希望下面的代码是不言自明的。它使用基本函数“aggregate”，基本上这就是说对于站点、植物、处理和植物数量的每个独特组合，查看果实的总和和种子的总和。

# Load your data
data <- data.frame(site=c("A","A","A","A","B","B"), plant=c("buttercup","buttercup",
"buttercup","rose","buttercup","rose"), treatment=c(1,1,2,1,1,1), 
plant_numb=c(1,1,2,1,1,2), fruits=c(1,2,1,4,3,2),seeds=c(45,67,32,43,13,25)) 

# Summarize your data
aggregate(cbind(fruits, seeds) ~ 
      site + plant + treatment + plant_numb, 
      sum, 
      data = data)
#  site     plant treatment plant_numb fruits seeds
#1    A buttercup         1          1      3   112
#2    B buttercup         1          1      3    13
#3    A      rose         1          1      4    43
#4    B      rose         1          2      2    25
#5    A buttercup         2          2      1    32

行的顺序发生了变化(并按站点、工厂等进行了排序)，但希望这不是一个太大的问题。

另一种方法是使用 plyr 包中的 ddply 。

library(plyr)
ddply(data, .(site, plant, treatment, plant_numb), 
      summarize, 
      fruits = sum(fruits), 
      seeds = sum(seeds))
#  site     plant treatment plant_numb fruits seeds
#1    A buttercup         1          1      3   112
#2    A buttercup         2          2      1    32
#3    A      rose         1          1      4    43
#4    B buttercup         1          1      3    13
#5    B      rose         1          2      2    25

关于r - 根据特定因素组合对行求和，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10424608/

r - 根据特定因素组合对行求和

上一篇：applescript - AppleScript 中 If 语句的多个条件

下一篇：sql-server - 在 Arch Linux 上连接到 MS SQL Server