r - dplyr:子集、总结和变异新函数的工作流程

标签 r function dplyr summarize

我正在尝试找出最有效的方法来实现一系列目标,以对数据进行分组、汇总列并根据摘要更改新列。

通过下面的示例数据,我想要:

  1. 变异一个新列“sum”,它将是“count”、group_by(site, trmt, id,species)的总和
  2. 计算每个物种的相对丰度,group_by(id)。

这篇文章几乎可以帮助我,但我并不想总结(跨())多列:dplyr: group_by, sum various columns, and apply a function based on grouped row sums?

您将如何使用 dplyr 中的管道来解决此问题,以从“df_have”到“df_want”?

谢谢!

site <- c("X", "Y", "Y", "X", "X", "X", "Y", "X", "Y", "X", "Y", "Y", "X", "X", "X", "Y", "X", "Y")
trmt <- c("yes", "yes", "no", "no", "yes", "no", "no", "yes", "yes", "yes", "yes", "no", "no", "yes", "no", "no", "yes", "yes")
id <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9)
species <- c("a", "b", "a", "c", "d", "a", "e", "b", "d", "a", "b", "m", "c", "p", "a", "q", "r", "d")
count <- c(28, 17, 7, 8, 2, 9, 1, 5, 3, 12, 4, 18, 3, 30, 12, 21, 18, 6)
extra <- c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B")


df_have <- cbind(site, trmt, id, species, count, extra) 
df_have <- as.data.frame(df_have)
df_have


site1 <- c("X", "Y", "Y", "X", "X", "Y", "Y",  "X", "X", "Y" )
trmt1 <- c("yes", "yes", "no", "yes", "no", "no", "no", "yes", "yes", "yes" )
id1 <- c(1, 2, 3, 3, 4, 5, 5, 6, 7, 7, 8, 8, 9)
species1 <- c("a", "b", "a", "m", "c", "d", "p", "a", "e", "q", "b", "r", "d" )
sum <- c(40, 21, 7, 18, 11, 2, 30, 21, 1, 21, 5, 18, 9)
relabund <- c(100, 100, 38.9, 61.1, 100, 6.25, 93.75, 100, 4.54, 95.45, 27.74, 78.26, 100)

df_want <- cbind(site1, trmt1, id1, species1, sum, relabund) 
df_want <- as.data.frame(df_want)
df_want

最佳答案

这是一个 dplyr 选项

library(dplyr)
df_have %>%
    group_by(site, trmt, id, species) %>%
    summarise(sum = sum(as.integer(count)), .groups = "drop") %>%
    group_by(id) %>%
    mutate(relabund = sum / sum(sum) * 100) %>%
    ungroup() %>%
    arrange(id, species)
## A tibble: 13 x 6
#   site  trmt  id    species   sum relabund
#   <chr> <chr> <chr> <chr>   <int>    <dbl>
# 1 X     yes   1     a          40   100   
# 2 Y     yes   2     b          21   100   
# 3 Y     no    3     a           7    28   
# 4 Y     no    3     m          18    72   
# 5 X     no    4     c          11   100   
# 6 X     yes   5     d           2     6.25
# 7 X     yes   5     p          30    93.8 
# 8 X     no    6     a          21   100   
# 9 Y     no    7     e           1     4.55
#10 Y     no    7     q          21    95.5 
#11 X     yes   8     b           5    21.7 
#12 X     yes   8     r          18    78.3 
#13 Y     yes   9     d           9   100   

最后一个 arrange() 命令只是为了匹配您的预期输出;如果顺序无关紧要,您可以省略。另请注意,count列中的数据是字符,因此我们需要先转换为整数;这可能应该在上游修复。

关于r - dplyr:子集、总结和变异新函数的工作流程,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73315935/

相关文章:

r - 根据列名在数据框中添加两列

R如何将范围内的名称限制为我明确创建的名称?

r - R 有没有办法过滤数据帧并将其拆分为新的数据帧?

python - 在函数之外使用局部变量

python - python中的函数

r - 将时间间隔划分为每小时并按比例分配值

r - 按具有重复最大值的最大组样本大小过滤数据

r - 两个或多个数据帧的平均单元格

python - 编写根据参数执行不同计算的函数的最佳方法是什么?

r - 在向量上映射一个函数