r - 按两列分组并汇总多列

标签 r dataframe dplyr tidyverse

我有一个数据框,我想按“State”和“Date”列分组,然后像这样总结其他列的值。

df

State  Female  Male   Date
------------------------------
Texas  2       2     01/01/04
Texas  3        1     01/01/04
Texas  5        4     02/01/04
Cali   1        1     05/06/05
Cali   2        1     05/06/05
Cali   3         1    10/06/05
Cali   1         2     10/06/05
NY    10         5    11/06/05
NY    11         6    12/06/05

预期结果

df

State  Female  Male   Date
------------------------------
Texas  5       3     01/01/04
Texas  5        4     02/01/04
Cali   3        2     05/06/05
Cali   4         3    10/06/05
NY    10         5    11/06/05
NY    11         6    12/06/05

我尝试使用 group by 和 summarize,但我不知道如何对 2 列做同样的事情

我的尝试

df <- df_homicides %>% 
        group_by(state) %>% 
        summarise(Female = sum(Female))

``
Thanks for your help!

最佳答案

我们可以使用summariseacross来自 dplyr版本 > = 1.00

library(dplyr)
df %>%
   group_by(State, Date) %>%
   summarise(across(everything(), sum, na.rm = TRUE), .groups = 'drop')
# A tibble: 6 x 4
#  State Date       Female  Male
#  <chr> <chr>       <int> <int>
#1 Cali  05/06/2005      3     2
#2 Cali  10/06/2005      4     3
#3 NY    11/06/2005     10     5
#4 NY    12/06/2005     11     6
#5 Texas 01/01/2004      5     3
#6 Texas 02/01/2004      5     4

或使用 aggregate来自 base R

aggregate(.~ State + Date, df, sum, na.rm = TRUE)

数据

df <-  structure(list(State = c("Texas", "Texas", "Texas", "Cali", "Cali", 
"Cali", "Cali", "NY", "NY"), Female = c(2L, 3L, 5L, 1L, 2L, 3L, 
1L, 10L, 11L), Male = c(2L, 1L, 4L, 1L, 1L, 1L, 2L, 5L, 6L), 
    Date = c("01/01/2004", "01/01/2004", "02/01/2004", "05/06/2005", 
    "05/06/2005", "10/06/2005", "10/06/2005", "11/06/2005", "12/06/2005"
    )), class = "data.frame", row.names = c(NA, -9L))

关于r - 按两列分组并汇总多列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63837164/

相关文章:

python - 将 python 列中以毫秒为单位的字符串时间转换为时间戳

r - 如果 R 中的数据帧中的任何条目是无限的,则删除组

r - 与 randomForest 相比,游侠的错误预测

r - 如何在ggplot2中创建类似于theme_bw的自定义主题?

r - 如何在 R 中进行分类时间序列预测?

r - 了解 R 中 mclapply 和 parLapply 之间的区别

Python Pandas : Boolean indexing on multiple columns

python - 对 pandas 数据框进行重新采样、分组、旋转

r - 如何使用 dplyr/magrittr 管道将字符串转换为因子并设置对比度

r - 如何将调查响应的数据框转换为频率表?