r - 使用 dplyr 对多个分组变量进行计数

标签 r group-by count dplyr

我有一个包含多个分类变量的数据集

data <- data_frame(
HomeTeam = c("Team1", "Team2", "Team3", "Team4", "Team2", "Team2", "Team4", 
             "Team3", "Team2", "Team1", "Team3", "Team2"),
AwayTeam = c("Team2", "Team1", "Team4", "Team3", "Team1", "Team4", "Team1", 
             "Team2", "Team3", "Team3", "Team4", "Team1"),
HomeScore = c(10, 5, 12, 18, 17, 19, 23, 17, 34, 19, 8, 3),
AwayScore = c(4, 16, 9, 19, 16, 4, 8, 21, 6, 5, 9, 17),
Venue = c("Ground1", "Ground2", "Ground3", "Ground3", "Ground1", "Ground2", 
          "Ground1", "Ground3", "Ground2", "Ground3", "Ground4", "Ground2"))

我基本上想通过计数将“HomeTeam”和“AwayTeam”汇总到一个新表中,如下所示

 HomeTeam NumberOfGamesHome NumberOfGamesaWAY
 <chr>                <int>             <int>
 1 Team1                    2                 4
 2 Team2                    5                 2
 3 Team3                    3                 3
 4 Team4                    2                 3

我当前的方法需要两行分组代码,然后连接表

HomeTeamCount <- data %>% 
group_by(HomeTeam) %>% 
summarise(NumberOfGamesHome = n()) 

AwayTeamCount <- data %>% 
group_by(AwayTeam) %>% 
summarise(NumberOfGamesAway = n()) 

Desired <- left_join(HomeTeamCount, AwayTeamCount, 
                 by = c("HomeTeam" = "AwayTeam"))

在我的实际数据集中,我有大量的分类变量,遵循上述方法似乎费力且低效

有没有办法使用 dplyr 对多个分类变量进行 group_by 来产生所需的输出?或者可能是 data.table?

我咨询了其他几个问题,例如 herehere ,但一直无法找到答案。

最佳答案

这是一种使用gather将数据从宽到长传播的可能性,按球队分组并汇总主客场比赛的数量。

library(tidyverse)
data %>%
    gather(key, Team) %>%
    group_by(Team) %>%
    summarise(
        NumberOfGamesHome = sum(key == "HomeTeam"),
        NumberOfGamesaWAY = sum(key == "AwayTeam"))
## A tibble: 4 x 3
#  Team  NumberOfGamesHome NumberOfGamesaWAY
#  <chr>             <int>             <int>
#1 Team1                 2                 4
#2 Team2                 5                 2
#3 Team3                 3                 3
#4 Team4                 2                 3

更新

要使用您可以执行的其他列来反射(reflect)更新后的示例数据

data %>%
    gather(key, Team, HomeTeam, AwayTeam) %>%
    group_by(Team) %>%
    summarise(
        NumberOfGamesHome = sum(key == "HomeTeam"),
        NumberOfGamesaWAY = sum(key == "AwayTeam"))
## A tibble: 4 x 3
#  Team  NumberOfGamesHome NumberOfGamesaWAY
#  <chr>             <int>             <int>
#1 Team1                 2                 4
#2 Team2                 5                 2
#3 Team3                 3                 3
#4 Team4                 2                 3

关于r - 使用 dplyr 对多个分组变量进行计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53455485/

相关文章:

sql - 结合使用 string_agg 并在 postgres 中使用

php - 将帖子计为类别 PHP 和 MYSQL

linux - 计算包含特定字符的文本文件中的行数(Linux)?

r - 如何根据向量值在点图中使用不同的符号

SQL 按 LIKE 模式分组

python - pandas groupby 计数共存

sql - 同时从 SQL 获取内容和计数

r - 为什么像 æøå 这样的特殊字符在 block 中显示不正确,但在 R Notebooks 的内联代码中显示正确?

r - 用 R 删除反向重复项

r - 在 R 中求解方程类似于 Excel 求解器参数函数