r - 跨多个列计算出现次数并按年份分组

标签 r dataframe dplyr

我有一个电影数据集,其中有一个年份列和三个流派列。

这是一个例子:

genre_structure<-structure(
  list(
    year = c(
      "2008",
      "2003",
      "2010",
      "2001",
      "2002",
      "1999",
      "1980",
      "2020",
      "1977",
      "1991",
      "1954",
      "2022",
      "1962",
      "2000",
      "1994",
      "2019",
      "2019",
      "1981",
      "2012",
      "2003"
    ),
    genre1 = c(
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action"
    ),
    genre2 = c(
      "Crime",
      "Adventure",
      "Adventure",
      "Adventure",
      "Adventure",
      "SciFi",
      "Adventure",
      "Drama",
      "Adventure",
      "SciFi",
      "Drama",
      "Drama",
      "Drama",
      "Adventure",
      "Crime",
      "Adventure",
      "Adventure",
      "Adventure",
      "Drama",
      "Drama"
    ),
    genre3 = c(
      "Drama",
      "Drama",
      "SciFi",
      "Drama",
      "Drama",
      "",
      "Fantasy",
      "",
      "Fantasy",
      "",
      "",
      "Mystery",
      "Mystery",
      "Drama",
      "Drama",
      "Crime",
      "Drama",
      "",
      "",
      "Mystery"
    )
  ),
  row.names = c(NA,-20L),
  class =  "data.frame"
  )

我正在尝试计算每年的所有 3 种类型。预期结果是(示例):

genre | year| count
Action |2008| 1
Comedy | 2008 | 3
Drama | 2008 | 4
...

我试过:

genre_years_test<-genre_structure %>% 
  group_by(genre1, genre2, genre3, year) %>% 
  summarise(total=n(), .groups = "drop")

但每当有新类型在该年发布时,它就在重复年份。

最佳答案

我们可能会 reshape 为“长”并获得计数

library(dplyr)
library(tidyr)
genre_structure %>% 
  pivot_longer(cols = -year, values_to = 'genre') %>%
  count(year, genre, name = 'count')

关于r - 跨多个列计算出现次数并按年份分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75366193/

相关文章:

r - 不同年份具有不同颜色的图 - R

python - 连接两个具有相同列的数据帧,但给我 ValueError : columns overlap but no suffix specified

r - 将函数列表作为列的数据框

r - 如何使用 R 的 {collapse} 包来实现正确的 fgroup_by() |> ftransform() 输出?

R markdown df_print 选项

r - 来自 url 模板的传单 map 图 block 未在 Shiny 应用程序中呈现

python - 提取数据框行中的元素

R 包在 AWS t2.micro 卡住上安装 dplyr

r - 计算不同组固定年份的百分比变化

r - 如何使用 R 交换 data.table 中的列值