r - dplyr::count() 多列

标签 r dplyr

我有以下数据集:

dat = structure(list(C86_1981 = c("Outer London", "Buckinghamshire", 
NA, "Ross and Cromarty", "Cornwall and Isles of Scilly", NA, 
"Kirkcaldy", "Devon", "Kent", "Renfrew"), C96_1981 = c("Outer London", 
"Buckinghamshire", NA, "Ross and Cromarty", "Not known/missing", 
NA, "Kirkcaldy", NA, NA, NA), C00_1981 = c("Outer London", "Inner London", 
"Lancashire", "Ross and Cromarty", NA, "Humberside", "Kirkcaldy", 
NA, NA, NA), C04_1981 = c("Kent", NA, NA, "Ross and Cromarty", 
NA, "Humberside", "Not known/missing", NA, NA, "Renfrew"), C08_1981 = c("Kent", 
"Oxfordshire", NA, "Ross and Cromarty", "Cornwall and Isles of Scilly", 
"Humberside", "Dunfermline", NA, NA, "Renfrew"), C12_1981 = c("Kent", 
NA, NA, "Ross and Cromarty", "Cornwall and Isles of Scilly", 
"Humberside", "Dunfermline", NA, NA, "Renfrew")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("C86_1981", 
"C96_1981", "C00_1981", "C04_1981", "C08_1981", "C12_1981"))

我要dplyr::count()每列。预期输出:
# A tibble: 10 x 3
                       C86_1981 dat86_n dat96_n ...
                          <chr>   <int>   <int>
 1              Buckinghamshire       1       1
 2 Cornwall and Isles of Scilly       1      NA
 3                        Devon       1      NA
 4                         Kent       1      NA
 5                    Kirkcaldy       1       1
 6                 Outer London       1       1
 7                      Renfrew       1      NA
 8            Ross and Cromarty       1       1
 9                         <NA>       2       5
10            Not known/missing      NA       1

目前我正在手动执行此操作然后 dplyr::full_join()结果:
library("tidyverse")

dat86_n = dat %>%
  count(C86_1981) %>%
  rename(dat86_n = n)
dat96_n = dat %>%
  count(C96_1981) %>%
  rename(dat96_n = n)
# ...

dat_counts = dat86_n %>%
  full_join(dat96_n, by = c("C86_1981" = "C96_1981"))
  # ...

哪个有效,但如果我的任何数据稍后发生更改,则它并不完全可靠。我曾希望以编程方式执行此操作。

我试过一个循环:
lapply(dat, count)
# Error in UseMethod("groups") : 
# no applicable method for 'groups' applied to an object of class "character"

( purrr::map() 给出相同的错误)。我认为这个错误是因为 count()期待 tbl和一个变量作为单独的参数,所以我也尝试过:
lapply(dat, function(x) {
  count(dat, x)
})
# Error in grouped_df_impl(data, unname(vars), drop) : 
# Column `x` is unknown

再次,purrr::map()给出相同的错误。我也试过 summarise_all() 的变体:
dat %>% 
  summarise_all(count)
  # Error in summarise_impl(.data, dots) : 
  # Evaluation error: no applicable method for 'groups' applied to an object of class "character".

我觉得我错过了一些明显的东西,解决方案应该很简单。 dplyr解决方案特别受欢迎,因为这是我最常用的解决方案。

最佳答案

还使用 tidyr 包,下面的代码可以解决问题:

dat %>% tidyr::gather(name, city) %>% dplyr::group_by(name, city) %>% dplyr::count() %>% dplyr::ungroup %>% tidyr::spread(name, n)

结果:
# A tibble: 15 x 7
                           city C00_1981 C04_1981 C08_1981 C12_1981 C86_1981 C96_1981
 *                        <chr>    <int>    <int>    <int>    <int>    <int>    <int>
 1              Buckinghamshire       NA       NA       NA       NA        1        1
 2 Cornwall and Isles of Scilly       NA       NA        1        1        1       NA
 3                        Devon       NA       NA       NA       NA        1       NA
 4                  Dunfermline       NA       NA        1        1       NA       NA
 5                   Humberside        1        1        1        1       NA       NA
 6                 Inner London        1       NA       NA       NA       NA       NA
 7                         Kent       NA        1        1        1        1       NA
 8                    Kirkcaldy        1       NA       NA       NA        1        1
 9                   Lancashire        1       NA       NA       NA       NA       NA
10            Not known/missing       NA        1       NA       NA       NA        1
11                 Outer London        1       NA       NA       NA        1        1
12                  Oxfordshire       NA       NA        1       NA       NA       NA
13                      Renfrew       NA        1        1        1        1       NA
14            Ross and Cromarty        1        1        1        1        1        1
15                         <NA>        4        5        3        4        2        5

关于r - dplyr::count() 多列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46339538/

相关文章:

r - 根据动态依赖的两个条件过滤

在组内 reshape 数据 - 一行中的组

r - x 没有适用于 'tbl_vars' 的方法应用于类 "c(' double', 'numeric' ) 的对象”

r - 放置一个指向 R Shiny 应用程序的 HTML 链接

r - 有效地查找r中数据帧中不同行的列值计数

r - shinydasboard 未加载 R

r - 有没有办法在 Linux 上的 R 版本之间切换?

r - 如何向仅修改某些列的数据框添加一行

r - dplyr:将每个变量的多个 `count` + `mutate` 语句组合成单个语句

r - 如何用另一个数据框中的信息替换数据框的行名称?