r - R中由带有dplyr的另一列分组的分类值的计数

我想按位置名称总结 df。数据看起来像这样:

location <- c("NY", "NC", "KA", "TX", "AZ", "NC", "SC", "ND", "SD", "MN","WA","MA","VT","CA","OR","NJ","OH","MI","IL","GA","FL")
tree_type <- c("pine", "birch", "maple", "palm")
df <- data.frame(location = sample(location, 20, replace = TRUE), 
           tree_type = sample(tree_type, 20, replace = TRUE),
           density = runif(20, min = 24, max = 365), 
           income = runif(20, min = 37000, max = 62000))

我想要的是这个:

   location mean(density) mean(income) birch maple palm pine
1        AZ      38.44009     52032.95     0     0    1    0
2        CA     136.85112     42243.35     0     1    0    0
3        GA     101.24081     53405.60     2     0    0    0
4        IL     172.02651     46368.42     1     1    0    0
5        MA     198.69868     51117.18     0     0    0    1
6        MI     153.93358     60425.87     1     0    0    0
7        MN     185.05276     46468.68     0     0    1    0
8        NC     181.42187     46007.93     1     0    2    0
9        NJ     302.66541     59316.94     0     0    2    0
10       OR     303.88283     48497.03     0     0    0    2
11       SC      84.05136     50348.41     0     1    0    1
12       SD     158.47423     57894.27     0     0    1    0
13       VT     126.32967     42853.04     0     0    1    0

我是这样做的:

require(dplyr)
require(reshape2)
df_quantvars <- df %>% group_by(location) %>% summarise(mean(density), mean(income))
df_catvarslong <- as.data.frame(table(df[1:2]))
df_catvarswide <- dcast(df_catvarslong, location ~ tree_type, value.var = "Freq")
final_df <- left_join(df_quantvars, df_catvarswide, by = "location")

在 dplyr 中没有办法做到这一点吗？ group_by 成语？冒着听起来很愚蠢的风险，我尝试这样做:
df_quantvars <- df %>% group_by(location) %>% summarise(mean(density), mean(income), table(df[1:2]))
我错过了什么？

最佳答案

这个回复有点晚了，但我已经投入了一些工作。一次性完成这一切有点棘手。这似乎有效:

首先我使用 group_by(location, tree_type)计算所有的树，然后我使用 group_by(location)以获得所需的手段。然后我用 select(-c(density, income) 删除原始密度和收入类别并留下重复的行，但正确的聚合计数。然后我用 distinct() 删除重复项然后使用 spread()来自 tidyr库根据您的要求转换为宽格式。

library(dplyr)
library(tidyr)

df %>% 
  arrange(location)%>%
  group_by(location, tree_type)%>%
  mutate(Count = n())%>%
  group_by(location)%>%
  mutate(MeanDensity = mean(density), 
         MeanIncome = mean(income))%>%
  ungroup()%>%
  select(-c(density, income))%>%
  distinct()%>%
  spread(key = tree_type, value = Count, fill = 0)

这给了我:

  location MeanDensity MeanIncome birch maple  palm  pine
     (fctr)       (dbl)      (dbl) (dbl) (dbl) (dbl) (dbl)
1        AZ   244.18094   57474.94     0     0     1     0
2        FL    51.90693   42425.36     0     0     0     1
3        GA   341.18643   49385.44     0     0     0     2
4        IL   258.11124   37101.36     0     1     0     0
5        KA   267.92430   59699.20     1     0     0     0
6        MA    87.48623   60632.98     1     0     0     0
7        MI   197.18310   58837.00     0     0     0     1
8        NC   362.48531   50857.42     0     0     1     0
9        ND   315.57415   51465.06     0     0     1     0
10       NJ   233.72886   55877.40     0     0     1     1
11       NY   283.41522   49275.58     0     1     0     1
12       OH   350.23362   40901.73     0     0     1     0
13       OR   267.68415   38954.04     0     2     0     0
14       SC   260.12169   52837.10     0     1     0     0
15       SD    76.29782   54986.76     0     1     0     0
16       VT   341.80646   44547.77     1     0     0     0

关于r - R中由带有dplyr的另一列分组的分类值的计数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31956104/

r - R中由带有dplyr的另一列分组的分类值的计数

上一篇：meteor - 在 Meteor 中分区客户端代码？

下一篇：R - 对于数据框中的每一行，如何检查至少一列是否不适用？

r - R中由带有dplyr的另一列分组的分类值的计数

上一篇：meteor - 在 Meteor 中分​​区客户端代码？

下一篇：R - 对于数据框中的每一行，如何检查至少一列是否不适用？

上一篇：meteor - 在 Meteor 中分区客户端代码？