r - R中由带有dplyr的另一列分组的分类值的计数

标签 r aggregate dplyr

我想按位置名称总结 df。数据看起来像这样:

location <- c("NY", "NC", "KA", "TX", "AZ", "NC", "SC", "ND", "SD", "MN","WA","MA","VT","CA","OR","NJ","OH","MI","IL","GA","FL")
tree_type <- c("pine", "birch", "maple", "palm")
df <- data.frame(location = sample(location, 20, replace = TRUE), 
           tree_type = sample(tree_type, 20, replace = TRUE),
           density = runif(20, min = 24, max = 365), 
           income = runif(20, min = 37000, max = 62000))

我想要的是这个:
   location mean(density) mean(income) birch maple palm pine
1        AZ      38.44009     52032.95     0     0    1    0
2        CA     136.85112     42243.35     0     1    0    0
3        GA     101.24081     53405.60     2     0    0    0
4        IL     172.02651     46368.42     1     1    0    0
5        MA     198.69868     51117.18     0     0    0    1
6        MI     153.93358     60425.87     1     0    0    0
7        MN     185.05276     46468.68     0     0    1    0
8        NC     181.42187     46007.93     1     0    2    0
9        NJ     302.66541     59316.94     0     0    2    0
10       OR     303.88283     48497.03     0     0    0    2
11       SC      84.05136     50348.41     0     1    0    1
12       SD     158.47423     57894.27     0     0    1    0
13       VT     126.32967     42853.04     0     0    1    0

我是这样做的:
require(dplyr)
require(reshape2)
df_quantvars <- df %>% group_by(location) %>% summarise(mean(density), mean(income))
df_catvarslong <- as.data.frame(table(df[1:2]))
df_catvarswide <- dcast(df_catvarslong, location ~ tree_type, value.var = "Freq")
final_df <- left_join(df_quantvars, df_catvarswide, by = "location")

dplyr 中没有办法做到这一点吗? group_by 成语?冒着听起来很愚蠢的风险,我尝试这样做:
df_quantvars <- df %>% group_by(location) %>% summarise(mean(density), mean(income), table(df[1:2]))
我错过了什么?

最佳答案

这个回复有点晚了,但我已经投入了一些工作。一次性完成这一切有点棘手。这似乎有效:

首先我使用 group_by(location, tree_type)计算所有的树,然后我使用 group_by(location)以获得所需的手段。然后我用 select(-c(density, income) 删除原始密度和收入类别并留下重复的行,但正确的聚合计数。然后我用 distinct() 删除重复项然后使用 spread()来自 tidyr库根据您的要求转换为宽格式。

library(dplyr)
library(tidyr)

df %>% 
  arrange(location)%>%
  group_by(location, tree_type)%>%
  mutate(Count = n())%>%
  group_by(location)%>%
  mutate(MeanDensity = mean(density), 
         MeanIncome = mean(income))%>%
  ungroup()%>%
  select(-c(density, income))%>%
  distinct()%>%
  spread(key = tree_type, value = Count, fill = 0)

这给了我:
  location MeanDensity MeanIncome birch maple  palm  pine
     (fctr)       (dbl)      (dbl) (dbl) (dbl) (dbl) (dbl)
1        AZ   244.18094   57474.94     0     0     1     0
2        FL    51.90693   42425.36     0     0     0     1
3        GA   341.18643   49385.44     0     0     0     2
4        IL   258.11124   37101.36     0     1     0     0
5        KA   267.92430   59699.20     1     0     0     0
6        MA    87.48623   60632.98     1     0     0     0
7        MI   197.18310   58837.00     0     0     0     1
8        NC   362.48531   50857.42     0     0     1     0
9        ND   315.57415   51465.06     0     0     1     0
10       NJ   233.72886   55877.40     0     0     1     1
11       NY   283.41522   49275.58     0     1     0     1
12       OH   350.23362   40901.73     0     0     1     0
13       OR   267.68415   38954.04     0     2     0     0
14       SC   260.12169   52837.10     0     1     0     0
15       SD    76.29782   54986.76     0     1     0     0
16       VT   341.80646   44547.77     1     0     0     0

关于r - R中由带有dplyr的另一列分组的分类值的计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31956104/

相关文章:

r - 使用 dtplyr 时,使用函数输入中的列作为 group_by 变量

r - dplyr 条件汇总函数

用所需值替换连续的零

ElasticSearch 聚合使用 doc_count 进行平均

python - Pandas 按多列上的多个自定义聚合函数分组

折叠/分组列表以聚合最大/最小值的 Pythonic 方式

r - 按组计算非 NA 值

arrays - 复制 2 维矩阵以创建 3 维数组(在 R 中)

python - Python 中的 Mclust (R) 等效包

替换向量中的给定值