r - 此 dplyr group_by 代码的 Base R 等价物是什么？

The R4DS book具有以下代码块:

library(tidyverse)
by_age2 <- gss_cat %>%
  filter(!is.na(age)) %>%
  count(age, marital) %>%
  group_by(age) %>%
  mutate(prop = n / sum(n))

在 base R 中是否有与此代码等效的简单代码？ filter 可以替换为 gss_cat[!is.na(gss_cat$age),]，但之后我遇到了麻烦。这显然是 by、tapply 或 aggregate 的工作，但我一直未能找到正确的方法。 by(gss_2, with(gss_2, list(age, marital)), length) 是朝着正确方向迈出的一步，但输出很糟糕。

最佳答案

我们可以使用 proportions在 table 上在 subset 之后输出想删除 NA ( complete.cases ) 和 select编辑列

数据来自forcats包裹。所以，加载包并获取数据

library(forcats)
data(gss_cat)

使用 table/proportions如上所述

by_age2_base <- proportions(table(subset(gss_cat, complete.cases(age), 
       select = c(age, marital))), 1)

-输出

head(by_age2_base, 3)
    marital
age    No answer Never married   Separated    Divorced     Widowed     Married
  18 0.000000000   0.978021978 0.000000000 0.000000000 0.000000000 0.021978022
  19 0.000000000   0.939759036 0.000000000 0.012048193 0.004016064 0.044176707
  20 0.000000000   0.904382470 0.003984064 0.007968127 0.000000000 0.083665339

-与OP的输出比较

head(by_age2, 3)
# A tibble: 3 x 4
# Groups:   age [2]
    age marital           n   prop
  <int> <fct>         <int>  <dbl>
1    18 Never married    89 0.978 
2    18 Married           2 0.0220
3    19 Never married   234 0.940

如果我们需要“长”格式的输出，请转换 table至 data.frame与 as.data.frame

by_age2_base_long <- subset(as.data.frame(by_age2_base), Freq > 0)

或者另一个选项是 aggregate/ave (使用 R 4.1.0 )

subset(gss_cat, complete.cases(age), select = c(age, marital)) |> 
    {\(dat) aggregate(cbind(n = age) ~ age + marital, 
      data = dat, FUN = length)}() |> 
   transform(prop = ave(n, age, FUN = \(x) x/sum(x)))

关于r - 此 dplyr group_by 代码的 Base R 等价物是什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67926231/

r - 此 dplyr group_by 代码的 Base R 等价物是什么？

上一篇：Django ORM、Q 语句和自定义排序

下一篇：computer-vision - 除了 yolov4 的 darknet 之外，是否已经有可用的权重，还是我必须训练？