我正在尝试创建按不同列分组的新列,但我不确定我这样做的方式是否是使用 group_by 的最佳方式。我想知道是否有一种方法可以让 group_by 排队?
我知道它可以使用 data.table 包来完成,其中语法是类型
DT[i,j,by]。
但由于这是使用 tidyverse 的较大代码中的一小部分,并且按原样运行良好,我只是不想偏离这一点。
## Creating Sample Data Frame
state <- rep(c("OH", "IL", "IN", "PA", "KY"),10)
county <- sample(LETTERS[1:5], 50, replace = T) %>% str_c(state,sep = "-")
customers <- sample.int(50:100,50)
sales <- sample.int(500:5000,50)
df <- bind_cols(data.frame(state, county,customers,sales))
## workflow
df2 <- df %>%
group_by(state) %>%
mutate(customerInState = sum(customers),
saleInState = sum(sales)) %>%
ungroup %>%
group_by(county) %>%
mutate(customerInCounty = sum(customers),
saleInCounty = sum(sales)) %>%
ungroup %>%
mutate(salePerCountyPercent = saleInCounty/saleInState,
customerPerCountyPercent = customerInCounty/customerInState) %>%
group_by(state) %>%
mutate(minSale = min(salePerCountyPercent)) %>%
ungroup
我希望我的代码看起来像
df3 <- df %>%
mutate(customerInState = sum(customers, by = state),
saleInState = sum(sales, by = state),
customerInCounty = sum(customers, by = county),
saleInCounty = sum(sales, by = county),
salePerCountyPercent = saleInCounty/saleInState,
customerPerCountyPercent = customerInCounty/customerInState,
minSale = min(salePerCountyPercent, by = state))
它运行没有错误,但我知道输出不正确
我知道有可能在 mutates 周围玩杂耍,以使用较少的 group_bys 来获得我需要的东西。
但问题是,如果在 dplyr 中可以在 line group by 中做
最佳答案
你可以创建包装器来做你想做的事。如果您有一个分组变量,则此特定解决方案有效。祝你好运!
library(tidyverse)
mutate_by <- function(.data, group, ...) {
group_by(.data, !!enquo(group)) %>%
mutate(...) %>%
ungroup
}
df1 <- df %>%
mutate_by(state,
customerInState = sum(customers),
saleInState = sum(sales)) %>%
mutate_by(county,
customerInCounty = sum(customers),
saleInCounty = sum(sales)) %>%
mutate(salePerCountyPercent = saleInCounty/saleInState,
customerPerCountyPercent = customerInCounty/customerInState) %>%
mutate_by(state,
minSale = min(salePerCountyPercent))
identical(df2, df1)
[1] TRUE
编辑:或者,更简洁/类似于您的代码:
df %>%
mutate_by(customerInState = sum(customers),
saleInState = sum(sales), group = state) %>%
mutate_by(customerInCounty = sum(customers),
saleInCounty = sum(sales), group = county) %>%
mutate(salePerCountyPercent = saleInCounty/saleInState,
customerPerCountyPercent = customerInCounty/customerInState) %>%
mutate_by(minSale = min(salePerCountyPercent), group = state)
关于r - 在 dplyr 中按行分组以改变列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57059875/