r - 在 dplyr 中按行分组以改变列

标签 r dplyr

我正在尝试创建按不同列分组的新列,但我不确定我这样做的方式是否是使用 group_by 的最佳方式。我想知道是否有一种方法可以让 group_by 排队?

我知道它可以使用 data.table 包来完成,其中语法是类型
DT[i,j,by]。

但由于这是使用 tidyverse 的较大代码中的一小部分,并且按原样运行良好,我只是不想偏离这一点。

## Creating Sample Data Frame
state <- rep(c("OH", "IL", "IN", "PA", "KY"),10) 
county <- sample(LETTERS[1:5], 50, replace = T) %>% str_c(state,sep = "-") 
customers <- sample.int(50:100,50) 
sales <- sample.int(500:5000,50)

df <- bind_cols(data.frame(state, county,customers,sales))

## workflow

df2 <- df %>%
  group_by(state) %>% 
  mutate(customerInState = sum(customers),
         saleInState = sum(sales)) %>% 
  ungroup %>% 
  group_by(county) %>% 
  mutate(customerInCounty = sum(customers),
         saleInCounty = sum(sales)) %>% 
  ungroup %>% 
  mutate(salePerCountyPercent  = saleInCounty/saleInState,
         customerPerCountyPercent = customerInCounty/customerInState) %>% 
  group_by(state) %>% 
  mutate(minSale = min(salePerCountyPercent)) %>%
  ungroup


我希望我的代码看起来像
df3 <- df %>%
  mutate(customerInState = sum(customers, by = state),
         saleInState = sum(sales, by = state),
         customerInCounty = sum(customers, by = county),
         saleInCounty = sum(sales, by = county),
         salePerCountyPercent  = saleInCounty/saleInState,
         customerPerCountyPercent = customerInCounty/customerInState,
         minSale = min(salePerCountyPercent, by = state))

它运行没有错误,但我知道输出不正确

我知道有可能在 mutates 周围玩杂耍,以使用较少的 group_bys 来获得我需要的东西。
但问题是,如果在 dplyr 中可以在 line group by 中做

最佳答案

你可以创建包装器来做你想做的事。如果您有一个分组变量,则此特定解决方案有效。祝你好运!

library(tidyverse)

mutate_by <- function(.data, group, ...) {

  group_by(.data, !!enquo(group)) %>%
    mutate(...) %>%
    ungroup

}

df1 <- df %>%
  mutate_by(state, 
            customerInState = sum(customers),
            saleInState = sum(sales)) %>%
  mutate_by(county,
            customerInCounty = sum(customers),
            saleInCounty = sum(sales)) %>%
  mutate(salePerCountyPercent  = saleInCounty/saleInState,
         customerPerCountyPercent = customerInCounty/customerInState) %>% 
  mutate_by(state,
            minSale = min(salePerCountyPercent))

identical(df2, df1)
[1] TRUE

编辑:或者,更简洁/类似于您的代码:
df %>%
  mutate_by(customerInState = sum(customers),
            saleInState = sum(sales), group = state) %>%
  mutate_by(customerInCounty = sum(customers),
            saleInCounty = sum(sales), group = county) %>%
  mutate(salePerCountyPercent  = saleInCounty/saleInState,
         customerPerCountyPercent = customerInCounty/customerInState) %>% 
  mutate_by(minSale = min(salePerCountyPercent), group = state)

关于r - 在 dplyr 中按行分组以改变列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57059875/

相关文章:

r - ggplot2 离散比例的连续颜色并删除图例

json - 将大量 json 对象转换为 Dataframe R

r - 在 `dplyr` 中,当使用 `pivot_wide` 时,我想同时替换 'NA'

使用通用名称对列重新排序 - dplyr

r - 寻找具有所有相等元素的最大方子矩阵

R - 用变量对重新组织一个熔化的 data.frame

R 绘制连续数据的平均值 + SD

r - 计算数据集每列的比例(百分比)

r - 计算一行的某些单元格中有多少个值不是 NA(在 R 中)

r - 应用group_by和summarise(sum),但保留具有不相关冲突数据的列?