r - 使用 R 中的 dplyr 对所有变量组合求和

标签 r dplyr tidyr

我对 R 编程比较陌生,所以如果这个问题太基础,我深表歉意。我的交易显示了六种不同类型产品所赚取的收入。交易期限为三年。我的目标是找出每年所有不同产品组合的销售产品总和,即 2^6 - 1 = 64 - 1 = 63。意思是,我会有 63*3 = 189 组合。

为了简单起见,我只使用三个变量创建了测试数据,因为我使用 while 循环编写了一年的程序,这很糟糕。我的目标是展示我正在努力实现的目标。尽管如此,我还是从下面的原始文件中发布了随机样本。

这里的测试数据只有三个变量 CarTireServices 以及 while 循环来显示你就是我正在寻找的:

    dput(Sample_File)
structure(list(Order.ID = c(171, 173, 132, 174, 132, 174, 132, 
174, 174), Fiscal.Year = c(2017, 2016, 2016, 2016, 2016, 2016, 
2016, 2016, 2018), Car = c(2, 2, 3, 1, 0, 0, 0, 0, 1), Tire = c(0, 
0, 0, 1, 0, 1, 0, 1, 1), Services = c(3, 1, 4, 0, 4, 1, 4, 0, 
0)), .Names = c("Order.ID", "Fiscal.Year", "Car", "Tire", "Services"
), row.names = c(NA, 9L), class = "data.frame")

这是我的代码:

  i<-1
    Csum <- matrix(rep(0,21),nrow = 7,ncol = 3) 
    # Row 1 is used when C is ON; T is ON ; S is ON
    # Row 2 is used when C is ON; T is ON ; S is OFF
    # Row 3 is used when C is ON; T is OFF ; S is ON
    # Row 4 is used when C is OFF; T is ON ; S is ON
    # Row 5 is used when C is ON; T is OFF ; S is OFF
    # Row 6 is used when C is OFF; T is ON ; S is OFF
    # Row 7 is used when C is OFF; T is OFF ; S is ON

    while (i <= length(Sample_File$Order.ID))
    {
      if (Sample_File$Fiscal.Year[i]!=2016)
        {
        i<-i+1
        next
      }
      if (Sample_File$Car[i]!=0 & Sample_File$Tire[i]!=0 & Sample_File$Services[i]!=0)#1 
      {
        Csum[1,1] <- Csum[1,1] + Sample_File$Car[i]
        Csum[1,2] <- Csum[1,2] + Sample_File$Tire[i]
        Csum[1,3] <- Csum[1,3] + Sample_File$Services[i]

      }
      else if (Sample_File$Car[i]!=0 & Sample_File$Tire[i]!=0 & Sample_File$Services[i]==0) #2
      {
        Csum[2,1] <- Csum[2,1] + Sample_File$Car[i]
        Csum[2,2] <- Csum[2,2] + Sample_File$Tire[i]
        Csum[2,3] <- Csum[2,3] + 0
      }
      else if(Sample_File$Car[i]!=0 & Sample_File$Tire[i]==0 & Sample_File$Services[i]!=0) #3
        {

        Csum[3,1] <- Csum[3,1] + Sample_File$Car[i]
        Csum[3,2] <- Csum[3,2] + 0
        Csum[3,3] <- Csum[3,3] + Sample_File$Services[i]
      }
      else if(Sample_File$Car[i]==0 & Sample_File$Tire[i]!=0 & Sample_File$Services[i]!=0) #4
      {
        Csum[4,1] <- Csum[4,1] + 0
        Csum[4,2] <- Csum[4,2] + Sample_File$Tire[i]
        Csum[4,3] <- Csum[4,3] + Sample_File$Services[i]
      }
      else if(Sample_File$Car[i]!=0 & Sample_File$Tire[i]==0 & Sample_File$Services[i]==0) #5
      {
        Csum[5,1] <- Csum[5,1] + Sample_File$Car[i]
        Csum[5,2] <- Csum[5,2] + 0
        Csum[5,3] <- Csum[5,3] + 0
      }
      else if(Sample_File$Car[i]==0 & Sample_File$Tire[i]!=0 & Sample_File$Services[i]==0)#6 
      {
        Csum[6,1] <- Csum[6,1] + 0
        Csum[6,2] <- Csum[6,2] + Sample_File$Tire[i]
        Csum[6,3] <- Csum[6,3] + 0
      }
      else #7
        {
          Csum[7,1] <- Csum[7,1] + 0
          Csum[7,2] <- Csum[7,2] + 0
          Csum[7,3] <- Csum[7,3] + Sample_File$Services[i]
        }
      i<-i+1
    }  

我编写的代码只处理一年,因为复制此代码三年是非常痛苦的。我正在寻找一种解决方案,可以创建 3 个数据框的列表,每个数据框为期三年。

这是一个大小为 10 的随机样本,其中包含原始文件中的六个变量。

dput(Sample_File_Random)
structure(list(Order.ID = c(171, 173, 132, 174, 169, 175, 163, 
186, 178, 121), Fiscal.Year = c(2016, 2016, 2017, 2016, 2015, 
2016, 2015, 2015, 2015, 2017), Car = c(2, 0, 3, 0, 0, 0, 0, 5346.25, 
0, 0), Tire = c(0, 0, 0, 8691.55800460666, 3198, 5, 2, 0, 2, 
3282.18), Services = c(3, 0, 4, 0, 0, 0, 0, 0, 0, 0), Insurance = c(4, 
0, 0, 4, 0, 4, 0, 0, 0, 0), Accessories = c(94.3, 3749.8, 9308.65, 
0, 2, 0, 1, 633.75, 51.44, 0), Finance = c(0, 0, 0, 4, 0, 14800, 
0, 0, 0, 0)), .Names = c("Order.ID", "Fiscal.Year", "Car", "Tire", 
"Services", "Insurance", "Accessories", "Finance"), row.names = c(NA, 
10L), class = "data.frame")

我真的陷入困境,所以我真诚地感谢任何有关矢量化的帮助..


@ Ronak shah 的请求:这是 Sample_File_Random 的预期输出

Output_File
  Fiscal.Year     Car     Tire Services Insurance Accessories Finance
1        2015    0.00 3202.000        0         0       54.44       0
2        2015 5346.25    0.000        0         0      633.75       0
3        2016    2.00    0.000        3         4       94.30       0
4        2016    0.00    0.000        0         0     3749.80       0
5        2016    0.00 8696.558        0         8        0.00   14804
6        2017    3.00    0.000        4         0     9308.65       0
7        2017    0.00 3282.180        0         0        0.00       0

最佳答案

这是一个紧凑且富有表现力的 dplyr 解决方案,分三个步骤进行:

  1. 创建指标以确定每项服务是否在购物篮中
  2. 按年份分组,以及指标组合
  3. 按分组变量对服务值求和

下面是执行此操作的代码:

df_foo %>% 
  # 1. create the combinations of whether each of the 
  #   products is in the basket or not
  mutate_each(
    funs(In_Basket = . > 0), Car:Services
  ) %>% 
  # 2. group by the year and the basket service indicators
  group_by_(.dots = c("Fiscal.Year", grep("_In_Basket", names(.), value = TRUE))) %>% 
  # 3. sum the service values
  summarise_each(
    funs(sum(., na.rm = TRUE)), Car:Services
  )

这给出了输出:

Source: local data frame [7 x 7]
Groups: Fiscal.Year, Car_In_Basket, Tire_In_Basket [?]

  Fiscal.Year Car_In_Basket Tire_In_Basket Services_In_Basket   Car  Tire Services
        <dbl>         <lgl>          <lgl>              <lgl> <dbl> <dbl>    <dbl>
1        2016         FALSE          FALSE               TRUE     0     0        8
2        2016         FALSE           TRUE              FALSE     0     1        0
3        2016         FALSE           TRUE               TRUE     0     1        1
4        2016          TRUE          FALSE               TRUE     5     0        5
5        2016          TRUE           TRUE              FALSE     1     1        0
6        2017          TRUE          FALSE               TRUE     2     0        3
7        2018          TRUE           TRUE              FALSE     1     1        0

关于r - 使用 R 中的 dplyr 对所有变量组合求和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40457655/

相关文章:

r - 列对的矩阵平均值

r - 如何按可变行数偏移行

使用 dplyr 替换来自不同数据帧的多列

根据子图中的点重新排列facet_wrap图

r - 如何计算 dplyr::group_by 成员之间的重叠

r - Tidyr 的 gather() 与 NAs

r - 如何在 R 中将年份映射到随后的几十年?

R foreach 和打印行为

r - 如何从R中的basename结尾删除文件扩展名?

r - 是否可以为 R 定义跨平台工作目录?