我对 R 编程比较陌生,所以如果这个问题太基础,我深表歉意。我的交易显示了六种不同类型产品所赚取的收入。交易期限为三年。我的目标是找出每年所有不同产品组合的销售产品总和,即 2^6 - 1 = 64 - 1 = 63
。意思是,我会有 63*3 = 189
组合。
为了简单起见,我只使用三个变量创建了测试数据,因为我使用 while
循环编写了一年的程序,这很糟糕。我的目标是展示我正在努力实现的目标。尽管如此,我还是从下面的原始文件中发布了随机样本。
这里的测试数据只有三个变量 Car
、Tire
和 Services
以及 while
循环来显示你就是我正在寻找的:
dput(Sample_File)
structure(list(Order.ID = c(171, 173, 132, 174, 132, 174, 132,
174, 174), Fiscal.Year = c(2017, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2018), Car = c(2, 2, 3, 1, 0, 0, 0, 0, 1), Tire = c(0,
0, 0, 1, 0, 1, 0, 1, 1), Services = c(3, 1, 4, 0, 4, 1, 4, 0,
0)), .Names = c("Order.ID", "Fiscal.Year", "Car", "Tire", "Services"
), row.names = c(NA, 9L), class = "data.frame")
这是我的代码:
i<-1
Csum <- matrix(rep(0,21),nrow = 7,ncol = 3)
# Row 1 is used when C is ON; T is ON ; S is ON
# Row 2 is used when C is ON; T is ON ; S is OFF
# Row 3 is used when C is ON; T is OFF ; S is ON
# Row 4 is used when C is OFF; T is ON ; S is ON
# Row 5 is used when C is ON; T is OFF ; S is OFF
# Row 6 is used when C is OFF; T is ON ; S is OFF
# Row 7 is used when C is OFF; T is OFF ; S is ON
while (i <= length(Sample_File$Order.ID))
{
if (Sample_File$Fiscal.Year[i]!=2016)
{
i<-i+1
next
}
if (Sample_File$Car[i]!=0 & Sample_File$Tire[i]!=0 & Sample_File$Services[i]!=0)#1
{
Csum[1,1] <- Csum[1,1] + Sample_File$Car[i]
Csum[1,2] <- Csum[1,2] + Sample_File$Tire[i]
Csum[1,3] <- Csum[1,3] + Sample_File$Services[i]
}
else if (Sample_File$Car[i]!=0 & Sample_File$Tire[i]!=0 & Sample_File$Services[i]==0) #2
{
Csum[2,1] <- Csum[2,1] + Sample_File$Car[i]
Csum[2,2] <- Csum[2,2] + Sample_File$Tire[i]
Csum[2,3] <- Csum[2,3] + 0
}
else if(Sample_File$Car[i]!=0 & Sample_File$Tire[i]==0 & Sample_File$Services[i]!=0) #3
{
Csum[3,1] <- Csum[3,1] + Sample_File$Car[i]
Csum[3,2] <- Csum[3,2] + 0
Csum[3,3] <- Csum[3,3] + Sample_File$Services[i]
}
else if(Sample_File$Car[i]==0 & Sample_File$Tire[i]!=0 & Sample_File$Services[i]!=0) #4
{
Csum[4,1] <- Csum[4,1] + 0
Csum[4,2] <- Csum[4,2] + Sample_File$Tire[i]
Csum[4,3] <- Csum[4,3] + Sample_File$Services[i]
}
else if(Sample_File$Car[i]!=0 & Sample_File$Tire[i]==0 & Sample_File$Services[i]==0) #5
{
Csum[5,1] <- Csum[5,1] + Sample_File$Car[i]
Csum[5,2] <- Csum[5,2] + 0
Csum[5,3] <- Csum[5,3] + 0
}
else if(Sample_File$Car[i]==0 & Sample_File$Tire[i]!=0 & Sample_File$Services[i]==0)#6
{
Csum[6,1] <- Csum[6,1] + 0
Csum[6,2] <- Csum[6,2] + Sample_File$Tire[i]
Csum[6,3] <- Csum[6,3] + 0
}
else #7
{
Csum[7,1] <- Csum[7,1] + 0
Csum[7,2] <- Csum[7,2] + 0
Csum[7,3] <- Csum[7,3] + Sample_File$Services[i]
}
i<-i+1
}
我编写的代码只处理一年,因为复制此代码三年是非常痛苦的。我正在寻找一种解决方案,可以创建 3 个数据框的列表,每个数据框为期三年。
这是一个大小为 10 的随机样本,其中包含原始文件中的六个变量。
dput(Sample_File_Random)
structure(list(Order.ID = c(171, 173, 132, 174, 169, 175, 163,
186, 178, 121), Fiscal.Year = c(2016, 2016, 2017, 2016, 2015,
2016, 2015, 2015, 2015, 2017), Car = c(2, 0, 3, 0, 0, 0, 0, 5346.25,
0, 0), Tire = c(0, 0, 0, 8691.55800460666, 3198, 5, 2, 0, 2,
3282.18), Services = c(3, 0, 4, 0, 0, 0, 0, 0, 0, 0), Insurance = c(4,
0, 0, 4, 0, 4, 0, 0, 0, 0), Accessories = c(94.3, 3749.8, 9308.65,
0, 2, 0, 1, 633.75, 51.44, 0), Finance = c(0, 0, 0, 4, 0, 14800,
0, 0, 0, 0)), .Names = c("Order.ID", "Fiscal.Year", "Car", "Tire",
"Services", "Insurance", "Accessories", "Finance"), row.names = c(NA,
10L), class = "data.frame")
我真的陷入困境,所以我真诚地感谢任何有关矢量化的帮助..
@ Ronak shah 的请求:这是 Sample_File_Random
的预期输出
Output_File
Fiscal.Year Car Tire Services Insurance Accessories Finance
1 2015 0.00 3202.000 0 0 54.44 0
2 2015 5346.25 0.000 0 0 633.75 0
3 2016 2.00 0.000 3 4 94.30 0
4 2016 0.00 0.000 0 0 3749.80 0
5 2016 0.00 8696.558 0 8 0.00 14804
6 2017 3.00 0.000 4 0 9308.65 0
7 2017 0.00 3282.180 0 0 0.00 0
最佳答案
这是一个紧凑且富有表现力的 dplyr
解决方案,分三个步骤进行:
- 创建指标以确定每项服务是否在购物篮中
- 按年份分组,以及指标组合
- 按分组变量对服务值求和
下面是执行此操作的代码:
df_foo %>%
# 1. create the combinations of whether each of the
# products is in the basket or not
mutate_each(
funs(In_Basket = . > 0), Car:Services
) %>%
# 2. group by the year and the basket service indicators
group_by_(.dots = c("Fiscal.Year", grep("_In_Basket", names(.), value = TRUE))) %>%
# 3. sum the service values
summarise_each(
funs(sum(., na.rm = TRUE)), Car:Services
)
这给出了输出:
Source: local data frame [7 x 7]
Groups: Fiscal.Year, Car_In_Basket, Tire_In_Basket [?]
Fiscal.Year Car_In_Basket Tire_In_Basket Services_In_Basket Car Tire Services
<dbl> <lgl> <lgl> <lgl> <dbl> <dbl> <dbl>
1 2016 FALSE FALSE TRUE 0 0 8
2 2016 FALSE TRUE FALSE 0 1 0
3 2016 FALSE TRUE TRUE 0 1 1
4 2016 TRUE FALSE TRUE 5 0 5
5 2016 TRUE TRUE FALSE 1 1 0
6 2017 TRUE FALSE TRUE 2 0 3
7 2018 TRUE TRUE FALSE 1 1 0
关于r - 使用 R 中的 dplyr 对所有变量组合求和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40457655/