我使用的 data.table 代码工作正常,但无法转换为包含 ifelse
语句。我使用以下表示:
set.seed(1645)
Place <- c(rep("Copenhagen",7),rep("Berlin",11),rep("Roma",12))
Year <- c(rep("2020",4),rep("2021",3),rep("2020",6),rep("2021",5),rep("2019",4),rep("2020",4),rep("2021",4))
Value1 <- c(runif(3),NA,runif(8),NA,runif(9),NA,runif(7))
Value2 <- c(runif(4),NA,runif(2),runif(6),NA,NA,runif(11),NA,NA,runif(2))
df <- data.frame(Place,Year,Value1,Value2)
> df
Place Year Value1 Value2
1 Copenhagen 2020 0.10517697 0.865935100
2 Copenhagen 2020 0.96597760 0.579956282
3 Copenhagen 2020 0.47262307 0.346569960
4 Copenhagen 2020 NA 0.478763951
5 Copenhagen 2021 0.90030423 NA
6 Copenhagen 2021 0.14444142 0.280377315
7 Copenhagen 2021 0.73801550 0.302816525
8 Berlin 2020 0.13961383 0.641314310
9 Berlin 2020 0.40221211 0.756374251
10 Berlin 2020 0.49613139 0.070459347
11 Berlin 2020 0.95190545 0.184497038
12 Berlin 2020 0.40182901 0.407892240
13 Berlin 2020 NA 0.002209376
14 Berlin 2021 0.38310025 NA
15 Berlin 2021 0.76417492 NA
16 Berlin 2021 0.29001287 0.632133629
17 Berlin 2021 0.84478784 0.365406326
18 Berlin 2021 0.55547323 0.493870653
19 Roma 2019 0.44198733 0.067744090
20 Roma 2019 0.50403809 0.847876518
21 Roma 2019 0.85358805 0.952393606
22 Roma 2019 0.74996137 0.887583928
23 Roma 2020 NA 0.631937527
24 Roma 2020 0.08303509 0.993400333
25 Roma 2020 0.74205719 0.589183185
26 Roma 2020 0.27552659 0.522451407
27 Roma 2021 0.39518410 NA
28 Roma 2021 0.38390124 NA
29 Roma 2021 0.36605674 0.942102065
30 Roma 2021 0.32014949 0.375689863
如果组中的 NA <= 25%,我想按地点和年份计算平均值。如果没有我的条件,这可以正常工作:
setDT(df)
df_means <- df[,.(Value1_mean = mean(Value1),Value2_mean = mean(Value2)), by = .(Place,Year)]
> df_means
Place Year Value1_mean Value2_mean
1: Copenhagen 2020 NA 0.4257258
2: Copenhagen 2021 0.3581245 NA
3: Berlin 2020 NA 0.3935807
4: Berlin 2021 0.3729461 NA
5: Roma 2019 0.4572996 0.3956536
6: Roma 2020 NA 0.6494491
7: Roma 2021 0.4142637 NA
我未能包含 ifelse
语句,这不起作用:
df_means2 <- df[,.(Value1_mean = ifelse(sum(is.na(Value1))/length(Value1)>=0.25,NA,mean(Value1,na.rm=TRUE)),
Value2_mean = ifelse(sum(is.na(Value2))/length(Value2)>=0.25,NA,mean(Value2,na.rm=TRUE))),
by = .(Place,Year)]
我检查了这些帖子1 , 2 ,和3没有解决我的问题。我的预期结果应该是这样的:
> df_means2
Place Year Value1_mean Value2_mean
1 Copenhagen 2020 mean mean
2 Copenhagen 2021 mean <NA>
3 Berlin 2020 mean mean
4 Berlin 2021 mean <NA>
5 Roma 2019 mean mean
6 Roma 2020 mean mean
7 Roma 2021 mean <NA>
如何转换我的代码?
最佳答案
我们可以使用if/else
library(data.table)
df[, lapply(.SD, function(x) if(mean(is.na(x)) <= 0.25)
mean(x, na.rm = TRUE) else NA_real_), by = .(Place, Year)]
-输出
Place Year Value1 Value2
1: Copenhagen 2020 0.5145925 0.5678063
2: Copenhagen 2021 0.5942537 NA
3: Berlin 2020 0.4783384 0.3437911
4: Berlin 2021 0.5675098 NA
5: Roma 2019 0.6373937 0.6888995
6: Roma 2020 0.3668730 0.6842431
7: Roma 2021 0.3663229 NA
在OP的代码中,使用NA_real_
而不是NA
,默认情况下是逻辑
,这会在类中产生冲突
df[,.(Value1_mean = ifelse(sum(is.na(Value1))/length(Value1)>0.25,
NA_real_,mean(Value1,na.rm=TRUE)),
Value2_mean = ifelse(sum(is.na(Value2))/length(Value2)>0.25,
NA_real_,
mean(Value2,na.rm=TRUE))),
by = .(Place,Year)]
-输出
Place Year Value1_mean Value2_mean
1: Copenhagen 2020 0.5145925 0.5678063
2: Copenhagen 2021 0.5942537 NA
3: Berlin 2020 0.4783384 0.3437911
4: Berlin 2021 0.5675098 NA
5: Roma 2019 0.6373937 0.6888995
6: Roma 2020 0.3668730 0.6842431
7: Roma 2021 0.3663229 NA
关于r - 如何使用 ifelse 语句通过 data.table 语法按组获取平均值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70173094/