r - 如何使用 ifelse 语句通过 data.table 语法按组获取平均值?

标签 r if-statement data.table

我使用的 data.table 代码工作正常,但无法转换为包含 ifelse 语句。我使用以下表示:

set.seed(1645)
Place <- c(rep("Copenhagen",7),rep("Berlin",11),rep("Roma",12))
Year <- c(rep("2020",4),rep("2021",3),rep("2020",6),rep("2021",5),rep("2019",4),rep("2020",4),rep("2021",4))
Value1 <- c(runif(3),NA,runif(8),NA,runif(9),NA,runif(7))
Value2 <- c(runif(4),NA,runif(2),runif(6),NA,NA,runif(11),NA,NA,runif(2))
df <- data.frame(Place,Year,Value1,Value2)

> df
        Place Year     Value1      Value2
1  Copenhagen 2020 0.10517697 0.865935100
2  Copenhagen 2020 0.96597760 0.579956282
3  Copenhagen 2020 0.47262307 0.346569960
4  Copenhagen 2020         NA 0.478763951
5  Copenhagen 2021 0.90030423          NA
6  Copenhagen 2021 0.14444142 0.280377315
7  Copenhagen 2021 0.73801550 0.302816525
8      Berlin 2020 0.13961383 0.641314310
9      Berlin 2020 0.40221211 0.756374251
10     Berlin 2020 0.49613139 0.070459347
11     Berlin 2020 0.95190545 0.184497038
12     Berlin 2020 0.40182901 0.407892240
13     Berlin 2020         NA 0.002209376
14     Berlin 2021 0.38310025          NA
15     Berlin 2021 0.76417492          NA
16     Berlin 2021 0.29001287 0.632133629
17     Berlin 2021 0.84478784 0.365406326
18     Berlin 2021 0.55547323 0.493870653
19       Roma 2019 0.44198733 0.067744090
20       Roma 2019 0.50403809 0.847876518
21       Roma 2019 0.85358805 0.952393606
22       Roma 2019 0.74996137 0.887583928
23       Roma 2020         NA 0.631937527
24       Roma 2020 0.08303509 0.993400333
25       Roma 2020 0.74205719 0.589183185
26       Roma 2020 0.27552659 0.522451407
27       Roma 2021 0.39518410          NA
28       Roma 2021 0.38390124          NA
29       Roma 2021 0.36605674 0.942102065
30       Roma 2021 0.32014949 0.375689863

如果组中的 NA <= 25%,我想按地点和年份计算平均值。如果没有我的条件,这可以正常工作:

setDT(df)
df_means <- df[,.(Value1_mean = mean(Value1),Value2_mean = mean(Value2)), by = .(Place,Year)]

> df_means
        Place Year Value1_mean Value2_mean
1: Copenhagen 2020          NA   0.4257258
2: Copenhagen 2021   0.3581245          NA
3:     Berlin 2020          NA   0.3935807
4:     Berlin 2021   0.3729461          NA
5:       Roma 2019   0.4572996   0.3956536
6:       Roma 2020          NA   0.6494491
7:       Roma 2021   0.4142637          NA

我未能包含 ifelse 语句,这不起作用:

df_means2 <- df[,.(Value1_mean = ifelse(sum(is.na(Value1))/length(Value1)>=0.25,NA,mean(Value1,na.rm=TRUE)),
                   Value2_mean = ifelse(sum(is.na(Value2))/length(Value2)>=0.25,NA,mean(Value2,na.rm=TRUE))), 
                by = .(Place,Year)]

我检查了这些帖子1 , 2 ,和3没有解决我的问题。我的预期结果应该是这样的:

> df_means2
       Place Year Value1_mean Value2_mean
1 Copenhagen 2020        mean        mean
2 Copenhagen 2021        mean        <NA>
3     Berlin 2020        mean        mean
4     Berlin 2021        mean        <NA>
5       Roma 2019        mean        mean
6       Roma 2020        mean        mean
7       Roma 2021        mean        <NA>

如何转换我的代码?

最佳答案

我们可以使用if/else

library(data.table)
df[, lapply(.SD, function(x) if(mean(is.na(x)) <= 0.25) 
     mean(x, na.rm = TRUE) else NA_real_), by = .(Place, Year)]

-输出

     Place Year    Value1    Value2
1: Copenhagen 2020 0.5145925 0.5678063
2: Copenhagen 2021 0.5942537        NA
3:     Berlin 2020 0.4783384 0.3437911
4:     Berlin 2021 0.5675098        NA
5:       Roma 2019 0.6373937 0.6888995
6:       Roma 2020 0.3668730 0.6842431
7:       Roma 2021 0.3663229        NA

在OP的代码中,使用NA_real_而不是NA,默认情况下是逻辑,这会在类中产生冲突

df[,.(Value1_mean = ifelse(sum(is.na(Value1))/length(Value1)>0.25,
    NA_real_,mean(Value1,na.rm=TRUE)),
                   Value2_mean = ifelse(sum(is.na(Value2))/length(Value2)>0.25,
      NA_real_,
      mean(Value2,na.rm=TRUE))), 
                by = .(Place,Year)]

-输出

        Place Year Value1_mean Value2_mean
1: Copenhagen 2020   0.5145925   0.5678063
2: Copenhagen 2021   0.5942537          NA
3:     Berlin 2020   0.4783384   0.3437911
4:     Berlin 2021   0.5675098          NA
5:       Roma 2019   0.6373937   0.6888995
6:       Roma 2020   0.3668730   0.6842431
7:       Roma 2021   0.3663229          NA

关于r - 如何使用 ifelse 语句通过 data.table 语法按组获取平均值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70173094/

相关文章:

ios - Xcode - 源文件中的无效字符(将 ""替换为 "")

linux - bash if 语句预期的一元运算符

r - 我可以在分组 data.table 时打印一些东西吗?

r - 按 2 个成对向量子集/过滤 data.table

r - 我的Windows 10特有R再现性问题

python - 通过时间序列实现在线学习

C:假条件在 if 语句中被解释为真

r - 我们如何从子函数中提取外部函数的名称?

r - 根据另一列 r 的值在列之间切换元素

r - 获取不同组的相同个体数