考虑以下随机 MWE。
对于每一行,我试图确定哪个变量的值最接近常量 reference_day 以及哪个变量的值最接近常量 reference_day。 p>
df1 <-
structure(
list(id = 1:5,
gender = c("female", "male", "male", "male", "male"),
reference_day = structure(c(18052, NA, 18052, 18052, 18052), class = "Date"),
var1 = structure(c(16505, 17144, 18139, NA, 16639), class = "Date"),
var2 = structure(c(NA, 18042, 16544, 16697, NA), class = "Date"),
var3 = structure(c(17845, 18070, 17152, 16571, NA), class = "Date")),
row.names = c(NA, -5L), class = "data.frame")
df1
id gender reference_day var1 var2 var3
1 1 female 2019-06-05 2015-03-11 <NA> 2018-11-10
2 2 male <NA> 2016-12-09 2019-05-26 2019-06-23
3 3 male 2019-06-05 2019-08-31 2015-04-19 2016-12-17
4 4 male 2019-06-05 <NA> 2015-09-19 2015-05-16
5 5 male 2019-06-05 2015-07-23 <NA> <NA>
我要的结果是这样的:
id gender reference_day var1 var2 var3 closest_to_left closest_to_right
1 1 female 2019-06-05 2015-03-11 <NA> 2018-11-10 var3 <NA>
2 2 male <NA> 2016-12-09 2019-05-26 2019-06-23 <NA> <NA>
3 3 male 2019-06-05 2019-08-31 2015-04-19 2016-12-17 var3 var1
4 4 male 2019-06-05 <NA> 2015-09-19 2015-05-16 var2 <NA>
5 5 male 2019-06-05 2015-07-23 <NA> <NA> var1 <NA>
经过多次尝试和错误后,我实际上能够使用 dplyr 的 case_when 函数找到解决方案,但它需要大量的样板代码,这让我认为必须有一个更聪明的方法解决方案。
我个人更喜欢使用 dplyr,但非常感谢任何帮助。
最佳答案
执行此操作的自定义函数 -
library(dplyr)
cols <- df1 %>% select(starts_with('var')) %>% names
closest_to_right <- function(x, y) {
tmp <- y - x
if(any(tmp > 0, na.rm = TRUE))
cols[tmp %in% min(tmp[tmp > 0], na.rm = TRUE)] else NA
}
closest_to_left <- function(x, y) {
tmp <- y - x
if(any(tmp < 0, na.rm = TRUE))
cols[tmp %in% max(tmp[tmp < 0], na.rm = TRUE)] else NA
}
df1 %>%
rowwise() %>%
mutate(closest_to_left = closest_to_left(reference_day, c_across(starts_with('var'))),
closest_to_right = closest_to_right(reference_day, c_across(starts_with('var')))) %>%
ungroup
# id gender reference_day var1 var2 var3 closest_to_left closest_to_right
# <int> <chr> <date> <date> <date> <date> <chr> <chr>
#1 1 female 2019-06-05 2015-03-11 NA 2018-11-10 var3 NA
#2 2 male NA 2016-12-09 2019-05-26 2019-06-23 NA NA
#3 3 male 2019-06-05 2019-08-31 2015-04-19 2016-12-17 var3 var1
#4 4 male 2019-06-05 NA 2015-09-19 2015-05-16 var2 NA
#5 5 male 2019-06-05 2015-07-23 NA NA var1 NA
关于r - 如何检测 R 中数据框中给定引用变量下方和上方的最接近值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71006802/