2023 年 2 月 4 日编辑
数据:
library(dplyr)
DF<-data.frame(
stringsAsFactors = FALSE,
ID = c(1L,2L,2L,3L,3L,3L,4L,4L,
4L,4L,5L,5L,6L,6L,6L,7L,7L,7L,7L,7L),
COLOR = c("BLUE","RED","BLUE","RED",
"RED","BLUE","RED","BLUE","BLUE","BLUE","BLUE",
"BLACK","GREEN","GRAY","GRAY","RED","BLUE","BLUE",
"BLUE","BLUE"),
COLOR_DATE = c("2001-01-01","2001-01-01",
"2002-02-02","2001-01-01","2002-02-02","2008-08-08",
"2001-01-01","2002-02-02","2009-09-09","2009-09-09",
"2001-01-01","2006-06-06","2001-01-01","2008-01-01",
"2008-01-01","2001-01-01","2002-02-02","2003-03-03",
"2004-04-04","2007-07-07")
)
期望的输出:
ID COLOR COLOR_DATE TRUE_COLOR
1 1 BLUE 2001-01-01 BLUE
2 2 RED 2001-01-01 MIX
3 2 BLUE 2002-02-02 MIX
4 3 RED 2001-01-01 MIX
5 3 RED 2002-02-02 MIX
6 3 BLUE 2008-08-08 MIX
7 4 RED 2001-01-01 BLUE
8 4 BLUE 2002-02-02 BLUE
9 4 BLUE 2009-09-09 BLUE
10 4 BLUE 2009-09-09 BLUE
11 5 BLUE 2001-01-01 BLUE
12 5 BLACK 2006-06-06 BLUE
13 6 GREEN 2001-01-01 <NA>
14 6 GRAY 2008-01-01 <NA>
15 6 GRAY 2008-01-01 <NA>
16 7 RED 2001-01-01 BLUE
17 7 BLUE 2002-02-02 BLUE
18 7 BLUE 2003-03-03 BLUE
19 7 BLUE 2004-04-04 BLUE
20 7 BLUE 2007-07-07 BLUE
逻辑:
当同一ID
中只有RED COLOR
时,则TRUE_COLOR = RED
。
当同一ID
中只有BLUE COLOR
时,则TRUE_COLOR = BLUE
。
当RED
和BLUE COLOR
都在同一个ID
中时,则TRUE_COLOR = MIX
。
但是,
如果 COLOR
至少最近 5 年保持不变,则 TRUE_COLOR = RED
或 BLUE
(如示例数据 ID 4 和7).
除RED
或BLUE
之外的其他COLOR
将被忽略。
最后,RED123
和 BLUE234
应分别解释为 RED
和 BLUE
。
如何解决?
最佳答案
library(tidyverse); library(lubridate)
blu_red <- quo(COLOR2 %in% c("BLUE", "RED"))
DF %>%
mutate(COLOR2 = str_extract(COLOR, "BLUE|RED"),
COLOR_DATE = as.Date(COLOR_DATE)) %>%
arrange(COLOR_DATE) %>%
group_by(ID) %>%
mutate(
TRUE_COLOR = case_when(
isTRUE(all(!(!!blu_red))) ~ NA,
isTRUE(n() == 1 & !!blu_red) ~ COLOR2,
isTRUE(n() == 1 & !(!!blu_red)) ~ NA,
isTRUE((last(COLOR_DATE) - COLOR_DATE[last(which(COLOR2 != lag(COLOR2)))]) >= years(5) &
last(COLOR2) %in% c("BLUE", "RED")) ~ last(COLOR2),
isTRUE(all(COLOR2[!!blu_red] == "BLUE")) ~ "BLUE",
isTRUE(all(COLOR2[!!blu_red] == "RED")) ~ "RED",
TRUE ~ "MIX")) %>%
ungroup() %>%
select(- COLOR2) %>%
arrange(ID)
关于r - 根据日期间隔改变新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75896311/