r - 根据日期间隔改变新列

标签 r dplyr

2023 年 2 月 4 日编辑

数据:

library(dplyr)

DF<-data.frame(
  stringsAsFactors = FALSE,
                ID = c(1L,2L,2L,3L,3L,3L,4L,4L,
                       4L,4L,5L,5L,6L,6L,6L,7L,7L,7L,7L,7L),
             COLOR = c("BLUE","RED","BLUE","RED",
                       "RED","BLUE","RED","BLUE","BLUE","BLUE","BLUE",
                       "BLACK","GREEN","GRAY","GRAY","RED","BLUE","BLUE",
                       "BLUE","BLUE"),
        COLOR_DATE = c("2001-01-01","2001-01-01",
                       "2002-02-02","2001-01-01","2002-02-02","2008-08-08",
                       "2001-01-01","2002-02-02","2009-09-09","2009-09-09",
                       "2001-01-01","2006-06-06","2001-01-01","2008-01-01",
                       "2008-01-01","2001-01-01","2002-02-02","2003-03-03",
                       "2004-04-04","2007-07-07")
)

期望的输出:

       ID COLOR COLOR_DATE TRUE_COLOR
1   1  BLUE 2001-01-01       BLUE
2   2   RED 2001-01-01        MIX
3   2  BLUE 2002-02-02        MIX
4   3   RED 2001-01-01        MIX
5   3   RED 2002-02-02        MIX
6   3  BLUE 2008-08-08        MIX
7   4   RED 2001-01-01       BLUE
8   4  BLUE 2002-02-02       BLUE
9   4  BLUE 2009-09-09       BLUE
10  4  BLUE 2009-09-09       BLUE
11  5  BLUE 2001-01-01       BLUE
12  5 BLACK 2006-06-06       BLUE
13  6 GREEN 2001-01-01       <NA>
14  6  GRAY 2008-01-01       <NA>
15  6  GRAY 2008-01-01       <NA>
16  7   RED 2001-01-01       BLUE
17  7  BLUE 2002-02-02       BLUE
18  7  BLUE 2003-03-03       BLUE
19  7  BLUE 2004-04-04       BLUE
20  7  BLUE 2007-07-07       BLUE

逻辑:

当同一ID中只有RED COLOR时,则TRUE_COLOR = RED。 当同一ID中只有BLUE COLOR时,则TRUE_COLOR = BLUE。 当REDBLUE COLOR都在同一个ID中时,则TRUE_COLOR = MIX

但是, 如果 COLOR 至少最近 5 年保持不变,则 TRUE_COLOR = REDBLUE(如示例数据 ID 4 和7).

REDBLUE之外的其他COLOR将被忽略。

最后,RED123BLUE234 应分别解释为 REDBLUE

如何解决?

最佳答案

library(tidyverse); library(lubridate)
blu_red <- quo(COLOR2 %in% c("BLUE", "RED"))
DF %>%
  mutate(COLOR2 = str_extract(COLOR, "BLUE|RED"), 
         COLOR_DATE = as.Date(COLOR_DATE)) %>% 
  arrange(COLOR_DATE) %>% 
  group_by(ID) %>%
  mutate(
    TRUE_COLOR = case_when(
      isTRUE(all(!(!!blu_red))) ~ NA,
      isTRUE(n() == 1 & !!blu_red) ~ COLOR2,
      isTRUE(n() == 1 & !(!!blu_red)) ~ NA,
      isTRUE((last(COLOR_DATE) - COLOR_DATE[last(which(COLOR2 != lag(COLOR2)))]) >= years(5) & 
        last(COLOR2) %in% c("BLUE", "RED")) ~ last(COLOR2),
      isTRUE(all(COLOR2[!!blu_red] == "BLUE")) ~ "BLUE",
      isTRUE(all(COLOR2[!!blu_red] == "RED")) ~ "RED",
      TRUE ~ "MIX")) %>% 
  ungroup() %>% 
  select(- COLOR2) %>% 
  arrange(ID)

关于r - 根据日期间隔改变新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75896311/

相关文章:

r - 使用计算标签从 groupby 创建列

javascript - 与 session$sendCustomMessage 的 Shiny 和 JavaScript 交互

r - 自动记录数据集

r - countif 像 r 中的 excel 函数

重复具有特定值的行

在 R 中按月检索客户的独特比例

将 NA 值替换为组中的数值

r - 如何配置RStudio软件包构建以在多台计算机上工作

R图覆盖条形图,绘图类型为 "p"(与因素混淆)

r - 如何替换R中列中的数据?