我一直在用 R 编写这个逻辑。有人可以帮我写这个逻辑吗?
逻辑:
block 引用>(Priority1) For an Item, Date1 == Calendar then we have to select that row. Eg - Item B
block 引用>(Priority2) For an Item if not Priority1 then, Date1 ~ Previous date in Calendar column then select that row. Eg - Item C
block 引用>(Priority3) For an Item if not Priority1 & 2 then, Date1 ~ Next date in Calendar column then select that row. Eg - Item A
输入:
Item Date1 Calendar A 2021-01-08 2021-01-11 A 2021-01-08 2021-01-19 B 2021-02-05 2021-01-29 B 2021-02-05 2021-02-05 B 2021-02-05 2021-02-12 C 2021-02-15 2021-02-07 C 2021-02-15 2021-02-13 C 2021-02-15 2021-02-20 C 2021-02-15 2021-02-27
这是
dput
数据的input <- structure(list(Item = c("A", "A", "B", "B","B", "C", "C","C","C"), Date1 = c("2021-01-08", "2021-01-08", "2021-02-05", "2021-02-05", "2021-02-05", "2021-02-15", "2021-02-15", "2021-02-15", "2021-02-15"), Calendar = c("2021-01-11", "2021-01-19", "2021-01-29", "2021-02-05","2021-02-12", "2021-02-07","2021-02-13", "2021-02-20", "2021-02-27")), class = "data.frame", row.names = c(NA, -9L))
输出:
Item Date1 Calendar A 2021-01-08 2021-01-11 B 2021-02-05 2021-02-05 C 2021-02-15 2021-02-13
这是
dput
预期输出。output <- structure(list(Item = c("A","B", "C"), Date1 = c("2021-01-08","2021-02-05", "2021-02-15"), Calendar = c("2021-01-11","2021-02-05","2021-02-13")), class = "data.frame", row.names = c(NA, -3L))
最佳答案
这是另一种tidyverse
方法。计算 Date1
和 Calendar
之间的绝对差。然后,根据上述规则分配优先级值(对于优先级 1、2 或 3)。然后,根据优先级和天数差异进行排序。最后,对于每个组,取第一行(最高优先级,基于规则和最近的日期)。
library(tidyverse)
input$Date1 <- as.Date(input$Date1)
input$Calendar <- as.Date(input$Calendar)
input %>%
mutate(Diff = abs(Date1 - Calendar)) %>%
group_by(Item) %>%
mutate(Priority = case_when(
Diff == 0 ~ 1,
Date1 > Calendar ~ 2,
TRUE ~ 3
)) %>%
arrange(Priority, Diff) %>%
slice(1)
输出
Item Date1 Calendar Diff Priority
<chr> <date> <date> <drtn> <dbl>
1 A 2021-01-08 2021-01-11 3 days 3
2 B 2021-02-05 2021-02-05 0 days 1
3 C 2021-02-15 2021-02-13 2 days 2
关于r - 日期选择的逻辑,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65612638/