我有以下数据集:
id code date charge
1 AAA 01jan2016 23
1 BBB 20jan2016 45
1 CCC 19feb2018 23
1 DDD 20jan2019 123
1 EEE 02jan2016 43
1 FFF 12dec2015 12
2 AAA 07jan2017 12
2 BBB 08jan2017 32
2 CCC 06jan2017 12
2 DDD 10oct2019 12
3 AAA 12dec2014 12
3 BBB 18dec2014 12
3 CCC 01dec2014 13
如何保留代码 AAA
的 -30
至 +90
天内的所有记录?
这是我期望的输出:
id code date charge
1 AAA 01jan2016 23
1 BBB 20jan2016 45
1 EEE 02jan2016 43
1 FFF 12dec2015 12
2 AAA 07jan2017 12
2 BBB 08jan2017 32
2 CCC 06jan2017 12
3 AAA 12dec2014 12
3 BBB 18dec2014 12
3 CCC 01dec2014 13
我尝试使用日期过滤器,但所有 ID 的 AAA
日期都不同,因此不起作用。
最佳答案
一个选项是首先将“Date”转换为 Date
类(mdy
- 来自 lubridate
),然后按“ID”分组',检查“日期”值是否在“代码”为“AAA”的“日期”之前 30 天和该“日期”之后 90 天内之间</p>
library(dplyr)
library(lubridate)
df1 %>%
mutate(Date = mdy(Date)) %>%
group_by(ID) %>%
filter(between(Date, min(Date[Code == "AAA"]) - days(30),
min(Date[Code == "AAA"]) + days(90)))
# A tibble: 10 x 4
# Groups: ID [3]
# ID Code Date Charge
# <int> <chr> <date> <dbl>
# 1 1 AAA 2016-01-01 23
# 2 1 BBB 2016-01-20 45
# 3 1 EEE 2016-01-02 43
# 4 1 FFF 2015-12-12 12
# 5 2 AAA 2017-01-07 12
# 6 2 BBB 2017-01-08 32
# 7 2 CCC 2017-01-06 12
# 8 3 AAA 2014-12-12 12
# 9 3 BBB 2014-12-18 12
#10 3 CCC 2014-12-01 13
数据
df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L), Code = c("AAA", "BBB", "CCC", "DDD", "EEE", "FFF",
"AAA", "BBB", "CCC", "DDD", "AAA", "BBB", "CCC"), Date = c("1/1/2016",
"1/20/2016", "2/19/2018", "1/20/2019", "1/2/2016", "12/12/2015",
"1/7/2017", "1/8/2017", "1/6/2017", "10/10/2019", "12/12/2014",
"12/18/2014", "12/1/2014"), Charge = c(23, 45, 23, 123, 43, 12,
12, 32, 12, 12, 12, 12, 13)), class = "data.frame", row.names = c(NA,
-13L))
关于r - 根据代码和日期过滤数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57981941/