我的数据如下所示,我需要填写 DATE
缺少的 NA 值。
ID DAY TIME DATE
<dbl> <dbl> <dbl> <date>
1 1 1 1 NA
2 1 1 2 NA
3 1 1 3 NA
4 1 1 4 NA
5 1 1 5 NA
6 1 2 1 2021-09-25
7 1 2 2 2021-09-25
8 1 2 3 2021-09-25
9 1 2 4 2021-09-25
10 1 2 5 2021-09-25
11 1 3 1 NA
12 1 3 2 NA
13 1 3 3 NA
14 1 3 4 NA
15 1 3 5 NA
16 2 1 1 2022-02-26
17 2 1 2 2022-02-26
18 2 1 3 2022-02-26
19 2 1 4 2022-02-26
20 2 1 5 2022-02-26
21 2 2 1 NA
22 2 2 2 2022-02-27
23 2 2 3 2022-02-27
24 2 2 4 2022-02-27
25 2 2 5 2022-02-27
与 DAY
对应的 DATE
值对于每个 ID
都是不同的。最终的数据集应如下所示:
# A tibble: 25 × 4
ID DAY TIME DATE
<dbl> <dbl> <dbl> <chr>
1 1 1 1 2021-09-24
2 1 1 2 2021-09-24
3 1 1 3 2021-09-24
4 1 1 4 2021-09-24
5 1 1 5 2021-09-24
6 1 2 1 2021-09-25
7 1 2 2 2021-09-25
8 1 2 3 2021-09-25
9 1 2 4 2021-09-25
10 1 2 5 2021-09-25
11 1 3 1 2021-09-26
12 1 3 2 2021-09-26
13 1 3 3 2021-09-26
14 1 3 4 2021-09-26
15 1 3 5 2021-09-26
16 2 1 1 2022-02-26
17 2 1 2 2022-02-26
18 2 1 3 2022-02-26
19 2 1 4 2022-02-26
20 2 1 5 2022-02-26
21 2 2 1 2022-02-27
22 2 2 2 2022-02-27
23 2 2 3 2022-02-27
24 2 2 4 2022-02-27
25 2 2 5 2022-02-27
最佳答案
一种方法是根据已知日期确定“零”日期,并在共享共同引用日期的给定 ID
内使用 fill
。然后,您可以使用引用日期加上DAY
来确定最终的DATE
。
library(tidyverse)
df %>%
mutate(DATE = as.Date(DATE),
DATE0 = DATE - DAY) %>%
group_by(ID) %>%
fill(DATE0, .direction = "updown") %>%
mutate(DATE = DATE0 + DAY) %>%
select(-DATE0)
输出
ID DAY TIME DATE
<int> <int> <int> <date>
1 1 1 1 2021-09-24
2 1 1 2 2021-09-24
3 1 1 3 2021-09-24
4 1 1 4 2021-09-24
5 1 1 5 2021-09-24
6 1 2 1 2021-09-25
7 1 2 2 2021-09-25
8 1 2 3 2021-09-25
9 1 2 4 2021-09-25
10 1 2 5 2021-09-25
11 1 3 1 2021-09-26
12 1 3 2 2021-09-26
13 1 3 3 2021-09-26
14 1 3 4 2021-09-26
15 1 3 5 2021-09-26
16 2 1 1 2022-02-26
17 2 1 2 2022-02-26
18 2 1 3 2022-02-26
19 2 1 4 2022-02-26
20 2 1 5 2022-02-26
21 2 2 1 2022-02-27
22 2 2 2 2022-02-27
23 2 2 3 2022-02-27
24 2 2 4 2022-02-27
25 2 2 5 2022-02-27
数据
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
DAY = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), TIME = c(1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), DATE = c(NA, NA, NA,
NA, NA, "2021-09-25", "2021-09-25", "2021-09-25", "2021-09-25",
"2021-09-25", NA, NA, NA, NA, NA, "2022-02-26", "2022-02-26",
"2022-02-26", "2022-02-26", "2022-02-26", NA, "2022-02-27",
"2022-02-27", "2022-02-27", "2022-02-27")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25"))
关于r - 如何使用 dplyr 填写日期?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71850450/