所以我有一个这样的数据框
ID Date TIME var Data misc
1 1/3/2018 3:30 AM a string1 string1
1 4/23/2019 1:32 PM b string2 string1
1 1/3/2018 4:53 PM c string3 string1
2 1/4/2018 3:32 AM d string4 string2
2 3/3/2018 3:30 PM s string5 string2
2 3/3/2018 3:30 PM e string6 string2
3 4/23/2019 6:24 AM w
3 4/23/2019 1:32 PM s
3 4/24/2019 3:20 PM s
3 4/24/2019 3:20 PM a
有许多类似于 Data
和 misc
的列,我想使用另一个由 ID = 3 数据组成的 df 将它们加入到填充 df 中。
ID3_数据
DATE Time Data misc
4/23/2019 6:24 AM string7 stringA
4/23/2019 1:32 PM string8 stringB
4/24/2019 3:20 PM string9 stringC
4/24/2019 3:20 PM string10 stringC
那么我怎样才能将我的 DF 与这个 ID3_data 只连接到 ID =3
的行呢?
此外,还有另一个问题,我唯一的标识符是Date
和TIME
,但我确实有相同标识符的不同匹配项,有没有办法说第一个实例转到第一个,第二个实例转到第二个???简而言之,最终的 DF 应该如下所示:
ID Date TIME var Data misc
1 1/3/2018 3:30 AM a string1 string1
1 4/23/2019 1:32 PM b string2 string1
1 1/3/2018 4:53 PM c string3 string1
2 1/4/2018 3:32 AM d string4 string2
2 3/3/2018 3:30 PM s string5 string2
2 3/3/2018 3:30 PM e string6 string2
3 4/23/2019 6:24 AM w string7 stringA
3 4/23/2019 1:32 PM s string8 stringB
3 4/24/2019 3:20 PM s string9 stringC
3 4/24/2019 3:20 PM a string10 stringC
同样,优先级是连接选择的行,但如果重复的问题也可以使用 dplyr 在同一时间完成,那就太好了。
最佳答案
我们可以使用coalesce
进行连接。假设缺失值为 NA
library(dplyr)# 1.0.0
left_join(DF, ID3_data %>%
mutate(ID = 3), by = c('ID', 'Date' = 'DATE', 'TIME' = 'Time')) %>%
mutate(Data = coalesce(Data.x, Data.y), misc = coalesce(misc.x, misc.y))
或者,如果存在重复项,则可以选择绑定(bind)两个数据集的行,然后按仅非 NA 行的summarise
进行分组 (dplyr
1.0 .0 允许汇总
多行)
cbind(ID = 3, ID3_data) %>%
set_names(names(DF)) %>%
bind_rows(DF) %>%
group_by(ID, Date, TIME) %>%
summarise(across(everything(), ~ .[!is.na(.)]))
# A tibble: 10 x 5
# Groups: ID, Date, TIME [8]
# ID Date TIME Data misc
# <dbl> <chr> <chr> <chr> <chr>
# 1 1 1/3/2018 3:30 AM string1 string1
# 2 1 1/3/2018 4:53 PM string3 string1
# 3 1 4/23/2019 1:32 PM string2 string1
# 4 2 1/4/2018 3:32 AM string4 string2
# 5 2 3/3/2018 3:30 PM string5 string2
# 6 2 3/3/2018 3:30 PM string6 string2
# 7 3 4/23/2019 1:32 PM string8 stringB
# 8 3 4/23/2019 6:24 AM string7 stringA
# 9 3 4/24/2019 3:20 PM string9 stringC
#10 3 4/24/2019 3:20 PM string10 stringC
数据
DF <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L),
Date = c("1/3/2018", "4/23/2019", "1/3/2018", "1/4/2018",
"3/3/2018", "3/3/2018", "4/23/2019", "4/23/2019", "4/24/2019",
"4/24/2019"), TIME = c("3:30 AM", "1:32 PM", "4:53 PM", "3:32 AM",
"3:30 PM", "3:30 PM", "6:24 AM", "1:32 PM", "3:20 PM", "3:20 PM"
), Data = c("string1", "string2", "string3", "string4", "string5",
"string6", NA, NA, NA, NA), misc = c("string1", "string1",
"string1", "string2", "string2", "string2", NA, NA, NA, NA
)), class = "data.frame", row.names = c(NA, -10L))
ID3_data <- structure(list(DATE = c("4/23/2019", "4/23/2019", "4/24/2019",
"4/24/2019"), Time = c("6:24 AM", "1:32 PM", "3:20 PM", "3:20 PM"
), Data = c("string7", "string8", "string9", "string10"), misc = c("stringA",
"stringB", "stringC", "stringC")), class = "data.frame",
row.names = c(NA,
-4L))
关于r - 如何使用 dplyr 仅连接某些行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62801420/