我有一个数据框“df1”,如下所示:
structure(list(MAPS_code = c("SARI", "SABO", "SABO", "SABO",
"ISLA", "TROP"), Location_code = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-",
"LCP-"), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall"), Lat = c(NA, NA, NA,
NA, NA, "51.23"), Long = c(NA, NA, NA, NA, NA, "-109.26")), row.names = c(NA, 6L), class = "data.frame")
第二个数据框“df2”如下所示:
structure(list(MAPS_code = c("SAFR", "SAGA", "ELPU", "ISLA",
"SABO", "SATE", "QUST", "SARI", "PANA", "COPA", "LOAN", "GAPA",
"MELI", "CAGO", "PINO", "GABO", "RIJA", "FILA", "AMIS"), Lat = c(8.765833,
8.751389, 8.768611, 8.835833, 8.801111, 8.808333, 8.815, 8.827778,
8.781667, 8.778333, 8.783333, 8.800833, 8.790278, 8.754444, 8.844444,
8.801389, 8.786667, 8.785278, 8.952222), Long = c(-82.94277,
-82.951111, -82.95, -82.963056, -82.917222, -82.924444, -82.923889,
-82.924167, -82.896944, -82.955833, -82.938611, -82.972222, -82.967222,
-82.925833, -82.97, -82.972222, -82.964722, -82.976111, -82.833333
), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall"
), Location = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-",
"LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-",
"LCP-", "LCP-", "LCP-", "LCP-", "LCP-")), class = "data.frame", row.names = c(NA,
-19L))
当相应行的“Contact”、“Location”和“MAPS_code”在 df1 之间匹配时,如何从 df2 的“Lat”和“Long”填充 df1 的“Lat”和“Long”的每一行和 df2?因此 df1 的结果如下所示:
structure(list(MAPS_code = c("SARI", "SABO", "SABO", "SABO",
"ISLA", "TROP"), Location_code = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-",
"LCP-"), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall"), Lat = c("8.827778", "8.801111", "8.801111
", "8.801111", "8.835833", "51.23"), Long = c("-82.92417", "-82.91722", "-82.91722", "-82.91722", "-82.96306", "-109.26")), row.names = c(NA, 6L), class = "data.frame")
请注意,如果经纬度中已经有数据,我不希望将它们删除或用 NA 覆盖。
最佳答案
更新的答案
我们可以使用 dplyr::coalesce
在两对 Lat
和 Long
中检索不是 NA
的值>:
library(dplyr)
df1 %>%
rename(Location = Location_code) %>%
left_join(df2, by = c('MAPS_code', 'Contact', 'Location')) %>%
mutate(across(ends_with('.x'), as.double)) %>%
mutate(Lat = coalesce(!!!(select(., starts_with('Lat')))),
Long = coalesce(!!!select(., starts_with('Long')))) %>%
select(!contains('.'))
MAPS_code Location Contact Lat Long
1 SARI LCP- Chase Mendenhall 8.827778 -82.92417
2 SABO LCP- Chase Mendenhall 8.801111 -82.91722
3 SABO LCP- Chase Mendenhall 8.801111 -82.91722
4 SABO LCP- Chase Mendenhall 8.801111 -82.91722
5 ISLA LCP- Chase Mendenhall 8.835833 -82.96306
6 TROP LCP- Tom Jones 51.230000 -109.26000
关于R 多个数据框列匹配以填充列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73911322/