R 多个数据框列匹配以填充列

标签 r dplyr match left-join

我有一个数据框“df1”,如下所示:

structure(list(MAPS_code = c("SARI", "SABO", "SABO", "SABO", 
"ISLA", "TROP"), Location_code = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-",
"LCP-"), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall"), Lat = c(NA, NA, NA, 
NA, NA, "51.23"), Long = c(NA, NA, NA, NA, NA, "-109.26")), row.names = c(NA, 6L), class = "data.frame")

第二个数据框“df2”如下所示:

structure(list(MAPS_code = c("SAFR", "SAGA", "ELPU", "ISLA", 
"SABO", "SATE", "QUST", "SARI", "PANA", "COPA", "LOAN", "GAPA", 
"MELI", "CAGO", "PINO", "GABO", "RIJA", "FILA", "AMIS"), Lat = c(8.765833, 
8.751389, 8.768611, 8.835833, 8.801111, 8.808333, 8.815, 8.827778, 
8.781667, 8.778333, 8.783333, 8.800833, 8.790278, 8.754444, 8.844444, 
8.801389, 8.786667, 8.785278, 8.952222), Long = c(-82.94277, 
-82.951111, -82.95, -82.963056, -82.917222, -82.924444, -82.923889, 
-82.924167, -82.896944, -82.955833, -82.938611, -82.972222, -82.967222, 
-82.925833, -82.97, -82.972222, -82.964722, -82.976111, -82.833333
), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall"
), Location = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", 
"LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", 
"LCP-", "LCP-", "LCP-", "LCP-", "LCP-")), class = "data.frame", row.names = c(NA, 
-19L))

当相应行的“Contact”、“Location”和“MAPS_code”在 df1 之间匹配时,如何从 df2 的“Lat”和“Long”填充 df1 的“Lat”和“Long”的每一行和 df2?因此 df1 的结果如下所示:

structure(list(MAPS_code = c("SARI", "SABO", "SABO", "SABO", 
"ISLA", "TROP"), Location_code = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-", 
"LCP-"), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall"), Lat = c("8.827778", "8.801111", "8.801111
", "8.801111", "8.835833", "51.23"), Long = c("-82.92417", "-82.91722", "-82.91722", "-82.91722", "-82.96306", "-109.26")), row.names = c(NA, 6L), class = "data.frame")

请注意,如果经纬度中已经有数据,我不希望将它们删除或用 NA 覆盖。

最佳答案

更新的答案 我们可以使用 dplyr::coalesce 在两对 LatLong 中检索不是 NA 的值>:

library(dplyr)

df1 %>%
  rename(Location = Location_code) %>%
  left_join(df2, by = c('MAPS_code', 'Contact', 'Location')) %>%
  mutate(across(ends_with('.x'), as.double)) %>%
  mutate(Lat = coalesce(!!!(select(., starts_with('Lat')))), 
         Long = coalesce(!!!select(., starts_with('Long')))) %>%
  select(!contains('.'))


  MAPS_code Location          Contact       Lat       Long
1      SARI     LCP- Chase Mendenhall  8.827778  -82.92417
2      SABO     LCP- Chase Mendenhall  8.801111  -82.91722
3      SABO     LCP- Chase Mendenhall  8.801111  -82.91722
4      SABO     LCP- Chase Mendenhall  8.801111  -82.91722
5      ISLA     LCP- Chase Mendenhall  8.835833  -82.96306
6      TROP     LCP-        Tom Jones 51.230000 -109.26000

关于R 多个数据框列匹配以填充列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73911322/

相关文章:

r - 如何在安装了防火墙的情况下连接到shinyapps?

r - 将变量函数应用于 data.table 中的列

r - 如何在 dplyr 中执行相当于 Excel 滚动 sumifs 的操作?

sql - MySQL SELECT 查询字符串匹配

elasticsearch - Elasticsearch搜索 bool +必须查询

r - 如何从 df 中对多个列进行子集化,包括 grep match

r - Predict.glm(, type ="terms") 实际上做了什么?

r - 如何找到曲线的线性部分

r - 使用 dplyr 仅对某些列执行操作

r - dplyr:变异内的整数采样