r - R中基于多列的匹配数据框

我有两个像这样的巨大数据集。

df2 中有一种水果，PEACH，由于某种原因从 df1 中丢失了。我想在 df1 中添加缺少的水果。

library(tidyverse)

df1 <- tibble(central_fruit=c("ananas","apple"),
              fruits=c("ananas,anan,anannas",("apple,appl,appless")),
              counts=c("100,10,1","50,20,2"))
df1
#> # A tibble: 2 × 3
#>   central_fruit fruits              counts  
#>   <chr>         <chr>               <chr>   
#> 1 ananas        ananas,anan,anannas 100,10,1
#> 2 apple         apple,appl,appless  50,20,2

df2 <- tibble(fruit=c("ananas","anan","anannas","apple","appl","appless","PEACH"),
              counts=c(100,10,1,50,20,2,1000))
df2
#> # A tibble: 7 × 2
#>   fruit   counts
#>   <chr>    <dbl>
#> 1 ananas     100
#> 2 anan        10
#> 3 anannas      1
#> 4 apple       50
#> 5 appl        20
#> 6 appless      2
#> 7 PEACH     1000

^{由 reprex package 于 2022 年 3 月 20 日创建(v2.0.1)}

我希望我的数据看起来像这样

df1 
   central_fruit fruits              counts  
   <chr>         <chr>               <chr>   
 1 ananas        ananas,anan,anannas 100,10,1
 2 apple         apple,appl,appless  50,20,2
 3 PEACH            NA               1000

非常感谢任何帮助或建议

最佳答案

请在下面找到一种可能的 data.table 方法。

Reprex

代码

library(tidyverse) # to read your tibbles
library(data.table)

setDT(df1)
setDT(df2)

df1[df2, on = .(central_fruit = fruit)
    ][, `:=` (counts = fcoalesce(counts, as.character(i.counts)), i.counts = NULL)
      ][central_fruit %chin% c(df1$central_fruit, setdiff(df2$fruit, unlist(strsplit(df1$fruit, ","))))][]

输出

#>    central_fruit              fruits   counts
#> 1:        ananas ananas,anan,anannas 100,10,1
#> 2:         apple  apple,appl,appless  50,20,2
#> 3:         PEACH                <NA>     1000

^{由 reprex package 于 2022 年 3 月 20 日创建(v2.0.1)}

关于r - R中基于多列的匹配数据框，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/71546056/

r - R中基于多列的匹配数据框

上一篇：线性回归上的 R 循环

下一篇：ios - 请在 Podfile 中指定此目标的平台