r - 使用函数通过查找列表完成不完全链接的文档(文档树)

我已将链接文档(文档树)保存在列表 (list) 中

某些文档树包含不完整的项目(用 seach=1 标记)。某些树可能有多个标有 search=1 的不完整树。

我想使用包含文档树的查找列表(list_lookup)来扩展/完成这些不完整的树，列表和list_lookup中始终只有一棵匹配的树。匹配文档树的level应调整为list中的文档树。

示例数据和所需的输出:

library(tidyverse)

# initial df1, aaa is incomplete (it is in fact linked to other documents, but this information is stored in the lookup list)
 
df1 <- tibble(id_from=c(NA_character_,"111","222","333","444","444","bbb"),
             id_to=c("111","222","333","444","aaa","bbb","ccc"),
             level=c(0,1,2,3,4,4,5),
             search=c(0,0,0,0,1,0,0))
df1
#> # A tibble: 7 × 4
#>   id_from id_to level search
#>   <chr>   <chr> <dbl>  <dbl>
#> 1 <NA>    111       0      0
#> 2 111     222       1      0
#> 3 222     333       2      0
#> 4 333     444       3      0
#> 5 444     aaa       4      1
#> 6 444     bbb       4      0
#> 7 bbb     ccc       5      0


# lookup dfs, df2 contains the further document links of aaa
df2 <- tibble(id_from=c(NA,"aaa","x","x"),
             id_to=c("aaa","x","x1","x2"),
             level=c(0,1,2,2))

df3 <- tibble(id_from=c(NA,"thank"),
                     id_to=c("thank","you"),
                     level=c(0,1))

#list with df
list <- list(df1)

#list with lookups
list_lookup <- list(df2,df3)

list_lookup
#> [[1]]
#> # A tibble: 4 × 3
#>   id_from id_to level
#>   <chr>   <chr> <dbl>
#> 1 <NA>    aaa       0
#> 2 aaa     x         1
#> 3 x       x1        2
#> 4 x       x2        2
#> 
#> [[2]]
#> # A tibble: 2 × 3
#>   id_from id_to level
#>   <chr>   <chr> <dbl>
#> 1 <NA>    thank     0
#> 2 thank   you       1

#what I need; an updated list of dfs where information from the lookup list are included

df1_wanted <- tibble(id_from=c(NA_character_,"111","222","333","444","444","aaa","bbb","x","x"),
                     id_to=c("111","222","333","444","aaa","bbb","x","ccc","x1","x1"),
                     level=c(0,1,2,3,4,4,5,5,6,6))

list(df1_wanted)
#> [[1]]
#> # A tibble: 10 × 3
#>    id_from id_to level
#>    <chr>   <chr> <dbl>
#>  1 <NA>    111       0
#>  2 111     222       1
#>  3 222     333       2
#>  4 333     444       3
#>  5 444     aaa       4
#>  6 444     bbb       4
#>  7 aaa     x         5  <- added from df2, level adjusted
#>  8 bbb     ccc       5  
#>  9 x       x1        6  <- added from df2, level adjusted
#> 10 x       x1        6  <- added from df2, level adjusted

^{创建于 2023 年 4 月 1 日 reprex v2.0.2}

我的方法:

我考虑过使用 purrr::map 将函数映射到 list 的每个项目，但是，我不确定这个函数应该是什么样子。

最佳答案

在此解决方案中:

我首先定义一个递归函数 get_tree()，它采用单个 id 和查找表，并从表中返回该 id 的完整树。
然后，我定义一个函数 complete_tree()，它采用数据帧和查找表列表，针对每个 id_to 迭代 get_tree() 其中 search == 1 并针对每个查找表调整 level，并将结果绑定(bind)到初始数据帧。
最后，我对 list 的每个元素迭代 complete_tree()。

library(dplyr)
library(purrr)

get_tree <- function(id, lookup) {
  branch <- filter(lookup, id_from == id)
  if (nrow(branch) == 0) return()
  bind_rows(
    branch, 
    map(branch$id_to, \(x) get_tree(x, lookup))
  )
}

complete_trees <- function(data, lookups) {
  branches <- pmap(
    filter(data, search == 1),
    \(id_to, level, ...) {
      bind_rows(map(
          lookups, 
          \(lookup) get_tree(id_to, lookup)
        )) %>%
        mutate(level = level + .env$level)
    }
  )
  bind_rows(data, branches) %>%
    select(!search) %>%
    arrange(level, id_from)
}

map(list, \(x) complete_trees(x, lookups = list_lookup))

结果:

[[1]]
# A tibble: 10 × 3
   id_from id_to level
   <chr>   <chr> <dbl>
 1 <NA>    111       0
 2 111     222       1
 3 222     333       2
 4 333     444       3
 5 444     aaa       4
 6 444     bbb       4
 7 aaa     x         5
 8 bbb     ccc       5
 9 x       x1        6
10 x       x2        6

关于r - 使用函数通过查找列表完成不完全链接的文档(文档树)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/75908153/

r - 使用函数通过查找列表完成不完全链接的文档(文档树)

上一篇：sql - 每组的行之间曲折？

下一篇：python - 将包含当前日期和时间的列添加到 Polars DataFrame