我已将链接文档(文档树)保存在列表 (list
) 中
某些文档树包含不完整的项目(用 seach=1
标记)。某些树可能有多个标有 search=1
的不完整树。
我想使用包含文档树的查找列表(list_lookup
)来扩展/完成这些不完整的树,列表和list_lookup
中始终只有一棵匹配的树。匹配文档树的level
应调整为list
中的文档树。
示例数据和所需的输出:
library(tidyverse)
# initial df1, aaa is incomplete (it is in fact linked to other documents, but this information is stored in the lookup list)
df1 <- tibble(id_from=c(NA_character_,"111","222","333","444","444","bbb"),
id_to=c("111","222","333","444","aaa","bbb","ccc"),
level=c(0,1,2,3,4,4,5),
search=c(0,0,0,0,1,0,0))
df1
#> # A tibble: 7 × 4
#> id_from id_to level search
#> <chr> <chr> <dbl> <dbl>
#> 1 <NA> 111 0 0
#> 2 111 222 1 0
#> 3 222 333 2 0
#> 4 333 444 3 0
#> 5 444 aaa 4 1
#> 6 444 bbb 4 0
#> 7 bbb ccc 5 0
# lookup dfs, df2 contains the further document links of aaa
df2 <- tibble(id_from=c(NA,"aaa","x","x"),
id_to=c("aaa","x","x1","x2"),
level=c(0,1,2,2))
df3 <- tibble(id_from=c(NA,"thank"),
id_to=c("thank","you"),
level=c(0,1))
#list with df
list <- list(df1)
#list with lookups
list_lookup <- list(df2,df3)
list_lookup
#> [[1]]
#> # A tibble: 4 × 3
#> id_from id_to level
#> <chr> <chr> <dbl>
#> 1 <NA> aaa 0
#> 2 aaa x 1
#> 3 x x1 2
#> 4 x x2 2
#>
#> [[2]]
#> # A tibble: 2 × 3
#> id_from id_to level
#> <chr> <chr> <dbl>
#> 1 <NA> thank 0
#> 2 thank you 1
#what I need; an updated list of dfs where information from the lookup list are included
df1_wanted <- tibble(id_from=c(NA_character_,"111","222","333","444","444","aaa","bbb","x","x"),
id_to=c("111","222","333","444","aaa","bbb","x","ccc","x1","x1"),
level=c(0,1,2,3,4,4,5,5,6,6))
list(df1_wanted)
#> [[1]]
#> # A tibble: 10 × 3
#> id_from id_to level
#> <chr> <chr> <dbl>
#> 1 <NA> 111 0
#> 2 111 222 1
#> 3 222 333 2
#> 4 333 444 3
#> 5 444 aaa 4
#> 6 444 bbb 4
#> 7 aaa x 5 <- added from df2, level adjusted
#> 8 bbb ccc 5
#> 9 x x1 6 <- added from df2, level adjusted
#> 10 x x1 6 <- added from df2, level adjusted
创建于 2023 年 4 月 1 日 reprex v2.0.2
我的方法:
我考虑过使用 purrr::map
将函数映射到 list
的每个项目,但是,我不确定这个函数应该是什么样子。
最佳答案
在此解决方案中:
- 我首先定义一个递归函数
get_tree()
,它采用单个id
和查找表,并从表中返回该id 的完整树
。 - 然后,我定义一个函数
complete_tree()
,它采用数据帧和查找表列表,针对每个id_to 迭代
其中get_tree()
search == 1
并针对每个查找表调整level
,并将结果绑定(bind)到初始数据帧。 - 最后,我对
list
的每个元素迭代complete_tree()
。
library(dplyr)
library(purrr)
get_tree <- function(id, lookup) {
branch <- filter(lookup, id_from == id)
if (nrow(branch) == 0) return()
bind_rows(
branch,
map(branch$id_to, \(x) get_tree(x, lookup))
)
}
complete_trees <- function(data, lookups) {
branches <- pmap(
filter(data, search == 1),
\(id_to, level, ...) {
bind_rows(map(
lookups,
\(lookup) get_tree(id_to, lookup)
)) %>%
mutate(level = level + .env$level)
}
)
bind_rows(data, branches) %>%
select(!search) %>%
arrange(level, id_from)
}
map(list, \(x) complete_trees(x, lookups = list_lookup))
结果:
[[1]]
# A tibble: 10 × 3
id_from id_to level
<chr> <chr> <dbl>
1 <NA> 111 0
2 111 222 1
3 222 333 2
4 333 444 3
5 444 aaa 4
6 444 bbb 4
7 aaa x 5
8 bbb ccc 5
9 x x1 6
10 x x2 6
关于r - 使用函数通过查找列表完成不完全链接的文档(文档树),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75908153/