我正在尝试了解有关网络图搜索算法的更多信息。为了说明这一点,我创建了以下示例。
第 1 步:假设有 100 个国家/地区 (country_1....country_100) 彼此随机连接
set.seed(123)
library(igraph)
countries <- paste0("country_", 1:100)
g <- make_empty_graph(100)
num_edges <- 200
edge_list <- sample(countries, size = num_edges * 2, replace = TRUE)
edge_list <- matrix(edge_list, ncol = 2, byrow = TRUE)
g <- graph_from_edgelist(edge_list, directed = FALSE)
V(g)$name <- countries
plot(g, vertex.label.cex = 0.7, vertex.label.color = "black", vertex.label.dist = 2)
第 2 步:现在,假设有 20 个人 (person_A...person_T) 居住在这些国家/地区(每个国家/地区最多只能有一个人 - 其中 80 个国家/地区将为空):
edge_list <- as_edgelist(g)
df <- as.data.frame(edge_list)
colnames(df) <- c("from", "to")
people <- paste0("person_", LETTERS[1:20])
assignment <- sample(countries, size = length(people), replace = FALSE)
names(assignment) <- people
df2 <- data.frame(country = countries)
df2$person <- ifelse(df2$country %in% assignment, names(assignment)[match(df2$country, assignment)], "empty")
第 3 步:作为可选步骤,我们可以可视化结果:
library(visNetwork)
df2$color <- ifelse(df2$person == "empty", "grey", "red")
df2$label <- ifelse(df2$person == "empty", df2$country, paste0(df2$person, "\n", df2$country))
nodes <- data.frame(id = df2$country, label = df2$label, color = df2$color)
edges <- df
visNetwork(nodes, edges) %>%
visInteraction(navigationButtons = TRUE)
我的问题:假设我们采用“person_A” - 我想找出距离“person_A”最近的人以及这个人住在哪个国家/地区。我有兴趣学习如何手动编写此问题的 BFS 算法 - 例如:以 person_A 并搜索以度为 1 的半径内的每个人 - 如果没有找到人,现在搜索以度为 2 的半径内的每个人...继续,直到你找到找到第一个人。
我知道如何使用该算法的预构建实现:
adj_matrix <- as_adjacency_matrix(g)
diag(adj_matrix) <- 0
shortest_paths <- shortest.paths(g)
df2_filtered <- subset(df2, person != "empty")
selected_countries <- intersect(rownames(shortest_paths), df2_filtered$country)
filtered_paths <- shortest_paths[selected_countries, selected_countries]
item = df2[df2$person %in% c("person_A"), ]
#answer (exclude distance = 0, i.e. the same country itself)
sort(filtered_paths[rownames(filtered_paths) == item$country, ])[2]
有人可以告诉我如何编写一个搜索算法(手动)来完成此任务,该任务从一个人的名字开始 - 然后在每一步打印搜索结果,直到找到一个人?
最佳答案
背景/概述
广度优先搜索的总体思路是从图中的一个点(我们称之为a
)开始,然后将所有邻居添加到未探索点的列表中(其中我们称之为前沿
)。然后,您逐一浏览列表,对于每个点,将该点的未见过的邻居添加到队列的末尾,依此类推,直到找到您正在寻找的点(这可能是一个特定点b
,或者只是满足您设置的特定条件的任何点),或者您已经没有地方了(因为您已经探索过所有地方)。
整理数据
首先,我要稍微清理一下数据。我创建了一个数据框,其中只有存在的人(没有空):
people_df <- df2 %>%
filter(person != "empty") %>%
select(person, country)
然后,我将国家/地区连接数据框 df
转换为数据框 neighbours_df
,它为我提供了每个点的邻居。数据帧的结构方式,它有(例如)一行:
from to country_31 country_79
但没有相反的情况,即
from to country_79 country_31
因此,我切换列,将反转的列添加到第一个列的末尾,并将每个点的邻居分组到一个列表中,以使其更整洁:
reversed_df <- df %>%
mutate(new_from = to, to = from, from = new_from) %>%
select(from, to)
neighbours_df <- df %>%
bind_rows(reversed_df) %>%
filter(from != to) %>%
group_by(from) %>%
summarise(to = list(to))
# from to
# 1 country_1 c("country_8", "country_92")
实现
breadth_first_search <- function(person, neighbours_df, people_df) {
# get the country of the person
starting_country <- people_df$country[people_df$person == person]
# initialise the visited list
visited <- c()
# initialise the frontier with the starting point
frontier <- list()
frontier[[starting_country]] <- 0
# initialise distance from start variable (so we can print how far we are from the start)
distance_from_start <- 0
# while the frontier is not empty
while (length(frontier) > 0) {
# get the first element of the frontier
current <- names(frontier)[1]
# get the distance from current to start
distance_from_start <- frontier[[1]]
print(paste0("Current point: ", current, " (", distance_from_start, " steps from start)"))
# remove the first element of the frontier (the one we just selected)
frontier <- frontier[-1]
# if the current point is in the country column of our `people_df` (i.e. someone lives there), and it's not the starting country, return the person who lives there
if (current %in% people_df$country && current != starting_country) {
found_person <- people_df$person[people_df$country == current]
print(paste0("Found person: ", found_person, ", " , distance_from_start , " steps from start, in country ", current))
return(found_person)
}
# add the current point to the visited list
visited <- c(visited, current)
# get the neighbors of the current point
neighbs <- neighbours_df$to[neighbours_df$from == current][[1]]
# add the neighbors to the frontier if they haven't been visited already
neighbs <- neighbs[!neighbs %in% visited]
frontier <- c(frontier, setNames(rep(distance_from_start + 1, length(neighbs)), neighbs))
}
# if we search through all the points, and didn't find anyone, return NA
return(NA)
}
print(breadth_first_search("person_R", neighbours_df, people_df))
# [1] "person_J"
引用资料/更多信息
我从this article中大量抄袭了由 Red Blob Games 编写(this other piece 的姊妹篇,它很好地介绍了广度优先搜索(以及其他类似的图搜索算法,如 A* 的工作原理)。如果您想更全面地了解它们的工作原理,请参阅BFS 的缺点和优点,和/或想要尝试一些交互式的东西,我建议您检查一下!
关于r - 手工编写 BFS 搜索算法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76503515/