r - 手工编写 BFS 搜索算法

标签 r algorithm igraph breadth-first-search

我正在尝试了解有关网络图搜索算法的更多信息。为了说明这一点,我创建了以下示例。

第 1 步:假设有 100 个国家/地区 (country_1....country_100) 彼此随机连接

set.seed(123)


library(igraph)


countries <- paste0("country_", 1:100)


g <- make_empty_graph(100)


num_edges <- 200
edge_list <- sample(countries, size = num_edges * 2, replace = TRUE)
edge_list <- matrix(edge_list, ncol = 2, byrow = TRUE)
g <- graph_from_edgelist(edge_list, directed = FALSE)


V(g)$name <- countries

plot(g, vertex.label.cex = 0.7, vertex.label.color = "black", vertex.label.dist = 2)

第 2 步:现在,假设有 20 个人 (person_A...person_T) 居住在这些国家/地区(每个国家/地区最多只能有一个人 - 其中 80 个国家/地区将为空):

edge_list <- as_edgelist(g)

df <- as.data.frame(edge_list)

colnames(df) <- c("from", "to")

people <- paste0("person_", LETTERS[1:20])

assignment <- sample(countries, size = length(people), replace = FALSE)
names(assignment) <- people

df2 <- data.frame(country = countries)


df2$person <- ifelse(df2$country %in% assignment, names(assignment)[match(df2$country, assignment)], "empty")

第 3 步:作为可选步骤,我们可以可视化结果:

library(visNetwork)

df2$color <- ifelse(df2$person == "empty", "grey", "red")

df2$label <- ifelse(df2$person == "empty", df2$country, paste0(df2$person, "\n", df2$country))

nodes <- data.frame(id = df2$country, label = df2$label, color = df2$color)

edges <- df

visNetwork(nodes, edges) %>% 
    visInteraction(navigationButtons = TRUE)

enter image description here

我的问题:假设我们采用“person_A” - 我想找出距离“person_A”最近的人以及这个人住在哪个国家/地区。我有兴趣学习如何手动编写此问题的 BFS 算法 - 例如:以 person_A 并搜索以度为 1 的半径内的每个人 - 如果没有找到人,现在搜索以度为 2 的半径内的每个人...继续,直到你找到找到第一个人。

我知道如何使用该算法的预构建实现:

adj_matrix <- as_adjacency_matrix(g)

diag(adj_matrix) <- 0

shortest_paths <- shortest.paths(g)

df2_filtered <- subset(df2, person != "empty")
selected_countries <- intersect(rownames(shortest_paths), df2_filtered$country)

filtered_paths <- shortest_paths[selected_countries, selected_countries]

item = df2[df2$person %in% c("person_A"), ]

#answer (exclude distance = 0, i.e. the same country itself)
sort(filtered_paths[rownames(filtered_paths) == item$country, ])[2]

有人可以告诉我如何编写一个搜索算法(手动)来完成此任务,该任务从一个人的名字开始 - 然后在每一步打印搜索结果,直到找到一个人?

最佳答案

背景/概述

广度优先搜索的总体思路是从图中的一个点(我们称之为a)开始,然后将所有邻居添加到未探索点的列表中(其中我们称之为前沿)。然后,您逐一浏览列表,对于每个点,将该点的未见过的邻居添加到队列的末尾,依此类推,直到找到您正在寻找的点(这可能是一个特定点b,或者只是满足您设置的特定条件的任何点),或者您已经没有地方了(因为您已经探索过所有地方)。

整理数据

首先,我要稍微清理一下数据。我创建了一个数据框,其中只有存在的人(没有空):

people_df <- df2 %>% 
    filter(person != "empty") %>%
    select(person, country)

然后,我将国家/地区连接数据框 df 转换为数据框 neighbours_df,它为我提供了每个点的邻居。数据帧的结构方式,它有(例如)一行:

      from         to
country_31 country_79

但没有相反的情况,即

      from         to
country_79 country_31

因此,我切换列,将反转的列添加到第一个列的末尾,并将每个点的邻居分组到一个列表中,以使其更整洁:

reversed_df <- df %>% 
    mutate(new_from = to, to = from, from = new_from) %>% 
    select(from, to)

neighbours_df <- df %>% 
    bind_rows(reversed_df) %>% 
    filter(from != to) %>%
    group_by(from) %>%
    summarise(to = list(to))
    
#          from                            to
# 1    country_1   c("country_8", "country_92")

实现

breadth_first_search <- function(person, neighbours_df, people_df) {
  # get the country of the person
    starting_country <- people_df$country[people_df$person == person]

  # initialise the visited list 
    visited <- c()

  # initialise the frontier with the starting point
    frontier <- list()
    frontier[[starting_country]] <- 0

  # initialise distance from start variable (so we can print how far we are from the start)
    distance_from_start <- 0

  # while the frontier is not empty
    while (length(frontier) > 0) {

      # get the first element of the frontier
        current <- names(frontier)[1]

      # get the distance from current to start
        distance_from_start <- frontier[[1]]
      
      print(paste0("Current point: ", current, " (", distance_from_start, " steps from start)"))

      # remove the first element of the frontier (the one we just selected)
        frontier <- frontier[-1]

        # if the current point is in the country column of our `people_df` (i.e. someone lives there), and it's not the starting country, return the person who lives there
        if (current %in% people_df$country && current != starting_country) {
          found_person <- people_df$person[people_df$country == current]
          print(paste0("Found person: ", found_person, ", " , distance_from_start , " steps from start, in country ", current))
            return(found_person)
        }

        # add the current point to the visited list
        visited <- c(visited, current)

        # get the neighbors of the current point
        neighbs <- neighbours_df$to[neighbours_df$from == current][[1]]

        # add the neighbors to the frontier if they haven't been visited already
        neighbs <- neighbs[!neighbs %in% visited]
        frontier <- c(frontier, setNames(rep(distance_from_start + 1, length(neighbs)), neighbs))
        }
    # if we search through all the points, and didn't find anyone, return NA
    return(NA)
}

print(breadth_first_search("person_R", neighbours_df, people_df))
# [1] "person_J"

引用资料/更多信息

我从this article中大量抄袭了由 Red Blob Games 编写(this other piece 的姊妹篇,它很好地介绍了广度优先搜索(以及其他类似的图搜索算法,如 A* 的工作原理)。如果您想更全面地了解它们的工作原理,请参阅BFS 的缺点和优点,和/或想要尝试一些交互式的东西,我建议您检查一下!

关于r - 手工编写 BFS 搜索算法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76503515/

相关文章:

r - 使用 2 个变量将数据帧从宽转换为长

r - 将值分配给特定的data.table列和行

c++ - 提高棋盘模式的时间复杂度

c - 强平衡树 - 改进

r - 二分图匹配以匹配两个集合

r - 我可以在 ggplot2 中获得箱线图凹口吗?

r - 在R中将十六进制转换为十进制

algorithm - 编程竞赛办法

r - 如何计算不同群体的共同值(value)?

python - 将非常大的 RDF 三元组加载到 iGraph -> 快速顶点查找?