python - Python 中的快速探路者关联网络算法 (PFNET)

我一直在尝试实现来自 https://doi.org/10.1016/j.ipm.2007.09.005 的“快速探路者”网络修剪算法在 Python/networkX 中，最终偶然发现了一些返回看起来或多或少正确的东西。

不过，我没有足够的能力来测试结果是否始终如一(或永远)正确。特别是对于有向图，我有疑问，而且我不确定原件是否打算用于有向图。我还没有找到任何探路者网络算法的 Python 实现，但如果有现有的替代方案可供使用，我也会对比较结果感兴趣。我知道 R ( https://rdrr.io/cran/comato/src/R/pathfinder.r) 中有一个实现，我也从中获得了一些灵感。

根据我最好的(阅读:较差的)理解，论文中描述的算法使用由 Floyd-Warshall 算法生成的最短路径的距离矩阵，并将这些距离与加权邻接矩阵进行比较，仅选择匹配项作为链接。无向情况下预期结果的直觉是所有可能的最小生成树中所有边的并集。

这就是我试图用以下函数模拟的内容:

def minimal_pathfinder(G, r = float("inf")):
    """ 
    Args:
    -----
    G [networkX graph]:
        Graph to filter links from.
    r [float]:
        "r" parameter as in the paper.

    Returns:
    -----
    PFNET [networkX graph]:
        Graph containing only the PFNET links.
    """
    
    import networkx as nx
    from collections import defaultdict
    
    H = G.copy()
    
    # Initialize adjacency matrix W
    W = defaultdict(lambda: defaultdict(lambda: float("inf")))
    
    # Set diagonal to 0
    for u in H.nodes():
        W[u][u] = 0 
    
    # Get weights and set W values
    for i, j, d in H.edges(data=True):
        W[i][j] = d['weight'] # Add weights to W
        
    # Get shortest path distance matrix D
    dist = nx.floyd_warshall_predecessor_and_distance(H, weight='weight')[1]
    
    # Iterate over all triples to get values for D
    for k in H.nodes():
        for i in H.nodes():
            for j in H.nodes():
                if r == float("inf"): # adapted from the R-comato version which does a similar check
                # Discard non-shortest paths
                    dist[i][j] = min(dist[i][j], (dist[i][k] + dist[k][j]))
                else:
                    dist[i][j] = min(dist[i][j], (((dist[i][k]) ** r) + ((dist[k][j]) ** r )) ** (1/r))
                
    # Check for type; set placeholder for either case
    if not H.is_directed():
        PFNET = nx.Graph()
        PFNET.add_nodes_from(H.nodes(data=True))
    else:
        PFNET = nx.DiGraph()
        PFNET.add_nodes_from(H.nodes(data=True))
        
    # Add links D_ij only if == W_ij
    for i in H.nodes():
        for j in H.nodes():
            if dist[i][j] == W[i][j]: # If shortest path distance equals distance in adjacency
                if dist[i][j] == float("inf"): # Skip infinite path lengths
                    pass
                elif i == j: # Skip the diagonal
                    pass
                else: # Add link to PFNET
                    weight = dist[i][j]
                    PFNET.add_edge(i, j, weight=weight)
                    
    return PFNET

我已经用一堆真实网络(有向和无向)和随机生成的网络对此进行了测试，这两种情况都从 20 个节点到大约 300 个节点不等，最多几千条边(例如完整图、连接的穴居人图) .在所有情况下，它都会返回一些东西，但我不太相信结果是否正确。因为我没有找到其他实现，所以我不确定如何验证它是否始终如一地工作(我实际上根本没有使用任何其他语言)。

我相当确定这仍然有问题，但我不确定它可能是什么。

简单用例:

G = nx.complete_graph(50) # Generate a complete graph

# Add random weights
for (u,v,w) in G.edges(data=True):
    w['weight'] = np.random.randint(1,20)
    
PFNET = minimal_pathfinder(G)

print(nx.info(G))
print(nx.info(PFNET))

输出:

Graph with 50 nodes and 1225 edges
Graph with 50 nodes and 236 edges

我想知道两件事:

<强>1。知道实现可能有什么问题吗？我应该对结果有信心吗？

知道如何将其转换为使用相似性数据而不是距离吗？

对于第二个，我考虑将权重归一化到 0-1 范围，并将所有距离转换为 1 - 距离的相似性。但我不确定这在理论上是否有效，希望得到第二意见。

编辑:我可能发现了 Q2 的解决方案。在原始论文中:将 float("inf") 更改为 float("-inf") 并将 min 更改为 max 在第一个循环中。来自作者的脚注:

Actually, using similarities or distances has no influence at all in our proposal. In case of using similarities, we would only need to replace MIN by MAX, ’>’ by ’<’, and use r = -inf to mimic the MIN function instead of the MAX function in the Fast Pathfinder algorithm.

非常感谢任何输入，谢谢!

使用“来自数据文件的示例”部分，根据评论编辑(从 here 添加错误示例):

起始图中的邻接:

matrix([[0, 1, 4, 2, 2],
        [1, 0, 2, 3, 0],
        [4, 2, 0, 3, 1],
        [2, 3, 3, 0, 3],
        [2, 0, 1, 3, 0]], dtype=int32)

然后用函数剪枝后，首先转换成networkX无向图:

matrix([[0, 1, 0, 2, 2],
        [1, 0, 2, 3, 0],
        [0, 2, 0, 3, 1],
        [2, 3, 3, 0, 3],
        [2, 0, 1, 3, 0]], dtype=int32)

它似乎只掉落了所有其他边缘的最高权重。由于预期结果在链接示例的边缘列表中，因此这也是我获得的结果的边缘列表:

source  target  weight
1       2       1
1       4       2
1       5       2
2       3       2
2       4       3 
3       4       3
3       5       1
4       5       3

最佳答案

下面是 Fast-Pathfinder 在 Python 中使用 networkx 库的可能实现。注意:

实现对应于paper .
它的灵感来自于 GitHub 中的 C 实现。 .
仅实现最大变体，其中输入矩阵是相似度矩阵而不是距离矩阵(具有最高值的边被保留)。

def fast_pfnet(G, q, r):
    
    s = G.number_of_nodes()
    weights_init = np.zeros((s,s))
    weights = np.zeros((s,s))
    hops = np.zeros((s,s))
    pfnet = np.zeros((s,s))

    for i, j, d in G.edges(data=True):
        weights_init[i,j] = d['weight']
        weights_init[j,i] = d['weight']

    for i in range(s):
        for j in range(s):
            weights[i,j] = -weights_init[i,j]
            if i==j:
                hops[i,j] = 0
            else:
                hops[i,j] = 1

    def update_weight_maximum(i, j, k, wik, wkj, weights, hops, p):
        if p<=q:
            if r==0:
                # r == infinity
                dist = max(wik, wkj)
            else:
                dist = (wik**r + wkj**r) ** (1/r)

            if dist < weights[i,j]:
                weights[i,j] = dist
                weights[j,i] = dist
                hops[i,j] = p
                hops[j,i] = p
                
    def is_equal(a, b):
        return abs(a-b)<0.00001

    for k in range(s):
        for i in range(s):
            if i!=k:
                beg = i+1
                for j in range(beg, s):
                    if j!=k:
                        update_weight_maximum(i, j, k, weights_init[i,k], weights_init[k,j], weights, hops, 2)
                        update_weight_maximum(i, j, k, weights[i,k], weights[k,j], weights, hops, hops[i,k]+hops[k,j])

    for i in range(s):
        for j in range(s): # Possible optimisation: in case of symmetrical matrices, we do not need to go from 0 to s but from i+1 to s
            if not is_equal(weights_init[i,j], 0):
                if is_equal(weights[i,j], -weights_init[i,j]):
                    pfnet[i,j] = weights_init[i,j]
                else:
                    pfnet[i,j] = 0

    return nx.from_numpy_matrix(pfnet)

用法:

m = np.matrix([[0, 1, 4, 2, 2],
        [1, 0, 2, 3, 0],
        [4, 2, 0, 3, 1],
        [2, 3, 3, 0, 3],
        [2, 0, 1, 3, 0]], dtype=np.int32)

G = nx.from_numpy_matrix(m)

# Fast-PFNET parameters set to emulate MST-PFNET
# This variant is OK for other parameters (q, r) but for the ones below
# it is faster to implement the MST-PFNET variant instead.
q = G.number_of_nodes()-1
r = 0

P = fast_pfnet(G, q, r)

list(P.edges(data=True))

这应该返回:

[(0, 2, {'weight': 4.0}),
 (1, 3, {'weight': 3.0}),
 (2, 3, {'weight': 3.0}),
 (3, 4, {'weight': 3.0})]

这类似于 website 上显示的内容(见“Pathfinder 应用后”一节中的示例)。

关于python - Python 中的快速探路者关联网络算法 (PFNET)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/70262806/

python - Python 中的快速探路者关联网络算法 (PFNET)

上一篇：javascript - 如何在知道前两个点和到下一个点的距离的情况下获得下一个点的坐标

下一篇：javascript - Jsplumb 分离连接