python - 为什么计算优先附着是昂贵的？

我有一个有 1034 个顶点和 53498 条边的无向图。我正在计算顶点的优先附着索引。两个顶点之间的优先附着相似度定义为第一个顶点的度乘以第二个顶点的度。我注意到我的计算速度非常慢。计算上述图表需要 2.7 分钟。我不确定是我的算法慢还是其他问题。如果有人能稍微研究一下我的代码，我将非常感激。

编辑:我刚刚意识到 S 是一个 1034_by_1034 矩阵。看看嵌套的 for 循环，它似乎是一个 O(n^2) 算法!我想这就是它慢的原因。你不同意吗？

def pa(graph):
    """
        Calculates Preferential Attachment index.

        Returns S the similarity matrix.
    """
    A = gts.adjacency(graph)
    S = np.zeros(A.shape)
    for i in xrange(S.shape[0]):
        for j in xrange(S.shape[0]):
            i_degree = graph.vertex(i).out_degree()
            j_degree = graph.vertex(j).out_degree()
            factor = i_degree * j_degree
            S[i,j] = factor
    return S

最佳答案

据我所知，这些是我可以建议的加速:

零加速:i_ Degree 不依赖于 j，因此将其向上移动一级

def pa(graph):
    A = gts.adjacency(graph)
    S = np.zeros(A.shape)
    for i in xrange(S.shape[0]):
        i_degree = graph.vertex(i).out_degree() # easy to see that this can be put here instead, since it does not depend on j 
        for j in xrange(S.shape[0]):
            j_degree = graph.vertex(j).out_degree()
            factor = i_degree * j_degree
            S[i,j] = factor
    return S

第一次加速:仅调用 out_ Degree() N 次，而不是 2N^2。

def pa2(graph):
    A = gts.adjacency(graph)
    i_degree = numpy.zeros(A.shape[0])
    for i in xrange(A.shape[0]):
        i_degree[i] = graph.vertex(i).out_degree()
    S = np.zeros(A.shape)
    for i in xrange(S.shape[0]):
        for j in xrange(S.shape[0]):
            S[i,j] = i_degree[i]*i_degree[j]
    return S

第二次加速:numpy 代替 python for 循环

def pa3(graph):
    A = gts.adjacency(graph)
    i_degree = numpy.zeros(A.shape[0])
    for i in xrange(A.shape[0]):
        i_degree[i] = graph.vertex(i).out_degree()
    S = i_degree[:,None]*i_degree[None,:]
    return S

这滥用了问题的对称性。

注意:[None,:] 的作用与使用 [numpy.newaxis,:] 相同。如果你想保留你的代码，你也可以在 out_ Degree() 方法上使用 @memoize 装饰器，但最好只在递归的东西上使用它，这不是其中之一。

关于python - 为什么计算优先附着是昂贵的？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22553665/

python - 为什么计算优先附着是昂贵的？

上一篇：algorithm - 在范围系列中找到最频繁重复的整数

下一篇：algorithm - K-Means++ 的流程是什么，我想知道 K-Mean 和 K-Means++ 之间的确切区别？