我得到了这段代码，其中距离是定义如下的下三角矩阵:

distance = np.tril(scipy.spatial.distance.cdist(points, points))  
def make_them_touch(distance):
    """
    Return the every distance where two points touched each other. See example below.
    """
    thresholds = np.unique(distance)[1:] # to avoid 0 at the beginning, not taking a lot of time at all
    result = dict()
    for t in thresholds:
            x, y = np.where(distance == t)
            result[t] = [i for i in zip(x,y)]
    return result

我的问题是 np.where 对于大矩阵(例如 2000*100)来说非常慢。
如何通过改进 np.where 或更改算法来加速此代码？

编辑:为 MaxU指出，这里最好的优化不是生成方阵并使用迭代器。

示例:

points = np.array([                                                                        
...: [0,0,0,0],                                                            
...: [1,1,1,1],         
...: [3,3,3,3],              
...: [6,6,6,6]                             
...: ])  

In [106]: distance = np.tril(scipy.spatial.distance.cdist(points, points))

In [107]: distance
Out[107]: 
array([[ 0.,  0.,  0.,  0.],
   [ 2.,  0.,  0.,  0.],
   [ 6.,  4.,  0.,  0.],
   [12., 10.,  6.,  0.]])

In [108]: make_them_touch(distance)
Out[108]: 
{2.0: [(1, 0)],
 4.0: [(2, 1)],
 6.0: [(2, 0), (3, 2)],
 10.0: [(3, 1)],
 12.0: [(3, 0)]}

最佳答案

更新1:这是上三角距离矩阵的片段(这并不重要，因为距离矩阵始终是对称的):

from itertools import combinations res = {tup[0]:tup[1] for tup in zip(pdist(points), list(combinations(range(len(points)), 2)))}

结果:

In [111]: res Out[111]: {1.4142135623730951: (0, 1), 4.69041575982343: (0, 2), 4.898979485566356: (1, 2)}
<小时/>
更新2:此版本将支持距离重复:

In [164]: import pandas as pd

首先我们构建一个 Pandas.Series:

In [165]: s = pd.Series(list(combinations(range(len(points)), 2)), index=pdist(points)) In [166]: s Out[166]: 2.0 (0, 1) 6.0 (0, 2) 12.0 (0, 3) 4.0 (1, 2) 10.0 (1, 3) 6.0 (2, 3) dtype: object

现在我们可以按索引分组并生成坐标列表:

In [167]: s.groupby(s.index).apply(list) Out[167]: 2.0 [(0, 1)] 4.0 [(1, 2)] 6.0 [(0, 2), (2, 3)] 10.0 [(1, 3)] 12.0 [(0, 3)] dtype: object

PS 这里的主要思想是，如果您打算随后将其展平并消除重复项，则不应构建平方距离矩阵。

关于python - 如何使用三角矩阵使 np.where 更高效？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50907049/

python - 如何使用三角矩阵使 np.where 更高效？

示例:

上一篇：python - Django错误: ValueError: invalid literal for int() with base 10: '10,030'

下一篇：python - 如何从python3.6中的tkinter中的条目小部件获取值