我有一个算法可以计算每个点 p
(我的元组中表示的坐标值)到我的元组列表中的每个其他元组的距离。
点列表:
centerList = [(54, 2991),
(1717, 2989),
(1683, 2991),
(1604, 2991),
(114, 2991),
(919,222),
(930,233)]
距离函数:
def getDistance(p0, p1):
return math.sqrt((p0[0] - p1[0])**2 + (p0[1] - p1[1])**2)
用于计算点 p
到元组列表中每个其他点的距离的算法。
i = 0
distanceList = []
for p in range(len(centerList)):
while i < len(centerList):
print centerList[p], centerList[i], getDistance(centerList[p], centerList[i])
distance = getDistance(centerList[p], centerList[i])
if distance < 20:
distanceList.append(distance)
i += 1
i = p + 2
我当前的算法以一种非冗余的方式递增,但在其当前状态下对于实际应用来说过于粗暴。我的问题在于我的实际 centerList
包含数千个元组。
可以做些什么来提高这个元组比较算法的时间效率?
最佳答案
你可以结合sklearn.metrics.euclidean_distances
使用 numpy
的 bool 索引进行计算:
>>> from sklearn.metrics import euclidean_distances
>>> import numpy as np
>>> centerList = np.array(centerList)
>>> distances = euclidean_distances(centerList)
>>> distances[distances<20]
array([ 0. , 0. , 0. , 0. ,
0. , 0. , 15.55634919, 15.55634919, 0. ])
距离的计算使用了在高速 C 中开发的 numpy 矩阵代数。文档还强调了底层数学技术的效率:
For efficiency reasons, the euclidean distance between a pair of row vector x and y is computed as:
dist(x, y) = sqrt(dot(x, x) - 2 * dot(x, y) + dot(y, y))
This formulation has two advantages over other ways of computing distances. First, it is computationally efficient when dealing with sparse data. Second, if one argument varies but the other remains unchanged, then dot(x, x) and/or dot(y, y) can be pre-computed.
关于python - 改进元组距离计算算法以提高时间效率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44681117/