经过一些在线研究( 12numpyscipyscikitmath ),我找到了几种在 Python 中计算欧氏距离的方法:

# 1

# 2
distance.euclidean(vector1, vector2)

# 3

# 4
sqrt((xa-xb)^2 + (ya-yb)^2 + (za-zb)^2)

# 5
dist = [(a - b)**2 for a, b in zip(vector1, vector2)]
dist = math.sqrt(sum(dist))

# 6
math.hypot(x, y)


我感兴趣的上下文是计算数字元组对之间的欧几里得距离,例如(52, 106, 35, 12)之间的距离和(33, 153, 75, 10) .




方法5(zip,math.sqrt) > 方法1(numpy.linalg.norm) > 方法2(scipy.spatial.distance) > 方法3(sklearn.metrics.pairwise.euclidean_distances)


对于其余的,令人惊讶的是,Method5 是最快的。而对于使用 numpyMethod1 来说,正如我们所期望的那样,它在 C 语言中进行了大量优化,是第二快的。

对于scipy.spatial.distance,如果直接进入函数定义,您会看到它实际上使用numpy.linalg.norm,只不过它会在实际的 numpy.linalg.norm 之前对两个输入向量执行验证。这就是为什么它比 numpy.linalg.norm 稍慢的原因。

最后是 sklearn,根据文档:

This formulation has two advantages over other ways of computing distances. First, it is computationally efficient when dealing with sparse data. Second, if one argument varies but the other remains unchanged, then dot(x, x) and/or dot(y, y) can be pre-computed. However, this is not the most precise way of doing this computation, and the distance matrix returned by this function may not be exactly symmetric as required




import numpy as np
from scipy.spatial import distance
from sklearn.metrics.pairwise import euclidean_distances
import math

# 1
def eudis1(v1, v2):
    return np.linalg.norm(v1-v2)

# 2
def eudis2(v1, v2):
    return distance.euclidean(v1, v2)

# 3
def eudis3(v1, v2):
    return euclidean_distances(v1, v2)

# 5
def eudis5(v1, v2):
    dist = [(a - b)**2 for a, b in zip(v1, v2)]
    dist = math.sqrt(sum(dist))
    return dist

dis1 = (52, 106, 35, 12)
dis2 = (33, 153, 75, 10)
v1, v2 = np.array(dis1), np.array(dis2)

import timeit

def wrapper(func, *args, **kwargs):
    def wrapped():
        return func(*args, **kwargs)
    return wrapped

wrappered1 = wrapper(eudis1, v1, v2)
wrappered2 = wrapper(eudis2, v1, v2)
wrappered3 = wrapper(eudis3, v1, v2)
wrappered5 = wrapper(eudis5, v1, v2)
t1 = timeit.repeat(wrappered1, repeat=3, number=100000)
t2 = timeit.repeat(wrappered2, repeat=3, number=100000)
t3 = timeit.repeat(wrappered3, repeat=3, number=100000)
t5 = timeit.repeat(wrappered5, repeat=3, number=100000)

print('t1: ', sum(t1)/len(t1))
print('t2: ', sum(t2)/len(t2))
print('t3: ', sum(t3)/len(t3))
print('t5: ', sum(t5)/len(t5))


t1:  0.654838958307
t2:  1.53977598714
t3:  6.7898791732
t5:  0.422228400305


In [8]: eudis1(v1,v2)
Out[8]: 64.60650122085238

In [9]: eudis2(v1,v2)
Out[9]: 64.60650122085238

In [10]: eudis3(v1,v2)
Out[10]: array([[ 64.60650122]])

In [11]: eudis5(v1,v2)
Out[11]: 64.60650122085238

