python - 广播的 NumPy 算法 - 为什么一种方法的性能如此之高？

This question is a follow up to my answer in Efficient way to compute the Vandermonde matrix.

这是设置:

x = np.arange(5000)  # an integer array
N = 4

现在，我将计算 Vandermonde matrix以两种不同的方式:

m1 = (x ** np.arange(N)[:, None]).T

还有，

m2 = x[:, None] ** np.arange(N)

健全性检查:

np.array_equal(m1, m2)
True

这些方法是相同的，但它们的性能不同:

%timeit m1 = (x ** np.arange(N)[:, None]).T
42.7 µs ± 271 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit m2 = x[:, None] ** np.arange(N)
150 µs ± 995 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

因此，尽管第一种方法需要在最后进行换位，但它仍然比第二种方法快 3 倍。

唯一的区别是，在第一种情况下，广播的是较小数组，而在第二种情况下，广播的是较大。

So, with a fairly decent understanding of how numpy works, I can guess that the answer would involve the cache. The first method is a lot more cache friendly than the second. However, I'd like an official word from someone with more experience than me.

时间上出现这种鲜明对比的原因可能是什么？

最佳答案

我也试着查看 broadcast_arrays:

In [121]: X,Y = np.broadcast_arrays(np.arange(4)[:,None], np.arange(1000))
In [122]: timeit X+Y
10.1 µs ± 31.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [123]: X,Y = np.broadcast_arrays(np.arange(1000)[:,None], np.arange(4))
In [124]: timeit X+Y
26.1 µs ± 30.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [125]: X.shape, X.strides
Out[125]: ((1000, 4), (4, 0))
In [126]: Y.shape, Y.strides
Out[126]: ((1000, 4), (0, 4))

np.ascontiguousarray 将 0 跨度维度转换为完整维度

In [132]: Y1 = np.ascontiguousarray(Y)
In [134]: Y1.strides
Out[134]: (16, 4)
In [135]: X1 = np.ascontiguousarray(X)
In [136]: X1.shape
Out[136]: (1000, 4)

使用完整数组操作更快:

In [137]: timeit X1+Y1
4.66 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

因此，使用 0-strided 数组会有某种时间损失，即使它没有首先显式扩展数组。成本与形状有关，也可能与扩展的维度有关。

关于python - 广播的 NumPy 算法 - 为什么一种方法的性能如此之高？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48253210/

python - 广播的 NumPy 算法 - 为什么一种方法的性能如此之高？

上一篇：python - 反转 Pandas DataFrame 中列顺序的大 O 复杂度是多少？

下一篇：c# - 将基于 Python 的 TensorFlow 集成到 .NET 应用程序中