python - numpy 数组的累积 argmax

标签 python arrays numpy vectorization argmax

考虑数组a

np.random.seed([3,1415])
a = np.random.randint(0, 10, (10, 2))
a

array([[0, 2],
       [7, 3],
       [8, 7],
       [0, 6],
       [8, 6],
       [0, 2],
       [0, 4],
       [9, 7],
       [3, 2],
       [4, 3]])

什么是获得累积 argmax 的矢量化方法？

array([[0, 0],  <-- both start off as max position
       [1, 1],  <-- 7 > 0 so 1st col = 1, 3 > 2 2nd col = 1
       [2, 2],  <-- 8 > 7 1st col = 2, 7 > 3 2nd col = 2
       [2, 2],  <-- 0 < 8 1st col stays the same, 6 < 7 2nd col stays the same
       [2, 2],  
       [2, 2],
       [2, 2],
       [7, 2],  <-- 9 is new max of 2nd col, argmax is now 7
       [7, 2],
       [7, 2]])

这是一种非矢量化的方法。

请注意，随着窗口的扩大，argmax 应用于不断增长的窗口。

pd.DataFrame(a).expanding().apply(np.argmax).astype(int).values

array([[0, 0],
       [1, 1],
       [2, 2],
       [2, 2],
       [2, 2],
       [2, 2],
       [2, 2],
       [7, 2],
       [7, 2],
       [7, 2]])

最佳答案

这是一个矢量化的纯 NumPy 解决方案，它执行得非常快:

def cumargmax(a):
    m = np.maximum.accumulate(a)
    x = np.repeat(np.arange(a.shape[0])[:, None], a.shape[1], axis=1)
    x[1:] *= m[:-1] < m[1:]
    np.maximum.accumulate(x, axis=0, out=x)
    return x

然后我们有:

>>> cumargmax(a)
array([[0, 0],
       [1, 1],
       [2, 2],
       [2, 2],
       [2, 2],
       [2, 2],
       [2, 2],
       [7, 2],
       [7, 2],
       [7, 2]])

对具有数千到数百万个值的数组进行的一些快速测试表明，这比 Python 级别的循环(隐式或显式)快 10-50 倍。

关于python - numpy 数组的累积 argmax，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40672186/

上一篇：python - Unicode解码错误: cp932 codec can't decode byte 0x81 in position 81

下一篇：python - 如何使用 Python FileNotFoundError 打印丢失文件的名称？

相关文章：

python - 管理员中的 Django 模型验证

c++ - 如何将此 C 代码转换为 C++？

python - Numba 可以与 TensorFlow 一起使用吗？

python - 如何发送带有 Python 附件的 zip 文件的电子邮件？

python - python中的日期时间格式转换

java - 数组错误越界

python - sklearn.manifold.TSNE fit_transform 实际上在空 numpy 数组上返回一些内容

python - 如何使用 python numpy.savetxt 将字符串和 float 写入 ASCII 文件？

python - list[::] 和 list 有什么区别？

复制数组导致 0？