python - 如何将 Numpy 中的累积和应用于条件为前一个值的切片？

我有一个向量，其信号值为 1 或 -1。我想要第二个向量，它计算具有相同值的连续信号的累积和，并在每次信号变化时重新启动累积和。这是一个例子:

signal  = [1  1  1 -1 -1 -1 -1]

cum_sum = [1  2  3 -1 -2 -3 -4]

我有大量数据需要计算，并且希望尽可能高效地进行计算。我的代码现在可以完成这项工作，但需要时间并且没有利用 numpy 效率:

import numpy as np

# Signal values to be analyzed
signal = np.array([1,1,1,-1,-1,-1,-1], dtype=int)

# Vector with previous value of signal
signal_prev = signal[:-1]
signal_prev = np.pad(signal_prev,(1,0), mode='constant', constant_values=(0))

#Array with signal values in first column and previous values in second column 
arr = np.array([signal,signal_prev], dtype=int)
arr = np.transpose(arr)

print(arr)
""" Array with signal values and previous values
[[ 1  0]
 [ 1  1]
 [ 1  1]
 [-1  1]
 [-1 -1]
 [-1 -1]
 [-1 -1]]
"""

#create an empty array to append cumulative sum
signal_sum = np.array([], dtype=int)

# compute the cumulative sum iterating row by row
for x in arr:
    if np.sign(x[0]*x[1]) > 0:
        signal_sum = np.append(signal_sum, signal_sum[-1] + x[1])
    else:
        signal_sum= np.append(signal_sum, x[0])

arr_sum = np.array([signal, signal_sum])
arr_sum = np.transpose(arr_sum)
print(arr_sum)
""" Array with signal values and cumulative sum restarted with signal change
[[ 1  1]
 [ 1  2]
 [ 1  3]
 [-1 -1]
 [-1 -2]
 [-1 -3]
 [-1 -4]]
"""

我相信使用 numpy 函数或 lambda 函数可以更有效地完成此计算。我不是程序员，而且我是 Python 新手。我想知道这是否可以做得更快。

最佳答案

对于快速、完全矢量化的方式(无循环)，您可以使用常规的np.cumsum()，但在数组的副本上进行减去每组开始时的前一组总和:

def group_cumsum(s):
    # make a copy and ensure np.array (in case list was given)
    s = np.array(s).copy()
    idx = np.nonzero(np.diff(s))[0]  # last of each group
    off = np.diff(np.concatenate(([0], np.cumsum(s)[idx])))
    s[idx + 1] -= off
    return np.cumsum(s)

示例:

print(group_cumsum([1, 1, 1, -1, -1, -1, -1]))
# [ 1  2  3 -1 -2 -3 -4]

print(group_cumsum([1]*3 + [-1]*2 + [1]*4 + [-1]*5))
# [ 1  2  3 -1 -2  1  2  3  4 -1 -2 -3 -4 -5]

对于大型数组来说，节省了大量的时间:

Python 代码中没有循环，所有操作均已向量化，并且
对于大小为 n 的数组中的 k 个组，其复杂度为 O(n + k)(与 O(n * k))。

试试这个:

s = np.random.choice([1, -1], size=(int(1e6)))

%%timeit
group_cumsum(s)

19.1 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

关于python - 如何将 Numpy 中的累积和应用于条件为前一个值的切片？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65264898/

python - 如何将 Numpy 中的累积和应用于条件为前一个值的切片？

上一篇：python - 如果requests不能获取网页的动态内容，bs4能获取到吗？

下一篇：node.js - NodeJS MongoDB 其中数组包含数组的任何元素