python - 在 NumPy 数组中搜索序列

假设我有以下数组:

 array([2, 0, 0, 1, 0, 1, 0, 0])

如何获取出现值序列的索引:[0,0]？因此，这种情况的预期输出为:[1,2,6,7]。

编辑:

1) 请注意[0,0] 只是一个序列。它可以是 [0,0,0] 或 [4,6,8,9] 或 [5,2,0],什么都行。

2) 如果我的数组被修改为:array([2, 0, 0, 0, 0, 1, 0, 1, 0, 0])，预期结果与[0,0] 的序列将是 [1,2,3,4,8,9]。

我正在寻找一些 NumPy 快捷方式。

最佳答案

嗯，这基本上是一个 template-matching problem这在图像处理中经常出现。这篇文章中列出了两种方法:基于纯 NumPy 和基于 OpenCV (cv2)。

方法 #1:使用 NumPy，可以创建一个 2D 滑动索引数组，横跨输入数组的整个长度。因此，每一行都是元素的滑动窗口。接下来，将每一行与输入序列匹配，这将带来 broadcasting用于矢量化解决方案。我们寻找所有 True 行，表明这些行是完美匹配的，因此将是匹配的起始索引。最后，使用这些索引，创建一个延伸到序列长度的索引范围，为我们提供所需的输出。实现将是 -

def search_sequence_numpy(arr,seq):
    """ Find sequence in an array using NumPy only.

    Parameters
    ----------    
    arr    : input 1D array
    seq    : input 1D array

    Output
    ------    
    Output : 1D Array of indices in the input array that satisfy the 
    matching of input sequence in the input array.
    In case of no match, an empty list is returned.
    """

    # Store sizes of input array and sequence
    Na, Nseq = arr.size, seq.size

    # Range of sequence
    r_seq = np.arange(Nseq)

    # Create a 2D array of sliding indices across the entire length of input array.
    # Match up with the input sequence & get the matching starting indices.
    M = (arr[np.arange(Na-Nseq+1)[:,None] + r_seq] == seq).all(1)

    # Get the range of those indices as final output
    if M.any() >0:
        return np.where(np.convolve(M,np.ones((Nseq),dtype=int))>0)[0]
    else:
        return []         # No match found

方法#2:使用 OpenCV (cv2)，我们有一个用于模板匹配 的内置函数:cv2.matchTemplate .使用这个，我们将有起始匹配索引。其余步骤与之前的方法相同。下面是 cv2 的实现:

from cv2 import matchTemplate as cv2m

def search_sequence_cv2(arr,seq):
    """ Find sequence in an array using cv2.
    """

    # Run a template match with input sequence as the template across
    # the entire length of the input array and get scores.
    S = cv2m(arr.astype('uint8'),seq.astype('uint8'),cv2.TM_SQDIFF)

    # Now, with floating point array cases, the matching scores might not be 
    # exactly zeros, but would be very small numbers as compared to others.
    # So, for that use a very small to be used to threshold the scorees 
    # against and decide for matches.
    thresh = 1e-5 # Would depend on elements in seq. So, be careful setting this.

    # Find the matching indices
    idx = np.where(S.ravel() < thresh)[0]

    # Get the range of those indices as final output
    if len(idx)>0:
        return np.unique((idx[:,None] + np.arange(seq.size)).ravel())
    else:
        return []         # No match found

样本运行

In [512]: arr = np.array([2, 0, 0, 0, 0, 1, 0, 1, 0, 0])

In [513]: seq = np.array([0,0])

In [514]: search_sequence_numpy(arr,seq)
Out[514]: array([1, 2, 3, 4, 8, 9])

In [515]: search_sequence_cv2(arr,seq)
Out[515]: array([1, 2, 3, 4, 8, 9])

运行时测试

In [477]: arr = np.random.randint(0,9,(100000))
     ...: seq = np.array([3,6,8,4])
     ...: 

In [478]: np.allclose(search_sequence_numpy(arr,seq),search_sequence_cv2(arr,seq))
Out[478]: True

In [479]: %timeit search_sequence_numpy(arr,seq)
100 loops, best of 3: 11.8 ms per loop

In [480]: %timeit search_sequence_cv2(arr,seq)
10 loops, best of 3: 20.6 ms per loop

似乎基于 Pure NumPy 的是最安全和最快的!

关于python - 在 NumPy 数组中搜索序列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36522220/

python - 在 NumPy 数组中搜索序列

上一篇：python - 了解 Matplotlib 的箭袋绘图

下一篇：python - 基于类的 View "has no attribute .as_view()"错误