python - 在 NumPy 数组中查找重复序列的索引

这是 previous question 的后续内容。如果我有一个 NumPy 数组 [0, 1, 2, 2, 3, 4, 2, 2, 5, 5, 6, 5, 5, 2, 2]，对于每个重复序列 (从每个索引开始)，有没有一种快速的方法可以找到该重复序列的所有匹配项并返回这些匹配项的索引？

这里，重复序列是[2, 2]和[5, 5](注意，重复的长度是由用户指定的，但会根据长度相同并且可以远大于 2)。可以通过以下方式在 [2, 6, 8, 11, 13] 处找到重复:

def consec_repeat_starts(a, n):
    N = n-1
    m = a[:-1]==a[1:]
    return np.flatnonzero(np.convolve(m,np.ones(N, dtype=int))==N)-N+1

但是对于每种独特类型的重复序列(即 [2, 2] 和 [5, 5])，我想返回类似重复的内容，后跟重复位置的索引:

[([2, 2], [2, 6, 13]), ([5, 5], [8, 11])]

更新

此外，给定重复序列，您可以从第二个数组返回结果吗？因此，在以下位置查找 [2, 2] 和 [5, 5]:

[2, 2, 5, 5, 1, 4, 9, 2, 5, 5, 0, 2, 2, 2]

该函数将返回:

[([2, 2], [0, 11, 12]), ([5, 5], [2, 8]))]

最佳答案

这是一种方法 -

def group_consec(a, n):
    idx = consec_repeat_starts(a, n)
    b = a[idx]
    sidx = b.argsort()
    c = b[sidx]
    cut_idx = np.flatnonzero(np.r_[True, c[:-1]!=c[1:],True])
    idx_s = idx[sidx]
    indices = [idx_s[i:j] for (i,j) in zip(cut_idx[:-1],cut_idx[1:])]
    return c[cut_idx[:-1]], indices

# Perform lookup in another array, b
n = 2
v_a,indices_a = group_consec(a, n)
v_b,indices_b = group_consec(b, n)

idx = np.searchsorted(v_a, v_b)
idx[idx==len(v_a)] = 0
valid_mask = v_a[idx]==v_b
common_indices = [j for (i,j) in zip(valid_mask,indices_b) if i]
common_val = v_b[valid_mask]

请注意，为了简单和易于使用，group_consec 的第一个输出参数具有每个序列的唯一值。如果您需要 (val, val,..) 格式的它们，只需在最后复制即可。同样，对于common_val。

关于python - 在 NumPy 数组中查找重复序列的索引，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59667326/

python - 在 NumPy 数组中查找重复序列的索引

上一篇：Python删除停用词(pandas数据帧): output same as input

下一篇：python - 如何在 Python 中正确解开此 json 响应以将我需要的数据放入 Pandas DataFrame 中？