python - 检测多个 numpy 二维数组中的第一个唯一行

我有多个 numpy 二维数组，我想按行进行比较。我的函数的输出应该是一个 numpy 二维数组，代表三个输入数组的所有行。我希望能够检测到第一次出现某行时，在输出中应将每第二个或第三个重复行标记为 False。单个数组中不可能有重复的行。

如果可能的话，我想避免使用循环，因为它们会降低计算速度。

例子:

array1 = array([[444, 427],
   [444, 428],
   [444, 429],
   [444, 430],
   [445, 421]], dtype=uint64)

array2 = array([[446, 427],
   [446, 440],
   [444, 429],
   [444, 432],
   [445, 421]], dtype=uint64)

array3 = array([[447, 427],
   [446, 441],
   [444, 429],
   [444, 432],
   [445, 421]], dtype=uint64)

# output
array([[True, True, True, True,  True],
   [ True,  True,  False, True,  False],
   [ True,  True,  False, False,  False]], dtype=bool)

有什么想法吗？

最佳答案

这是一种快速矢量化方法:

def find_dupe_rows(*arrays):

    A = np.vstack(arrays)
    rtype = np.dtype((np.void, A.dtype.itemsize*A.shape[1]))
    _, first_idx = np.unique(A.view(rtype), return_index=True)
    out = np.zeros(A.shape[0], np.bool)
    out[first_idx] = True

    return out.reshape(len(arrays), -1)

示例用法:

print(find_dupe_rows(array1, array2, array3))
# [[ True  True  True  True  True]
#  [ True  True False  True False]
#  [ True  True False False False]]

稍微分解一下:

堆叠三个子数组以生成一个(15, 2) 数组:
```
A = np.vstack((array1, array2, array3))
```

使用 np.unique连同 this trick有效地找到每个唯一行在 A 中首次出现的索引:

rtype = np.dtype((np.void, A.dtype.itemsize * A.shape[1]))
_, first_idx = np.unique(A.view(rtype), return_index=True)

不是第一次出现的唯一行的每一行都可以被视为重复行:

out = np.zeros(A.shape[0], np.bool)     # output is False by default
out[first_idx] = True                   # set first occurrences to True

最后，根据您的示例输出，将此 bool 向量 reshape 为 (narrays, nrows):
```
return out.reshape(len(arrays), -1)
```

关于python - 检测多个 numpy 二维数组中的第一个唯一行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37104013/

python - 检测多个 numpy 二维数组中的第一个唯一行

上一篇：python - 如何将参数传递给 animation.FuncAnimation()？

下一篇：python - Python Pandas 中的日期时间 strptime : what's wrong?