Python:如何从2个数组中找到唯一的元素模式？

我有两个 numpy 数组，A 和 B:

A = ([1, 2, 3, 2, 3, 1, 2, 1, 3])
B = ([2, 3, 1, 2])

其中 B 是 A 中的唯一模式。

我需要输出为 A 中的所有元素，而 B 中不存在这些元素。

Output = ([1, 2, 3, 1, 3])

最佳答案

最简单的方法是使用 Python 的内置函数，即字符串类型:

A = "123231213"
B = "2312"
result = A.replace(B, "")

要有效地将 numpy.array 转换为 str，请使用以下函数:

x = numpy.frombuffer("3452353", dtype="|i1")
x
array([51, 52, 53, 50, 51, 53, 51], dtype=int8)
x.tostring()
"3452353"

(*) 因此混合了 ascii 代码 (1 != "1")，但子字符串搜索可以正常工作。您的数据类型应该更适合一个字符，否则您可能会得到错误的匹配。

总而言之，快速破解如下:

A = numpy.array([1, 2, 3, 2, 3, 1, 2, 1, 3])
B = numpy.array([2, 3, 1, 2])
numpy.fromstring(A.tostring().replace(B.tostring(), ""), dtype=A.dtype)
array([1, 2, 3, 1, 3])
# note, here dtype is some int, I'm relying on the fact that:
# "1 matches 1" is equivalent to "0001 matches 00001"
# this holds as long as values of B are typically non-zero.
#
# this trick can conceptually be used with floating point too,
# but beware of multiple floating point representations of same number

深入解释:

假设A和B的大小是任意的，naive approach以二次方时间运行。然而更好的是，存在概率算法，例如 Rabin-Karp ，它依赖于滑动窗口哈希。

这是面向文本的函数(例如 xxx in str 或 str.replace 或 re )比自定义函数快得多的主要原因numpy 代码。

如果你确实需要这个函数与 numpy 集成，你总是可以编写一个扩展，但这并不容易:)

关于Python:如何从2个数组中找到唯一的元素模式？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23056215/

Python:如何从2个数组中找到唯一的元素模式？

上一篇：python - NearDuplicatesDetection 0.2.0 安装期间出错

下一篇：python - pyparsing 解析由 bool 值组成的字符串