我的 list 很大,可能载有成千上万的条目。我设置了一个有限大小的窗口以在列表上滑动。我需要计算窗口中匹配的元素,并一次向前滑动窗口1位置来重复该过程。这是一个简单的 list 示例
L = [1 2 1 3 4 5 1 2 1 2 2 2 3 ]
假设窗口的长度为3,它将捕获因此,总共有1 +1 +1 +1 + 3 +1 = 8个匹配对。我发现了使用itertools查找窗口中所有元素的组合并开发代码以查找所有匹配对的想法
import itertools
L = [1,2,1,3,4,5,1,2,1,2,2,2,3]
winlen = 3
totalMatch = 0
for n in range(len(L)-winlen+1):
window = [L[n+i] for i in range(winlen)]
A = list(itertools.combinations(window, 2))
match = [a==b for a, b in A]
totalMatch += sum(match)
它适用于简短列表,但对于列表和窗口变大的情况,此代码是如此之慢。我已经使用C++多年了,因此决定改用python,如果有任何提高代码效率的提示,我将不胜感激。
最佳答案
在窗口中更有效地跟踪数据?这是O(| L |)而不是O(| L | * winlen ^ 2)。它将窗口的元素计数保留在ctr
中,并将窗口的匹配项保留在match
中。例如,当一个新值进入窗口并且在窗口中已经存在该值的两个实例时,您将获得两个新的匹配项。类似地,对于一个落在窗口外的值,它所进行的匹配与在窗口中其他实例所进行的匹配一样多。
from collections import Counter
L = [1,2,1,3,4,5,1,2,1,2,2,2,3]
winlen = 3
totalMatch = match = 0
ctr = Counter()
for i, x in enumerate(L):
# Remove old element falling out of window
if i >= winlen:
ctr[L[i-winlen]] -= 1
match -= ctr[L[i-winlen]]
# Add new element to window
match += ctr[x]
ctr[x] += 1
# Update the total (for complete windows)
if i >= winlen - 1:
totalMatch += match
print(totalMatch)
将L
和winlen
乘以20得出的基准结果: 38.75 ms original
0.18 ms Manuel
38.73 ms original
0.19 ms Manuel
38.87 ms original
0.18 ms Manuel
基准代码(还包括长度为0到9的数字1到3的所有列表的测试代码):from timeit import repeat
import itertools
from itertools import product
from collections import Counter
def original(L, winlen):
totalMatch = 0
for n in range(len(L)-winlen+1):
window = [L[n+i] for i in range(winlen)]
A = list(itertools.combinations(window, 2))
match = [a==b for a, b in A]
totalMatch += sum(match)
return totalMatch
def Manuel(L, winlen):
totalMatch = match = 0
ctr = Counter()
for i, x in enumerate(L):
if i >= winlen:
ctr[L[i-winlen]] -= 1
match -= ctr[L[i-winlen]]
match += ctr[x]
ctr[x] += 1
if i >= winlen - 1:
totalMatch += match
return totalMatch
def test():
for n in range(10):
print('testing', n)
for T in product([1, 2, 3], repeat=n):
L = list(T)
winlen = 3
expect = original(L, winlen)
result = Manuel(L, winlen)
assert result == expect, (L, expect, result)
def bench():
L = [1,2,1,3,4,5,1,2,1,2,2,2,3] * 20
winlen = 3 * 20
for _ in range(3):
for func in original, Manuel:
t = min(repeat(lambda: func(L, winlen), number=1))
print('%6.2f ms ' % (t * 1e3), func.__name__)
print()
test()
bench()
关于python - 如何在Python中改善列表上的模式匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66808465/