python - 如何在Python中改善列表上的模式匹配

我的 list 很大，可能载有成千上万的条目。我设置了一个有限大小的窗口以在列表上滑动。我需要计算窗口中匹配的元素，并一次向前滑动窗口1位置来重复该过程。这是一个简单的 list 示例

L = [1 2 1 3 4 5 1 2 1 2 2 2 3 ]

假设窗口的长度为3，它将捕获

[1 2 1]包含一对匹配元素(1和1)

将窗口向前移动1个位置=> [2 1 3]，没有匹配的元素

将窗口向前移动1个位置=> [1 3 4]，没有匹配的元素

将窗口向前移动1个位置=> [3 4 5]，没有匹配的元素

将窗口向前移动1个位置=> [4 5 1]，没有匹配的元素

将窗口向前移动1个位置=> [5 1 2]，没有匹配的元素

将窗口向前移动1个位置=> [1 2 1]，1个匹配元素(1＆1)

将窗口向前移动1个位置=> [2 1 2]，1个匹配元素(2＆2)

将窗口向前移动1个位置=> [1 2 2]，1个匹配元素(2＆2)

将窗口向前移动1个位置=> [2 2 2]，3个匹配元素([2 2-]，[2-2]，[-2 2])

将窗口向前移动1个位置=> [2 2 3]，1个匹配元素(2＆2)

因此，总共有1 +1 +1 +1 + 3 +1 = 8个匹配对。我发现了使用itertools查找窗口中所有元素的组合并开发代码以查找所有匹配对的想法

import itertools
L = [1,2,1,3,4,5,1,2,1,2,2,2,3]
winlen = 3
totalMatch = 0
for n in range(len(L)-winlen+1):
    window = [L[n+i] for i in range(winlen)]
    A = list(itertools.combinations(window, 2))
    match = [a==b for a, b in A]
    totalMatch += sum(match)

它适用于简短列表，但对于列表和窗口变大的情况，此代码是如此之慢。我已经使用C++多年了，因此决定改用python，如果有任何提高代码效率的提示，我将不胜感激。

最佳答案

在窗口中更有效地跟踪数据？这是O(| L |)而不是O(| L | * winlen ^ 2)。它将窗口的元素计数保留在ctr中，并将窗口的匹配项保留在match中。例如，当一个新值进入窗口并且在窗口中已经存在该值的两个实例时，您将获得两个新的匹配项。类似地，对于一个落在窗口外的值，它所进行的匹配与在窗口中其他实例所进行的匹配一样多。

from collections import Counter

L = [1,2,1,3,4,5,1,2,1,2,2,2,3]
winlen = 3

totalMatch = match = 0
ctr = Counter()
for i, x in enumerate(L):
    
    # Remove old element falling out of window
    if i >= winlen:
        ctr[L[i-winlen]] -= 1
        match -= ctr[L[i-winlen]]

    # Add new element to window
    match += ctr[x]
    ctr[x] += 1

    # Update the total (for complete windows)
    if i >= winlen - 1:
        totalMatch += match

print(totalMatch)

将L和winlen乘以20得出的基准结果:

 38.75 ms  original
  0.18 ms  Manuel

 38.73 ms  original
  0.19 ms  Manuel

 38.87 ms  original
  0.18 ms  Manuel

基准代码(还包括长度为0到9的数字1到3的所有列表的测试代码):

from timeit import repeat
import itertools
from itertools import product
from collections import Counter

def original(L, winlen):
    totalMatch = 0
    for n in range(len(L)-winlen+1):
        window = [L[n+i] for i in range(winlen)]
        A = list(itertools.combinations(window, 2))
        match = [a==b for a, b in A]
        totalMatch += sum(match)
    return totalMatch

def Manuel(L, winlen):
    totalMatch = match = 0
    ctr = Counter()
    for i, x in enumerate(L):
        if i >= winlen:
            ctr[L[i-winlen]] -= 1
            match -= ctr[L[i-winlen]]
        match += ctr[x]
        ctr[x] += 1
        if i >= winlen - 1:
            totalMatch += match
    return totalMatch

def test():
    for n in range(10):
        print('testing', n)
        for T in product([1, 2, 3], repeat=n):
            L = list(T)
            winlen = 3
            expect = original(L, winlen)
            result = Manuel(L, winlen)
            assert result == expect, (L, expect, result)

def bench():
    L = [1,2,1,3,4,5,1,2,1,2,2,2,3] * 20
    winlen = 3 * 20
    for _ in range(3):
        for func in original, Manuel:
            t = min(repeat(lambda: func(L, winlen), number=1))
            print('%6.2f ms ' % (t * 1e3), func.__name__)
        print()

test()
bench()

关于python - 如何在Python中改善列表上的模式匹配，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66808465/

python - 如何在Python中改善列表上的模式匹配

上一篇：c++ - 为什么在Linux上双缓冲实现比Windows慢8倍？

下一篇：python - 如何从函数列表[f1，f2，f3，... fn]组成嵌套函数g = fn(...(f3(f2(f1()))...)