python - 在文本文件中查找最大匹配区域

A.txt 包含看起来像这样的行(或者说，它的一小部分):

Green- Blue- 1
Red- Black- 3
Brown- Blue- 3
Black- Red- 1
Green- Blue- 1

本质上，最后一个字符串要么是 1 要么是 3。假设上面的示例持续很长时间，我需要做的是找到最大数量的末尾有 1 的连续行，同时保持小于或等于某个数字(比如 2)的 3 的数量。例如，假设整个 A.txt 如下所示:

Green- Blue- 1
Red- Black- 3
Brown- Blue- 3
Black- Red- 3
Green- Blue- 1
Green- Purple- 1
Red- Black- 3
Brown- Blue- 3
Black- Red- 1
Blue- Blue- 3

然后脚本会将以下行写入另一个文本文件:

Green- Blue- 1
Green- Purple- 1
Red- Black- 3
Brown- Blue- 3
Black- Red- 1

我该如何编码？提前致谢!

最佳答案

您真的别无选择，只能遍历整个文件，跟踪最大的序列。这是我的看法，封装了一个函数:它使用堆栈并逐行遍历文件，因此对于大型输入文件来说它应该是内存高效的。

def foo(in_file, out_file, max_count):
    biggest, stack = [], []
    count = 0
    with open(in_file) as f:
        for line in f:
            if line[-2] == '3':
                count += 1
            if count > max_count:
                if len(stack) > len(biggest):
                    biggest = list(stack)
                # this line trims the list after the first element that ends with '3'
                stack = stack[stack.index(next(x for x in stack if x[-2] == '3')) + 1:]
                count = max_count
            stack.append(line)

    with open(out_file, 'w') as f:
        f.write(''.join(max(biggest, stack)))

注意:只有当文件末尾包含空行时，这才会按预期工作，并假定 max_count 始终大于 0(否则调用next 抛出未处理的异常)。

关于python - 在文本文件中查找最大匹配区域，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45549313/

python - 在文本文件中查找最大匹配区域

上一篇：python - 从 DataFrame 中提取自定义标题列名称，用 NA 估算缺失的列

下一篇：python - << : 'str' and 'int' while reading file 不支持的操作数类型