Python，循环文件的最佳方式，匹配模式后的两行然后加入它们

我有一个如下所示的文本文件:

(...more lines here...)
:50:SOME RANDOM WORDS
ORDERING ADDRESS LINE1
ORDERING ADDRESS LINE2
:52A:OTHER RANDOM CHARS
(...more lines here...)

现在，我还有一些循环遍历文件的简单 python 行:

with open('some_file.txt') as in_file:
    for line in in_file:
        line = line.rstrip()
        if "word" in line:
            # some processing
        if # here I don't know how to process the tow lines from txt

如何匹配 .txt 中的两行:

ORDERING ADDRESS LINE1
ORDERING ADDRESS LINE2

，然后将它们连接成一个如下所示的字符串？

address_string = "ORDERING ADDRESS LINE1ORDERING ADDRESS LINE2" ?

要知道:

:50: 将始终保持原样，但之后可以是任何其他词
:52A:将始终保持原样，但之后可以是任何其他词
也有不以:字符开头的行

最佳答案

如果你只想要两行:

def sections(fle):
    with open(fle) as f:
        stp = imap(str.strip, f)
        for line in imap(str.strip, stp):
            if line.startswith(":50:"):
                yield next(stp) + next(stp)
                break

输出:

 In [4]: cat in.txt
(...more lines here...)
:50:SOME RANDOM WORDS
ORDERING ADDRESS LINE1
ORDERING ADDRESS LINE2
:52A:OTHER RANDOM CHARS
(...more lines here...)

In [5]: list(sections("in.txt"))
Out[5]: ['ORDERING ADDRESS LINE1ORDERING ADDRESS LINE2']

如果您有未知数量的行和重复:

from itertools import takewhile, imap
def sections(fle):
    with open(fle) as f:
        # python3 just use map
        stp = imap(str.strip, f)
        for line in imap(str.strip,stp):
            if line.startswith(":50:"):
                yield("".join(takewhile(lambda x: not x.startswith(":52A:"), stp)))

这将抓取多个部分:

In [7]: cat in.txt
(...more lines here...)
:50:SOME RANDOM WORDS
ORDERING ADDRESS LINE1
ORDERING ADDRESS LINE2
:52A:OTHER RANDOM CHARS
(...more lines here...)
(...more lines here...)
:50:SOME RANDOM WORDS
ORDERING ADDRESS LINE3
ORDERING ADDRESS LINE4
:52A:OTHER RANDOM CHARS
(...more lines here..

In [8]: list(sections("in.txt"))
Out[8]: 
['ORDERING ADDRESS LINE1ORDERING ADDRESS LINE2',
 'ORDERING ADDRESS LINE3ORDERING ADDRESS LINE4']

您还可以使用双循环:

def sections(fle):
    with open(fle) as f:
        stp = imap(str.strip, f)
        for line in imap(str.strip, stp):
            tmp = []
            if line.startswith(":50:"):
                for line in stp:
                    if line.startswith(":52A:"):
                        yield "".join(tmp)
                        break
                    tmp.append(line)

它的行为与之前的函数完全一样。

如果你只想要第一个，你可以返回而不是 yield 或者只调用生成器上的 next:

In [25]: next(sections("in.txt"))
Out[25]: 'ORDERING ADDRESS LINE1ORDERING ADDRESS LINE2'

关于Python，循环文件的最佳方式，匹配模式后的两行然后加入它们，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35793699/

Python，循环文件的最佳方式，匹配模式后的两行然后加入它们

上一篇：python pandas HDF5Store 附加带有长字符串列的新数据框失败

下一篇：python - 如何在 python 中创建一个简单且安全的 Socks5 代理？