Python,循环文件的最佳方式,匹配模式后的两行然后加入它们

标签 python regex

我有一个如下所示的文本文件:

(...more lines here...)
:50:SOME RANDOM WORDS
ORDERING ADDRESS LINE1
ORDERING ADDRESS LINE2
:52A:OTHER RANDOM CHARS
(...more lines here...)

现在,我还有一些循环遍历文件的简单 python 行:

with open('some_file.txt') as in_file:
    for line in in_file:
        line = line.rstrip()
        if "word" in line:
            # some processing
        if # here I don't know how to process the tow lines from txt

如何匹配 .txt 中的两行:

ORDERING ADDRESS LINE1
ORDERING ADDRESS LINE2

,然后将它们连接成一个如下所示的字符串?

address_string = "ORDERING ADDRESS LINE1ORDERING ADDRESS LINE2" ?

要知道:

  • :50: 将始终保持原样,但之后可以是任何其他词
  • :52A:将始终保持原样,但之后可以是任何其他词
  • 也有不以:字符开头的行

最佳答案

如果你只想要两行:

def sections(fle):
    with open(fle) as f:
        stp = imap(str.strip, f)
        for line in imap(str.strip, stp):
            if line.startswith(":50:"):
                yield next(stp) + next(stp)
                break

输出:

 In [4]: cat in.txt
(...more lines here...)
:50:SOME RANDOM WORDS
ORDERING ADDRESS LINE1
ORDERING ADDRESS LINE2
:52A:OTHER RANDOM CHARS
(...more lines here...)

In [5]: list(sections("in.txt"))
Out[5]: ['ORDERING ADDRESS LINE1ORDERING ADDRESS LINE2']

如果您有未知数量的行和重复:

from itertools import takewhile, imap
def sections(fle):
    with open(fle) as f:
        # python3 just use map
        stp = imap(str.strip, f)
        for line in imap(str.strip,stp):
            if line.startswith(":50:"):
                yield("".join(takewhile(lambda x: not x.startswith(":52A:"), stp)))

这将抓取多个部分:

In [7]: cat in.txt
(...more lines here...)
:50:SOME RANDOM WORDS
ORDERING ADDRESS LINE1
ORDERING ADDRESS LINE2
:52A:OTHER RANDOM CHARS
(...more lines here...)
(...more lines here...)
:50:SOME RANDOM WORDS
ORDERING ADDRESS LINE3
ORDERING ADDRESS LINE4
:52A:OTHER RANDOM CHARS
(...more lines here..

In [8]: list(sections("in.txt"))
Out[8]: 
['ORDERING ADDRESS LINE1ORDERING ADDRESS LINE2',
 'ORDERING ADDRESS LINE3ORDERING ADDRESS LINE4']

您还可以使用双循环:

def sections(fle):
    with open(fle) as f:
        stp = imap(str.strip, f)
        for line in imap(str.strip, stp):
            tmp = []
            if line.startswith(":50:"):
                for line in stp:
                    if line.startswith(":52A:"):
                        yield "".join(tmp)
                        break
                    tmp.append(line)

它的行为与之前的函数完全一样。

如果你只想要第一个,你可以返回而不是 yield 或者只调用生成器上的 next:

In [25]: next(sections("in.txt"))
Out[25]: 'ORDERING ADDRESS LINE1ORDERING ADDRESS LINE2'

关于Python,循环文件的最佳方式,匹配模式后的两行然后加入它们,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35793699/

相关文章:

python - 如何在存储到磁盘之前使用 Pillow-python 获取以 KB 为单位的图像大小?

python - 如何使用 gevent 验证 url

python - 如何检查系列的符号是否符合给定的符号字符串?

python - 这是将大数据帧拆分为具有一定行数的较小数据帧的正确方法吗?

Android:用spannablestring替换字符串中的匹配项

html - 内联 HTML/CSS 中的粗体数字

php - 如何匹配十六进制字符序列并将其替换为PHP中的空格

java正则表达式清除mediawiki标记

python - 单吉他音符Python的谐波积谱

正则表达式下划线分隔模式匹配