我有一个如下所示的文本文件:
(...more lines here...)
:50:SOME RANDOM WORDS
ORDERING ADDRESS LINE1
ORDERING ADDRESS LINE2
:52A:OTHER RANDOM CHARS
(...more lines here...)
现在,我还有一些循环遍历文件的简单 python 行:
with open('some_file.txt') as in_file:
for line in in_file:
line = line.rstrip()
if "word" in line:
# some processing
if # here I don't know how to process the tow lines from txt
如何匹配 .txt
中的两行:
ORDERING ADDRESS LINE1
ORDERING ADDRESS LINE2
,然后将它们连接成一个如下所示的字符串?
address_string = "ORDERING ADDRESS LINE1ORDERING ADDRESS LINE2"
?
要知道:
:50:
将始终保持原样,但之后可以是任何其他词:52A:
将始终保持原样,但之后可以是任何其他词- 也有不以
:
字符开头的行
最佳答案
如果你只想要两行:
def sections(fle):
with open(fle) as f:
stp = imap(str.strip, f)
for line in imap(str.strip, stp):
if line.startswith(":50:"):
yield next(stp) + next(stp)
break
输出:
In [4]: cat in.txt
(...more lines here...)
:50:SOME RANDOM WORDS
ORDERING ADDRESS LINE1
ORDERING ADDRESS LINE2
:52A:OTHER RANDOM CHARS
(...more lines here...)
In [5]: list(sections("in.txt"))
Out[5]: ['ORDERING ADDRESS LINE1ORDERING ADDRESS LINE2']
如果您有未知数量的行和重复:
from itertools import takewhile, imap
def sections(fle):
with open(fle) as f:
# python3 just use map
stp = imap(str.strip, f)
for line in imap(str.strip,stp):
if line.startswith(":50:"):
yield("".join(takewhile(lambda x: not x.startswith(":52A:"), stp)))
这将抓取多个部分:
In [7]: cat in.txt
(...more lines here...)
:50:SOME RANDOM WORDS
ORDERING ADDRESS LINE1
ORDERING ADDRESS LINE2
:52A:OTHER RANDOM CHARS
(...more lines here...)
(...more lines here...)
:50:SOME RANDOM WORDS
ORDERING ADDRESS LINE3
ORDERING ADDRESS LINE4
:52A:OTHER RANDOM CHARS
(...more lines here..
In [8]: list(sections("in.txt"))
Out[8]:
['ORDERING ADDRESS LINE1ORDERING ADDRESS LINE2',
'ORDERING ADDRESS LINE3ORDERING ADDRESS LINE4']
您还可以使用双循环:
def sections(fle):
with open(fle) as f:
stp = imap(str.strip, f)
for line in imap(str.strip, stp):
tmp = []
if line.startswith(":50:"):
for line in stp:
if line.startswith(":52A:"):
yield "".join(tmp)
break
tmp.append(line)
它的行为与之前的函数完全一样。
如果你只想要第一个,你可以返回而不是 yield 或者只调用生成器上的 next:
In [25]: next(sections("in.txt"))
Out[25]: 'ORDERING ADDRESS LINE1ORDERING ADDRESS LINE2'
关于Python,循环文件的最佳方式,匹配模式后的两行然后加入它们,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35793699/