我经常遇到这个问题假设我有一个文本文件,我已经使用 file.readlines() 作为列表读入
假设文件看起来像这样
stuff stuff stuff stuff stuff
stuff stuff stuff stuff stuff
stuff stuff stuff stuff stuff
stuff stuff stuff stuff stuff #indeterminate number of line \
The text I want is set off by something distinctive
I want this
I want this
I want this
I want this # indeterminate number of lines
The end is also identifiable by something distinctive
stuff stuff stuff stuff stuff
stuff stuff stuff stuff stuff
stuff stuff stuff stuff stuff
我一直在处理这个问题的方式是做这样的事情
themasterlist=[]
for file in filelist:
count=0
templist=[]
for line in file:
if line=='The text I want is set off by something distinctive':
count=1
if line=='The end is also identifiable by something distinctive':
count=0
if count==1:
templist.append(line)
themasterlist.append(templist)
我考虑过使用字符串 (file.read()) 并根据端点拆分它,然后将其转换为列表,但实际上我想将这种构造用于许多其他类型。例如,假设我正在遍历 lxml.fromstring(somefile) 的元素,并且我想根据 element.text 是否包含一些短语等来处理元素的子集。
请注意,我一次可以运行 20 万到 30 万个文件。
我的解决方案有效,但感觉很笨拙,好像我遗漏了一些关于 python 的重要内容
有三个非常好的答案,我从每个答案中都学到了一些有用的东西。我需要选择一个作为答案,但我非常感谢每个张贴者的回复,这非常有帮助
最佳答案
我喜欢这样的东西:
def findblock( lines, start, stop ):
it = iter(lines)
for line in it:
if start in line:
# now we are in the block, so yield till we find the end
for line in it:
if stop in line:
# lets just look for one block
return # leave this generator
# break # would keep looking for the next block
yield line
for line in findblock(lines, start="something distinctive",
stop="something distinctive"):
print line
您缺少的是 yield 和列表理解 - 这是您修改后的代码:
def findblock( lines, start='The text I want is set off by something distinctive',
stop='The end is also identifiable by something distinctive'):
for line in lines:
inblock = False
if line==start:
inblock=True
if line==stop:
inblock=False # or return mb?
if inblock:
yield line
themasterlist = [list(findblock( file )) for file in files]
关于python - 我怎样才能专注于python中列表的一个子集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5060575/