python - 我怎样才能专注于python中列表的一个子集

我经常遇到这个问题假设我有一个文本文件，我已经使用 file.readlines() 作为列表读入

假设文件看起来像这样

stuff stuff stuff stuff stuff
stuff stuff stuff stuff stuff
stuff stuff stuff stuff stuff
stuff stuff stuff stuff stuff #indeterminate number of line \
The text I want is set off by something distinctive
I want this
I want this
I want this
I want this # indeterminate number of lines
The end is also identifiable by something distinctive
stuff stuff stuff stuff stuff
stuff stuff stuff stuff stuff
stuff stuff stuff stuff stuff

我一直在处理这个问题的方式是做这样的事情

themasterlist=[]
for file in filelist:
    count=0
    templist=[]
    for line in file:
        if line=='The text I want is set off by something distinctive':
            count=1
        if line=='The end is also identifiable by something distinctive':
            count=0
        if count==1:
        templist.append(line)
   themasterlist.append(templist)

我考虑过使用字符串 (file.read()) 并根据端点拆分它，然后将其转换为列表，但实际上我想将这种构造用于许多其他类型。例如，假设我正在遍历 lxml.fromstring(somefile) 的元素，并且我想根据 element.text 是否包含一些短语等来处理元素的子集。

请注意，我一次可以运行 20 万到 30 万个文件。

我的解决方案有效，但感觉很笨拙，好像我遗漏了一些关于 python 的重要内容

有三个非常好的答案，我从每个答案中都学到了一些有用的东西。我需要选择一个作为答案，但我非常感谢每个张贴者的回复，这非常有帮助

最佳答案

我喜欢这样的东西:

def findblock( lines, start, stop ):
    it = iter(lines)
    for line in it:
        if start in line:
            # now we are in the block, so yield till we find the end
            for line in it:
                if stop in line:
                    # lets just look for one block
                    return # leave this generator
                    # break # would keep looking for the next block
                yield line                

for line in findblock(lines, start="something distinctive", 
                             stop="something distinctive"):
    print line

您缺少的是 yield 和列表理解 - 这是您修改后的代码:

def findblock( lines, start='The text I want is set off by something distinctive', 
                      stop='The end is also identifiable by something distinctive'):
    for line in lines:
        inblock = False
        if line==start:
            inblock=True
        if line==stop:
            inblock=False # or return mb?
        if inblock:
            yield line

themasterlist = [list(findblock( file )) for file in files]

关于python - 我怎样才能专注于python中列表的一个子集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5060575/

python - 我怎样才能专注于python中列表的一个子集

上一篇：python - 从字符串中删除非数字字符

下一篇：python - Python 3 中对象和类的关系