python - 使用分隔符将文件中的多行存储到变量

我正在使用 Python 制作一个过滤器来搜索数千个文本文件以查找特定查询。这些文本文件由多个部分组成，并且它们的格式并不都是一致的。我希望检查每个部分的特定条件，因此在名为“记录描述”的文本文件部分中，我正在执行类似的操作以将字符串存储到变量中:

with open(some_file, 'r') as r:
    for line in r:
        if "DESCRIPTION OF RECORD" in line:
            record = line

现在，这对于大多数文件来说效果很好，但有些文件在该部分中有换行符，因此它不会将整个部分存储到变量中。我想知道如何使用分隔符来控制存储到变量的行数。我可能会使用下一节的标题“CORRELATION”作为分隔符。有什么想法吗？

文件的示例结构可能如下所示:

CLINICAL HISTORY: Some information.
MEDICATIONS: Other information
INTRODUCTION: Some more information.
DESCRIPTION OF THE RECORD: Some information here....
another line of information
IMPRESSION: More info 
CLINICAL CORRELATION: The last bit of information

最佳答案

您可以使用内置的 re 模块，如下所示:

import re

# I assume you have a list of all possible sections
sections = [
    'CLINICAL HISTORY',
    'MEDICATIONS',
    'INTRODUCTION',
    'DESCRIPTION OF THE RECORD',
    'IMPRESSION',
    'CLINICAL CORRELATION'
]

# Build a regexp that will match any of the section names
exp = '|'.join(sections)

with open(some_file, 'r') as r:
    contents_of_file = r.read()
    infos = list(re.split(exp, contents_of_file)) # infos is a list of what's between the section names
    infos = [info.strip('\n :') for info in infos] # let's get rid of colons and whitespace in our infos
    print(infos) # you don't have to print it :)

<小时/>

如果我使用您的示例文本而不是文件，它会打印类似的内容:

['', 'Some information.', 'Other information', 'Some more information.', 'Some information here....\nanother line of information', 'More info', 'The last bit of information']

<小时/>

第一个元素是空的，但您可以简单地通过这样做来删除它:

infos = infos[1:]

<小时/>

顺便说一句，如果我们将处理信息的行合并为一个，它可能会更干净，并且肯定会更有效(但可能有点难以理解):

infos = [info.strip('\n :') in re.split(exp, contents_of_file)][1:]

关于python - 使用分隔符将文件中的多行存储到变量，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36187884/

python - 使用分隔符将文件中的多行存储到变量

上一篇：python - 训练 Tesseract OCR 消除歧义

下一篇：python - 以 Float 形式表示的年份到 Datetime64