python - 在 Python 中分割具有不同位置标记的文件的更好方法

我有以下类型的文件:

--- part0 ---
some
strings
--- part1 ---
some other
strings
--- part2 ---
...

我想获取文件的任何部分作为 python 列表:

x = get_part_of_file(part=0)
print x # => should print ['some', 'strings']
x = get_part_of_file(part=1)
print x # => should print ['some other', 'strings']

所以，我的问题是实现上面使用的get_part_of_file方法的最简单方法是什么。

我的(丑陋的)解决方案如下:

def get_part_of_file(part, separate_str="part"):
    def does_match_to_separate(line):
        return re.compile("{}.*{}".format(separate_str, part)).match(line)
    def get_first_line_num_appearing_separate_str(lines):
        return len(list(end_of_loop() if does_match_to_separate(line, part) else line for line in lines))

    with open("my_file.txt") as f:
      lines = f.readlines()

    # get first line number of the required part
    first_line_num = get_first_line_num_appearing_separate_str(part)
    # get last line number of the required part
    last_line_num = get_first_line_num_appearing_separate_str(part + 1) - 1  
    return lines[first_line_num:last_line_num]

最佳答案

您可以使用正则表达式来解析字符串。请查看此处的示例并在 regex101 上尝试一下。 :

--- part(?P<part_number>\d+) ---\s(?P<part_value>[\w\s]*)

这会将给定的字符串解析为以下组:

比赛 1 零件编号 [8-9] 0 部分值 [14-27] 一些字符串
比赛 2 零件编号 [35-36] 1 part_value [41-60] 其他一些字符串

现在在 python 中你无法使用

获取所有组

import re
parts = re.finditer(your_regex_pattern, text)

for p in parts:
   print("Part %s: %s" % (p.group('part_number'), p.group('part_value'))
   # or return the element with the part-number you want.

您可能遇到的唯一问题是，目前正则表达式模式仅涵盖字符、空格和换行符 \w\s。如果您的部分值中有其他字符，则必须扩展此模式以匹配更多字符。

关于python - 在 Python 中分割具有不同位置标记的文件的更好方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32175092/

python - 在 Python 中分割具有不同位置标记的文件的更好方法

上一篇：python - 按字母顺序并排排列应用程序窗口

下一篇：python - matplotlib 箱线图中 'label' 属性有什么用？