python - re.findall 多行 python

re.findall 和 re.M 没有找到我正在尝试搜索的多行

我正在尝试从文件中提取与模式匹配的所有多行字符串

文件 book.txt 中的示例:

Title: Le Morte D'Arthur, Volume I (of II)
       King Arthur and of his Noble Knights of the Round Table

Author: Thomas Malory

Editor: William Caxton

Release Date: March, 1998  [Etext #1251]
Posting Date: November 6, 2009

Language: English

Title: Pride and Prejudice

Author: Jane Austen

Posting Date: August 26, 2008 [EBook #1342]
Release Date: June, 1998
Last Updated: October 17, 2016

Language: English

以下代码仅返回第一行Le Morte D'Arthur, Volume I (of II)

re.findall('^Title:\s(.+)$', book, re.M)

我期望输出是

['亚瑟王之死，第一卷(下)\n 亚瑟王和他的圆 table 贵族骑士'，'傲慢与偏见']

澄清一下，
- 第二行是可选的，它存在于某些文件中，但不存在于其他文件中。另外，第二行之后还有更多我不想阅读的文本。
- 使用 re.findall(r'Title: (.+\n.+)$', text, flags=re.MULTILINE) 可以工作，但如果第二行只是空白，则会失败。
- 我正在运行 python3.7。
- 我正在将 txt 文件转换为字符串，然后在 str 上运行 re。
- 以下也不起作用:
re.findall(r'^Title:\s(.+)$', text, re.S)
re.findall(r'^Title:\s(.+)$', text, re.DOTALL)

最佳答案

我猜也许这个表达，

(?<=Title:\s)(.*?)\s*(?=Author)

可能接近于所期望的设计。

DEMO

测试

import re

regex = r"(?<=Title:\s)(.*?)\s*(?=Author)"

test_str = ("Title: Le Morte D'Arthur, Volume I (of II)\n"
    "       King Arthur and of his Noble Knights of the Round Table\n\n"
    "Title: Le Morte D'Arthur, Volume I (of II)\n"
    "       King Arthur and of his Noble Knights of the Round Table")

print(re.findall(regex, test_str, re.DOTALL))

输出

["Le Morte D'Arthur, Volume I (of II)\n       King Arthur and of his Noble Knights of the Round Table\n\n", "Le Morte D'Arthur, Volume I (of II)\n       King Arthur and of his Noble Knights of the Round Table"]

关于python - re.findall 多行 python，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57097534/

python - re.findall 多行 python

DEMO

测试

输出

上一篇：python - 如何按年份计算 pandas dataframe 列中最常出现的单词？

下一篇：python - 如何在 python 上读取具有相似名称的文件，重命名它们，然后使用它们？