python - 如何使用正则表达式从文本中构建 python 列表？

我有以下一堆文字:

text = """SECTION 1. CHAPTER 1. Chapter title. Art. 1.- Lorem ipsum, blah, blah. Art 2.- More meaningless text. Art 3.- A little more text. CHAPTER 2. Another chapter. Art 4.- Lorem ipsum blah, blah, blah. Art. 5.- It's getting boring. SECTION 2. CHAPTER 1. Another chapter in another section. Art. 6.- The last text. SECTION 3. CHAPTER 1. Another chapter in another section. Art. 6.- The last text. SECTION 4. CHAPTER 1. Another chapter in another section. Art. 6.- The last text."""

我想拆分如下:

RE = r'(SECTION.*?SECTION)'
m = re.findall(RE, text, re.DOTALL)
sections = []
if m:
   for match in m:
        sections.append(match)

希望它会产生一个包含 4 个元素的列表，但我最终只有 2 个元素。

['SECTION 1. .....', 'SECTION 3. .....']  # only showing the first letters of each element

之后，我想对章节和文章做同样的事情。

有什么想法吗？

最佳答案

假设单词 SECTION 仅在字符串中有新的“section”时出现，您始终可以使用默认的 .split 方法，这样更容易比使用正则表达式。

这是一个例子:

text = """SECTION 1. CHAPTER 1. Chapter title. Art. 1.- Lorem ipsum, blah, blah. Art 2.- More meaningless text. Art 3.- A little more text. CHAPTER 2. Another chapter. Art 4.- Lorem ipsum blah, blah, blah. Art. 5.- It's getting boring. SECTION 2. CHAPTER 1. Another chapter in another section. Art. 6.- The last text. SECTION 3. CHAPTER 1. Another chapter in another section. Art. 6.- The last text. SECTION 4. CHAPTER 1. Another chapter in another section. Art. 6.- The last text."""

delimiter = 'SECTION'
sections = [delimiter + s for s in text.split(delimiter)[1:]]

结果将是:

>>> sections
['SECTION 1. ...', 'SECTION 2. ...', 'SECTION 3. ...', 'SECTION 4. ...']

关于python - 如何使用正则表达式从文本中构建 python 列表？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33976539/

python - 如何使用正则表达式从文本中构建 python 列表？

上一篇：Python 嵌套列表加入查找引用列表

下一篇：自定义类的 C++ begin() 和 end() 的 Python 等价物