python - 如何按 1)、2) 拆分文本？

我想按子部分 1.、2.、... 拆分我的文本

import re

s = "1. First sentence. \n2. Second sentence. \n1. Another sentence. \n3. Third sentence."

l = re.compile('\n(?=[0-9].)').split(s)

用我的正则表达式我得到: ['1。第一句。 '，'2。第二句。 '，'1。又一句。 '，'3。第三句。']

但我只想在数字优于前一个时拆分 ['1。第一句。 '，'2。第二句。 1. 另一个句子。 '，'3。第三句。']

对于这个例子，我想要一个包含 3 个元素而不是 4 个元素的列表。

最佳答案

您不能只使用正则表达式来做到这一点，因为正则表达式引擎将文本作为文本进行匹配，并且不能递增或递减找到的数值并在匹配时比较它们。您只有在获得所有匹配项后才能执行此操作。

我建议使用正则表达式提取所有要点及其对应的编号，然后分析结果并重新构建最终列表:

import re
s = "1. First sentence. \n2. Second sentence. \n1. Another sentence. \n3. Third sentence."
l = re.findall(r'(?:^|\n)(([0-9]+)\.[\s\S]*?)(?=\n[0-9]+\.|\Z)', s)
curr_num = 0                  # Init the current number to 0
result = []                   # The final bullet point list
for s,num in l:               # Iterate over the list of results
    if curr_num > int(num):   # If curr_num is greater than the number found
        if not result:        # If it is the first item, 
            result = ['']     #    we need to add an empty item
        result[-1] += s       # Append the text to the last item
    else:                     # else
        result.append(s)      # Append the line to the resulting list
    curr_num = int(num)       # Assign the current number
    
print(result) 
# => ['1. First sentence. ', '2. Second sentence. 1. Another sentence. ', '3. Third sentence.']

参见 Python demo和 regex demo .

详细信息:

(?:^|\n) - 字符串或换行符的开头
(([0-9]+)\.[\s\S]*?) - 第 1 组匹配
- ([0-9]+) - 第 2 组:一个或多个数字
- \. - 一个点
- [\s\S]*? - 尽可能少的任何零个或多个字符
(?=\n[0-9]+\.|\Z) - 到最左边的换行符，一位或多位数字，然后是 。 ( \n[0-9]+\.) 或字符串结尾 (\Z).

关于python - 如何按 1)、2) 拆分文本？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65577485/

python - 如何按 1)、2) 拆分文本？

上一篇：django - 如何在不反转 Django 数据库中的对象值的情况下制作 obj.save()

下一篇：python - 如何使用 Python 从 TOML 文件中读取 Google API 凭据？