python - 分组而不为 re.findall() 创建新目标

我想概括this question 有没有办法对不扩大 re.findall() 元组的元素进行分组？

我的例子:

line="(1 (2 (1 (1 (1 (2 You) (1 (2 (2 wo) (2 n't)) (2 (2 like) (2 Roger)))) (2 ,)) (2 but)) (2 (2 you) (3 (3 (2 will) (2 quickly)) (2 (2 recognize) (2 him))))) (2 .))\n"
numR=re.compile(r"\({1}(\d)? ((')*\w+|('|\.|,))\){1}")
re.findall(numR,line)
[('2', 'You', '', ''),
 ('2', 'wo', '', ''),
 ('2', 'like', '', ''),
 ('2', 'Roger', '', ''),
 ('2', ',', '', ','),
 ('2', 'but', '', ''),
 ('2', 'you', '', ''),
 ('2', 'will', '', ''),
 ('2', 'quickly', '', ''),
 ('2', 'recognize', '', ''),
 ('2', 'him', '', ''),
 ('2', '.', '', '.')]

如您所见，元组末尾包含 2 个不必要的元素

最佳答案

在您的模式中，您有 4 个捕获组。您可以在第二个捕获组中使用单个交替，总共只有 2 个捕获组。

\((\d) ([.',]|\w+)\)

解释

\( 匹配 (
(\d) 第 1 组，捕获一个数字(使用 \d+ 获取 1+ 个数字)
( 第 2 组
- [.',]|\w+ 匹配字符类中列出的一个或mat 1+个单词字符
) 关闭第 2 组
\)

Regex demo | Python demo

import re
line="(1 (2 (1 (1 (1 (2 You) (1 (2 (2 wo) (2 n't)) (2 (2 like) (2 Roger)))) (2 ,)) (2 but)) (2 (2 you) (3 (3 (2 will) (2 quickly)) (2 (2 recognize) (2 him))))) (2 .))\n"
numR=re.compile(r"\((\d) ([.',]|\w+)\)")
print(re.findall(numR,line))

结果

[('2', 'You'), ('2', 'wo'), ('2', 'like'), ('2', 'Roger'), ('2', ','), ('2', 'but'), ('2', 'you'), ('2', 'will'), ('2', 'quickly'), ('2', 'recognize'), ('2', 'him'), ('2', '.')]

请注意，您可以省略 {1} 并且替换 ('|\.|,) 可以使用字符类 [.', ]

关于python - 分组而不为 re.findall() 创建新目标，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56823816/

上一篇：python - 'BooleanField' 对象在 Django 中没有属性 'use_required_attribute'

下一篇：python - 我如何打印列表中的项目，直到它们达到一定数量的行

相关文章：

python - 不断附加数据时从python中的多个线程并发访问列表

python - 与各种列表合并

regex - .htaccess RewriteRule 使用查询字符串重写条件，但不将查询字符串添加到最终网址

javascript - 检测并替换 Markdown 文档中的文本段落

python - 为什么我不能在 python 中使用列表作为 dict 键？

python - 如何使用 Selenium Python 获取文本颜色

python - 使用pyspark中的函数进行行操作

regex - nginx 重写以删除旧 URL 的日期路径

haskell - 列表理解元组列表，haskell

Python 3 : How to compare two tuples and find similar values?