python - 正则表达式捕获一组的多个重复

我有一个示例文本:

Lorem ipsum AB-CD-01 dolor sit amet, AB-CD-Foobar consectetur

我想捕获所有短语 AB-CD-*。我正在尝试类似的事情:

pattern = re.compile("((AB-CD-\S+).*)*")
result = pattern.search(text)
print(result.groups()) # expected: ('AB-CD-01', 'AB-CD-Foobar')

我知道这是相当简单和基本的正则表达式问题，但我找不到任何好的解决方案。

最佳答案

您可以使用更简单的模式 re.findall :

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

使用更新的正则表达式的示例代码:

import re
p = re.compile(r'AB-CD-\S+')
test_str = "Lorem ipsum AB-CD-01 dolor sit amet, AB-CD-Foobar consectetur"
print(re.findall(p, test_str))
# => ['AB-CD-01', 'AB-CD-Foobar']

参见IDEONE和 regex demo

re.search只查找第一个匹配项，re.findall 返回匹配列表(如果模式中没有定义捕获组 - 这就是我建议删除它们的原因)。

关于python - 正则表达式捕获一组的多个重复，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34895656/