python正则表达式，子组重复

我目前正在努力掌握 Python 中正则表达式的技能。在这两方面我都不是专家。也许我使用了错误的正则表达式术语，这就是我找不到答案的原因。如果是这样，请道歉。

我根据以下代码创建了一个测试字符串和两个不同的正则表达式:

teststring = "This is just a string of literal text with some 0987654321 and an issue in it"

reg = re.compile(r"([0-9]{3})*",re.DEBUG)
outSearch = reg.search(teststring)

print "Test with ([0-9]{3})*"
if outSearch:
    print "groupSearch = " + outSearch.group()
    print

reg = re.compile(r"([0-9]{3})+",re.DEBUG)
outSearch = reg.search(teststring)

print "Test with ([0-9]{3})+"
if outSearch:
    print "groupSearch = " + outSearch.group()

此测试 cde 产生以下输出:

max_repeat 0 4294967295
  subpattern 1
    max_repeat 3 3
      in
        range (48, 57)
Test with ([0-9]{3})*
groupSearch = 

max_repeat 1 4294967295
  subpattern 1
    max_repeat 3 3
      in
        range (48, 57)
Test with ([0-9]{3})+
groupSearch = 098765432

现在是有趣的部分:我希望两个正则表达式返回相同的结果。例如，我用 ([0-9]{3})+ 得到的结果。当我使用 ([0-9]{3})* 时，正则表达式匹配测试字符串，但 outSearch.group() 为空。谁能解释一下这是为什么？

顺便说一句。这两个正则表达式都没有实际用途，我只是想了解正则表达式在 Python 中的工作原理。

最佳答案

您的第一个代码使用 * 进行重复 - 这意味着它将匹配上一组的 0 次或多次 次出现。但是当您使用 + 重复时，这需要至少一次 出现。因此，包含仅一个可选组 的正则表达式将首先匹配字符串的开头，如果该组不接受第一个字符，则不匹配任何字符字符串。如果您检查每个匹配项的 start() 和 end()，这会更清楚:

teststring = "some 0987654321"
reg = re.compile(r"([0-9]{3})*",re.DEBUG)
outSearch = reg.search(teststring)

print("Test with ([0-9]{3})*")
if outSearch:
    print ("groupSearch = " + outSearch.group() + ' , ' + str(outSearch.start()) + ' , ' + str(outSearch.end()))

reg = re.compile(r"([0-9]{3})+",re.DEBUG)
outSearch = reg.search(teststring)

print("Test with ([0-9]{3})+")
if outSearch:
    print ("groupSearch = " + outSearch.group() + ' , ' + str(outSearch.start()) + ' , ' + str(outSearch.end()))

输出:

Test with ([0-9]{3})*
groupSearch =  , 0 , 0

Test with ([0-9]{3})+
groupSearch = 098765432 , 5 , 14

(第一个正则表达式的匹配项从索引 0 开始并在索引 0 结束 - 空字符串)

这不是 Python 独有的——这几乎是任何地方的预期行为:

https://regex101.com/r/BwMWTq/1

(点击进入其他语言 - 查看所有语言(不仅仅是 Python)如何在索引 0 处开始和结束它们的匹配)

关于python正则表达式，子组重复，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51762816/

python正则表达式，子组重复

上一篇：python - 如何从 Django 阻止列表中删除用户(用户被 Django throttle 阻止)？

下一篇：python - 线程中的 Django 未处理异常由 <function check_errors.<locals>.wrapper at 0x035D2618> 启动