查找连续重复的单词时 Python 后视正则表达式 "fixed-width pattern"错误

我有一段由 . 分隔的单词的文本, 具有 2 和 3 个连续重复单词的实例:

My.name.name.is.Inigo.Montoya.You.killed.my.father.father.father.Prepare.to.die-

我需要用正则表达式独立匹配它们，从一式三份中排除重复项。

因为有最大值。 3个连续重复的词，这个

r'\b(\w+)\.+\1\.+\1\b'

成功捕获

father.father.father

但是，为了捕捉 2 个连续的重复词，我需要确保下一个词和上一个词不相同。我可以做一个负面前瞻

r'\b(\w+)\.+\1(?!\.+\1)\b'

但我尝试进行负面回顾

r'(?<!(\w)\.)\b\1\.+\1\b(?!\.\1)'

返回固定宽度问题(当我保留 + 时)或其他问题。

我应该如何纠正负面回顾？

最佳答案

我认为可能有一种更简单的方法来捕获您想要的内容，而无需消极回头看:

r = re.compile(r'\b((\w+)\.+\2\.+\2?)\b')
r.findall(t)

> [('name.name.', 'name'), ('father.father.father', 'father')]

只是让第三次重复成为可选的。

一个版本可以捕获同一个词的任意数量的重复，看起来像这样:

r = re.compile(r'\b((\w+)(\.+\2)\3*)\b')
r.findall(t)
> [('name.name', 'name', '.name'), ('father.father.father', 'father', '.father')]

关于查找连续重复的单词时 Python 后视正则表达式 "fixed-width pattern"错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45334520/