Python : Regex, 在字符串上查找重复项

我需要在文本字符串中查找重复项。我已经找到了一个非常好的优雅解决方案 here来自@Tim Pietzcker

我对现有的解决方案很满意，但想知道是否可以进一步扩展它，使其接受带有空格的字符串。

例如 "a bcab c" 将返回 [(abc,2)]

我尝试使用正则表达式模式 "([^\s]+?)\1+") 但没有成功。非常感谢任何帮助。

最佳答案

您应该首先考虑从文本中删除“”。您可以通过正则表达式本身来完成。

>>> def repetitions(s):
...    r = re.compile(r"(.+?)\1+")
...    for match in r.finditer(re.sub(r'\s+',"",s)):
...        yield (match.group(1), len(match.group(0))/len(match.group(1)))
...

输出。

>>> list(repetitions("a bcab c"))
[('abc', 2)]

如果您仍想保留原始文本中的空格，请尝试使用此正则表达式:r"(\s*\S+\s*?\S*?)\1+"。但这有局限性。

>>> def repetitions(s):
...    r = re.compile(r"(\s*\S+\s*?\S*?)\1+")
...    for match in r.finditer(s):
...        yield (match.group(1), len(match.group(0))/len(match.group(1)))
...

结果:

>>> list(repetitions(" abc abc "))
[(' abc', 2)]
>>> list(repetitions("abc abc "))
[('abc ', 2)]
>>> list(repetitions(" ab c ab c "))
[(' ab c', 2)]
>>> list(repetitions("ab cab c "))
[('ab c', 2)]
>>> list(repetitions("blablabla"))
[('bla', 3)]

关于Python : Regex, 在字符串上查找重复项，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55272367/

上一篇：python - 在 Web App 中设置 Bokeh 图的绝对屏幕位置

下一篇：python - 使用 xlrd 从 excel 表中导入 python 中的数字列表

相关文章：

php - 如何从 php 文件中读取所有数值？

python - VIM:在 python 模式下使用 python3 解释器

python - PermissionError : [WinError 32] None:

javascript - 带有 'and' 运算符的 Perl/Javascript 正则表达式

regex - XSD 中不区分大小写的正则表达式

Javascript:获取字符串中第一个特殊正则表达式字符的索引

.net - 如何过滤文件上传控件？

python - python中一个类的提款方法

python - 使用作为字符串一部分的整数迭代 for 循环

带有 call_later 的 Python asyncio 递归