Python RegEx findall 没有响应

标签 python regex python-3.x

我刚刚遇到了一件奇怪的事情。我正在使用 Open ANC 进行文本爬行原型(prototype)设计作为语料库。

在某些文本中,re 模块只是没有响应。如果有人可以肯定 re 模块可以处理正则表达式的复杂性,我就很好。

正则表达式是前面的(?:[^A-Za-z0-9\n\r]*\w+[^A-Za-z0-9\n\r]*)+获得的

出现问题的文本是:

My claim is that Lincoln’s address expresses the same idea that was then current in Europe. Each people of common history and language constitutes a nation, and the natural form for the nation’s survival was in a state structure. The idea that Americans constituted an organic national unit explained, implicitly, why the eleven Southern states could not go their own way. As he assumed the presidency, Lincoln still spoke of the Union rather than a nation; but in the course of the debates in the decades immediately preceding, the notion of union had acquired the metaphysical qualities of nationhood. In his first inaugural address, Lincoln invoked the “bonds of affection,” and even before shots were fired on Fort Sumter in Charleston Harbor, he stressed the unbreakable ties of historical struggle:

产生问题的Python代码:

import re

txt = "post text here"
regex = r"preceding(?:[^A-Za-z0-9\n\r]*\w+[^A-Za-z0-9\n\r]*)+acquired"
re.findall(regex, txt)

最佳答案

您的模式受到 catastrophic backtracking 的影响.

这是一种适合您的输入的替代模式:

regex = r"preceding[^A-Za-z0-9\n\r]+(?:\w+[^A-Za-z0-9\n\r]+)+?acquired"

这假设必须始终有至少一个非单词字符来分隔单词(否则它只会匹配一个长的、完整的单词)。

(另请参阅:How can I recognize an evil regex?)

关于Python RegEx findall 没有响应,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59668935/

相关文章:

php - 另一个棘手的 preg_match

c# - 正则表达式匹配任何单词组合但不匹配单个小数

Python 从 MySQL 数据库中打印出错误的编码

python - 使用 Pandas 或其他模块在 Python 中读取没有隐藏列的 Excel 文件

java - 如何将Java中字符串上的所有 'x'替换为数字

python-3.x - 递归地从 N 中选择 K 个项目,直到为空

python - 在数据框的列之间创建成对关系

python - 基于唯一键简化字典列表的方法

python - 我怎样才能使这个 Python 递归函数返回一个平面列表?

python - 每当我运行Django程序时,我都会继续收到错误消息