使用 re.match 的 Python 因长文本而挂起

我有一个包含域列表的文本文件，我想使用 python 正则表达式来匹配域和任何子域。

示例域文件

admin.happy.com
nothappy.com

我有以下正则表达式:

main_domain = 'happy.com'
mydomains = open('domains.txt','r').read().replace('\n',',')
matchobj = re.match(r'^(.*\.)*%s$' % main_domain,mydomains)

该代码适用于短文本，但当我的域文件有 100 多个条目时，它会挂起并卡住。

有没有一种方法可以优化正则表达式以处理文本文件中的内容？

最佳答案

(.*\.)* 很可能导致可怕的回溯。如果文件每行包含一个域，最简单的修复方法是在每一行上执行正则表达式，而不是一次执行整个文件:

main_domain = 'happy.com'
for line in open('domains.txt','r')):
    matchobj = re.match(r'^(.*\.)*%s$' % main_domain, line.strip())
    # do something with matchobj

如果您的文件只包含您发布的格式的域，您甚至可以进一步简化它，根本不使用正则表达式:

subdomains = []
for line in open('domains.txt','r')):
    line = line.strip()
    if line.endswith(main_domain):
        subdomains.append(line[:-len(main_domain)])

关于使用 re.match 的 Python 因长文本而挂起，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16581025/

上一篇：用于替换 C 文件中的#define 值的 Python 脚本

下一篇：python - 使用 lxml 解析大型 XML

相关文章：

regex - 匹配 Compojure/Clout 中的路线

python - 使用Python创建MongoDb数据库

python - 如何克服Python中的 "classmethod is not callable"

python - 按一列中出现的频率对整个 csv 进行排序，并仅显示实例的一个副本

javascript 在正则表达式执行后替换匹配项

用于保护 xSS 的 Java 正则表达式

python - 了解 Django 中的 View 评估

python - Pandas 熔化函数使用列索引位置而不是列名称

ruby-on-rails - 如何在 cucumber / capybara 步骤定义中进行xpath正则表达式搜索(Rails 3)？

Javascript 正则表达式原型(prototype)