python - 哪个正则表达式更有效?

标签 python regex

s/(?P<head>\[\[foo[^\[]*)abc/\g<head>def

s/(?=\[\[foo[^\[]*)abc/def

哪个更有效率?还有其他方法可以提高效率吗?请注意,尽管出于说明目的我使用了 Perl 风格的语法,但实际上我使用的是 Python 的 re 库,它不允许使用 \K (keep) 关键字。

最佳答案

使用 (?P<head>\[\[foo[^\[]*)abc 模块的 python 中的 re 速度更快:

import time
import re

rec1 = re.compile('(?P<head>\[\[foo[^\[]*)abc')
rec2 = re.compile('(?=\[\[foo[^\[]*)abc')

total1, total2 = 0.0, 0.0

def timeRE(ver):
    x = ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_1234567890_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" * 100)
    t1 = time.time()
    if ver is 1:
        rec1.sub("def", x)
    else:
        rec2.sub("def", x)
    return (time.time() - t1)

for x in xrange(50000):
    total1 += timeRE(1)

for x in xrange(50000):
    total2 += timeRE(2)

print total1
print total2

输出:

4.27380466461
16.9591507912

编辑(在同一个循环中多运行几次调用):

for x in xrange(50000):
    total1 += timeRE(1)
    total2 += timeRE(2)

输出:

4.26199269295
17.2384319305

编辑(修复子匹配问题):

import time
import re
rec1 = re.compile('(?P<head>\[\[foo[^\[]*)abc')
rec2 = re.compile('(?=\[\[foo[^\[]*)abc')
total1, total2 = 0.0, 0.0
def timeRE(ver):
    x = ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_1234567890_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" * 100)
    t1 = time.time()
    if ver is 1:
        rec1.sub("\g<head>def", x)
    else:
        rec2.sub("def", x)
    return (time.time() - t1)

for x in xrange(50000):
    total1 += timeRE(1)
    total2 += timeRE(2)
print total1
print total2

输出:

Run 1:
4.62282061577
17.8212277889

Run 2:    
4.6660721302
17.1630160809

Run 3:
4.62124109268
17.21393013

编辑(使用匹配正则表达式的字符串):

import time
import re

rec1 = re.compile('(?P<head>\[\[foo[^\[]*)abc')
rec2 = re.compile('(?=\[\[foo[^\[]*)abc')
total1, total2 = 0.0, 0.0

def timeRE(ver):
    x = ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_1234567890_<head>_<tail>_</head>_</tail>_abcdefghijklmnopqrstuvwxyz_<head>[[fooBAR_ABCDEFGHIJKLMNOPQRSTUVWXYZ_abc]]]]defghiojklmnopqrstuvwyz" * 100)
    t1 = time.time()
    if ver is 1:
        rec1.sub("\g<head>def", x)
    else:
        rec2.sub("def", x)
    return (time.time() - t1)

for x in xrange(50000):
    total1 += timeRE(1)
    total2 += timeRE(2)

print total1
print total2

输出:

23.4271130562
29.6934807301

最后一次运行:

import time
import re
rec1 = re.compile('(?P<head>\[\[foo[^\[]*)abc')
rec2 = re.compile('(?=\[\[foo[^\[]*)abc')
total1, total2 = 0.0, 0.0
def timeRE(ver):
    x = ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_1234567890_<head>_<tail>_</head>_</tail>_abcdefghijklmnopqrstuvwxyz_<head>[[fooBAR_ABCDEFGHIJKLMNOPQRSTUVWXYZ_abc]]]]defghiojklmnopqrstuvwyz" * 100)
    t1 = time.time()
    if ver is 1:
        rec1.sub("\g<head>def", x)
    else:
        rec2.sub("def", x)
    return (time.time() - t1)
for x in xrange(50000):
    total1 += timeRE(1)
    total2 += timeRE(2)
print "Method 1: Avg run took: %+0.7f - With a total of: %+0.7f" % ((total1 / 50000.0), total1)
print "Method 2: Avg run took: %+0.7f - With a total of: %+0.7f" % ((total2 / 50000.0), total2)

输出:

Method 1: Avg run took: +0.0004924 - With a total of: +24.6196477
Method 2: Avg run took: +0.0005921 - With a total of: +29.6053855

关于python - 哪个正则表达式更有效?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7423525/

相关文章:

javascript - 正则表达式从 Windows 的文件名中删除特殊字符

python - sklearn 随机森林分类器可以按树调整样本大小,以处理类别不平衡吗?

python - 查找当前执行的是哪个库

python - 将项目插入 MongoDb 后,如何获取其 ObjectID?

regex - 你能解释为什么我的 Perl 正则表达式模式中的\G 会这样吗?

regex - 如何在 bash 脚本中存储 "time"函数输出的子字符串

javascript - 使用 django 和 angularjs 渲染 html 模板时出现 TemplateSyntaxError

Python - 获取字符串之间的差异

python - 用于文本的简单过滤器 Python 脚本

javascript - 使用 javascript 查找搜索词(片段)周围的单词