Python 等效于 Perl 在 if block 中匹配和捕获

我正在慢慢地从 Perl 转向 Python，并试图了解使用正则表达式的最佳实践。

我有以下 Perl 代码 - 此代码基本上将字符串作为输入，并根据正则表达式匹配和捕获吐出重新排列的字符串作为输出:

#!/usr/bin/env perl

use strict;
use warnings;

my $str = $ARGV[0] || die "Arg?";

my $result;

if($str =~ m/^\d{12}$/) {
    $result = $str;
} elsif($str =~ m{^(\d{2})/(\d{2})/(\d{4})$}) {
    $result = "${1}${2}0000${3}";
} elsif($str =~ m{^(\d{4})$}) {
    $result = "01010000${1}";
} else {
    die "Invalid string";
}

print("Result: $result\n");

Python 3 中什么是好的等价物？

到目前为止我想到了以下内容，但在 elif 部分匹配两次似乎效率不高。在开始时编译所有正则表达式似乎也很低效。

#!/usr/bin/env python3

import re, sys

str = sys.argv[1]

p1 = re.compile('\d{12}')
p2 = re.compile('(\d{2})/(\d{2})/(\d{4})')
p3 = re.compile('(\d{4})')

if p1.match(str):
    result = str
elif p2.match(str):
    m = p2.match(str)
    result = '%s%s0000%s' % (m.group(1), m.group(2), m.group(3))
elif p3.match(str):
    m = p3.match(str)
    result = '01010000%s' % (m.group(1))
else:
    raise Exception('Invalid string')

print('Result: ' + result)

鉴于 Python 的座右铭“应该有一个——最好只有一个——显而易见的方法来做到这一点”——关于这里最好的方法是什么有什么想法/建议吗？

预先感谢您提出任何建议。

最好的问候， -帕维尔

最佳答案

关于您的代码的几点说明:

预编译正则表达式
如果您不打算重用它们，则无需显式编译正则表达式。通过使用模块级函数，您可以获得更清晰的代码:
使用 m = re.match(pattern, text)
而不是 p1 = re.compile(pattern) 后跟 m = p1.match(str)
尝试匹配，如果匹配 - 使用匹配组格式化输出
Python 正则表达式工具提供了一个完全适合您的情况的函数:re.subn()。它执行正则表达式替换并返回一些替换。
性能注意事项
- re.match() 调用了两次 - 它将尝试匹配同一行两次并返回两个不同的匹配对象。这可能会花费您一些额外的周期。
- re.compile()(或模块级匹配函数)调用了两次 - 根据 docs 没问题:
  
  Note: The compiled versions of the most recent patterns passed to re.compile() and the module-level matching functions are cached, so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions.
- 如何避免正则表达式预编译
  该代码定义了匹配输入字符串时应遵循的正则表达式顺序。仅当我们 100% 确定我们需要它时才编译正则表达式才有意义。请参阅下面的代码。这比实际解释要简单得多。
- 过早的优化
  您没有遇到任何性能问题，是吗？通过尽早优化这一点，您可能会花费一些时间而没有任何可观察到的效果。

座右铭:

import re

rules = ( (r'\d{12}', r'\g<0>')
        , (r'(\d{2})/(\d{2})/(\d{4})', r'\1\g<2>0000\3') 
        #using r'\1\20000\3' would imply group 1 followed by group 20000!
        , (r'(\d{4})', r'01010000\1') )

def transform(text):
    for regex, repl in rules:
        # we're compiling only those regexes we really need
        result, n = re.subn(regex, repl, text)
        if n: return result
    raise ValueError('Invalid string')

tests = ['1234', r'12/34/5678', '123456789012']
for test in tests:
    print(transform(test))

transform('this line supposed to trigger exception')

希望对您有所帮助

关于Python 等效于 Perl 在 if block 中匹配和捕获，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28311708/

Python 等效于 Perl 在 if block 中匹配和捕获

上一篇：python - 按 Pandas 分组计算同比增长

下一篇：python - 当我为变量 URL 指定转换器时，为什么我的 Flask 路由列表功能会失败？

Python 等效于 Perl 在 if block 中匹配和捕获

上一篇：python - 按 Pandas 分组计算同比增长

下一篇：python - 当我为变量 URL 指定转​​换器时，为什么我的 Flask 路由列表功能会失败？

下一篇：python - 当我为变量 URL 指定转换器时，为什么我的 Flask 路由列表功能会失败？