regex - Perl RegEx 非捕获组在组内具有替代捕获

标签 regex perl regex-group

我正在尝试解析一些邮件日志,这些日志具有以下三种可能的中继格式。

Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<email@company.com>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=mail-company.com. [0.0.0.0], tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com
Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<email@company.com>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=[0.0.0.0], tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com
Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<email@company.com>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=mail-company.com., tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com

使用此代码:
my $topat    = '^(\w{3})\s{1,2}(\d{1,2}) (\d{2}:\d{2}:\d{2}).+ sendmail\[\d.+\]: (\w+): to=<(\S+)>(?:,|, \[more\],) delay.+, relay=(?:(?:\S+ )?\[(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]|(\S+)\.), .+, stat=(.+)';

foreach my $line(@i) {
  if($line =~ /$topat/){
    my ($month, $day, $time, $id, $addy, $relay, $stat) = ($line =~ m/$topat/);
     print $line;
     print "$addy $relay $stat\n";
  }
}

我收到以下错误:
Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<email@company.com>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=mail-company.com. [0.0.0.0], tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com
Use of uninitialized value $stat in concatenation (.) or string at ./reg_test line 26.
email@company.com 0.0.0.0 

Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<email@company.com>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=[0.0.0.0], tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com
Use of uninitialized value $stat in concatenation (.) or string at ./reg_test line 26.
email@company.com 0.0.0.0 

Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<email@company.com>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=mail-company.com., tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com
Use of uninitialized value $relay in concatenation (.) or string at ./reg_test line 26.
email@company.com  mail-company.com

在前两种情况下,它正确地获取了地址和中继,但没有获取统计信息。在第三个中,它获取地址和中继,但它认为 $relay 是空白的,而 $stat 是中继。

我尝试了许多不同的配置和组,但似乎找不到正确的解决方案。任何指针将不胜感激。

最佳答案

您在 relay 中有两种选择 field :

relay=(?:(?:\S+ )?\[(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]|(\S+)\.)
                    ^    ----      $6         ----     ^  | ^$7^ 

如果它不遵循第一个模式但与第二个模式匹配,则中继以 $7 结束。和 $stat . $stat永远不会正确填充,因为它需要 8 美元,而不是 7 美元。

您可以使用分支重置模式,该模式对所有备选方案使用相同的捕获编号:
(?|(?:\S+ )?\[(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]|(\S+)\.)
  ^

或者,使用原始正则表达式并填充两个变量:
    my ($month, $day, $time, $id, $addy, $relay, $relay_alt, $stat) = $line =~ m/$topat/;
    $relay //= $relay_alt;

关于regex - Perl RegEx 非捕获组在组内具有替代捕获,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58549275/

相关文章:

perl - 如何在 Perl 中将日期转换为纪元时间?

javascript - 将邮政编码的开头与indexOf或正则表达式匹配?

perl - 为什么 Perl 会提示 "Useless use of a constant in void context",但只是有时?

启动 PhantomJS 时出现 Perl 错误 "selenium server did not return proper status"

接受正则表达式组的前瞻部分

python - 正则表达式捕捉网址

regex - Perl:($num) 和 $num 之间的区别

javascript - Javascript:如何从文件名中删除非法URL字符?

Java : Extract numbers from a string

regex - 如何在 Perl 中替换字符串?