regex - 正则表达式替换中是否存在类似计数器变量的内容?

标签 regex language-agnostic

如果我有很多匹配项,例如在多行模式下,我想用部分匹配项以及递增的计数器号替换它们。

我想知道是否有任何正则表达式味道都有这样的变量。我找不到一个,但我似乎记得类似的东西存在...

我不是在谈论可以使用回调进行替换的脚本语言。这是关于能够在RegexBuddy,sublime文本,gskinner.com / RegExr等工具中执行此操作的方式……与您可以使用\ 1或$ 1引用捕获的子字符串的方式大致相同。

最佳答案

关于花式正则表达式的FMTEYEWTK

好的,我将从简单到高尚。请享用!

简单的S /// E解决方案

鉴于这种:

#!/usr/bin/perl

$_ = <<"End_of_G&S";
    This particularly rapid,
        unintelligible patter
    isn't generally heard,
        and if it is it doesn't matter!
End_of_G&S

my $count = 0;

然后这样:
s{
    \b ( [\w']+ ) \b
}{
    sprintf "(%s)[%d]", $1, ++$count;
}gsex;

产生这个
(This)[1] (particularly)[2] (rapid)[3],
    (unintelligible)[4] (patter)[5]
(isn't)[6] (generally)[7] (heard)[8], 
    (and)[9] (if)[10] (it)[11] (is)[12] (it)[13] (doesn't)[14] (matter)[15]!

Anon阵列解决方案中的内插代码

鉴于此:
s/\b([\w']+)\b/#@{[++$count]}=$1/g;

产生这个:
#1=This #2=particularly #3=rapid,
    #4=unintelligible #5=patter
#6=isn't #7=generally #8=heard, 
    #9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter!

用LHS中的代码代替RHS的解决方案

这会将增量放入匹配本身内:
s/ \b ( [\w']+ ) \b (?{ $count++ }) /#$count=$1/gx;

产生这个:
#1=This #2=particularly #3=rapid,
    #4=unintelligible #5=patter
#6=isn't #7=generally #8=heard, 
    #9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter!

口吃解决方案口吃解决方案

这个
s{ \b ( [\w'] + ) \b             }
 { join " " => ($1) x ++$count   }gsex;

生成以下令人愉快的答案:
This particularly particularly rapid rapid rapid,
    unintelligible unintelligible unintelligible unintelligible patter patter patter patter patter
isn't isn't isn't isn't isn't isn't generally generally generally generally generally generally generally heard heard heard heard heard heard heard heard, 
    and and and and and and and and and if if if if if if if if if if it it it it it it it it it it it is is is is is is is is is is is is it it it it it it it it it it it it it doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't matter matter matter matter matter matter matter matter matter matter matter matter matter matter matter!

探索边界

对于复数所有格,有更健壮的方法来限制单词边界(以前的方法不行),但是我怀疑您的奥秘在于触发++$count,而不是\b行为的精妙之处。

我真的希望人们理解\b不是他们认为的那样。
他们总是认为这意味着空格或字符串的边缘
那里。他们从不认为它是\w\W\W\w转换。
# same as using a \b before:
(?(?=\w) (?<!\w)  | (?<!\W) )

# same as using a \b after:
(?(?<=\w) (?!\w)  | (?!\W)  )

如您所见,它是有条件的,取决于它所触摸的内容。这就是(?(COND)THEN|ELSE)子句的用途。

这成为诸如以下问题的问题:
$_ = qq('Tis Paul's parents' summer-house, isn't it?\n);
my $count = 0;

s{
    (?(?=[\-\w']) (?<![\-\w'])  | (?<![^\-\w']) )
    ( [\-\w'] + )
    (?(?<=[\-\w']) (?![\-\w'])  | (?![^\-\w'])  )
}{
    sprintf "(%s)[%d]", $1, ++$count
}gsex;

print;

正确打印
('Tis)[1] (Paul's)[2] (parents')[3] (summer-house)[4], (isn't)[5] (it)[6]?

担心Unicode

1960年代风格的ASCII已过时约50年。就像您看到有人写[a-z]一样,这几乎总是错误的,事实证明,破折号和引号之类的东西也不应该显示为模式中的文字。在进行此操作时,您可能不想使用\w,因为它还包括数字和下划线,而不仅仅是字母。

想象一下这个字符串:
$_ = qq(\x{2019}Tis Ren\x{E9}e\x{2019}s great\x{2010}grandparents\x{2019} summer\x{2010}house, isn\x{2019}t it?\n);

您可以使用use utf8作为文字:
use utf8;
$_ = qq(’Tis Renée’s great‐grandparents’ summer‐house, isn’t it?\n);

这次,我将在模式上有所不同,将术语的定义与执行分开,以使其更具可读性和可维护性:
#!/usr/bin/perl -l
use 5.10.0;
use utf8;
use open qw< :std :utf8 >;
use strict;
use warnings qw< FATAL all >;
use autodie;

$_ = q(’Tis Renée’s great‐grandparents’ summer‐house, isn’t it?);

my $count = 0;

s{ (?<WORD> (?&full_word)  )

   # the rest is just definition
   (?(DEFINE)

     (?<word_char>   [\p{Alphabetic}\p{Quotation_Mark}] )

     (?<full_word>

             # next line won't compile cause
             # fears variable-width lookbehind
             ####  (?<! (?&word_char) )   )
             # so must inline it

         (?<! [\p{Alphabetic}\p{Quotation_Mark}] )

         (?&word_char)
         (?:
             \p{Dash}
           | (?&word_char)
         ) *

         (?!  (?&word_char) )
     )

   )   # end DEFINE declaration block

}{
    sprintf "(%s)[%d]", $+{WORD}, ++$count;
}gsex;

print;

该代码在运行时会产生以下结果:
(’Tis)[1] (Renée’s)[2] (great‐grandparents’)[3] (summer‐house)[4], (isn’t)[5] (it)[6]?

好的,所以可能是关于花哨的正则表达式的 FMTEYEWTK,但是您不高兴被问到吗? ☺

关于regex - 正则表达式替换中是否存在类似计数器变量的内容?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4213800/

相关文章:

language-agnostic - 为什么 ||= 不是有效的运算符?

javascript - 如何使用原始javascript正确突出显示单词?

python - Python 正则表达式模块中的简单案例折叠与完整案例折叠

Ruby 正则表达式匹配维度

language-agnostic - 并行代码文档的哪种图表?

algorithm - 有效地从集合中检索最近元素的数据结构

regex - Swift 和正则表达式,cpu 为某些字符串失控

javascript - 通过 RegEx Javascript 按 2 个字符串分隔符分割

parsing - 以编程方式从 PDF 文件中提取文本(手动) - 丢失一些文本

database - 通用开源分类数据库