regex - perl6 正则表达式 : match all punctuations except . 和“

我阅读了一些关于匹配“X 除外 Y”的主题，但没有特定于 perl6。我正在尝试匹配和替换除 .和 ”

> my $a = ';# -+$12,678,93.45 "foo" *&';
;# -+$12,678,93.45 "foo" *&

> my $b = $a.subst(/<punct - [\.\"]>/, " ", :g);
===SORRY!===
Unrecognized regex metacharacter - (must be quoted to match literally)
------> my $b = $a.subst(/<punct⏏ - [\.\"]>/, " ", :g);
Unrecognized regex metacharacter   (must be quoted to match literally)
------> my $b = $a.subst(/<punct -⏏ [\.\"]>/, " ", :g);
Unable to parse expression in metachar:sym<assert>; couldn't find final '>' (corresponding starter was at line 1)
------> my $b = $a.subst(/<punct - ⏏[\.\"]>/, " ", :g);

> my $b = $a.subst(/<punct-[\.\"]>/, " ", :g);
===SORRY!=== Error while compiling:
Unable to parse expression in metachar:sym<assert>; couldn't find final '>' (corresponding starter was at line 1)
------> my $b = $a.subst(/<punct⏏-[\.\"]>/, " ", :g);
    expecting any of:
        argument list
        term

> my $b = $a.subst(/<punct>-<[\.\"]>/, " ", :g);
===SORRY!===
Unrecognized regex metacharacter - (must be quoted to match literally)
------> my $b = $a.subst(/<punct>⏏-<[\.\"]>/, " ", :g);
Unable to parse regex; couldn't find final '/'
------> my $b = $a.subst(/<punct>-⏏<[\.\"]>/, " ", :g);

> my $b = $a.subst(/<- [\.\"] + punct>/, " ", :g); # $b is blank space, not want I want
                       
> my $b = $a.subst(/<[\W] - [\.\"]>/, " ", :g);
      12 678 93.45 "foo"   
# this works, but clumsy; I want to 
# elegantly say: punctuations except \, and \" 
# using predefined class <punct>;

最好的方法是什么？

最佳答案

我认为最自然的解决方案是使用“字符类算术表达式”。这需要使用 +和 -任意数量的前缀 Unicode properties或 [...]字符类:

                            #;# -+$12,678,93.45 "foo" *&
<+:punct -[."]>             #    +$12 678 93.45 "foo"

这可以理解为“具有 Unicode 属性 punct 减去 . 和 " 字符的字符类”。

您的输入字符串包括 +和 $ .这些不被视为“标点符号”字符。您可以将它们显式添加到被空格替换的字符集中:

<:punct +[+$] -[."] >       #      12 678 93.45 "foo"

(我在 + 之前去掉了开头的 :punct 。如果你没有为字符类算术表达式中的第一项写 + 或 - ，那么假定 + 。)

有一个 Unicode 属性涵盖所有“符号”，包括 +和 $所以你可以用它来代替:

<:punct +:symbol -[."] >    #      12 678 93.45 "foo"

回顾一下，您可以组合任意数量的:

Unicode 属性，如 :punct以 : 开头并对应于 Unicode 指定的某些字符属性；或

[...]枚举特定字符的字符类、反斜杠字符类(例如 \d )或字符范围(例如 a..z )。

如果整体<...>断言是一个字符类算术表达式然后开头后的第一个字符<必须是四个字符之一:

:引入 Unicode 属性(例如 <:punct ...> )；

[介绍[...]字符类(例如 <[abc ...> )；

+或 - .这后面可以跟空格。然后它必须后跟一个 Unicode 属性( :foo )或一个 [...]字符类(例如 <+ :punct ...> )。

此后，同一整体字符类算术表达式中的每个附加属性或字符类必须以 + 开头。或 -有或没有额外的空格(例如 <:punct - [."] ...> )。

您可以在括号中对子表达式进行分组。

我不确定 + 的确切语义是什么和 -是。我注意到这个令人惊讶的结果:

say $a.subst(/<-[."] +:punct>/, " ", :g); # substitutes ALL characters!?!

形式的内置插件 <...>在字符类算术表达式中不被接受。

即使它们在文档中被称为“字符类”，也是如此。这包括与字符类完全不同的类(例如 <ident> 在文档中称为字符类，即使它匹配多个字符的字符串，该字符串与特定模式匹配!)但也包括看起来像是字符类的类喜欢 <punct>或 <digit> . (其中许多后者直接对应于 Unicode 属性，因此您只需使用它们即可。)

使用反斜杠“字符类”，如 \d在字符类算术表达式中使用 +和 -算术你必须在 [...] 中列出它字符类。

组合断言

虽然 <punct>不能使用字符类算术与其他断言结合它可以使用 & regex conjunction operator 与其他正则表达式结构结合使用:

<punct> & <-[."]>           #    +$12 678 93.45 "foo"

根据编译器优化的状态(截至 2019 年，几乎没有对正则表达式引擎应用任何努力)，这通常比使用真实字符类要慢。

关于regex - perl6 正则表达式 : match all punctuations except . 和“，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57538843/

regex - perl6 正则表达式 : match all punctuations except . 和“

上一篇：multithreading - 动态变量和 promise

下一篇：wpf - 从 WPF ListView 中的按钮传递值