Powershell 中的 RegEx，合并替换调用

我已经编写了自己的 CSS 压缩器来获得乐趣和利润(利润不多)，而且效果很好。我现在正在尝试简化它，因为我实际上是在过滤文件 10 次以上。小文件没什么大不了的，但文件越大，性能影响就越大。

是否有更优雅的方式来过滤我的输入文件？我假设正则表达式会有办法，但我不是正则表达式向导......

$a = (gc($path + $file) -Raw)
$a = $a -replace "\s{2,100}(?<!\S)", ""
$a = $a -replace " {",    "{"
$a = $a -replace "} ",    "}"
$a = $a -replace " \(",   "\("
$a = $a -replace "\) ",   "\)"
$a = $a -replace " \[",   "\["
$a = $a -replace "\] ",   "\]"
$a = $a -replace ": ",    ":"
$a = $a -replace "; ",    ";"
$a = $a -replace ", ",    ","
$a = $a -replace "\n",    ""
$a = $a -replace "\t",    ""

为了让您省去一点麻烦，我基本上使用第一个 -replace 来去除长度为 2-100 个字符的任何连续空白。其余的替换语句涵盖了在特定情况下清理单个空格。

我怎样才能合并它，所以我不会过滤文件 12 次？

最佳答案

negative lookbehind (?<!\S)在这种情况下使用:(?<!prefix)thing匹配左边没有前缀的东西。当你把它放在正则表达式的末尾时，后面什么也没有，我认为它什么都不做。您可能打算将其放在左侧，或者可能打算以负面方式向前看，我不会尝试猜测，我只是将其删除以获得此答案。 p>
您没有使用 character classes . abc查找文本 abc , 但将它们放在方括号和 [abc] 中寻找任何字符 a , b , c .
1. 使用它，您可以将最后两行合并为一行:[\n\t]替换换行符或制表符。
您可以使用正则表达式逻辑或 | 组合两个单独的(不替换)规则进行一场比赛:\s{2,100}|[\n\t] - 匹配空格或换行符或制表符。 (你可能会使用 OR 两次而不是字符，fwiw)。
使用 regex capture groups这允许您引用 正则表达式匹配的任何内容，而无需事先知道那是什么。
1. 例如"space bracket -> bracket"和 "space colon -> colon"和 "space comma -> comma"都遵循一般模式"space (thing) -> (thing)" .与尾随空格相同 "(thing) space -> (thing)" .
2. 将捕获组与字符类合并，将其余行合并为一个。

例如

$a -replace " (:)", '$1'    # capture the colon, replacement is not ':' 
                            # it is "whatever was in the capture group"

$a -replace " ([:,])", '$1' # capture the colon, or comma. Replacement  
                            # is "whatever was in the capture group"
                            # space colon -> colon, space comma -> comma

# make the space optional with \s{0,1} and put it at the start and end
\s{0,1}([:,])\s{0,1}  #now it will match "space (thing)" or "(thing) space"

# Add in the rest of the characters, with appropriate \ escapes
# gained from [regex]::Escape('those chars here')

# Your original:
$a = (gc D:\css\1.css -Raw)
$a = $a -replace "\s{2,100}(?<!\S)", ""
$a = $a -replace " {",    "{"
$a = $a -replace "} ",    "}"
$a = $a -replace " \(",   "\("
$a = $a -replace "\) ",   "\)"
$a = $a -replace " \[",   "\["
$a = $a -replace "\] ",   "\]"
$a = $a -replace ": ",    ":"
$a = $a -replace "; ",    ";"
$a = $a -replace ", ",    ","
$a = $a -replace "\n",    ""
$a = $a -replace "\t",    ""

# My version:
$b = gc d:\css\1.css -Raw
$b = $b -replace "\s{2,100}|[\n\t]", ""
$b = $b -replace '\s{0,1}([])}{([:;,])\s{0,1}', '$1'

# Test that they both do the same thing on my random downloaded sample file:
$b -eq $a

# Yep.

用另一个 | 再做一次将两者合而为一:

$c = gc d:\css\1.css -Raw
$c = $c -replace "\s{2,100}|[\n\t]|\s{0,1}([])}{([:;,])\s{0,1}", '$1'

$c -eq $a   # also same output as your original.

NB. that the space and tab and newline capture nothing, so '$1' is empty,
    which removes them.

而且您可以花费大量时间来构建您自己的不可读的正则表达式，这在任何实际场景中可能不会明显更快。 :)

注意。 '$1'在替换中，美元是 .Net 正则表达式引擎语法，而不是 PowerShell 变量。如果您使用双引号，PowerShell 将从变量 $1 进行字符串插值，并可能将其替换为任何内容。

关于Powershell 中的 RegEx，合并替换调用，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40430834/

Powershell 中的 RegEx，合并替换调用

上一篇：common-lisp - 为什么这个 lisp 基准测试(在 sbcl 中)这么慢？

下一篇：webpack - html-webpack-plugin:如何将标题等参数注入(inject)模板