regex - 如何捕获haskell正则表达式中的字符串？

使用 Text.Regex.Posix 模块，我可以检查字符串是否与正则表达式匹配，但我不知道如何捕获字符串中的元素

例如，我可以通过 fsharpx 捕获 3 个元素，如下所示:

Match @"(?i:MAIL\s+FROM:\s*<([a-zA-Z0-9]+)@([a-zA-Z0-9]+(\.[a-zA-Z0-9]+)+)>\s*(SIZE=([0-9]+))*)" mailMatch ->

我能捕获

([a-zA-Z0-9]+) by mailMatch.Groups.[0].ToString() 
([a-zA-Z0-9]+(\.[a-zA-Z0-9]+)+) by mailMatch.Groups.[1].ToString() 
([0-9]+))* by mailMatch.Groups.[2].ToString()

但我不知道如何在 haskell 中执行此操作

我需要一些例子，谢谢!

最佳答案

首先，据我所知，您显示的正则表达式不是 POSIX 正则表达式。因此，您应该import Text.Regex.PCRE 而不是import Text.Regex.Posix，因为这是正则表达式的扩展版本。

其次，正则表达式本身应该转义反斜杠，因此您应该重写:

<s>regex = "(?i:MAIL\s+FROM:\s*<([a-zA-Z0-9]+)@([a-zA-Z0-9]+(\.[a-zA-Z0-9]+)+)>\s*(SIZE=([0-9]+))*)"</s>

进入:

regex = "(?i:MAIL\\s+FROM:\\s*<([a-zA-Z0-9]+)@([a-zA-Z0-9]+(\\.[a-zA-Z0-9]+)+)>\\s*(SIZE=([0-9]+))*)"

现在我们可以使用(=~)运算符:

Prelude Text.Regex.PCRE> "MAIL  FROM: <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9afcf5f5daf8fbe8b4f9f5f7" rel="noreferrer noopener nofollow">[email protected]</a>> SIZE=1" =~ regex :: [[String]]
[["MAIL  FROM: <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="75131a1a351714075b161a18" rel="noreferrer noopener nofollow">[email protected]</a>> SIZE=1","foo","bar.com",".com","SIZE=1","1"]]

因此，我们在这里指定结果是字符串列表的列表[[String]]。每个子列表都是正则表达式的匹配。因此，如果文本出现三个匹配项，我们就有三个子列表。对于每个子列表，我们都会看到捕获。第一次捕获是完整匹配，第二次捕获是捕获组 1，依此类推。

如果您确定只有一场比赛，您可以使用:

[[_,user,domain,topdomain,_,size]] = "MAIL  FROM: <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="690f0606290b081b470a0604" rel="noreferrer noopener nofollow">[email protected]</a>> SIZE=1" =~ regex :: [[String]]

那么结果是:

Prelude Text.Regex.PCRE> user
"foo"
Prelude Text.Regex.PCRE> domain
"bar.com"
Prelude Text.Regex.PCRE> topdomain
".com"
Prelude Text.Regex.PCRE> size
"1"

请注意，这种模式匹配往往不安全，因此您最好在程序中使用更安全、更全面的解决方案。

关于regex - 如何捕获haskell正则表达式中的字符串？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46703780/

regex - 如何捕获haskell正则表达式中的字符串？

上一篇： Pandas 用通配符重命名列

下一篇：clojure - 在 Clojure 中，如何添加对空等常见函数的支持？算我的新类型吗？