java - 正则表达式导致 StackOverflowError

标签 java regex scala parsing

我有以下正则表达式

(([^'])|(''))* 

通常能够很好地解析文字

但是使用以下文本(你可以尝试一下):

xxxxxxx_xxx_xxxxxxx=xxxxxx.xxxxxxxxxxx.xxxxxxx.xxxxxxxxxx`2[xxxxxx.xxxxxx,xxxxxx.xxxxxx]xxxxxxx_xxxxxxxxxxxxxxxx_xxxxxxxxxxx_xxxxx_xxx_xxxxxxxxxxxxxxxxx=1455544499467&xxxxxxxxx_xxxxxxxxxxxxxxxx_xxxxxxxxxxx_xxxxx_xxx_xx=xxxxxxx &xxxxxxx_xxxxxxxxxxxxxxxx_xxxxxxxxxxx_xxxxx_xxx_xxxxxxxxx=xxxxxxx_701454xx-x23x-4x31-xx75-xxx185x3xx26&xxxxxx_xxxxxxxx_xxx_xxxxxxxxxx=xxxxx&xxxxxxxxxxxxxxxxxx_xxxxxxxxxxx_xxxxx_xxx_xxxxxxxxxx=xxxxxxx&xxxxxxxx_xxx_xx=xxxxxxx_2x542x7x-7x94-4867-8819-239x732xx3x1&xxxxxxxxxxxxx_xxxxxxxxxxxxxxxx_xxxxxxxxxxx_xxxxx_xxx_xxxxxxxxxxxxxxx=xxxxx&xxxxxxxxxxxxxxxxxx_xxxxxxxxxxx_xxxxx_xxx_xxxxxxxxxxxxxx=xx-xx&xxxxxxx_xxxxxxxxxxxxxxxx_xxxxxxxxxxx_xxxxx_xxx_xxxxxxxx=7&xxxxxxxxxxxxxxxx_xxxxxxxxxxx_xxxxx_xxx_xxxxxxxxxxxxx=xxx2012x2xx.xxx.xxx:82&xxxxxxx_xxxxxxxxx_xxxxxxxxxxxxxxxx_xxxxxxxxxxx_xxxxx_xxx_xxxxxxx=11.0&xxxxxxxxx_xxxxxxxxxxxxxxxx_xxxxxxxxxxx_xxxxx_xxx_xxxxxxx=xxxx&xxxxxxxxxxxxxxxxxx_xxxxxxxxxxx_xxxxx_xxx_xxxxxxxxxxxx=xxxxxxxxx&xx_xxxxxxxxx_xxxxxxxxxxxxxxxx_xxxxxxxxxxx_xxxxx_xxx_xxxxxxx=xx 6.3&xxxxxxxxxxx_xxxxx_xxx_xxxxxxxxxxxxxxxxxxx=<xxxxxxx-xxxxx xxxxx="xxx">xxxxxx xxxxxx"></xxxxxxx><xxxxxxx-xxxx xxx-xxxxx="xx_xxxx" xxxxxxxx="xxxx" xxxx="xxxxxxxxxxx xxxx"><xxxx-xx-xxxxxx xxxxxxxxxxx="xxxx"><xxxxx>xxxxxxxxxx</xxxxx><xxxxx>xxxxx</xxxxx><xxxxx>xxx</xxxxx><xxxxx>xxxxxxxxx</xxxxx><xxxxx>xxxxxxx</xxxxx></xxxx-xx-xxxxxx></xxxxxxx-xxxx><xxxxxxx-xxxx xxx-xxxxx="xx_xxxxxxxx" xxxxxxxx="xxxxxxxx" xxxx="xx_xxxxxxxx"/><xxxxxxx-xxxx xxx-xxxxx="xx_xx_xxxxx" xxxxxxxx="xxxx" xxxx="xxxxxxx - xxxxxx xxxxxx?"><xxxx-xx-xxxxxx xxxxxxxxxxx="xxxxx"> 2</xxxxx></xxxx-xx-xxxxxx><xxxxx>xxxx</xxxxx><xxxxx>xxxxx</xxxxx></xxxx-xx-xxxxxx></xxxxxxx-xxxx><xxxxxxx-xxxx xxx-xxxxx="xx_xxxx_xxxx" xxxxxxxx="xxxx" xxxx="xx_xxxx_xxxx"></xxxxxxx><xxxxxxx-xxxx xxx-xxxxx="xx_xxxxxxx" xxxxxxxx="xxxxxx" xxxx="xx_xxxxxxx"></xxxxxxx><xxxxxxx-xxxx xxx-xxxxx="xx_xx_xxxxxxxxxxxxxx_xxxxxx" xxxxxxxx="xxxx"xxxx="xxxxxxx - xxxxxxxxxxxxxx xxxxxx"></xxxxxxx><xxxxxxx-xxxx xxx-xxxxx="xx_xxxxxx" xxxxxxxx="xxxx" xxxx="xxxxxxxxxxx xxxxxx">')

我收到 StackOverflowError:

Exception in thread "main" java.lang.StackOverflowError
    at java.util.regex.Pattern$Branch.match(Pattern.java:4600)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$BranchConn.match(Pattern.java:4568)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$CharProperty.match(Pattern.java:3777)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)

可能是什么问题? 是否有可能因为字符串长度太长(2030个符号)而出现这种异常? 或者我这里是否有一些特殊符号导致此错误?

任何想法表示赞赏

最佳答案

首先,您不需要内部捕获组。重复捕获组仅捕获最后一次出现的情况,因此,它们在 Java 中没有用处。

接下来,您可以unroll这个正则表达式,以便它线性匹配(没有消耗大量资源的交替):

[^']*(?:''[^']*)*

请参阅updated regex演示。

图案详细信息:

  • [^']* - 匹配除 '
  • 之外的零个或多个字符
  • (?:''[^']*)* - 匹配零个或多个以下序列:
    • '' - 两个单撇号
    • [^']* - 除 '
    • 之外的零个或多个字符

关于java - 正则表达式导致 StackOverflowError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37390980/

相关文章:

java - 同步、集群 JMS 应用程序 (Weblogic)

java - 写入SD卡!安卓,txt数据

java - 以任意顺序对字符串进行排序

scala - Spark RDD未从Elasticsearch获取所有源字段

java - 错误消息: Can't determine a valid Java Home

javascript - 在名字/姓氏验证中只允许一个空格的正则表达式

regex - hive 正则表达式不工作

Ruby 正则表达式 Rubular 与现实

scala - 了解 Spark WindowSpec#rangeBetween

scala - 将 DStream 与动态数据集连接