java - 使用正则表达式获取值列表中括号的内容

标签 java regex coldfusion

我正在尝试寻找一个正则表达式(Coldfusion 或 Java),它可以准确无误地获取每个(param\d+)括号之间的内容。我尝试了数十种不同类型的正则表达式,我得到的最接近的是这个:

\(param \d+\) = \[(type='[^']*', class='[^']*', value='(?:[^']|'')*', sqltype='[^']*')\]

如果我从 CF 返回的字符串从 value 参数中转义了单引号,那就完美了。但事实并非如此,所以它惨遭失败。像这样走消极前瞻的路线:

\[(type='[^']*', class='[^']*', value='(?:(?!', sqltype).)*', sqltype='[^']*')\]

很好,除非出于某种不自然的原因,有一段代码的值实际上包含 , sqltype 。我发现很难相信我不能简单地告诉正则表达式挖出它找到的每个开括号和闭括号的内容,但话又说回来,我不知道足够的正则表达式来了解它的局限性。

这是我尝试解析的示例字符串:

(param 1) = [type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer'] , (param 2) = [type='IN', class='java.lang.String', value='asf , O'Reilly, really?', sqltype='cf_sql_varchar'] , (param 3) = [type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , []can break it ', sqltype= ', sqltype='cf_sql_varchar']

对于好奇的人来说,这是 Copyable Coldfusion SQL Exception 的子问题.

编辑

这是我在 CF9.1 中实现 @Mena 的答案的尝试。遗憾的是它没有完成字符串的处理。我必须将 \\ 替换为 \ 只是为了让它首先运行,但我的实现可能仍然有问题。

这是给定的字符串(管道仅表示边界):

| (param 1) = [type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer'] , (param 2) = [type='IN', class='java.lang.String', value='asf , O'Reilly], really?', sqltype='cf_sql_varchar'] , (param 3) = [type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , []can break it ', sqltype ', sqltype='cf_sql_varchar'] | 

这是我的实现:

    <cfset var outerPat = createObject("java","java.util.regex.Pattern").compile(javaCast("string", "\((.+?)\)\s?\=\s?\[(.+?)\](\s?,|$)"))>
    <cfset var innerPat = createObject("java","java.util.regex.Pattern").compile(javaCast("string", "(.+?)\s?\=\s?'(.+?)'\s?,\s?"))>
    <cfset var outerMatcher = outerPat.matcher(javaCast("string", arguments.params))>

    <cfdump var="Start"><br />
    <cfloop condition="outerMatcher.find()">     
        <cfdump var="#outerMatcher.group(1)#"> (<cfdump var="#outerMatcher.group(2)#">)<br />
        <cfset var innerMatcher = innerPat.matcher(javaCast("string", outerMatcher.group(2)))>
        <cfloop condition="innerMatcher.find()">
            <cfoutput>|__</cfoutput><cfdump var="#innerMatcher.group(1)#"> --> <cfdump var="#innerMatcher.group(2)#"><br />
        </cfloop>
        <br />
    </cfloop>
    <cfabort>

这就是打印的内容:

Start 
param 1 ( type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer' )
|__ type --> IN 
|__ class --> java.lang.Integer 
|__ value --> 47 

param 2 ( type='IN', class='java.lang.String', value='asf , O'Reilly )
|__ type --> IN 
|__ class --> java.lang.String 

End

最佳答案

这是适用于您的示例输入的 Java 正则表达式模式。

(?x)

# lookbehind to check for start of string or previous param
# java lookbehinds must have max length, so limits sqltype
(?<=^|sqltype='cf_sql_[a-z]{1,16}']\ ,\ )

# capture the full string for replacing in the orig sql
# and just the position to verify against the match position
(\(param\ (\d+)\))

\ =\ \[

# type and class wont contain quotes
   type='([^']++)'
,\ class='([^']++)'

# match any non-quote, then lazily keep going
,\ value='([^']++.*?)'

# sqltype is always alphanumeric
,\ sqltype='cf_sql_[a-z]+'

\]

# lookahead to check for end of string or next param
(?=$|\ ,\ \(param\ \d+\)\ =\ \[)

((?x) 标志用于注释模式,它会忽略散列和行尾之间的未转义空格。)

这是在 CFML 中实现的模式(在 CF9,0,1,274733 上测试)。它使用 cfRegex (一个库,可以更轻松地在 CFML 中使用 Java 正则表达式)来获取该模式的结果,然后进行一些检查以确保找到预期数量的参数。

<cfsavecontent variable="Input">
(param 1) = [type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer']
 , (param 2) = [type='IN', class='java.lang.String', value='asf , O'Reilly, really?', sqltype='cf_sql_varchar']
 , (param 3) = [type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , []can break it ', sqltype= ', sqltype='cf_sql_varchar']
</cfsavecontent>
<cfset Input = trim(Input).replaceall('\n','')>

<cfset cfcatch = 
    { params = input
    , sql = 'SELECT stuff FROM wherever WHERE (param 3) is last param'
    }/>

<cfsavecontent variable="ParamRx">(?x)

    # lookbehind to check for start or previous param
    # java lookbehinds must have max length, so limits sqltype
    (?<=^|sqltype='cf_sql_[a-z]{1,16}']\ ,\ )

    # capture the full string for replacing in the orig sql
    # and just the position to verify against the match position
    (\(param\ (\d+)\))

    \ =\ \[

    # type and class wont contain quotes
       type='([^']++)'
    ,\ class='([^']++)'

    # match any non-quote, then lazily keep going if needed
    ,\ value='([^']++.*?)'

    # sqltype is always alphanumeric
    ,\ sqltype='cf_sql_[a-z]+'

    \]

    # lookahead to check for end or next param
    (?=$|\ ,\ \(param\ \d+\)\ =\ \[)

</cfsavecontent>

<cfset FoundParams = new Regex(ParamRx).match
    ( text = cfcatch.params
    , returntype = 'full'
    )/>

<cfset LastParamPos = cfcatch.sql.lastIndexOf('(param ') + 7 />
<cfset LastParam = ListFirst( Mid(cfcatch.sql,LastParamPos,3) , ')' ) />

<cfif LastParam NEQ ArrayLen(FoundParams) >
    <cfset ProblemsDetected = true />
<cfelse>
    <cfset ProblemsDetected = false />

    <cfloop index="i" from=1 to=#ArrayLen(FoundParams)# >

        <cfif i NEQ FoundParams[i].Groups[2] >
            <cfset ProblemsDetected = true />
        </cfif>

    </cfloop>
</cfif>

<cfif ProblemsDetected>
    <big>Something went wrong!</big>
<cfelse>
    <big>All seems fine</big>
</cfif>

<cfdump var=#FoundParams# />

如果您将整个参数嵌入到另一个参数的值中,这实际上会起作用。如果您尝试两次(或更多),它就会失败,但至少检查应该检测到此失败。

转储输出应如下所示:

dump output

希望这里的一切都有意义 - 如有任何问题请告诉我。

关于java - 使用正则表达式获取值列表中括号的内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18152435/

相关文章:

java - 我可以在命令提示符中查看 JVM 特定配置吗?

php - 如何从字符串开头匹配并捕获所有 ORed 正则表达式

javascript - unicode 标志不适用于 Javascript 中的 RegEx

JavaScript 正则表达式 : Match additionals based on path delimiter

冷聚变 : terminate CFTHREAD in separate request

coldfusion - 我可以在运行时指定动态数据源吗?

java - 在 ubuntu 服务器上启动 spring boot jar 的问题。启动 ApplicationContext 时出错

java - 如何根据不同的下拉选择更改下拉值

java - 生成并显示 N 个 0 到 1 之间的随机数的函数

coldfusion - 使用 ColdFusion 的 SAML 服务提供商