我正在解析 C#
项目中的 html
代码。
假设我们有这个字符串:
<a href="javascript:func('data1','data2'...)">...</a>
或者在必要的 .subtring()
之后:
func('data1','data2'...)
检索 func()
参数的最佳 Regex
模式是什么,避免依赖分隔符(' 和 ,),因为它们有时可能是参数的字符串?
最佳答案
您不应该使用正则表达式来解析编程语言代码,因为它不是常规语言。本文解释了原因:Can regular expressions be used to match nested patterns?
为了证明我的观点,请允许我分享一个带有正则表达式的实际解决方案,我认为它会匹配您想要的内容:
^ # Start of string
[^()'""]+\( # matches `func(`
#
(?> # START - Iterator (match each parameter)
(?(param)\s*,(?>\s*)) # if it's not the 1st parameter, start with a `,`
(?'param' # opens 'param' (main group, captures each parameter)
#
(?> # Group: matches every char in parameter
(?'qt'['""]) # ALTERNATIVE 1: strings (matches ""foo"",'ba\'r','g)o\'o')
(?: # match anything inside quotes
[^\\'""]+ # any char except quotes or escapes
|(?!\k'qt')['""] # or the quotes not used here (ie ""double'quotes"")
|\\. # or any escaped char
)* # repeat: *
\k'qt' # close quotes
| (?'parens'\() # ALTERNATIVE 2: `(` open nested parens (nested func)
| (?'-parens'\)) # ALTERNATIVE 3: `)` close nested parens
| (?'braces'\{) # ALTERNATIVE 4: `{` open braces
| (?'-braces'}) # ALTERNATIVE 5: `}` close braces
| [^,(){}\\'""] # ALTERNATIVE 6: anything else (var, funcName, operator, etc)
| (?(parens),) # ALTERNATIVE 7: `,` a comma if inside parens
| (?(braces),) # ALTERNATIVE 8: `,` a comma if inside braces
)* # Repeat: *
# CONDITIONS:
(?(parens)(?!)) # a. balanced parens
(?(braces)(?!)) # b. balanced braces
(?<!\s) # c. no trailing spaces
#
) # closes 'param'
)* # Repeat the whole thing once for every parameter
#
\s*\)\s*(?:;\s*)? # matches `)` at the end if func(), maybe with a `;`
$ # END
一行:
^[^()'""]+\((?>(?(param)\s*,(?>\s*))(?'param'(?>(?'qt'['""])(?:[^\\'""]+|(?!\k'qt')['""]|\\.)*\k'qt'|(?'parens'\()|(?'-parens'\))|(?'braces'\{)|(?'-braces'})|[^,(){}\\'""]|(?(parens),)|(?(braces),))*(?(parens)(?!))(?(braces)(?!))(?<!\s)))*\s*\)\s*(?:;\s*)?$
正如您现在可以想象的那样(如果您仍在阅读),即使使用缩进模式并为每个构造添加注释,这个正则表达式也是不可读的,很难维护并且几乎不可能调试......我可以猜测会有异常导致失败。
以防万一顽固的头脑仍然感兴趣,这里有一个指向其背后逻辑的链接:Matching Nested Constructs with Balancing Groups (regular-expressions.info)
关于c# - 在原始字符串中获取函数参数值的最佳 C# 正则表达式模式?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32512994/