我正在尝试将 2 个由特定单词分隔的合理长度的子句子(在示例中为“AND”)分组,其中第二个可以是可选的。 一些例子:
案例 1:
foo sentence A AND foo sentence B
应该给予
"foo sentence A" --> matching group 1
"AND" --> matching group 2 (optionally)
"foo sentence B" --> matching group 3
案例 2:
foo sentence A
应该给予
"foo sentence A" --> matching group 1
"" --> matching group 2 (optionally)
"" --> matching group 3
我尝试了以下正则表达式
(.*) (AND (.*))?$
它有效,但前提是,在 CASE2 中,我在字符串的最后位置放置了一个空格,否则模式不匹配。 如果我在圆括号组内包含“AND”之前的空格,则在情况 1 中,匹配器将整个字符串包含在第一组中。 我想知道前瞻和后视断言,但不确定它们是否能帮助我。 有什么建议吗? 谢谢
最佳答案
我会使用这个正则表达式:
^(.*?)(?: (AND) (.*))?$
解释:
The regular expression:
(?-imsx:^(.*?)(?: (AND) (.*))?$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
' '
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
AND 'AND'
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
' '
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
关于java - 将由特定单词分隔的句子分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16753986/