java - 正则表达式中的通配符仅在停用词之前是贪婪的

我正在尝试构建一个“简单”的正则表达式(在 Java 中)来匹配如下句子:

I want to cook something
I want to cook something with chicken and cheese
I want to cook something with chicken but without onions
I want to cook something without onions but with chicken and cheese
I want to cook something with candy but without nuts within 30 minutes

在最好的情况下，它也应该匹配: 我想在 30 分钟内做一些有糖果但没有坚果的东西

在这些示例中，我想捕获“包含”成分、“排除”成分和 cooking 过程的最长“持续时间”。正如您所看到的，这 3 个捕获组中的每一个在模式中都是可选的，每个都以一个特定的单词开头(with，(but)？without，within)并且组应该使用通配符匹配直到找到下一个特定关键字.此外，这些成分可以包含多个单词，因此在第二个/第三个示例中，“chicken and cheese”应该与命名的捕获组“included”相匹配。

在最好的情况下，我想写一个类似于这个的模式:

I want to cook something ((with (?<include>.+))|((but )?without (?<exclude>.+))|(within (?<duration>.+) minutes))*

显然这不起作用，因为这些通配符也可以与关键字匹配，因此在第一个关键字匹配后，其他所有内容(包括更多关键字)都将与相应命名捕获组的贪婪通配符匹配。

我尝试使用前瞻，例如这样的事情:

something ((with (?<IncludedIngredients>.*(?=but)))|(but )?without (?<ExcludedIngredients>.+))+

该正则表达式识别 something with chicken but without onions 但不匹配 something with chicken。

是否有一个简单的解决方案可以在正则表达式中执行此操作？

附言“简单”的解决方案意味着我不必在一个句子中指定这些关键字的所有可能组合，也不必按照每个组合中使用的关键字数量对它们进行排序。

最佳答案

它可能可以归结为以下结构。

https://regex101.com/r/RHfGnb/1

展开

 (?m)
 ^ I [ ] want [ ] to [ ] cook [ ] something
 (?= [ ] | $ )
 (?<Order>                      # (1 start)
      (?:
           (?<with>                      # (2 start)
                \b
                (?: but [ ] )?
                with [ ]
                (?:
                     (?!
                          (?:
                               \b
                               (?: but [ ] )?
                               with
                               (?: in | out )?
                               \b
                          )
                     )
                     .
                )*
           )                             # (2 end)
        |  (?<without>                   # (3 start)
                \b
                (?: but [ ] )?
                without [ ]
                (?:
                     (?!
                          (?:
                               \b
                               (?: but [ ] )?
                               with
                               (?: in | out )?
                               \b
                          )
                     )
                     .
                )*
           )                             # (3 end)
        |  (?<time>                      # (4 start)
                \b within [ ]
                (?<duration> .+ )             # (5)
                [ ] minutes [ ]? 
           )                             # (4 end)
        |  (?<unknown>                   # (6 start)
                (?:
                     (?!
                          (?:
                               \b
                               (?: but [ ] )?
                               with
                               (?: in | out )?
                               \b
                          )
                     )
                     .
                )+
           )                             # (6 end)
      )*
 )                             # (1 end)
 $

关于java - 正则表达式中的通配符仅在停用词之前是贪婪的，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58846410/

java - 正则表达式中的通配符仅在停用词之前是贪婪的

上一篇：google-drive-api - google drive javascript api gapi.client.drive.files未定义

下一篇：java - 想不出移动数组的方法