regex - 如何匹配正则表达式中的 n 个单词？

在摸不着头脑并进行了大量谷歌搜索之后，我似乎无法解决这个问题。

我有这个示例字符串:

test = "true sales are expected to be between 50% and 60% higher than those reported for the previous corresponding year. the main reason is blah blah. the fake sales are expected to be in the region of between 25% and 35% lower."

我正在尝试确定“真实”销售额是高还是低。使用 R 和“stringr”库，我尝试如下:

test = "true sales are expected to be between 50% and 60% higher than those reported for the previous corresponding year. the main reason is blah blah. the fake sales are expected to be in the region of between 25% and 35% lower."
positive.regex = "(sales).*?[0-9]{1,3}% higher"
negative.regex = "(sales).*?[0-9]{1,3}% lower"

产生以下结果:

str_extract(test,positive.regex) [1] "sales are expected to be between 50% and 60% higher" str_extract(test,negative.regex) [1] "sales are expected to be between 50% and 60% higher than those reported for the previous corresponding year. the main reason is blah blah. the fake sales are expected to be in the region of between 25% and 35% lower"

我正在尝试找到一种方法来限制 (sales) 和 '% higher' 或 '% lower' 之间匹配的单词数，以便负正则表达式将不匹配。即我知道我需要替换 '.*?'匹配整个单词而不是字符，并将这些单词的数量限制为 3-5 个，我该怎么做？

最佳答案

您必须确保单词 higher 或 lower 不会出现在正则表达式的 .*? 部分。一种方法是使用负数 lookahead assertion :

positive.regex = "sales(?:(?!higher|lower).)*[0-9]{1,3}% higher"
negative.regex = "sales(?:(?!higher|lower).)*[0-9]{1,3}% lower"

解释:

(?:      # Match...
 (?!     #  (unless we're at the start of the word
  higher #   "higher"
 |       #   or
  lower  #   "lower"
 )       #  )
 .       # any character
)*       # Repeat any number of times.

关于regex - 如何匹配正则表达式中的 n 个单词？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/9158402/

regex - 如何匹配正则表达式中的 n 个单词？

上一篇：.htaccess 重写规则

下一篇：ember.js - 如何在 Ember.js 中刷新更改的值