regex - 如何匹配正则表达式中的 n 个单词?

标签 regex r

在摸不着头脑并进行了大量谷歌搜索之后,我似乎无法解决这个问题。

我有这个示例字符串:

test = "true sales are expected to be between 50% and 60% higher than those reported for the previous corresponding year. the main reason is blah blah. the fake sales are expected to be in the region of between 25% and 35% lower."

我正在尝试确定“真实”销售额是高还是低。使用 R 和“stringr”库,我尝试如下:

test = "true sales are expected to be between 50% and 60% higher than those reported for the previous corresponding year. the main reason is blah blah. the fake sales are expected to be in the region of between 25% and 35% lower."
positive.regex = "(sales).*?[0-9]{1,3}% higher"
negative.regex = "(sales).*?[0-9]{1,3}% lower"

产生以下结果:

str_extract(test,positive.regex) [1] "sales are expected to be between 50% and 60% higher" str_extract(test,negative.regex) [1] "sales are expected to be between 50% and 60% higher than those reported for the previous corresponding year. the main reason is blah blah. the fake sales are expected to be in the region of between 25% and 35% lower"

我正在尝试找到一种方法来限制 (sales) 和 '% higher''% lower' 之间匹配的单词数,以便负正则表达式将不匹配。即我知道我需要替换 '.*?'匹配整个单词而不是字符,并将这些单词的数量限制为 3-5 个,我该怎么做?

最佳答案

您必须确保单词 higherlower 不会出现在正则表达式的 .*? 部分。一种方法是使用负数 lookahead assertion :

positive.regex = "sales(?:(?!higher|lower).)*[0-9]{1,3}% higher"
negative.regex = "sales(?:(?!higher|lower).)*[0-9]{1,3}% lower"

解释:

(?:      # Match...
 (?!     #  (unless we're at the start of the word
  higher #   "higher"
 |       #   or
  lower  #   "lower"
 )       #  )
 .       # any character
)*       # Repeat any number of times.

关于regex - 如何匹配正则表达式中的 n 个单词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9158402/

相关文章:

java - 十六进制数字的正则表达式

regex - 需要有关使用非捕获组的 Grok 模式的信息 (? : )

r - 在 R 和 ggplot2 中绘制正态分布的垂直密度

r - R Shiny 的页面刷新按钮

R:使用特定的构造函数创建新的类后代到 "data.frame"

python - Telnetlib、字节串和 "invalid escape sequence"

java - 正则表达式 java : how to replace a string in a generic between the start tag and end tag of a generic

r - 如何在 R 中组合多个 .csv 文件?

r - 基于存储在独立有序向量中的值对对数据帧进行子集

javascript - 正则表达式从字符串中检索数字