在摸不着头脑并进行了大量谷歌搜索之后,我似乎无法解决这个问题。
我有这个示例字符串:
test = "true sales are expected to be between 50% and 60% higher than those reported for the previous corresponding year. the main reason is blah blah. the fake sales are expected to be in the region of between 25% and 35% lower."
我正在尝试确定“真实”销售额是高还是低。使用 R
和“stringr”库,我尝试如下:
test = "true sales are expected to be between 50% and 60% higher than those reported for the previous corresponding year. the main reason is blah blah. the fake sales are expected to be in the region of between 25% and 35% lower."
positive.regex = "(sales).*?[0-9]{1,3}% higher"
negative.regex = "(sales).*?[0-9]{1,3}% lower"
产生以下结果:
str_extract(test,positive.regex) [1] "sales are expected to be between 50% and 60% higher" str_extract(test,negative.regex) [1] "sales are expected to be between 50% and 60% higher than those reported for the previous corresponding year. the main reason is blah blah. the fake sales are expected to be in the region of between 25% and 35% lower"
我正在尝试找到一种方法来限制 (sales) 和 '% higher'
或 '% lower'
之间匹配的单词数,以便负正则表达式将不匹配。即我知道我需要替换 '.*?'匹配整个单词而不是字符,并将这些单词的数量限制为 3-5 个,我该怎么做?
最佳答案
您必须确保单词 higher
或 lower
不会出现在正则表达式的 .*?
部分。一种方法是使用负数 lookahead assertion :
positive.regex = "sales(?:(?!higher|lower).)*[0-9]{1,3}% higher"
negative.regex = "sales(?:(?!higher|lower).)*[0-9]{1,3}% lower"
解释:
(?: # Match...
(?! # (unless we're at the start of the word
higher # "higher"
| # or
lower # "lower"
) # )
. # any character
)* # Repeat any number of times.
关于regex - 如何匹配正则表达式中的 n 个单词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9158402/