我知道几乎每个正则表达式问题都必须被提出并回答,但我在这里:
我想要一个正则表达式来匹配:
"alcohol abuse"
"etoh abuse"
"alcohol dependence"
"etoh dependence"
但不匹配
"denies alcohol dependence"
"denies smoking and etoh dependence"
"denies [anything at all] and etoh abuse"
负面的lookbehind是显而易见的,但如何不匹配最后两个示例?
到目前为止,我的正则表达式如下所示:
re.compile("(?<!denies\s)(alcohol|etoh)\s*(abuse|dependence)")
我不能在负向后查找中包含贪婪运算符,因为该操作仅适用于要评估的固定长度字符串。
我更愿意一步完成此操作,因为它提供给接受一个正则表达式作为参数的函数。
谢谢指点
最佳答案
如果您无法安装任何模块,您可以重新编写表达式并检查组 1 是否为空:
import re
rx = re.compile("(denies)?.*?(alcohol|etoh)\s*(abuse|dependence)")
sentences = ["alcohol abuse", "etoh abuse", "alcohol dependence", "etoh dependence",
"denies alcohol dependence", "denies smoking and etoh dependence", "denies [anything at all] and etoh abuse"]
def filterSentences(input):
m = rx.search(input)
if m and m.group(1) is None:
print("Yup: " + sent)
for sent in sentences:
filterSentences(sent)
这会产生
Yup: alcohol abuse
Yup: etoh abuse
Yup: alcohol dependence
Yup: etoh dependence
如果您有多个否认
(即不喜欢...
),只需更改第一个标题组即可。
关于python - 正则表达式忽略负向后查找和匹配之间的所有内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54409587/