Python 正则表达式负向后查找匹配，不带固定宽度

我想找到更好的方法来获得结果。当且仅当它前面没有非固定宽度的lookbehind项时，我使用正则表达式模式来匹配(DD+一些文本DDDD一些其他文本)形式的所有文本。如何将这些术语包含在我的 REGEX 模式 中？

aa = pd.DataFrame({"test": ["45 python 00222 sometext",
                            "python white 45 regex 00 222 somewhere",
                            "php noise 45 python 65000 sm",
                            "otherword 45 python 50000 sm"]})
pattern = re.compile("(((\d+)\s?([^\W\d_]+)\s?)?(\d{2}\s?\d{3})\s?(\w.+))")
aa["result"] = aa["test"].apply(lambda x: pattern.search(x)[0] if pattern.search(x) else None)
lookbehind = ['python', 'php']
aa.apply(lambda x: "" if any(look in x["test"].replace(x["result"], "") for look in lookbehind) else x["result"], axis=1)

输出是我所期望的

0    45 python 00222 sometext
1                            
2                            
3          45 python 50000 sm

最佳答案

您可以使用一种技巧，在预期匹配之前捕获 php 或 python，如果该组不为空(如果匹配)，则丢弃当前的匹配，否则匹配有效。

查看

pattern = re.compile(r"(?:(php|python).*?)?((?:\d+\s?[^\W\d_]+\s?)?\d{2}\s?\d{3}\s?\w.+)")

该模式包含 2 个捕获组:

(?:(php|python).*?)? - 最后一个 ? 使该组可选，它匹配并捕获到组 1 php 或 python，然后 0+ 个字符，尽可能少
((?:\d+\s?[^\W\d_]+\s?)?\d{2}\s?\d{3}\s?\w.+) - 这是第 2 组，基本上是您的模式，没有冗余组。

如果第 1 组匹配，我们需要返回空结果，否则返回第 2 组值:

def callback(v):
    m = pattern.search(v)
    if m and not m.group(1):
        return m.group(2)
    return ""

aa["test"].apply(lambda x: callback(x))

结果:

0    45 python 00222 sometext
1                            
2                            
3          45 python 50000 sm

关于Python 正则表达式负向后查找匹配，不带固定宽度，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52724641/

Python 正则表达式负向后查找匹配，不带固定宽度

上一篇：python - 字典中的范围函数

下一篇：python - AppIndicator3.指示器: is there a way to hide/show it at runtime?

Python 正则表达式 负向后查找匹配，不带固定宽度

上一篇：python - 字典中的范围函数

下一篇：python - AppIndicator3.指示器: is there a way to hide/show it at runtime?

Python 正则表达式负向后查找匹配，不带固定宽度