python - Lookahead 捕获不需要的字符

标签 python regex

我正在 try catch 来自防火墙的警报名称。每个日志都有以下格式:

datetime alertname severity_level username endpoint_name domain

我当前使用的正则表达式适用于除第三个日志之外的所有日志。关于如何修复它有什么想法吗?

regex = []

text = """2023-05-27 / 23:06:31 Computer account added/changed/deleted. medium ANONYMOUS LOGON PC-CR5$ SRVDC2 ACME 1
2023-05-27 / 23:28:08 Computer account added/changed/deleted. medium ANONYMOUS LOGON SRVXAP02$ SRVDC2 ACME 1
2023-05-28 / 02:24:29 User account locked out multiple login errors high SRVDC2$ john.smith.admin SRVDC2 \\\\NECBROWSER 1
2023-05-28 / 05:01:48 Computer account added/changed/deleted. medium ANONYMOUS LOGON SRVNPS01$ SRVDC1 ACME 1
2023-05-28 / 06:38:57 Computer account added/changed/deleted. medium ANONYMOUS LOGON VD-OPERATOR1$ SRVDC1 ACME 1"""

pattern = '(?:(?<=\d{2}:\d{2}:\d{2}))(.*)(?=\.)|(?=medium )|(?=high )|(?=low )|(?=critical )'
regex.append(re.findall(pattern,text,re.MULTILINE))
print(regex)

电流输出

[[' Computer account added/changed/deleted', '', ' Computer account added/changed/deleted', '', ' User account locked out multiple login errors high SRVDC2$ john.smith', ' Computer account added/changed/deleted', '', ' Computer account added/changed/deleted', '']]

预期输出

[[' Computer account added/changed/deleted', '', ' Computer account added/changed/deleted', '', ' User account locked out multiple login errors', ' Computer account added/changed/deleted', '', ' Computer account added/changed/deleted', '']]

最佳答案

你可以使用

\d{2}:\d{2}:\d{2}\s+
(.*?)
\s(?:medium|high|low|critical)

参见a demo on regex101.com .

与您最初的尝试相反,此尝试使用非捕获组(后向查找“昂贵”!)和随后的惰性量词构造。只需使用第一个捕获组即可。

Python中这可能是

import re

text = """2023-05-27 / 23:06:31 Computer account added/changed/deleted. medium ANONYMOUS LOGON PC-CR5$ SRVDC2 ACME 1
2023-05-27 / 23:28:08 Computer account added/changed/deleted. medium ANONYMOUS LOGON SRVXAP02$ SRVDC2 ACME 1
2023-05-28 / 02:24:29 User account locked out multiple login errors high SRVDC2$ john.smith.admin SRVDC2 \\\\NECBROWSER 1
2023-05-28 / 05:01:48 Computer account added/changed/deleted. medium ANONYMOUS LOGON SRVNPS01$ SRVDC1 ACME 1
2023-05-28 / 06:38:57 Computer account added/changed/deleted. medium ANONYMOUS LOGON VD-OPERATOR1$ SRVDC1 ACME 1"""

pattern = re.compile(r'''
    \d{2}:\d{2}:\d{2}\s+
    (.*?)
    \s(?:medium|high|low|critical)

''', re.VERBOSE)

messages = [match.group(1) for match in pattern.finditer(text)]
print(messages)

并且会产生

['Computer account added/changed/deleted.', 'Computer account added/changed/deleted.', 'User account locked out multiple login errors', 'Computer account added/changed/deleted.', 'Computer account added/changed/deleted.']

参见a demo on ideone.com .

关于python - Lookahead 捕获不需要的字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76350282/

相关文章:

python - 从标准 Python 调用 IronPython

python - 刷新后Bokeh实时更新x_axis

python - Pandas:将记录 reshape 为列

python - 分割一个大的 pandas 数据框

正则表达式仅匹配货币数字

java - 为什么这个模式匹配代码不起作用?

python - Python list.pop() 的 lua 等价物是什么?

java - 在Java中分割字符串时标记会粘住吗?

Javascript 前瞻问题!

javascript - js(jquery) 中的正则表达式反向 googlemaps url