python - 使用正则表达式读取日志

标签 python regex logging

我有一个 .txt 文件包含以下格式的请求日志:

time_namelookup: 0,121668 
time_connect: 0,460643 
time_pretransfer: 0,460755 
time_redirect: 0,000000 
time_starttransfer: 0,811697 
time_total: 0,811813 
-------------
time_namelookup: 0,121665 
time_connect: 0,460643 
time_pretransfer: 0,460355 
time_redirect: 0,000000 
time_starttransfer: 0,813697 
time_total: 0,811853 
-------------
time_namelookup: 0,121558 
time_connect: 0,463243 
time_pretransfer: 0,460755 
time_redirect: 0,000000 
time_starttransfer: 0,911697 
time_total: 0,811413 

我想为每个类别创建一个值列表,所以我认为正则表达式在这种情况下可能相关。

import re

'''
In this exmaple, I save only the 'time_namelookup' parameter
The same logic adapted for other parameters.
'''

namelookup = []
with open('shaghai_if_config_test.txt', 'r') as fh:
     for line in fh.readlines():
         number_match = re.match('([+-]?([0-9]*[,])?[0-9]+)',line)
         namelookup_match = re.match('^time_namelookup:', line)
         if namelookup_match and number_match:
             num = number_match.group(0)
             namelookup.append(num)
             continue

我发现这个逻辑非常复杂,因为我必须执行两个正则表达式匹配。此外,number_match 参数与行不匹配,而 ^time_namelookup: ([+-]?([0-9]*[,])?[0-9]+ ) 工作正常

我正在为所描述的程序寻找有经验的建议。任何建议表示赞赏。

最佳答案

我的猜测是你设计了一个很好的表达式,我们可能会稍微修改一下:

(time_(?:namelookup|connect|pretransfer|redirect|starttransfer|total))\s*:\s*([+-]?(?:\d*,)?\d+)

re.findall测试:

import re

regex = r"(time_(?:namelookup|connect|pretransfer|redirect|starttransfer|total))\s*:\s*([+-]?(?:\d*,)?\d+)"

test_str = ("time_namelookup: 0,121668 \n"
    "time_connect: 0,460643 \n"
    "time_pretransfer: 0,460755 \n"
    "time_redirect: 0,000000 \n"
    "time_starttransfer: 0,811697 \n"
    "time_total: 0,811813 \n")

print(re.findall(regex, test_str))

输出

[('time_namelookup', '0,121668'), ('time_connect', '0,460643'), ('time_pretransfer', '0,460755'), ('time_redirect', '0,000000'), ('time_starttransfer', '0,811697'), ('time_total', '0,811813')]

re.finditer测试:

import re

regex = r"(time_(?:namelookup|connect|pretransfer|redirect|starttransfer|total))\s*:\s*([+-]?(?:\d*,)?\d+)"

test_str = ("time_namelookup: 0,121668 \n"
    "time_connect: 0,460643 \n"
    "time_pretransfer: 0,460755 \n"
    "time_redirect: 0,000000 \n"
    "time_starttransfer: 0,811697 \n"
    "time_total: 0,811813 \n"
    "-------------\n"
    "time_namelookup: 0,121665 \n"
    "time_connect: 0,460643 \n"
    "time_pretransfer: 0,460355 \n"
    "time_redirect: 0,000000 \n"
    "time_starttransfer: 0,813697 \n"
    "time_total: 0,811853 \n"
    "-------------\n"
    "time_namelookup: 0,121558 \n"
    "time_connect: 0,463243 \n"
    "time_pretransfer: 0,460755 \n"
    "time_redirect: 0,000000 \n"
    "time_starttransfer: 0,911697 \n"
    "time_total: 0,811413 ")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

表达式在 this demo 的右上面板中进行了解释如果您想探索/简化/修改它。

正则表达式电路

jex.im可视化正则表达式:

enter image description here

关于python - 使用正则表达式读取日志,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57024354/

相关文章:

javascript - 如何从 Javascript 字符串中删除特定的 "sequence"?

JavaScript 正则表达式 - 空格、逗号和斜线

java - Logback 具有不同配置的多个记录器

java - 哪些版本的 java 对于 gc 日志记录很慢?

python - 在 Python 中评估数学表达式

python - 用 1D numpy 数组创建 2D

python - 创建可用值的分布 - Python

python - Matplotlib:智能图形比例/图例位置

java - 与我的正则表达式匹配时出错

java - 简单的 apache 常见记录器