我有一个 .txt 文件包含以下格式的请求日志:
time_namelookup: 0,121668
time_connect: 0,460643
time_pretransfer: 0,460755
time_redirect: 0,000000
time_starttransfer: 0,811697
time_total: 0,811813
-------------
time_namelookup: 0,121665
time_connect: 0,460643
time_pretransfer: 0,460355
time_redirect: 0,000000
time_starttransfer: 0,813697
time_total: 0,811853
-------------
time_namelookup: 0,121558
time_connect: 0,463243
time_pretransfer: 0,460755
time_redirect: 0,000000
time_starttransfer: 0,911697
time_total: 0,811413
我想为每个类别创建一个值列表,所以我认为正则表达式在这种情况下可能相关。
import re
'''
In this exmaple, I save only the 'time_namelookup' parameter
The same logic adapted for other parameters.
'''
namelookup = []
with open('shaghai_if_config_test.txt', 'r') as fh:
for line in fh.readlines():
number_match = re.match('([+-]?([0-9]*[,])?[0-9]+)',line)
namelookup_match = re.match('^time_namelookup:', line)
if namelookup_match and number_match:
num = number_match.group(0)
namelookup.append(num)
continue
我发现这个逻辑非常复杂,因为我必须执行两个正则表达式匹配。此外,number_match
参数与行不匹配,而 ^time_namelookup: ([+-]?([0-9]*[,])?[0-9]+ )
工作正常
我正在为所描述的程序寻找有经验的建议。任何建议表示赞赏。
最佳答案
我的猜测是你设计了一个很好的表达式,我们可能会稍微修改一下:
(time_(?:namelookup|connect|pretransfer|redirect|starttransfer|total))\s*:\s*([+-]?(?:\d*,)?\d+)
用re.findall
测试:
import re
regex = r"(time_(?:namelookup|connect|pretransfer|redirect|starttransfer|total))\s*:\s*([+-]?(?:\d*,)?\d+)"
test_str = ("time_namelookup: 0,121668 \n"
"time_connect: 0,460643 \n"
"time_pretransfer: 0,460755 \n"
"time_redirect: 0,000000 \n"
"time_starttransfer: 0,811697 \n"
"time_total: 0,811813 \n")
print(re.findall(regex, test_str))
输出
[('time_namelookup', '0,121668'), ('time_connect', '0,460643'), ('time_pretransfer', '0,460755'), ('time_redirect', '0,000000'), ('time_starttransfer', '0,811697'), ('time_total', '0,811813')]
用re.finditer
测试:
import re
regex = r"(time_(?:namelookup|connect|pretransfer|redirect|starttransfer|total))\s*:\s*([+-]?(?:\d*,)?\d+)"
test_str = ("time_namelookup: 0,121668 \n"
"time_connect: 0,460643 \n"
"time_pretransfer: 0,460755 \n"
"time_redirect: 0,000000 \n"
"time_starttransfer: 0,811697 \n"
"time_total: 0,811813 \n"
"-------------\n"
"time_namelookup: 0,121665 \n"
"time_connect: 0,460643 \n"
"time_pretransfer: 0,460355 \n"
"time_redirect: 0,000000 \n"
"time_starttransfer: 0,813697 \n"
"time_total: 0,811853 \n"
"-------------\n"
"time_namelookup: 0,121558 \n"
"time_connect: 0,463243 \n"
"time_pretransfer: 0,460755 \n"
"time_redirect: 0,000000 \n"
"time_starttransfer: 0,911697 \n"
"time_total: 0,811413 ")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
表达式在 this demo 的右上面板中进行了解释如果您想探索/简化/修改它。
正则表达式电路
jex.im可视化正则表达式:
关于python - 使用正则表达式读取日志,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57024354/