我有一个由下面的代码生成的日志文件模式。
2019-01-30 08:34:46.463 -0800 INFO [626] - Program Ended: xxxx::xxxxxxx::xxxxxxxx::xxxxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx] [linear_national] pid 626 user dexter after 00:26:15
2019-01-30 08:37:04.207 -0800 INFO [8749] - Program Ended: xxxxx::xxxxxx::xxxxxx::xxxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxxxxxxxxxxxxxxxxxxx] [xxxxxxxxxxxxxxxx] pid 8749 user dexter after 00:01:33
2019-01-30 08:39:55.117 -0800 INFO [31467] - Program Ended: xxx::xxxxxx::xxxxxxxxx::xxxxxxxxxxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx] [linear_national] pid 31467 user dexter after 00:02:20
2019-01-30 08:45:09.752 -0800 INFO [32104] - Program Ended: RTK::xxxxxxx::xxxxxxxx::xxxxxxxxxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxxxxxxxxxxxxxx] [xxxxxxxxxxxxxxxxxxxx] pid 32104 user dexter after 00:04:33
2019-01-30 08:46:20.511 -0800 INFO [15031] - Program Ended: xxx::xxxxxxxx::xxxxxxxx::xxxxxxxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxxxxxxxxxxxx] [xxxxxxxxxxxxxxxx] pid 15031 user dexter after 00:00:45
2019-01-30 08:48:08.232 -0800 INFO [15224] - Program Ended: RTK::xxxxxxx::xxxxxx::xxxxxxxxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxxx] [linear_national] pid 15224 user dexter after 00:01:33
2019-01-30 08:50:52.541 -0800 INFO [15539] - Program Ended: RTK::xxxxxx::xxxxxxx::xxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxx.xxxxxxxxxxxxxxxxxx] [linear_national] pid 15539 user dexter after 00:02:16
2019-01-30 08:58:05.386 -0800 INFO [16168] - Program Ended: xxx:xxxxx::xxxxxxxxx::xxxxxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxxxxxxxxxxxxxx] [linear_national] pid 16168 user dexter after 00:06:29
2019-01-30 09:06:52.701 -0800 INFO [20374] - Program Ended: xxx::xxxxxx::xxxxxxxx::xxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxx] [xxxxxxxxx] pid 20374 user dexter after 00:08:16
我想从每一行获取所有时间戳值,然后使用下面相同的代码取出总和和平均值,即必须对模式做一些额外的事情。
我应该使用什么模式来以这种方式解析文件以及如何计算整个文件?
src_dict = ("/xxx/home/dexter/work/xxxxx/xxxxx/logs")
pattern = re.compile ('(.*)for exports(.*)')
for passed_files in os.listdir(src_dict):
files = os.path.join(src_dict, passed_files)
strng = open(files)
for lines in strng.readlines():
if re.search(pattern, lines):
print lines
最佳答案
一个选择是只拆分并获取每行的最后一部分(我认为其中包含您所在的持续时间)。
合并到您已有的脚本中:
import datetime
dir_path = "/xxx/home/dexter/work/xxxxx/xxxxx/logs"
pattern = re.compile ('(.*)for exports(.*)')
n = 0
sum_seconds = 0
for filename in os.listdir(dir_path):
with open(os.path.join(dir_path, filename)) as f:
for line in file:
if re.search(pattern, line):
print(line)
# remove newline at end, split by spaces
parts = line.strip().split()
if len(parts) > 0:
n += 1
# this should be a string in the format 'hh:mm:ss'
duration_str = parts[-1]
print(duration_str)
h, m, s = duration_str.split(':')
sum_seconds += (int(h) * 3600 + int(m) * 60 + int(s))
print('Total (in seconds):', sum_seconds)
print('Total (formated as hh:mm:ss):', str(datetime.timedelta(seconds=sum_seconds)))
if n > 0:
avg_seconds = round(sum_seconds / n)
print('Avg (in seconds):', avg_seconds)
print('Avg (formated as hh:mm:ss):', str(datetime.timedelta(seconds=avg_seconds)))
<小时/>
您还可以解析持续时间字符串并创建 datetime.timedelta
对象,但我认为对于这个简单的情况来说这不是必需的。
关于python - 从文件的行尾获取所有时间戳值,并对它们进行总计和平均操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54630391/