python - 从文件的行尾获取所有时间戳值，并对它们进行总计和平均操作

我有一个由下面的代码生成的日志文件模式。

2019-01-30 08:34:46.463 -0800 INFO [626] - Program Ended: xxxx::xxxxxxx::xxxxxxxx::xxxxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx] [linear_national] pid 626 user dexter after 00:26:15

2019-01-30 08:37:04.207 -0800 INFO [8749] - Program Ended: xxxxx::xxxxxx::xxxxxx::xxxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxxxxxxxxxxxxxxxxxxx] [xxxxxxxxxxxxxxxx] pid 8749 user dexter after 00:01:33

2019-01-30 08:39:55.117 -0800 INFO [31467] - Program Ended: xxx::xxxxxx::xxxxxxxxx::xxxxxxxxxxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx] [linear_national] pid 31467 user dexter after 00:02:20

2019-01-30 08:45:09.752 -0800 INFO [32104] - Program Ended: RTK::xxxxxxx::xxxxxxxx::xxxxxxxxxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxxxxxxxxxxxxxx] [xxxxxxxxxxxxxxxxxxxx] pid 32104 user dexter after 00:04:33

2019-01-30 08:46:20.511 -0800 INFO [15031] - Program Ended: xxx::xxxxxxxx::xxxxxxxx::xxxxxxxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxxxxxxxxxxxx] [xxxxxxxxxxxxxxxx] pid 15031 user dexter after 00:00:45

2019-01-30 08:48:08.232 -0800 INFO [15224] - Program Ended: RTK::xxxxxxx::xxxxxx::xxxxxxxxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxxx] [linear_national] pid 15224 user dexter after 00:01:33

2019-01-30 08:50:52.541 -0800 INFO [15539] - Program Ended: RTK::xxxxxx::xxxxxxx::xxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxx.xxxxxxxxxxxxxxxxxx] [linear_national] pid 15539 user dexter after 00:02:16

2019-01-30 08:58:05.386 -0800 INFO [16168] - Program Ended: xxx:xxxxx::xxxxxxxxx::xxxxxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxxxxxxxxxxxxxx] [linear_national] pid 16168 user dexter after 00:06:29

2019-01-30 09:06:52.701 -0800 INFO [20374] - Program Ended: xxx::xxxxxx::xxxxxxxx::xxxxxxxxxxxxxxxxxxxxx for exports [xxxxxxxxxxxx] [xxxxxxxxx] pid 20374 user dexter after 00:08:16

我想从每一行获取所有时间戳值，然后使用下面相同的代码取出总和和平均值，即必须对模式做一些额外的事情。

我应该使用什么模式来以这种方式解析文件以及如何计算整个文件？


src_dict = ("/xxx/home/dexter/work/xxxxx/xxxxx/logs")
pattern = re.compile ('(.*)for exports(.*)')

for passed_files in os.listdir(src_dict):
    files = os.path.join(src_dict, passed_files)
    strng = open(files)
    for lines in strng.readlines():
        if re.search(pattern, lines):
            print lines

最佳答案

一个选择是只拆分并获取每行的最后一部分(我认为其中包含您所在的持续时间)。

合并到您已有的脚本中:

import datetime

dir_path = "/xxx/home/dexter/work/xxxxx/xxxxx/logs"
pattern = re.compile ('(.*)for exports(.*)')
n = 0
sum_seconds = 0

for filename in os.listdir(dir_path):
    with open(os.path.join(dir_path, filename)) as f:
        for line in file:
            if re.search(pattern, line):
                print(line)

                # remove newline at end, split by spaces
                parts = line.strip().split()
                if len(parts) > 0:
                    n += 1

                # this should be a string in the format 'hh:mm:ss'
                duration_str = parts[-1]
                print(duration_str)
                h, m, s = duration_str.split(':')
                sum_seconds += (int(h) * 3600 + int(m) * 60 + int(s))

print('Total (in seconds):', sum_seconds)
print('Total (formated as hh:mm:ss):', str(datetime.timedelta(seconds=sum_seconds)))
if n > 0:
    avg_seconds = round(sum_seconds / n)
    print('Avg (in seconds):', avg_seconds)
    print('Avg (formated as hh:mm:ss):', str(datetime.timedelta(seconds=avg_seconds)))

<小时/>

您还可以解析持续时间字符串并创建 datetime.timedelta 对象，但我认为对于这个简单的情况来说这不是必需的。

关于python - 从文件的行尾获取所有时间戳值，并对它们进行总计和平均操作，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54630391/

python - 从文件的行尾获取所有时间戳值，并对它们进行总计和平均操作

上一篇：python - Amazon SageMaker 中的 Tensorflow 服务

下一篇：python - 如何使用 dateparser 从字符串中提取实际日期？