python - 使用 Python 解析大型 journalctl 文件以匹配关键字的有效方法

解析 journelctl 文件时，要查找的关键字是:error、boot、warning、traceback

一旦遇到关键字，我需要为每个关键字增加计数器并打印匹配行。

所以，我试过如下；从文件中读取它并使用 Collections 模块 - Counter 对象与 re.findall 一起跟踪计数:

import re
from collections import Counter

keywords = [" error ", " boot ", " warning ", " traceback "]

def journal_parser():
    for keyword in keywords:
        print(keyword)  # just for debugging
        word = re.findall(keyword, open("/tmp/journal_slice.log").read().lower())
        count = dict(Counter(word))
        print(count)

以上解决方案解决了我的问题，但我期待更有效的方法。

请指教。

最佳答案

这里有一个更有效的方法:

def journal_parser(context):
    with open("/tmp/journal_slice.log") as f:
        data = f.read()
        words = re.findall(r"|".join(keywords), data, re.I) # case insensitive matching by passing the re.I flag (ignore case)
        count = dict(Counter(words))
        print(count)

关于python - 使用 Python 解析大型 journalctl 文件以匹配关键字的有效方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49772156/

上一篇：c - 如何在保留虚拟地址范围的同时释放内存？

下一篇：java - 如何从 shell 脚本覆盖 Log4j 值？

linux - 安装 .deb 时运行什么代码？

python-3.x - Python日志模块 - 自上次日志以来的时间

python - 根据另一列获取列值，其中包含 pandas 数据框中的字符串列表

python - AttributeError : cffi library '(pyModulesPath)\_soundfile_data\libsndfile64bit.dll' has no function, 常量或名为 'sf_wchar_open' 的全局变量

python - 如何获得神经网络的ROC曲线？

linux - 在 .ebextension 中检查 AWS EC2 上的服务

python - pickle python 的问题

python - 是否可以在 python 中使用 curses 获取默认背景颜色？

python - TypeError: 'list' 不支持缓冲区接口(interface)