java - 使用 Java 有选择地解析日志文件

标签 java regex bash shell text

我必须解析一大堆日志文件,其格式如下。

SOME SQL STATEMENT/QUERY

DB20000I  The SQL command completed successfully.

SOME OTHER SQL STATEMENT/QUERY

DB21034E  The command was processed as an SQL statement because it was not a 
valid Command Line Processor command.

编辑1:前3行(包括空行)表示SQL语句执行成功,而接下来的三行显示该语句及其引起的异常。 darioo 下面的回复建议使用 grep 而不是 Java,对于单行 SQL 语句效果非常好。

编辑2:但是,SQL 语句/查询不一定是一行。有时它是一个很大的CREATE PROCEDURE...END PROCEDURE block 。仅使用 Unix 命令也可以解决这个问题吗?

现在我需要解析整个日志文件并选择所有出现的一对(SQL 语句 + 错误)并将它们写入一个单独的文件中。

请告诉我如何做到这一点!

最佳答案

我的答案将不是基于 Java 的,因为这是一个可以通过非常非常简单的方式解决问题的经典示例。

您所需要的只是工具grep。如果您使用的是 Windows,则可以找到它 here .

假设您的日志位于文件 log.txt 中,您的问题的解决方案就是这样:

grep -hE --before-context 1 "^DB2[0-9]+E" log.txt > filtered.txt

说明:

  • -h - 不打印文件名
  • -E - 正则表达式搜索
  • --before-context 1 - 这将在发现错误消息之前打印一行(如果所有 SQL 查询都在一行中,这将起作用)
  • ^DB2[0-9]+E - 搜索以“DB2”开头、包含一些数字并以“E”结尾的行

以上表达式将在名为 filtered.txt 的新文件中打印您需要的每一行。

<小时/>

更新:经过一番摸索后,我设法仅使用标准 *nix 实用程序就获得了所需的内容。当心,这并不漂亮。最终的表达:

grep -nE "^DB2[0-9]+" log.txt | cut -f 1 -d " " | gawk "/E$/{y=$0;print x, y};{x=$0}" | sed -e "s/:DB2[[:digit:]]\+[IE]//g" | gawk "{print \"sed -n \\\"\" $1+1 \",\" $2 \"p\\\" log.txt \"}" | sed -e "s/$/ >> filtered.txt/g" > run.bat

说明:

  • grep -nE "^DB2[0-9]+"log.txt - 打印以 DB2... 开头的行及其开头的行号。示例:
6:DB20000I  The SQL command completed successfully.
12:DB21034E  The command was processed as an SQL statement because it was not a valid Command Line Processor command.
19:DB21034E  The command was processed as an SQL statement because it was not a valid Command Line Processor command.
26:DB21034E  The command was processed as an SQL statement because it was not a valid Command Line Processor command.
34:DB20000I  The SQL command completed successfully.
41:DB20000I  The SQL command completed successfully.
47:DB21034E  The command was processed as an SQL statement because it was not a valid Command Line Processor command.
54:DB20000I  The SQL command completed successfully.
  • cut -f 1 -d " " - prints only the "first column", that is, removes everything after error message. Example:
6:DB20000I
12:DB21034E
19:DB21034E
26:DB21034E
34:DB20000I
41:DB20000I
47:DB21034E
54:DB20000I
  • gawk "/E$/{y=$0;print x, y};{x=$0}" - for every line that ends with "E" (an error line), print the line before it and then the error line. Example:
6:DB20000I 12:DB21034E
12:DB21034E 19:DB21034E
19:DB21034E 26:DB21034E
41:DB20000I 47:DB21034E
  • sed -e "s/:DB2[[:digit:]]\+[IE]//g" - removes colon and the error message, leaving only line numbers. Example:
6 12
12 19
19 26
41 47
  • gawk "{print \"sed -n \\\"\" $1+1 \",\" $2 \"p\\\" log.txt \"}" - formats above lines for sed processing and increments first line number by one. Example:
sed -n "7,12p" log.txt 
sed -n "13,19p" log.txt 
sed -n "20,26p" log.txt 
sed -n "42,47p" log.txt 
  • sed -e "s/$/ >> filtered.txt/g" - appends >> filtered.txt to lines, for appending to final output file. Example:
sed -n "7,12p" log.txt  >> filtered.txt
sed -n "13,19p" log.txt  >> filtered.txt
sed -n "20,26p" log.txt  >> filtered.txt
sed -n "42,47p" log.txt  >> filtered.txt
  • > run.bat - finally, prints the last lines to a batch file named run.bat

After you execute this file, content you wanted will appear in filtered.txt.

Update 2:

Here is another version that works on Ubuntu (previous version was written on Windows):

grep -nE "^DB2[0-9]+" log.txt | cut -f 1 -d " " | gawk '/E/{y=$0;print x, y};{x=$0}' | sed -e "s/:DB2[[:digit:]]\+[IE]//g" | gawk '{print "sed -n \""$1+1" ,"$2 "p\" log.txt" }' | sed -e "s/$/ >> filtered.txt/g" > run.sh

有两件事不适用于以前的版本:

  1. 出于某种原因,gawk '/E$/' 无法正常工作(它无法识别 E 位于行尾),所以我只是输入了 /E/ 因为 E 在其他地方找不到。
  2. 引用," 被转换为 gawk 的 ',因为它不喜欢双引号;之后,修改了最后一个 gawk 表达式内的引用

关于java - 使用 Java 有选择地解析日志文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4645432/

相关文章:

bash - 使用管道获取进程替换的退出代码进入 while 循环

java - 从 NSF 加载的单例对象的行为

java - 识别和非识别关系

objective-c - 过滤文本的正则表达式

javascript - 查找问号前的单行文本的正则表达式是什么?

bash - SQL*Loader-522 : lfiopn failed for file

linux - 尝试启动 Linux Bash 脚本时找不到命令

java - Spring 的 @Scheduled cron 作业在预定时间前几毫秒触发

java - 确定 "Backspace"按钮按下

regex - 在文件中替换/插入时间戳