java - 使用 Java 有选择地解析日志文件

我必须解析一大堆日志文件，其格式如下。

SOME SQL STATEMENT/QUERY

DB20000I  The SQL command completed successfully.

SOME OTHER SQL STATEMENT/QUERY

DB21034E  The command was processed as an SQL statement because it was not a 
valid Command Line Processor command.

编辑1:前3行(包括空行)表示SQL语句执行成功，而接下来的三行显示该语句及其引起的异常。 darioo 下面的回复建议使用 grep 而不是 Java，对于单行 SQL 语句效果非常好。

编辑2:但是，SQL 语句/查询不一定是一行。有时它是一个很大的CREATE PROCEDURE...END PROCEDURE block 。仅使用 Unix 命令也可以解决这个问题吗？

现在我需要解析整个日志文件并选择所有出现的一对(SQL 语句 + 错误)并将它们写入一个单独的文件中。

请告诉我如何做到这一点!

最佳答案

我的答案将不是基于 Java 的，因为这是一个可以通过非常非常简单的方式解决问题的经典示例。

您所需要的只是工具grep。如果您使用的是 Windows，则可以找到它 here .

假设您的日志位于文件 log.txt 中，您的问题的解决方案就是这样:

grep -hE --before-context 1 "^DB2[0-9]+E" log.txt > filtered.txt

说明:

-h - 不打印文件名
-E - 正则表达式搜索
--before-context 1 - 这将在发现错误消息之前打印一行(如果所有 SQL 查询都在一行中，这将起作用)
^DB2[0-9]+E - 搜索以“DB2”开头、包含一些数字并以“E”结尾的行

以上表达式将在名为 filtered.txt 的新文件中打印您需要的每一行。

<小时/>

更新:经过一番摸索后，我设法仅使用标准 *nix 实用程序就获得了所需的内容。当心，这并不漂亮。最终的表达:

grep -nE "^DB2[0-9]+" log.txt | cut -f 1 -d " " | gawk "/E$/{y=$0;print x, y};{x=$0}" | sed -e "s/:DB2[[:digit:]]\+[IE]//g" | gawk "{print \"sed -n \\\"\" $1+1 \",\" $2 \"p\\\" log.txt \"}" | sed -e "s/$/ >> filtered.txt/g" > run.bat

说明:

grep -nE "^DB2[0-9]+"log.txt - 打印以 DB2... 开头的行及其开头的行号。示例:

6:DB20000I  The SQL command completed successfully.
12:DB21034E  The command was processed as an SQL statement because it was not a valid Command Line Processor command.
19:DB21034E  The command was processed as an SQL statement because it was not a valid Command Line Processor command.
26:DB21034E  The command was processed as an SQL statement because it was not a valid Command Line Processor command.
34:DB20000I  The SQL command completed successfully.
41:DB20000I  The SQL command completed successfully.
47:DB21034E  The command was processed as an SQL statement because it was not a valid Command Line Processor command.
54:DB20000I  The SQL command completed successfully.

cut -f 1 -d " " - prints only the "first column", that is, removes everything after error message. Example:

6:DB20000I
12:DB21034E
19:DB21034E
26:DB21034E
34:DB20000I
41:DB20000I
47:DB21034E
54:DB20000I

gawk "/E$/{y=$0;print x, y};{x=$0}" - for every line that ends with "E" (an error line), print the line before it and then the error line. Example:

6:DB20000I 12:DB21034E
12:DB21034E 19:DB21034E
19:DB21034E 26:DB21034E
41:DB20000I 47:DB21034E

sed -e "s/:DB2[[:digit:]]\+[IE]//g" - removes colon and the error message, leaving only line numbers. Example:

gawk "{print \"sed -n \\\"\" $1+1 \",\" $2 \"p\\\" log.txt \"}" - formats above lines for sed processing and increments first line number by one. Example:

sed -n "7,12p" log.txt 
sed -n "13,19p" log.txt 
sed -n "20,26p" log.txt 
sed -n "42,47p" log.txt

sed -e "s/$/ >> filtered.txt/g" - appends >> filtered.txt to lines, for appending to final output file. Example:

sed -n "7,12p" log.txt  >> filtered.txt
sed -n "13,19p" log.txt  >> filtered.txt
sed -n "20,26p" log.txt  >> filtered.txt
sed -n "42,47p" log.txt  >> filtered.txt

> run.bat - finally, prints the last lines to a batch file named run.bat

After you execute this file, content you wanted will appear in filtered.txt.

Update 2:

Here is another version that works on Ubuntu (previous version was written on Windows):

grep -nE "^DB2[0-9]+" log.txt | cut -f 1 -d " " | gawk '/E/{y=$0;print x, y};{x=$0}' | sed -e "s/:DB2[[:digit:]]\+[IE]//g" | gawk '{print "sed -n \""$1+1" ,"$2 "p\" log.txt" }' | sed -e "s/$/ >> filtered.txt/g" > run.sh

有两件事不适用于以前的版本:

出于某种原因，gawk '/E$/' 无法正常工作(它无法识别 E 位于行尾)，所以我只是输入了 /E/ 因为 E 在其他地方找不到。
引用，" 被转换为 gawk 的 '，因为它不喜欢双引号；之后，修改了最后一个 gawk 表达式内的引用

关于java - 使用 Java 有选择地解析日志文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4645432/

java - 使用 Java 有选择地解析日志文件

上一篇：Java 日历日期月份设置不正确

下一篇：java - Java中的十六进制整数到十进制整数