regex - 解析apache错误日志文件的正则表达式

我需要在 Java 程序中使用正则表达式来解析 apache 错误文件，例如:

[Thu Sep 27 12:08:18 2012] [error] [client 151.10.158.10] File does not exist: /srv/www/htdocs/pad/favicon.ico
[Thu Oct 04 17:02:42 2012] [error] [client 151.10.1.10] File does not exist: > /srv/www/htdocs/pad/favicon.ico
[Wed Oct 17 10:16:40 2012] [error] [client 151.10.14.60] File does not exist: /srv/www/htdocs/pad/sites/all/modules/fckeditor/fckeditor/editor/userfiles, referer: http://pad.sta.uniroma1.it/sites/all/modules/fckeditor/fckeditor/editor/fckeditor.html?InstanceName=edit-body&Toolbar=DrupalFull

我已经尝试了几种解决方案(其中一些之前已经在 stackoverflow 上报告过)，似乎效果更好的一种是:

^(\[[\w:\s]+\]) (\[[\w]+\]) (\[[\w\d.\s]+\])?([\w\s/.(")-]+[\-:]) ([\w/\s]+)$

但是，它似乎无法匹配像这样的字符串:

[Thu May 17 22:41:54 2012] [error] [client 118.238.211.206] Invalid URI in request GET :81/phpmyadmin/scripts/setup.php HTTP/1.1

我该如何解决？

编辑我检查了所有建议的解决方案，虽然提高了匹配行的数量，但仍然无法处理以下情况:

[Fri Jul 15 00:24:41 2011] [error] [client 219.12.35.141] script '/srv/www/htdocs/pad2/scripts/setup.php' not found or unable to stat
[Mon May 28 18:43:25 2012] [error] [client 88.110.28.25] Invalid URI in request GET HTTP/1.1 HTTP/1.1

另请注意，我可以在一个组中接收方括号后的所有数据，包括客户关键字

最佳答案

receiving the information encoded in the first three [...] groups

查找 [...] 作为以 [ 开头并以 ] 结尾且没有其他 ] 的最长字符串它们之间的符号 - \[[^\]]+\]

行的剩余部分捕获为 .* - 从当前位置到行尾的匹配。

所以你的完整解决方案如下所示:

^(\[[^\]]+\]) (\[[^\]]+\]) (\[[^\]]+\]) (.*)$

RegEx demo

关于regex - 解析apache错误日志文件的正则表达式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26651224/

regex - 解析apache错误日志文件的正则表达式

上一篇：erlang - 进行 "offline"erlang OTP 版本升级

下一篇：.net - 如何自动将 datetimepicker 值设置为所选月份的 1 号？