java - 从字符串中提取目录

标签 java regex string

我需要提取字符串的目录,示例如下:

222.77.201.211 - - [20/Sep/2013:00:10:23 +0800] "GET /mapreduce-nextgen/hadoop-internals-mapreduce-reference/ HTTP/1.1" 200 28664 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)"
220.181.89.164 - - [20/Sep/2013:00:10:25 +0800] "GET /mapreduce/hadoop-capacity-scheduler HTTP/1.1" 301 390 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
175.44.54.185 - - [20/Sep/2013:00:10:25 +0800] "GET /mapreduce-nextgen/apache-hadoop-2-0-3-published HTTP/1.1" 301 439 "http://dongxicheng.org/mapreduce-nextgen/apache-hadoop-2-0-3-published/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)"
175.44.54.185 - - [20/Sep/2013:00:10:25 +0800] "GET /search-engine/scribe-intro/ HTTP/1.1" 200 21578 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)"
112.111.174.38 - - [20/Sep/2013:00:10:30 +0800] "GET /structure/segment-tree HTTP/1.1" 301 414 "http://dongxicheng.org/structure/segment-tree/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)"
112.111.174.38 - - [20/Sep/2013:00:10:30 +0800] "GET /structure/segment-tree HTTP/1.1" 301 414 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)"
222.77.201.211 - - [20/Sep/2013:00:10:31 +0800] "GET /mapreduce-nextgen/apache-hadoop-2-0-3-published/ HTTP/1.1" 200 23438 "http://dongxicheng.org/mapreduce-nextgen/apache-hadoop-2-0-3-published/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)"

预期输出为:

  • /mapreduce-nextgen/hadoop-internals-mapreduce-reference/
  • /mapreduce/hadoop-capacity-scheduler
  • /mapreduce-nextgen/apache-hadoop-2-0-3-published
  • 等等...

我认为可能需要正则表达式。提前致谢!

最佳答案

如果它总是在 GETHTTP 之间,最简单的正则表达式就是这个:

GET (.*?) HTTP

在这里证明:Regex101

在 Java 中,代码应如下所示:

Pattern p = Pattern.compile("GET (.*?) HTTP");
Matcher m = p.matcher(string);

编辑:不要忘记在字符串中的每个 " 之前放置 \,否则它会被解释为字符串的结尾。

String str = "222.77.201.211 - - [20/Sep/2013:00:10:23 +0800] \"GET /mapreduce-nextgen/hadoop-internals-mapreduce-reference/ HTTP/1.1\" 200 28664 \"-\" \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)\"";

上面字符串的输出将是 /mapreduce-nextgen/hadoop-internals-mapreduce-reference/

关于java - 从字符串中提取目录,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35923477/

相关文章:

java - 不确定在这种情况下字符串拆分实际上是如何工作的

regex - 如何在mongodb中搜索逗号分隔的数据

Java错误子串

c++ - 字符串文字中的比较导致未指定的行为 - C++

Java Array Index out of Bounds 异常 with/for 循环

java - 值(value)返回方法JPQL的最佳实践是什么

java - 使用 Maven 父级和模块

java - 休息API : Raw Type warning extending a Generic Class

python - 如何加速 CSS 文件中的正则表达式 findall

c# - 替换字符串中的某个子字符串 - .Replace() 方法对我不起作用 C#