java - 匹配特定 url 的正则表达式模式

我有一个很大的文本，我只想使用其中的某些信息。文本如下所示:

Some random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_1_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_2_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_3_av.m3u8

我只想要 http 文本。文中有好几个，但我只需要其中之一。正则表达式应该是“以http开头，以.m3u8结尾”。

我查看了所有不同表达的词汇表，但它让我感到非常困惑。我尝试了 "/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{12,30})([\/\w\.-]*)*\/?$/" 作为我的模式。但这就足够了吗？

感谢所有帮助。谢谢。

最佳答案

假设您的示例中的文本在每一行表示中都是行分隔的，下面是一个可行的代码片段:

String text = 
"Some random text here" +
System.getProperty("line.separator") +
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8" +
System.getProperty("line.separator") +
"More random text here" +
System.getProperty("line.separator") +
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8" +
System.getProperty("line.separator") +
// removed some for brevity
"More random text here" +
System.getProperty("line.separator") +
// added counter-example ending with "NOPE"
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.NOPE";

// Multi-line pattern:
//                           ┌ line starts with http
//                           |    ┌ any 1+ character reluctantly quantified
//                           |    |  ┌ dot escape
//                           |    |  |  ┌ ending text
//                           |    |  |  |   ┌ end of line marker
//                           |    |  |  |   |
Pattern p = Pattern.compile("^http.+?\\.m3u8$", Pattern.MULTILINE);
Matcher m = p.matcher(text);
while (m.find()) {
    System.out.println(m.group());
}

输出

http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8

编辑

要通过 URL 的 “index_x” 文件进行精细“过滤”，您只需将其添加到协议(protocol)和行尾之间的 Pattern 中即可，例如:

Pattern.compile("^http.+?index_0.+?\\.m3u8$", Pattern.MULTILINE);

关于java - 匹配特定 url 的正则表达式模式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29895620/

java - 匹配特定 url 的正则表达式模式

上一篇：java - 在java中从字符串数组中存储13个不重复的随机值

下一篇：java - 从Java中的子类构造函数填充父类(super class)中的 HashMap