我有一个字符串,其中包含从 http 和 https 开始的多个 url,我需要获取所有这些 url 并将其放入列表中。
我尝试过下面的代码。
List<String> httpLinksList = new ArrayList<>();
String hyperlinkRegex = "((http:\/\/|https:\/\/)?(([a-zA-Z0-9-]){2,}\.){1,4}([a-zA-Z]){2,6}(\/([a-zA-Z-_\/\.0-9#:?=&;,]*)?)?)";
String synopsis = "这是 http://stackoverflow.com/questions 和 https://test.com/method?param=wasd 下面的代码捕获文本中的所有 url 并返回列表中的 url";
Pattern pattern = Pattern.compile(hyperlinkRegex);
Matcher matcher = pattern.matcher(synopsis);
while(matcher.find()){
System.out.println(matcher.find()+" "+matcher.group(1)+" "+matcher.groupCount()+" "+matcher.group(2));
httpLinksList.add(matcher.group());
}
System.out.println(httpLinksList);
我需要以下结果 [http://stackoverflow.com/questions , https://test.com/method?param=wasd] 但低于输出 [https://test.com/method?param=wasd]
最佳答案
此正则表达式将匹配所有有效的 URL,包括 FTP 和其他
String urlRegex = "((https?|ftp|gopher|telnet|file):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)";
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class xmlValue {
public static void main(String[] args) {
String text = "This is http://stackoverflow.com/questions and https://test.com/method?param=wasd The code below catches all urls in text and returns urls in list";
System.out.println(extractUrls(text));
}
public static List<String> extractUrls(String text)
{
List<String> containedUrls = new ArrayList<String>();
String urlRegex = "((https?|ftp|gopher|telnet|file):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)";
Pattern pattern = Pattern.compile(urlRegex, Pattern.CASE_INSENSITIVE);
Matcher urlMatcher = pattern.matcher(text);
while (urlMatcher.find())
{
containedUrls.add(text.substring(urlMatcher.start(0),
urlMatcher.end(0)));
}
return containedUrls;
}
}
输出:
[http://stackoverflow.com/questions, https://test.com/method?param=wasd]
鸣谢@BullyWiiPlaza
关于java - 用于从字符串中查找 http 和 https url 的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57609873/