我的输入String
为:
String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it";
我想将此文本转换为:
Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&myParam=pqr (URL Label) and some text after it
所以这里:
1) 我想用普通链接替换链接标签。如果标签包含标签,那么它应该放在 URL 后面的大括号中。
2) 如果 URL 是相对的,我想为基本 URL 添加前缀 ( http://www.google.com )。
3) 我想在 URL 中附加一个参数。 (&myParam=pqr)
我在检索包含 URL 和标签的标签并替换它时遇到问题。
我写了这样的内容:
public static void main(String[] args) {
String text = "String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it";";
text = text.replaceAll("<", "<");
text = text.replaceAll(">", ">");
text = text.replaceAll("&", "&");
// this is not working
Pattern p = Pattern.compile("href=\"(.*?)\"");
Matcher m = p.matcher(text);
String url = null;
if (m.find()) {
url = m.group(1);
}
}
// helper method to append new query params once I have the url
public static URI appendQueryParams(String uriToUpdate, String queryParamsToAppend) throws URISyntaxException {
URI oldUri = new URI(uriToUpdate);
String newQueryParams = oldUri.getQuery();
if (newQueryParams == null) {
newQueryParams = queryParamsToAppend;
} else {
newQueryParams += "&" + queryParamsToAppend;
}
URI newUri = new URI(oldUri.getScheme(), oldUri.getAuthority(),
oldUri.getPath(), newQueryParams, oldUri.getFragment());
return newUri;
}
编辑1:
Pattern p = Pattern.compile("HREF=\"(.*?)\"");
这有效。但我希望它与大小写无关。 Href、HRef、href、hrEF 等都应该可以工作。
此外,如果我的文本有多个 URL,我该如何处理。
编辑2:
一些进展。
Pattern p = Pattern.compile("href=\"(.*?)\"");
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1);
System.out.println(url);
}
这可以处理多个 URL 的情况。
最后一个悬而未决的问题是,如何获取标签并将原始文本中的 href 标签替换为 URL 和标签。
编辑3:
通过多个 URL 情况,我的意思是给定文本中存在多个 URL。
String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it and another link <A HREF=\"/relative-path/vegetables.cgi?param1=abc&param2=xyz\">URL2 Label</A> and some more text";
Pattern p = Pattern.compile("href=\"(.*?)\"", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1); // this variable should contain the link URL
url = appendBaseURI(url);
url = appendQueryParams(url, "license=ABCXYZ");
System.out.println(url);
}
最佳答案
public static void main(String args[]) {
String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it and another link <A HREF=\"/relative-path/vegetables.cgi?param1=abc&param2=xyz\">URL2 Label</A> and some more text";
text = StringEscapeUtils.unescapeHtml4(text);
Pattern p = Pattern.compile("<a href=\"(.*?)\">(.*?)</a>", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
while (m.find()) {
text = text.replace(m.group(0), cleanUrlPart(m.group(1), m.group(2)));
}
System.out.println(text);
}
private static String cleanUrlPart(String url, String label) {
if (!url.startsWith("http") && !url.startsWith("www")) {
if (url.startsWith("/")) {
url = "http://www.google.com" + url;
} else {
url = "http://www.google.com/" + url;
}
}
url = appendQueryParams(url, "myParam=pqr").toString();
if (label != null && !label.isEmpty()) url += " (" + label + ")";
return url;
}
输出
Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&myParam=pqr (URL Label) and some text after it and another link http://www.google.com/relative-path/vegetables.cgi?param1=abc¶m2=xyz&myParam=pqr (URL2 Label) and some more text
关于java正则表达式从文本中检索链接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53423132/