java - 仅在部分 html 文档中的链接中替换 &

我尝试了几种方法(如下所示的jsoup)来将 &amp进入&仅在链接中。我遇到的困难表明我的做法完全错误。我怀疑当提供解决方案时我会捂脸，但也许好的旧正则表达式是最好的答案(因为我只需要在 href 中进行替换)，除非修改了阅读器代码？

解析库(也尝试过 NekoHTML)想要转换所有 &发送至&所以我在使用它们来获取要使用的 true 链接 href 时遇到了麻烦 String的replace方法。

输入:

String toParse = "The <a href=\"http://example.com?key=val&amp;another_key=val.pdf&amp;action=edit&happy=good\">Link with an encoded ampersand (&amp;)</a> is challenging."

期望的输出:

The <a href=\"http://example.com?key=val&another_key=val.pdf&action=edit&happy=good\">Link with an encoded ampersand (&amp;)</a> is challenging.

我在尝试读取正在呈现的 RSS 提要时遇到此问题 <link>与&而不是& .

更新我最终使用正则表达式来识别链接，然后使用 replace插入解码的链接来代替 & 的链接s。 Pattern.quote()事实证明非常方便，但我必须手动关闭并重新打开引用的部分，以便我可以使用正则表达式或我的＆符号条件:

final String cleanLink = StringUtils.strip(link).replaceAll(" ", "%20").replaceAll("'", "%27");
String regex = Pattern.quote(link);
// end and re-start literal matching around my or condition
regex = regex.replaceAll("&", "\\\\E(&amp;|&)\\\\Q");
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(result);

while (matcher.find()) {
    int index = result.indexOf(matcher.group());
    while (index != -1) {
        // this replaces the links with &amp; with the same links with &
        // because cleanLink is from the DOM and has been properly decoded
        result.replace(index, index + matcher.group().length(), cleanLink);
        index += cleanLink.length();
        index = result.indexOf(matcher.group(), index);
        linkReplaced = true;
    }
}

我对这种方法并不满意，但我必须自己处理太多的情况，而不使用 DOM 工具来识别链接。

最佳答案

看看 StringEscapeUtils 。在您的 String 上尝试 unescapeHtml()。

关于java - 仅在部分 html 文档中的链接中替换 &，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31016230/

java - 仅在部分 html 文档中的链接中替换 &

上一篇：java - 如何在使用 Jersey 时检索 HttpServletRequest 数据

下一篇：java - ScrollView 高度不适合其内容 - Android Java