java - JSOUP:HTML 符号 ¡ 的问题

以下是传递给 JSOUP 的 HTML 字符串

<p id="pid">&uml;This is string using for testing</p>

Document doc = Jsoup.parse(htmlString);
String text = doc.getElementById("pid").text();

提取属性文本后，结果如下

¨This is string using for testing

但问题在于“此隐藏的十六进制字符已添加到其中。如果我在 notepad++ hexEditor 中查看字符串 ¡This 为 c2a854686973 (¡This)

最佳答案

But issue is for ¨This hidden hex character is added to it.

您可以更改加载 HTML 代码的方式。只要您提供字符集名称，Jsoup 就会接受解析 InputStream。

示例

String s = "<p id=\"pid\">&uml;This is string using for testing</p>";
Document doc = Jsoup.parse(new ByteArrayInputStream(s.getBytes()), "ASCII", "");
System.out.println(doc);

输出

<html>
 <head></head>
 <body>
  <p id="pid">&uml;This is string using for testing</p>
 </body>
</html>

关于java - JSOUP:HTML 符号 ¡ 的问题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36770552/

上一篇：java - 在 xsd 中定义由 xjc 有效处理的空 xml 节点

下一篇：java - 如何使用 pires OBD-II Java API 中的 ObdRawCommand 发送自定义命令

相关文章：

Java内存之谜

java - 警告对话框消失

javascript - 在后台对网站进行高效轮询？ (Firefox 插件 SDK)

c# - 在 C# 中与网页交互

java - java中如何使用Jsoup处理异常以保持程序运行

java - 从 Web url 检索完整页面源

java - JSoup 请求 url 更改

java - 如何打开安卓快速通知设置

java - 将 RecyclerView 与数据库结合使用

python - 使用 BeautifulSoup 解析 HTML 时缺少特殊字符和标签