java - 获取 xml 元素值中的 HTML 或 XHTML 文本

标签 java xslt xml-parsing jsoup jdom

我的xml文档中有projectDetails节点,它是从java jdom api创建的,节点内的数据来自数据库。
问题在于描述字段,它以 html 形式存储在数据库中。当我将其添加到 <descriptionDetails /> 中时元素,并使用 Java 的转换类对其进行转换,它会转义所有 html 标签。
是否有可能获得像其余标签一样的 Html 代码作为descriptionDetails 的子级并且未转义。

  <projectDetails label="label.projectDetails">
    <descriptionDetails label="label.descriptionDetails">
    &lt;html&gt;
 &lt;head&gt;&lt;/head&gt;
 &lt;body&gt;
  &lt;strong&gt;&lt;strong&gt; Tiny MCE Bold&lt;br /&gt;&lt;em&gt;Tiny MCE Bold/Itellic&lt;/em&gt;&lt;br /&gt;&lt;span style="text-decoration: underline;"&gt;&lt;em&gt;Tiny MCE Bold/Itellic/Underlined&lt;/em&gt;&lt;/span&gt;&lt;br /&gt;&lt;/strong&gt;&lt;/strong&gt; 
  &lt;div&gt;
   Lorem Ipsum&amp;nbsp;is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown 
   &lt;br /&gt;
   &lt;br /&gt;
   &lt;span style="color: #ff0000;"&gt;printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset &lt;span style="color: #ffffff; background-color: #808000;"&gt;&lt;span style="background-color: #808000;"&gt;sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum,.&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;
  &lt;/div&gt; 
  &lt;h1&gt;H1 heading&lt;/h1&gt; 
  &lt;h2&gt;H1 heading&lt;/h2&gt; 
  &lt;h3&gt;H1 heading&lt;/h3&gt; 
  &lt;h4&gt;H1 heading&lt;/h4&gt; 
  &lt;h5&gt;H1 heading&lt;/h5&gt; 
  &lt;h6&gt;H1 heading&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: 14pt;"&gt;font size 14&lt;/span&gt;&lt;/h6&gt;
 &lt;/body&gt;
&lt;/html&gt;
</descriptionDetails>
 </projectDetails 

private static String xmlAsString(Document xml) throws Exception {
        Transformer tf = TransformerFactory.newInstance().newTransformer();

        tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        tf.setOutputProperty(OutputKeys.INDENT, "yes");
        tf.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
        tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");

        Writer out = new StringWriter();
        tf.transform(new DOMSource(xml), new StreamResult(out));
        return out.toString();
    }

预期输出,

<projectDetails label="label.projectDetails">
    <descriptionDetails label="label.descriptionDetails">
    <html>
 <head></head>
 <body>
  <strong><strong> Tiny MCE Bold<br /><em>Tiny MCE Bold/Itellic</em><br /><span style="text-decoration: underline;"><em>Tiny MCE Bold/Itellic/Underlined</em></span><br /></strong></strong> 
  <div>
   Lorem Ipsum&nbsp;is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown 
   <br />
   <br />
   <span style="color: #ff0000;">printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset <span style="color: #ffffff; background-color: #808000;"><span style="background-color: #808000;">sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum,.</span><br /></span></span>
  </div> 
  <h1>H1 heading</h1> 
  <h2>H1 heading</h2> 
  <h3>H1 heading</h3> 
  <h4>H1 heading</h4> 
  <h5>H1 heading</h5> 
  <h6>H1 heading<br /><br /><span style="font-size: 14pt;">font size 14</span></h6>
 </body>
</html>
</descriptionDetails>
 </projectDetails

最佳答案

您可以使用 https://docs.oracle.com/javase/8/docs/api/javax/xml/transform/TransformerFactory.html#newTransformer-javax.xml.transform.Source- 创建,而不是使用带有 newInstance().newTransformer() 的默认 Transformer来自样式表的一个,例如

<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="descriptionDetails/text()">
      <xsl:value-of select="." disable-output-escaping="yes"/>
  </xsl:template>

</xsl:stylesheet>

用作源。

https://xsltfiddle.liberty-development.net/nc4NzR7

但请注意,转义的 HTML 包含例如实体引用   ,它会将您的输出转换为格式不正确的 XML,因为该实体未在 XML 中预定义。

关于java - 获取 xml 元素值中的 HTML 或 XHTML 文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52784991/

相关文章:

java - 如何以编程方式发现本地网络上的数据库

.net - Wix 安装程序和 XSLT - 省略特定目录中的某些文件

XSLT 1.0 添加命名空间 + xsi :type to element

c# - 在 C# 类字段中存储 xml

c++ - Qt:无法解析某些 svg 项目样式

java - 将 Java Tokenizer 转换为 Stream 对象以利用过滤

Java StringBuilder 附加垂直制表符失败

java - 仅用于启动 Activity 的应用程序

Java,XSLT : How to handle the dynamic namespaces generated by Axis in XSLT

JavaScript - 解析 XML 数据