在这里阅读答案: Normalization in DOM parsing with java - how does it work?
我知道规范化将删除空的相邻文本节点,我尝试了以下 xml:
<company>hello
wor
ld
</company>
使用以下代码:
try {
DocumentBuilder dBuilder = DocumentBuilderFactory.newInstance()
.newDocumentBuilder();
Document doc = dBuilder.parse(file);
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
System.out.println(doc.getDocumentElement().getChildNodes().getLength());
System.out.println(doc.getDocumentElement().getChildNodes().item(0).getTextContent());
} catch (Exception e) {
e.printStackTrace();
}
即使没有标准化,我总是会得到元素“company”的 1 个子节点。结果是:
Root element :company
1
hello
wor
ld
那么这里出了什么问题呢?谁能解释一下吗?我不应该在一行中得到 hello world 吗?
最佳答案
解析器已经在创建规范化的 DOM 树。
normalize()
方法在您构建/修改 DOM 时非常有用,这可能不会生成规范化树,在这种情况下,该方法将为您对其进行规范化。
普通助手
private static void printDom(String indent, Node node) {
System.out.println(indent + node);
for (Node child = node.getFirstChild(); child != null; child = child.getNextSibling())
printDom(indent + " ", child);
}
示例 1
public static void main(String[] args) throws Exception {
String xml = "<Root>text 1<!-- test -->text 2</Root>";
DocumentBuilder domBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = domBuilder.parse(new InputSource(new StringReader(xml)));
printDom("", doc);
deleteComments(doc);
printDom("", doc);
doc.normalizeDocument();
printDom("", doc);
}
private static void deleteComments(Node node) {
if (node.getNodeType() == Node.COMMENT_NODE)
node.getParentNode().removeChild(node);
else {
NodeList children = node.getChildNodes();
for (int i = 0; i < children.getLength(); i++)
deleteComments(children.item(i));
}
}
输出
[#document: null]
[Root: null]
[#text: text 1]
[#comment: test ]
[#text: text 2]
[#document: null]
[Root: null]
[#text: text 1]
[#text: text 2]
[#document: null]
[Root: null]
[#text: text 1text 2]
示例 2
public static void main(String[] args) throws Exception {
DocumentBuilder domBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = domBuilder.newDocument();
Element root = doc.createElement("Root");
doc.appendChild(root);
root.appendChild(doc.createTextNode("Hello"));
root.appendChild(doc.createTextNode(" "));
root.appendChild(doc.createTextNode("World"));
printDom("", doc);
doc.normalizeDocument();
printDom("", doc);
}
输出
[#document: null]
[Root: null]
[#text: Hello]
[#text: ]
[#text: World]
[#document: null]
[Root: null]
[#text: Hello World]
关于java - 归一化 DOM 没有归一化效果相同,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53934878/