我有一个格式为 xml -
<root>
<sentence>
first part of the text
<a id="interpolation_1"> </a>
second part of the text
<a id="interpolation_2"> </a>
</sentence>
</root>
本质上,<sentence>
标签代表一个句子,子标签<a>
是句子中的插值部分。
XPath 表达式 String sentence = xPath.evaluate("sentence", transUnitElement);
将文本指定为 - first part of the text second part of the text
即它省略了插值。
XPath 表达式 -
NodeList aList = (NodeList) xPath.evaluate("/sentence/a", transUnitElement, XPathConstants.NODESET);
给出 <a>
的列表元素。
如何解析它们以获取 <sentence>
的文本元素以及 <a>
元素不丢失 <a>
的顺序和位置元素?
预期输出 -
the first part of the sentence {interpolation_1} second part of the text {interpolation_2}
最佳答案
您正在寻找的结果可以通过迭代sentence
的子节点并逐步构建目标字符串来实现。例如:
// retrieve <sentence> as Node, not as text
Node sentence = (Node) xPath.evaluate("sentence", transUnitElement, XPathConstants.NODE);
StringBuilder resultBuilder = new StringBuilder();
NodeList children = sentence.getChildNodes();
for (int i = 0; i < children.getLength(); i++) {
Node child = children.item(i);
short nodeType = child.getNodeType();
switch (nodeType) {
case Node.TEXT_NODE:
String text = child.getTextContent().trim();
resultBuilder.append(text);
break;
case Node.ELEMENT_NODE:
String id = ((Element) child).getAttribute("id");
resultBuilder.append(" {").append(id).append("} ");
break;
default:
throw new IllegalStateException("Unexpected node type: " + nodeType);
}
}
// outputs "first part of the text {interpolation_1} second part of the text {interpolation_2}"
System.out.println(resultBuilder.toString());
关于java - 如何在Java中解析具有混合节点和文本的XML?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49183166/