我正在使用 org.apache.xerces.jaxp.DocumentBuilderImpl 在 java 中加载 xml 文档。 要加载的文档是:
<?xml version="1.0" encoding="UTF-8"?>CRLF
<doc >CRLF
<e1 />CRLF
</doc>
我以常见的方式加载文档:
DocumentBuilder builderXml = null;
Document nodeXml = null;
ByteArrayInputStream inputStream = new ByteArrayInputStream(xmlByte);
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
builderXml = documentBuilderFactory.newDocumentBuilder();
nodeXml = builderXml.parse(inputStream);
加载的文档似乎没问题,但只缺少一件事。行尾的 CR 被省略。
如果我这样称呼
nodeXml.getChildNodes().item(0).getChildNodes().item(0).getNodeValue()
我得到“\n”字符串。
在正常情况下不是这个问题,但是结合规范化,我得到了与我预期不同的结果。有人可以帮我看看最后一行的 CR 出了什么问题吗?
Java SDK 1.7_25 x86
提前感谢您的帮助
弗拉多
编辑:
在.net中我可以写这个
var xDoc = new XmlDocument();
xDoc.PreserveWhitespace = true;
using (var fs = new FileStream("file.xml", FileMode.Open))
{
xDoc.Load(fs);
}
var transform = new XmlDsigC14NTransform(false) { Algorithm = SignedXml.XmlDsigC14NTransformUrl };
transform.LoadInput(xDoc);
var output = (MemoryStream)transform.GetOutput();
File.WriteAllBytes("C:\\file1.xml", output.ToArray());
和空格被保留。这在java中可能吗?
最佳答案
XML standard状态:
XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters CARRIAGE RETURN (#xD) and LINE FEED (#xA).
To simplify the tasks of applications, the XML processor MUST behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character.
所以您所看到的实际上是预期的行为。
关于java - org.apache.xerces.jaxp.DocumentBuilderImpl 行尾缺少 CR,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20068960/