在我这里的环境中,我使用 Java 将结果集序列化为 XML。 它基本上是这样发生的:
//foreach column of each row
xmlHandler.startElement(uri, lname, "column", attributes);
String chars = rs.getString(i);
xmlHandler.characters(chars.toCharArray(), 0, chars.length());
xmlHandler.endElement(uri, lname, "column");
XML 在 Firefox 中看起来像这样:
<row num="69004">
<column num="1">10069</column>
<column num="2">sd</column>
<column num="3">FCVolume </column>
</row>
但是当我解析 XML 时,我得到了一个
org.xml.sax.SAXParseException: Character reference "" is an invalid XML character.
我现在的问题是:我必须替换哪些字符,或者我必须如何对我的字符进行编码,使它们成为有效的 XML?
最佳答案
我在 Xml Spec 中发现了一个有趣的列表: 根据该列表,不鼓励使用字符 #26(十六进制:#x1A)。
The characters defined in the following ranges are also discouraged. They are either control characters or permanently undefined Unicode characters
参见 complete ranges .
此代码替换字符串中所有无效的 Xml Utf8:
public String stripNonValidXMLCharacters(String in) {
StringBuffer out = new StringBuffer(); // Used to hold the output.
char current; // Used to reference the current character.
if (in == null || ("".equals(in))) return ""; // vacancy test.
for (int i = 0; i < in.length(); i++) {
current = in.charAt(i);
if ((current == 0x9) ||
(current == 0xA) ||
(current == 0xD) ||
((current >= 0x20) && (current <= 0xD7FF)) ||
((current >= 0xE000) && (current <= 0xFFFD)) ||
((current >= 0x10000) && (current <= 0x10FFFF)))
out.append(current);
}
return out.toString();
}
取自Invalid XML Characters: when valid UTF8 does not mean valid XML
但是我仍然遇到 UTF-8 兼容性问题:
org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence
看完XML - returning XML as UTF-8 from a servlet我只是尝试了如果我这样设置 Contenttype 会发生什么:
response.setContentType("text/xml;charset=utf-8");
它奏效了....
关于java - 如何将字符从 Oracle 编码为 XML?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/156697/