我有一个像这样的字符串,当我尝试解析它时,它会出现在 JSON processing data call\\U007fabccomputers
中, jackson 会抛出这样的异常:
org.codehaus.jackson.JsonParseException: Unrecognized character escape 'U' (code 85)
at [Source: java.io.StringReader@1b43c429; line: 1, column: 361]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1292)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonParserMinimalBase._handleUnrecognizedCharacterEscape(JsonParserMinimalBase.java:360)
at org.codehaus.jackson.impl.ReaderBasedParser._decodeEscaped(ReaderBasedParser.java:1064)
at org.codehaus.jackson.impl.ReaderBasedParser._finishString2(ReaderBasedParser.java:785)
at org.codehaus.jackson.impl.ReaderBasedParser._finishString(ReaderBasedParser.java:762)
我认为问题的发生是由于\\U007f
。它在 UTF-8 中肯定有某种含义。知道我们如何避免这个问题吗? JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER
在这里有什么帮助吗?
最佳答案
您的 JSON 数据格式错误。
JSON 使用 \u
转义序列对 UTF-16 代码单元进行编码。
在这种情况下,您的 JSON 数据正在尝试转义 Unicode 代码点 U+007F DELETE
(这是 JSON spec 不需要的 ASCII 控制字符)。被转义,但允许被转义),但是使用\U
转义序列来这样做。 JSON spec明确指出必须使用 \u
:
A string is a sequence of Unicode code points wrapped with quotation marks (U+0022). All characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark (U+0022), reverse solidus (U+005C), and the control characters U+0000 to U+001F. There are two-character escape sequence representations of some characters.
...
Any code point may be represented as a hexadecimal number. The meaning of such a number is determined by ISO/IEC 10646. If the code point is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the code point.
...
To escape a code point that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair.
虽然最后一段中没有明确说明,但 UTF-16 代理对的 12 个字符序列由两个 6 个字符序列组成,这两个序列必须遵循与 BMP 中的字符相同的转义格式。这是由字符编码图强制执行的:
(来源:json.org)
没有定义\U
转义序列。这就是解析器错误消息所提示的:
Unrecognized character escape 'U'
关于java - 解析 JSON 时无法识别的字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31015465/