java - 解析 JSON 时无法识别的字符

我有一个像这样的字符串，当我尝试解析它时，它会出现在 JSON processing data call\\U007fabccomputers 中， jackson 会抛出这样的异常:

org.codehaus.jackson.JsonParseException: Unrecognized character escape 'U' (code 85)
 at [Source: java.io.StringReader@1b43c429; line: 1, column: 361]
        at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1292)
        at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
        at org.codehaus.jackson.impl.JsonParserMinimalBase._handleUnrecognizedCharacterEscape(JsonParserMinimalBase.java:360)
        at org.codehaus.jackson.impl.ReaderBasedParser._decodeEscaped(ReaderBasedParser.java:1064)
        at org.codehaus.jackson.impl.ReaderBasedParser._finishString2(ReaderBasedParser.java:785)
        at org.codehaus.jackson.impl.ReaderBasedParser._finishString(ReaderBasedParser.java:762)

我认为问题的发生是由于\\U007f。它在 UTF-8 中肯定有某种含义。知道我们如何避免这个问题吗？ JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER 在这里有什么帮助吗？

最佳答案

您的 JSON 数据格式错误。

JSON 使用 \u 转义序列对 UTF-16 代码单元进行编码。

在这种情况下，您的 JSON 数据正在尝试转义 Unicode 代码点 U+007F DELETE(这是 JSON spec 不需要的 ASCII 控制字符)。被转义，但允许被转义)，但是使用\U转义序列来这样做。 JSON spec明确指出必须使用 \u:

A string is a sequence of Unicode code points wrapped with quotation marks (U+0022). All characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark (U+0022), reverse solidus (U+005C), and the control characters U+0000 to U+001F. There are two-character escape sequence representations of some characters.

...

Any code point may be represented as a hexadecimal number. The meaning of such a number is determined by ISO/IEC 10646. If the code point is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the code point.

...

To escape a code point that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair.

虽然最后一段中没有明确说明，但 UTF-16 代理对的 12 个字符序列由两个 6 个字符序列组成，这两个序列必须遵循与 BMP 中的字符相同的转义格式。这是由字符编码图强制执行的:

_{(来源:json.org)}

没有定义\U 转义序列。这就是解析器错误消息所提示的:

Unrecognized character escape 'U'

关于java - 解析 JSON 时无法识别的字符，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31015465/

java - 解析 JSON 时无法识别的字符

上一篇：java - 包的目的是什么

下一篇：java - Spring 4 + Spring Security 4 无法启动tomcat