java.net.URI
构造函数接受大多数非 ASCII 字符,但不接受 ideographic space (0x3000)。 ctor 失败并出现 java.net.URISyntaxException: Illegal character in path ...
所以我的问题是:
- 为什么
URI
构造函数不接受0x3000
但接受其他非 ASCII 字符? - 它不接受其他哪些字符?
最佳答案
JavaDoc documentation for java.net.URI
中详细说明了可接受的字符集。
Character categories
RFC 2396 specifies precisely which characters are permitted in the various components of a URI reference. The following categories, most of which are taken from that specification, are used below to describe these constraints:
- alpha The US-ASCII alphabetic characters, 'A' through 'Z' and 'a' through 'z'
- digit The US-ASCII decimal digit characters, '0' through '9'
- alphanum All alpha and digit characters unreserved All alphanum characters together with those in the string "_-!.~'()*"
- punct The characters in the string ",;:$&+="
- reserved All punct characters together with those in the string "?/[]@"
- escaped Escaped octets, that is, triplets consisting of the percent character ('%') followed by two hexadecimal digits ('0'-'9', 'A'-'F', and 'a'-'f')
- other The Unicode characters that are not in the US-ASCII character set, are not control characters (according to the
Character.isISOControl
method), and are not space characters (according to theCharacter.isSpaceChar
method) (Deviation from RFC 2396, which is limited to US-ASCII)The set of all legal URI characters consists of the unreserved, reserved, escaped, and other characters.
特别是,“other”不包括空格字符,这些字符被定义(由Character.isSpaceChar)为Unicode通用类别类型
- SPACE_SEPARATOR
- LINE_SEPARATOR
- PARAGRAPH_SEPARATOR
根据您在问题中链接到的页面,表意空格字符确实是这些类型之一。
关于java - URI 中的非法字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28147818/