java - URI 中的非法字符

标签 java url encoding utf

java.net.URI 构造函数接受大多数非 ASCII 字符,但不接受 ideographic space (0x3000)。 ctor 失败并出现 java.net.URISyntaxException: Illegal character in path ...

所以我的问题是:

  • 为什么 URI 构造函数不接受 0x3000 但接受其他非 ASCII 字符?
  • 它不接受其他哪些字符?

最佳答案

JavaDoc documentation for java.net.URI 中详细说明了可接受的字符集。

Character categories

RFC 2396 specifies precisely which characters are permitted in the various components of a URI reference. The following categories, most of which are taken from that specification, are used below to describe these constraints:

  • alpha The US-ASCII alphabetic characters, 'A' through 'Z' and 'a' through 'z'
  • digit The US-ASCII decimal digit characters, '0' through '9'
  • alphanum All alpha and digit characters unreserved All alphanum characters together with those in the string "_-!.~'()*"
  • punct The characters in the string ",;:$&+="
  • reserved All punct characters together with those in the string "?/[]@"
  • escaped Escaped octets, that is, triplets consisting of the percent character ('%') followed by two hexadecimal digits ('0'-'9', 'A'-'F', and 'a'-'f')
  • other The Unicode characters that are not in the US-ASCII character set, are not control characters (according to the Character.isISOControl method), and are not space characters (according to the Character.isSpaceChar method) (Deviation from RFC 2396, which is limited to US-ASCII)

The set of all legal URI characters consists of the unreserved, reserved, escaped, and other characters.

特别是,“other”包括空格字符,这些字符被定义(由Character.isSpaceChar)为Unicode通用类别类型

  • SPACE_SEPARATOR
  • LINE_SEPARATOR
  • PARAGRAPH_SEPARATOR

根据您在问题中链接到的页面,表意空格字符确实是这些类型之一。

关于java - URI 中的非法字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28147818/

相关文章:

java - Kotlin Unresolved reference : println from gradle on the CLI

java - 如何先在flume中加载自定义库

apache - .htaccess 如何工作?

html - RESTful URL 中的输出格式是如何编码的?

javascript - 返回值的编码

java - Java 版 RabbitMQ : how to send multiple float values?

java - 运行示例代码时找不到 URL

ruby-on-rails - 如何通过 Ruby 中的代理获取带有用户代理和超时的 URL?

javascript - javascript转义的替代品?

java - 在 Talend 中 Access 编码 cp1250 的数据库