根据https://blog.golang.org/strings在我的测试中,看起来当我们 range
一个字符串时,我们得到的字符是 rune
类型,但是如果我们通过 str[index]
获取它code> ,它们将是 byte
类型,为什么呢?
最佳答案
对于第一级,为什么是因为那是 how the language is defined 。 String type 告诉我们:
A string value is a (possibly empty) sequence of bytes. The number of bytes is called the length of the string and is never negative. Strings are immutable: once created, it is impossible to change the contents of a string.
和:
A string's bytes can be accessed by integer indices 0 through len(s)-1.
同时, range
是一个可以插入到 for
statement 中的子句,规范说:
The expression on the right in the "range" clause is called the range expression, which may be ... [a] string ...
和:
- For a string value, the "range" clause iterates over the Unicode code points in the string starting at byte index 0. On successive iterations, the index value will be the index of the first byte of successive UTF-8-encoded code points in the string, and the second value, of type
rune
, will be the value of the corresponding code point. If the iteration encounters an invalid UTF-8 sequence, the second value will be0xFFFD
, the Unicode replacement character, and the next iteration will advance a single byte in the string.
如果你想知道为什么语言是这样定义的,你真的必须问定义者本身。但是,请注意,如果 for
仅在字节范围内,则您需要构建自己的更高级的循环来在 rune 范围内。鉴于 for ... range
确实可以处理 rune ,如果您想要处理字符串 s
中的字节,您可以编写:
for i := 0; i < len(s); i++ {
...
}
并轻松访问循环内的s[i]
。您还可以写:
for i, b := range []byte(s) {
}
并在循环内访问索引 i
和字节 b
。 (从字符串到 []byte
的转换,或反之亦然,可能需要一个副本,因为 []byte
可以修改。但在这种情况下, range
不会修改它,编译器可以优化掉该副本。请参阅 icza's comment below 或 this answer 到 golang: []byte(string) vs []byte(*string) 。)所以你并没有失去任何能力,只是可能失去了 smidgen 的简洁性。
关于go - rune 与字符串范围内的字节,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58635507/