我在 c 中读取了一个带有一些 unicode 符号 (UTF8) 的字符串。我读取的那些存储在 3 个字节中,所以这些字符不能存储在一个字节中,所以我担心使用写入和读取函数通过 TCP 套接字发送时这些字符的字节顺序。我是否需要对它们做任何特别的事情来确保从流中读取的机器正确解释这些 unicode 字符?
最佳答案
将其作为字节数组发送。对于 UTF8 编码的字符串,字节顺序应该不是问题,因为它们是面向字节的。例如,当您有两个字节并且需要将它们解释为单个值时,字节顺序很重要。如果您必须单独解释这两个字节,则字节顺序不是问题。
更多信息:http://unicode.org/faq/utf_bom.html
Q: Is the UTF-8 encoding scheme the same irrespective of whether the underlying processor is little endian or big endian?
A: Yes. Since UTF-8 is interpreted as a sequence of bytes, there is no endian problem as there is for encoding forms that use 16-bit or 32-bit code units. Where a BOM is used with UTF-8, it is only used as an encoding signature to distinguish UTF-8 from other encodings — it has nothing to do with byte order. [AF]
关于c - 通过 TCP 套接字发送 unicode,字节顺序如何,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27014627/