c++ - wchar_t 究竟能代表什么？

根据cppreference.com's doc on wchar_t :

wchar_t - type for wide character representation (see wide strings). Required to be large enough to represent any supported character code point (32 bits on systems that support Unicode. A notable exception is Windows, where wchar_t is 16 bits and holds UTF-16 code units) It has the same size, signedness, and alignment as one of the integer types, but is a distinct type.

标准在 [basic.fundamental]/5 中说:

Type wchar_t is a distinct type whose values can represent distinct codes for all members of the largest extended character set specified among the supported locales. Type wchar_t shall have the same size, signedness, and alignment requirements as one of the other integral types, called its underlying type. Types char16_t and char32_t denote distinct types with the same size, signedness, and alignment as uint_least16_t and uint_least32_t, respectively, in <cstdint>, called the underlying types.

所以，如果我想处理 unicode 字符，我应该使用 wchar_t ?

同样，我如何知道特定的 unicode 字符是否“支持” wchar_t ？

最佳答案

So, if I want to deal with unicode characters, should I use wchar_t?

首先，请注意，编码并不强制您使用任何特定类型来表示某个字符。您可以使用 char 来表示 Unicode 字符，就像 wchar_t 一样 - 您只需要记住最多 4 个 char 一起将形成一个有效的代码点取决于 UTF-8、UTF-16 或 UTF-32 编码，而 wchar_t 可以使用 1 个(Linux 上的 UTF-32 等)或最多 2 个一起工作(UTF-16 window )。

接下来，没有明确的 Unicode 编码。一些 Unicode 编码使用固定宽度来表示代码点(如 UTF-32)，其他(如 UTF-8 和 UTF-16)具有可变长度(例如字母 'a' 肯定只会用完 1 个字节，但分开从英文字母表来看，其他字符肯定会占用更多的字节来表示)。

因此，您必须确定要表示的字符类型，然后相应地选择编码。根据您要表示的字符类型，这将影响您的数据将占用的字节数。例如。使用 UTF-32 来表示主要是英文字符会导致很多 0 字节。 UTF-8 是许多基于拉丁语的语言的更好选择，而 UTF-16 通常是东亚语言的更好选择。

一旦您做出了决定，您应该尽量减少转化次数并与您的决定保持一致。

在下一步中，您可以决定适合表示数据的数据类型(或您可能需要的转换类型)。

如果你想在代码点的基础上进行文本操作/解释，char 如果你有例如日文汉字。但是，如果您只是想传达您的数据并且不再将其视为定量的字节序列，您可以使用 char。

UTF-8 everywhere 的链接已经作为评论发布，我建议你也看看那里。另一个不错的读物是What every programmer should know about encodings .

到目前为止，C++ 中只有基本的 Unicode 语言支持(如 char16_t 和 char32_t 数据类型，以及 u8/u/U 文字前缀)。所以选择一个库来管理编码(尤其是转换)当然是一个好建议。

关于c++ - wchar_t 究竟能代表什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50413471/

c++ - wchar_t 究竟能代表什么？

上一篇：c# - 在 C++ 之后学习 C#

下一篇：c++ - 在 C++11 中同时使用 virtual 和 override 关键字有什么微妙之处吗？