c++ - 为什么 CHAR_BIT 通常是 8?

标签 c++

根据 N4140(C++11 工作草案):

The fundamental storage unit in the C ++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation-defined. (§6.6.1-1; p.48)

我认为只需要 8 位来包含“Unicode UTF-8 编码形式的八位代码单元”的所有成员。是否还需要更多位来包含“基本执行字符集”的所有成员?为什么CHAR_BIT在很多实现中可以是8?

最佳答案

基本执行字符定义如下(强调我的):

[lex.charset]/3

The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character (respectively, null wide character), whose value is 0. For each basic execution character set, the values of the members shall be non-negative and distinct from one another. In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous. The execution character set and the execution wide-character set are implementation-defined supersets of the basic execution character set and the basic execution wide-character set, respectively. The values of the members of the execution character sets and the sets of additional members are locale-specific.

基本源字符集是这样的:

[lex.charset]/1

The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:

a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " '

请注意标准定义的基本执行字符集与实现定义的执行字符集之间的区别。前者仅包含大约 100 个字符,而那些(无论是哪个)的编码可以很好地适应 8 位。

在阅读问题中的段落时,还必须谨慎行事。一个字节需要足够大以容纳基本执行字符集中的字符编码 utf-8 字符。前一种编码可能(通常)是后者的子集,但即使不一定是,8 位也足够了。

关于c++ - 为什么 CHAR_BIT 通常是 8?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49766777/

相关文章:

c++ - C++ 中的公共(public)字段或类似 C# 的属性?

c++ - 如何使用 .sdf 文件修复 RDkit 中的 `OSError: file error: bad input file`?

c++ - 为 C++ 初学者从 OpenCV 流式传输的优雅方式?

C++ 命名空间....匿名命名空间是合法的吗?

c++ - 可以使用 Clang AST 打印带有名称的函数指针的 QualType 吗?

c++ - 使用 decltype 显式调用析构函数

c++ - 始终输出到屏幕并允许重定向

Linux 中 GetLocalTime 的 C++ 等效项(以毫秒为单位!)

c++ - 使用 Qt + Opencv 时未定义的 Opencv 引用

c++ - 具有复制分配的不同对象容器之间的自动化