c++ - 有没有影响宽字符编码的语言环境?

标签 c++ c character-encoding language-lawyer locale

我没有在 C++ 标准中找到它说 codecvt s 与 mbtowc 兼容s。而C标准规定mbtowc作为

If the function determines that the next multibyte character is complete and valid, it determines the value of the corresponding wide character and then, if pwc is not a null pointer, stores that value in the object pointed to by pwc.



但是“相应宽字符的值”是什么意思呢?受地区影响吗?
宽字符的定义说

wide character
value representable by an object of type wchar_t, capable of representing any character in the current locale.



但后来它将“当前语言环境”“重新定义”为实现定义的语言环境。

The value of a wide character constant containing a single multibyte character that maps to a single member of the extended execution character set is the wide character corresponding to that multibyte character, as defined by the mbtowc, mbrtoc16, or mbrtoc32 function as appropriate for its type, with an implementation-defined current locale.



this answer说,wide-exec-charset与C库函数无关,而是一些C++ API如filesystem::path仍然利用它。

现在我真的很困惑,多字节/宽字符转换函数使用的编码是什么?它是依赖于语言环境还是定义了实现?甚至在某种程度上与 codecvt 相同s' UCS-2 还是 UTF-32?

最佳答案

注意:我几乎不了解 C++,因此我的回答将与 C 语言有关。它还将假设一个 glibc 系统(这是一个使用 GNU C 库的系统)。此外,您问题的主体超出了我的知识范围,因此我将回答您问题的标题和(大部分)最后一段。

According to the GNU implementation of the standard C library :

We already said above that the currently selected locale for the LC_CTYPE category decides the conversion that is performed by the functions we are about to describe. Each locale uses its own character set (given as an argument to localedef) and this is the one assumed as the external multibyte encoding. The wide character set is always UCS-4 in the GNU C Library.


回答您的问题:

Is there any locale that affects wide character encoding?


不,因为语言环境不指定宽字符编码,它们只指定多字节编码。

what is the encoding used by multibyte/wide character conversion functions?


转换函数使用语言环境定义的编码作为多字节编码,使用 UCS-4 作为宽字符编码。

Is it locale dependent or implementation defined?


多字节编码取决于语言环境。宽字符编码是实现定义的。
至于-fwide-exec-charset编译器选项,它仅确定在生成的可执行文件中将编码多宽的字 rune 字。 As this linked answer says :在交叉编译具有 C 库实现的系统时很有用,该 C 库实现是使用与您机器的 glibc 实现不同的宽(内部)字符集构建的。
This is a good introduction to extended characters .它解释了内部(宽)和外部(多字节)编码背后的基本原理。

关于c++ - 有没有影响宽字符编码的语言环境?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59521032/

相关文章:

c++ - 使用 unique_ptr 的 value_type 构造 unordered_map

c - 重新分配参数

c - 从 org babel 运行 C 完成的程序

c# - 字符串比较、.NET 和不间断空格

php - 无法发送西里尔语表单数据

c++ - 在 VS2010 中禁用所有类型的优化

c++ - 根据没有类或结构的另一个 vector 的元素对一个 vector 进行排序

c - ld 搜索格式错误的目录路径

mysql - 错误 1115 (42000) : Unknown character set: 'utf8mb4'

c++ - CMake:编译 OpenCV 时找不到 CUDA 库