winapi - MB_ERR_INVALID_CHARS 标志是否应该用于 MultiByteToWideChar 的 UTF-8 转换?

标签 winapi unicode utf-8

使用 Win32 API MultiByteToWideChar() 从 Unicode UTF-8 转换为 Unicode UTF-16 时,是否应该使用 MB_ERR_INVALID_CHARS 标志

换句话说,如果出现错误的无效 UTF-8 输入,这是最好的行为,为什么?

  • 使 MultiByteToWideChar() 调用 失败 (使用 MB_ERR_INVALID_CHARS 标志)
  • 只需用 REPLACEMENT CHARACTER U+FFFD
  • 替换无效的输入 UTF-8 字符

    最佳答案

    安全性 的角度来看,在从 UTF-8 转换为 UTF-16 时使用 MB_ERR_INVALID_CHARS 似乎是最好的做法,特别是与 ill-formed UTF-8 subsequences 问题相关(如“Unicode 技术报告 #36:UNICODE安全考虑”):

    3.1.1 Ill-Formed Subsequences

    Suppose that a UTF-8 converter is iterating through input UTF-8 bytes, converting to an output character encoding. If the converter encounters an ill-formed UTF-8 sequence it can treat it as an error in a number of different ways, including substituting a character like U+FFFD, SUB, "?", or SPACE. However, it must not consume any valid successor bytes. For example, suppose we have the following sequence:

    X = <... 41 C2 3E 42 ... >

    This sequence overall is ill-formed, because it contains an ill-formed substring, namely the <C2> [...]

    The UTF-8 converter can stop at the C2 byte, or substitute a character or sequence like U+FFFD and continue. However, it must not consume the 3E byte if it continues. [...]

    Consuming a subsequent byte (such as 3E above) is not only non-conformant; it can lead to security breaches. [...]



    实际上,使用 MB_ERR_INVALID_CHARS 标志会使 MultiByteToWideChar() API 在存在无效 UTF-8 序列的情况下失败 ,因此不存在后续代码(例如调用代码)可能会消耗无效子字符串之后的字节的风险。

    关于winapi - MB_ERR_INVALID_CHARS 标志是否应该用于 MultiByteToWideChar 的 UTF-8 转换?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22824537/

    相关文章:

    c++ - 为什么这个带有 WINAPI 的 c++ 语法是什么意思?

    perl - 转义多字节字符

    .htaccess - htaccess用于.html,.css,.js的UTF-8编码-最好的方法是什么?

    oracle - SQL 错误 : ORA-12712: new character set must be a superset of old character set

    ios - NSJSONSerialization 没有正确读取 UTF 8

    c++ - 为什么显示 ListView 图标时背景变黑?

    c# - 在另一个窗口上绘图(不闪烁?)

    windows - Hook 特定进程的注册表访问的简单方法

    c++ - Unicode 字符串上的 std::string 和 std::map 操作

    android - 如何在所有设备上渲染 ♦ (Unicode Black Diamond Suit)?