c# - 从 char 到 byte 的转换中使用的编码

看看下面的 C# 代码:

byte[] StringToBytesToBeHashed(string to_be_hashed) {
    byte[] to_be_hashed_byte_array = new byte[to_be_hashed.Length];
    int i = 0;
    foreach (char cur_char in to_be_hashed)
    {
        to_be_hashed_byte_array[i++] = (byte)cur_char;
    }
    return to_be_hashed_byte_array;
}

(以上功能摘自these lines of code from the WMSAuth github repo)

我的问题是:从字节到字符的转换在编码方面做了什么？

我想它在编码方面确实没有任何作用，但这是否意味着使用了 Encoding.Default，因此要返回的字节将取决于框架如何在特定的 Operative 中对底层字符串进行编码系统？

此外，char 实际上是否大于一个字节(我猜是 2 个字节)并且实际上会省略第一个字节？

我正在考虑将所有这些替换为:

Encoding.UTF8.GetBytes(stringToBeHashed)

你怎么看？

最佳答案

.NET Framework 使用 Unicode 来表示其所有字符和字符串。 char 的整数值(您可以通过转换为 int 获得)等同于它的 UTF-16 代码单元。对于 Basic Multilingual Plane 中的字符(构成您将遇到的大多数字符)，此值是 Unicode 代码点。

The .NET Framework uses the Char structure to represent a Unicode character. The Unicode Standard identifies each Unicode character with a unique 21-bit scalar number called a code point, and defines the UTF-16 encoding form that specifies how a code point is encoded into a sequence of one or more 16-bit values. Each 16-bit value ranges from hexadecimal 0x0000 through 0xFFFF and is stored in a Char structure. The value of a Char object is its 16-bit numeric (ordinal) value. — Char Structure

将 char 转换为 byte 将导致任何值大于 255 的字符丢失数据。尝试运行以下简单示例以了解原因:

char c1 = 'D';        // code point 68
byte b1 = (byte)c1;   // b1 is 68

char c2 = 'ń';        // code point 324
byte b2 = (byte)c2;   // b2 is 68 too!
                      // 324 % 256 == 68

是的，您绝对应该改用 Encoding.UTF8.GetBytes。

关于c# - 从 char 到 byte 的转换中使用的编码，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10708548/

c# - 从 char 到 byte 的转换中使用的编码

上一篇：c# - WCF 错误 : extension could not be loaded

下一篇：c# - 如何找到哪个标签页(TabControl)在