c++ - 瑞典字符无法正确比较

由于某些原因，If/else 语句在 C++ 中不能正常工作

问题是，当一个变量等于右(höger)时，它不会输出 If 语句，而是继续执行 else 语句。如果我将字母“ö”替换为“o”，使其变为“hoger”，则 if 语句将起作用。因此，每当我写“höger”这个词时，它都不会转到 if 语句，而是转到 else 语句。但是，如果我使 variabel 等于“hoger”，然后我写“hoger”，它就会起作用。如果 If 语句识别它，我怎样才能使写 'höger' 成为可能？好像瑞典字母不起作用。

我的代码是这样的:

#include <iostream>
#include <string>

using namespace std;


int main() {
    setlocale(LC_ALL,"");


    string test; // Define variabel
    cout << " Höger elle vänster"<<endl; // Right or left
    cin >> test;


    if(test == "höger") { // If right, then output this.

        cout <<"Du valde höger"<<endl;

    } 

    else if(test == "vänster") { // If left, then output this

        cout <<"Du valde vänster"<<endl;

    } else {

        // Do this

    }


}

最佳答案

问题几乎肯定与编码有关。

C/C++ 语言规范不会自动处理 7 位 ASCII 以外的任何内容。 o-umlaut 字符超出该范围，具体行为取决于源代码文件的编码。

最有可能的可能性是 ISO 8859-1、Windows ANSI-1252、UTF-8 或 Windows OEM 850。前两个对该字符的编码相同，但在其他每个中都是不同的。

有了更多关于您正在使用的编码和工具集的信息，就有可能提供更具体的诊断和建议。

[顺便说一下，C/C++ 中的 if/else 语句工作得很好，谢谢。]

如果我们暂时假设这是 Windows 和 Visual C++，那么这就是您要处理的内容。

在 Visual Studio 中编写的源代码:代码页 1252。o-umlaut 字符的代码点是 0xf6。
从控制台读取的键盘输入:代码页 850。o-umlaut 字符的代码点是 0x94。

显然不是很好的匹配。然而，Visual Studio 也可以非常愉快地编辑许多编码的源代码文件，包括 UTF-8(带字节标记)、UTF-16(宽字符)和代码页 850。因此:

在 Visual Studio 中编写的源代码:代码页 850。o-umlaut 字符的代码点是 0x94。现在可以了。

您还可以使用 CHCP 命令更改控制台的代码页。

将控制台更改为 CHCP 1252，它可以正常工作。

标准要求编译器在读取源代码时的行为与执行字符集保持一致。参见 n3797 S2.2.5:

Each source character set member in a character literal or a string literal, as well as each escape sequence and universal-character-name in a character literal or a non-raw string literal, is converted to the corresponding member of the execution character set

S2.3/3:

The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character (respectively, null wide character), whose representation has all zero bits. For each basic execution character set, the values of the members shall be non-negative and distinct from one another. In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous. The execution character set and the execution wide-character set are implementation-defined supersets of the basic execution character set and the basic execution wide-character set, respectively. The values of the members of the execution character sets and the sets of additional members are locale-specific.

n3797 S2.14.3/1:

A character literal that does not begin with u, U, or L is an ordinary character literal, also referred to as a narrow-character literal. An ordinary character literal that contains a single c-char representable in the execution character set has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set.

n3297 S2.14.5/6:

a string literal that does not begin with an encoding-prefix is an ordinary string literal, and is initialized with the given characters.

执行字符集是实现定义的。 Microsoft 关于 C 编译器的实现定义行为的声明位于此处:http://msdn.microsoft.com/en-us/library/hx3yt8af.aspx . [我找不到单独的 C++，所以我认为这对两者都适用。]

源字符集是源文件中可以出现的合法字符集。对于 Microsoft C，源字符集是标准的 ASCII 字符集。

关于语言律师的事情很抱歉，但这说明 MSVC 编译器独立于语言环境/编码并实现 8 位 ASCII，代码页未指定。显然，标准库函数可能需要知道用于各种目的的编码，但那完全是另外一回事了。

最后一点，Microsoft C 编译器的历史可以追溯到大约 30 年前，比 Windows 还早。始终可以在代码页 850 中编写源代码并使其在控制台上正确运行，但要小心处理扩展(8 位)字符。许多人仍然这样做。这里的问题是用 Windows-Ansi 或 Unicode 编写的源代码和来自 OEM (cp850) 控制台的键盘输入。更改其中任何一个以使其正常工作。

关于c++ - 瑞典字符无法正确比较，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22893226/

c++ - 瑞典字符无法正确比较

上一篇：c++ - 如何用 Armadillo 加载matlab矩阵？

下一篇：c++ - 计算满足xy<c的正整数对的个数