c++ - C++ 函数中返回字符串中最常见字符的错误。多字节字符？

在找工作时，我被要求解决 HackerRank.com 上的一个问题，即编写一个接受字符串、计算其中的字符数并返回找到的最常见字符的函数。我写了我的解决方案，修复了拼写错误，它适用于我的测试用例和他们的测试用例，除了它未能通过“测试 7”。因为是面试协议(protocol)，HackerRank 没有告诉我失败的细节，只是失败了。

我花了太多时间试图找出原因。我已经三次检查了差一错误，为 8 位字符编写了代码，但尝试接受 16 位值而不更改结果。这是我的代码。我不能给出错误，只是有一个错误。

会不会是多字节字符？

如何创建具有 2 字节或 3 字节字符的测试用例？

我输入了一些显示转储代码，结果正是您所期望的。我的桌面上有 Mac XCode IDE，欢迎提出任何建议!

/*
 * Complete the function below.
 */
char func(string theString) {

    //  I wonder what I'm doing wrong. 256 doesn't work any better here.
const int CHARS_RECOGED = 65536; // ie 0...65535 - even this isn't good enough to fix test case 7.

unsigned long alphaHisto[CHARS_RECOGED];
for (int count = 0; count < CHARS_RECOGED; count++ ) {
    alphaHisto[ count ] = 0;
} // for int count...

cout << "size: " << theString.size() << endl;

for (int count  = 0; count < theString.size(); count++) {
//        unsigned char uChar = theString.at(count);  // .at() better protected than [] - and this works no differently...
    unsigned int uChar = std::char_traits<char>::to_int_type(theString.at(count));  // .at() better protected than []
    alphaHisto[ uChar ]++;
} // for count...


unsigned char mostCommon = -1;
unsigned long totalMostCommon = 0;

for (int count = 0; count < CHARS_RECOGED; count++ ) {

    if (alphaHisto[ count ] > totalMostCommon){
        mostCommon = count;
        totalMostCommon = alphaHisto[ count ];
    } // if alphahisto

} // for int count...

for (int count = 0; count < CHARS_RECOGED; count++ ) {
    if (alphaHisto[ count ] > 0){
       cout << (char)count << "  " << count << " " << alphaHisto[ count ] << endl;
    } // if alphaHisto...
} // for int count...

return (char) mostCommon;
}
// Please provide additional test cases:
// Input         Return
// thequickbrownfoxjumpsoverthelazydog  e
// the quick brown fox jumps over the lazy dog " "
// theQuickbrownFoxjumpsoverthelazydog  e
// the Quick BroWn Fox JuMpS OVER  THe lazy dog " "
// the_Quick_BroWn_Fox.JuMpS.OVER..THe.LAZY.DOG "."

最佳答案

如果测试需要认真对待，则应指定字符集。如果没有，假设一个字节是一个字符可能是安全的。正如旁注，要支持具有多字节字符的字符集，将 256 与 65536 交换是远远不够的，但即使没有多字节字符，您也可以将 256 与 1<<CHAR_BITS 交换因为一个“字节”可能有超过 8 位。

我看到了一个更重要的问题
unsigned int uChar = std::char_traits<char>::to_int_type(theString.at(count));
首先，它不必要的复杂:
unsigned int uChar = theString.at(count);
应该够了。

现在记住 std::string::at返回 char , 你的变量是 unsigned int .什么char意味着没有明确说明它是签名还是未签名取决于编译器(即，如果它是 signed char 或 unsigned char )。现在，0 到 127 之间的 char 值将被保存而不更改目标变量，但这只是值范围的一半:如果 char是无符号的，128-255 也可以正常工作，但有符号的字符，即。在 -128 和 -1 之间，不会映射到无符号 128-255 如果目标变量大于 char。使用 4 字节整数，您将获得一些巨大的值，这些值不是您的数组 => 问题的有效索引。解决方案:使用char , 不是 int .

unsigned char uChar = theString.at(count);

另一件事:
for (int count = 0; count < theString.size(); count++)
theString.size()返回 size_t与 int 相比，它可能具有不同的大小和/或签名，字符串长度很大，因此可能会出现问题。因此，字符计数可以是 size_t也不是 unsigned long ...

也是最不可能出现问题的来源，但是如果它在没有二元补码的机器上运行，
它可能会失败(尽管我没有仔细考虑)

关于c++ - C++ 函数中返回字符串中最常见字符的错误。多字节字符？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30813855/

c++ - C++ 函数中返回字符串中最常见字符的错误。多字节字符？

上一篇：c++ - 有和没有结构的指针声明？

下一篇：c++ - 无法编译用 Boost::Spirit 库编写的简单解析器