c++ - 恒定大小字符串的高效 HashMap

我需要将仅包含字母数字值(A-Z、0-9，无小写字母)的固定大小的字符串映射到其他字符串。 unordered_map 变得非常大(数千万个键)，而映射值来自一组几千个字符串。在进行性能分析时，我发现大部分时间都花在了将新值插入 map (operator[]) 上，而且清除 map 也需要很长时间。

std::unordered_map<std::string, std::string> hashMap;
while (...){
    ...
    hashMap[key] = value;  // ~50% of program time is spent here
    ...
}

hashMap.clear();  // Takes a very long time, at this point hashMap.size() > 20,000,000

我的想法是字符串分配/取消分配非常慢，散列和插入映射也是如此。有什么优化建议吗？请记住， key 大小是恒定的，其内容限制为一组 36 个字符，并且映射值来自一个有限的集合。除了字符串和 unordered_map，我愿意使用不同的容器/数据类型。

更新

根据 Baum Mit Augen 的建议，我将我的 key 类型更改为 unsigned long long 并制作了一个将基数 36 转换为十进制的函数:

unsigned long long ConvertBase36(const char* num)
{
    unsigned long long retVal = 0;

    for (int i = 0; i < 12; i++)
    {
        unsigned int digit = 0;
        char currChar = num[i];
        if (currChar <= '9')
        {
            digit = currChar - '0';
        }
        else
        {
            digit = currChar - 'A' + 10;
        }

        retVal *= 36;
        retVal += digit;
    }

    return retVal;
}

这使我的整个程序运行时间提高了大约 10%。然后我再次尝试使用 unordered_map 保留函数来查看它是否有任何不同，但它没有。尝试使用 map 而不是 unordered_map 的效果差了大约 10%，因此我恢复了该更改。最后用 unsigned int 替换字符串值使事情变得更快一些。

最佳答案

两个不相关的建议，但都与 std::unordered_map::reserve 有关.

首先，由于您的无序映射包含 10 毫秒的元素，因此在您插入时可能会进行许多重新分配/重新散列。一开始，您可能希望保留 10 毫秒的条目。

自

the mapped values are from a set of a few thousand strings

您应该能够将值本身存储在辅助 unordered_set 中，您首先 reserved 到足够大的东西以确保没有迭代器在 insert 时失效s - 请参阅 invalidation guarantees for unordered associative containers .

您的(主要)unordered_map 然后可以将 string 映射到 std::unordered_set::const_iterator。

关于c++ - 恒定大小字符串的高效 HashMap ，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39076241/

c++ - 恒定大小字符串的高效 HashMap

更新

上一篇：c++ - 无法在 TDM-GCC 的 typeid 中使用 declval

下一篇：c++ - 给定一个整数 N，按字典顺序打印从 1 到 N 的数字