c++ - std::tr1::unordered_map::operator[] 的时间效率

我正在 Visual Studio 2008 SP1 中优化一段代码。知道 unorder_map 在恒定时间插入/删除/查找方面非常棒，因此我通过使用 unordered_map 作为主要数据结构来优化代码。请看一下下面的代码。

....
    typedef std::tr1::unordered_map <__int64, int> umap_id;
    const int text1_length = text1.GetLength();
    const int text2_length = text2.GetLength();
    const int max_d = text1_length + text2_length - 1;
    const bool doubleEnd = (64 < max_d);
    vector<set<pair<int, int> > > v_map1;
    vector<set<pair<int, int> > > v_map2;
    vector<int> v1(2 *max_d, 0);
    vector<int> v2(2 *max_d, 0);

    int x, y;
    __int64 footstep;
    umap_id footsteps(max_d * 2);
    bool done = false;
    const bool forward = ((text1_length + text2_length) % 2 == 1);

    for (int d = 0; d < max_d; ++d)
    {
        // Walk forward path one step
        v_map1.push_back(set<pair<int, int> >());
        for (int k = -d; k <= d; k += 2)
        {
            if (k == -d || (k != d && v1[k - 1 + max_d] < v1[k + 1 + max_d]))
                x = v1[k + 1 + max_d];
            else
                x = v1[k - 1 + max_d] + 1;
            y = x - k;

            if (doubleEnd)
            {
                footstep = (__int64) ((__int64)x << 32 | y);
                if (!forward)
                    footsteps[footstep] = d;
                else if (footsteps.find(footstep) != footsteps.end())
                    done = true;
            }
            ....
        }
    }
....

但事实证明它仍然很慢。鉴于我的输入相对较小 (max_d=946)，它运行了 20 秒以上。

我对 release 版本进行了探查器分析，探查器显示该行:footsteps[footstep] = d; 是主要罪魁祸首，它运行了 447931 次并花了大约20秒。

请注意，同一循环体中还有另一行代码: else if (footsteps.find(footstep) != footsteps.end()) 执行了相同的次数(即447931 次)，但花费的秒数要少得多。

unordered_map 的 operator::[] 对我来说似乎是一个黑匣子。我不明白为什么要花这么长时间。它是一个 32 位应用程序。如有任何帮助，我们将不胜感激。

最佳答案

在没有 SP1 的 VS 2008 中(但带有为您提供 TR1 库的功能包)，默认哈希函数为 tr1::unordered_map<>只考虑 key 值的低 32 位。至少这是我读到的template<class _Kty> class hash::operator()实现于<functional> header 。

footstep作为关键的变量使用 y 的计算结果因为它的低 32 位 - y 是否有足够的变化，它可以单独产生一个好的哈希值(我无法判断正在计算 y 的代码在做什么)？如果没有，您可能会将比您想要的更多的项目放入特定的哈希桶中，并产生过多的冲突。

如果是这种情况，您可能需要考虑提供自己的哈希函数。

顺便说一下，VS 2010 似乎专门针对 hash::operator()当与 64 位整数一起使用时，它会散列所有 64 位 - 如果您使用 VS 2010，我的答案中的推测不应适用。

更新:

经过一些测试，我确信这就是问题所在(VS 2008 SP1 中也存在该问题)。您可以通过将编译器升级到 VS 2010 来解决此问题，VS 2010 对于 64 位类型具有更好的哈希函数，或者使用您自己的哈希函数自行处理此问题。下面是我在VS2008中快速测试的一个，似乎可以工作:

class hash_int64
    : public unary_function<__int64, size_t>
{
public:
    typedef __int64 key_t;
    typedef unsigned int half_t;

    size_t operator()(const key_t& key) const
    {   
        // split the 64-bit key into 32-bit halfs, hash each
        // then combine them
        half_t x = (half_t) (key & 0xffffffffUL);
        half_t y = (half_t) (((unsigned __int64) key) >> 32);

        return (hash<half_t>()(x) ^ hash<half_t>()(y));
    }
};

然后更改您的typedef至:

typedef std::tr1::unordered_map <__int64, int, hash_int64> umap_id;

关于c++ - std::tr1::unordered_map::operator[] 的时间效率，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5983949/

c++ - std::tr1::unordered_map::operator[] 的时间效率

上一篇：c++ - 是否可以通过非托管 C++/MFC 代码向 SOAP Web 服务发送数据/从 SOAP Web 服务接收数据？

下一篇：c++ - 如何在编译时判断函数是否静态