c++ - std::unordered_map 是如何实现的

标签 c++ c++11 hashmap unordered-map

c++ unordered_map collision handling , resize and rehash

这是我之前提出的一个问题,我发现我对 unordered_map 的实现方式有很多困惑。我相信很多其他人也和我一样困惑。根据我没有阅读标准就知道的信息:

Every unordered_map implementation stores a linked list to external nodes in the array of buckets... No, that is not at all the most efficient way to implement a hash map for most common uses. Unfortunately, a small "oversight" in the specification of unordered_map all but requires this behavior. The required behavior is that iterators to elements must stay valid when inserting or deleting other elements

我希望有人可以解释实现以及它如何符合 C++ 标准定义(在性能要求方面),如果它真的不是实现 HashMap 数据结构的最有效方法,如何改进它?

最佳答案

该标准有效地要求 std::unordered_set std::unordered_map 的实现——以及它们的“多”兄弟——使用 open hashing aka separate chaining ,这意味着一个桶数组,每个桶都持有链表的头部†。这个要求很微妙:它是以下原因的结果:

如果没有链接,那将是不切实际的,因为随着 load_factor() ]( closed hashing aka open addressing ) 接近 1,与哈希表实现的另一个主要类别 - https://en.cppreference.com/w/cpp/container/unordered_map/load_factor - 的冲突变得势不可挡。

引用资料:

23.2.5/15: The insert and emplace members shall not affect the validity of iterators if (N+n) < z * B, where N is the number of elements in the container prior to the insert operation, n is the number of elements inserted, B is the container’s bucket count, and z is the container’s maximum load factor.

amongst the Effects of the constructor at 23.5.4.2/1: max_load_factor() returns 1.0.

† 为了在不越过任何空桶的情况下实现最优迭代,GCC 的实现用迭代器将桶填充到一个包含所有值的单链表中:迭代器立即指向之前的元素桶的元素,因此如果删除桶的最后一个值,可以重新连接那里的 next 指针。

关于你引用的文字:

No, that is not at all the most efficient way to implement a hash map for most common uses. Unfortunately, a small "oversight" in the specification of unordered_map all but requires this behavior. The required behavior is that iterators to elements must stay valid when inserting or deleting other elements

没有“疏忽”……所做的事情是经过深思熟虑的,并且是在充分意识到的情况下完成的。确实可以达成其他妥协,但是开放散列/链接方法对于一般用途来说是一个合理的妥协,它可以相当优雅地处理来自平庸散列函数的冲突,对于小或大的键/值类型不会太浪费,并处理任意多的 insert/erase 对,而不会像许多封闭哈希实现那样逐渐降低性能。

作为意识的证据,来自 Matthew Austern's proposal here :

I'm not aware of any satisfactory implementation of open addressing in a generic framework. Open addressing presents a number of problems:

• It's necessary to distinguish between a vacant position and an occupied one.

• It's necessary either to restrict the hash table to types with a default constructor, and to construct every array element ahead of time, or else to maintain an array some of whose elements are objects and others of which are raw memory.

• Open addressing makes collision management difficult: if you're inserting an element whose hash code maps to an already-occupied location, you need a policy that tells you where to try next. This is a solved problem, but the best known solutions are complicated.

• Collision management is especially complicated when erasing elements is allowed. (See Knuth for a discussion.) A container class for the standard library ought to allow erasure.

• Collision management schemes for open addressing tend to assume a fixed size array that can hold up to N elements. A container class for the standard library ought to be able to grow as necessary when new elements are inserted, up to the limit of available memory.

Solving these problems could be an interesting research project, but, in the absence of implementation experience in the context of C++, it would be inappropriate to standardize an open-addressing container class.

特别是对于数据小到可以直接存储在存储桶中的仅插入表、未使用的存储桶的方便标记值以及良好的散列函数,封闭散列方法可能会快大约一个数量级,并且使用量大大减少内存,但这不是通用的。

哈希表设计选项及其含义的完整比较和详细说明不在 S.O. 的主题范围内。因为它太宽泛了,无法在这里正确解决。

关于c++ - std::unordered_map 是如何实现的,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31112852/

相关文章:

c++ - 为设备编译 Boost 1.54 XCode5 - fatal error : error in backend: symbol '___umodsi3' can not be undefined in a subtraction expression

c++ - 有人知道将您自己的语言添加到不受管理的 visual studio 2010 的任何资源吗?

c++ - 使用 decltype() 和 SFINAE 的错误

java - HashMap 到数组 : throws java. lang.ArrayStoreException

go - 增加 map 最快的方法是什么?

java - LinkedHashMap 中的 "next"和 "after"条目有什么区别?

c++ - 在理解 C++ 中的友元函数时出错

c++ - 在使用 NT DDK 构建的用户模式程序中包含 C++ header

C++重载静态常量字符串与字符数组

c++ - 具有可变数量参数的函数,这些参数被转换为字符串