c++ - std::unordered_map 是如何实现的

标签 c++ c++11 hashmap unordered-map

c++ unordered_map collision handling , resize and rehash

这是我之前提出的一个问题,我看到我对 unordered_map 的实现方式有很多困惑。我相信很多其他人跟我一样有这种困惑。根据我在没有阅读标准的情况下所知道的信息:

Every unordered_map implementation stores a linked list to external nodes in the array of buckets... No, that is not at all the most efficient way to implement a hash map for most common uses. Unfortunately, a small "oversight" in the specification of unordered_map all but requires this behavior. The required behavior is that iterators to elements must stay valid when inserting or deleting other elements



我希望有人可以解释实现以及它如何符合 C++ 标准定义(在性能要求方面),如果它真的不是实现哈希映射数据结构的最有效方法,如何改进它?

最佳答案

该标准有效地要求 std::unordered_set std::unordered_map 的实现 - 以及它们的“多”兄弟 - 使用 open hashing aka separate chaining ,这意味着一个桶数组,每个桶都持有一个链表†的头部。这个要求很微妙:它是以下结果的结果:

  • 默认 max_load_factor() 为 1.0(这意味着只要 size() 超过 bucket_count() 的 1.0 倍,表就会调整大小,并且
  • 保证表不会被重新散列,除非增长超过该负载因子。

  • 如果没有链接,这将是不切实际的,因为随着 load_factor() ]( closed hashing aka open addressing ) 接近 1,与哈希表实现的另一个主要类别 - https://en.cppreference.com/w/cpp/container/unordered_map/load_factor 的冲突变得势不可挡。
    引用:

    23.2.5/15: The insert and emplace members shall not affect the validity of iterators if (N+n) < z * B, where N is the number of elements in the container prior to the insert operation, n is the number of elements inserted, B is the container’s bucket count, and z is the container’s maximum load factor.

    amongst the Effects of the constructor at 23.5.4.2/1: max_load_factor() returns 1.0.


    † 为了在不传递任何空桶的情况下允许最佳迭代,GCC 的实现将带有迭代器的桶填充到一个包含所有值的单向链表中:迭代器指向该桶元素之前的元素,因此 next 指针可以是如果删除存储桶的最后一个值,则重新连接。
    关于你引用的文字:

    No, that is not at all the most efficient way to implement a hash map for most common uses. Unfortunately, a small "oversight" in the specification of unordered_map all but requires this behavior. The required behavior is that iterators to elements must stay valid when inserting or deleting other elements


    没有“监督”......所做的事情是非常深思熟虑的,并且是在充分意识到的情况下完成的。确实,其他妥协可能已经被击中,但开放散列/链接方法对于一般用途来说是一个合理的妥协,它可以合理优雅地处理来自普通散列函数的冲突,对于小的或大的键/值类型不会太浪费,并处理任意多个 insert/erase 对,而不会像许多封闭散列实现那样逐渐降低性能。
    作为意识的证据,来自 Matthew Austern's proposal here :

    I'm not aware of any satisfactory implementation of open addressing in a generic framework. Open addressing presents a number of problems:

    • It's necessary to distinguish between a vacant position and an occupied one.

    • It's necessary either to restrict the hash table to types with a default constructor, and to construct every array element ahead of time, or else to maintain an array some of whose elements are objects and others of which are raw memory.

    • Open addressing makes collision management difficult: if you're inserting an element whose hash code maps to an already-occupied location, you need a policy that tells you where to try next. This is a solved problem, but the best known solutions are complicated.

    • Collision management is especially complicated when erasing elements is allowed. (See Knuth for a discussion.) A container class for the standard library ought to allow erasure.

    • Collision management schemes for open addressing tend to assume a fixed size array that can hold up to N elements. A container class for the standard library ought to be able to grow as necessary when new elements are inserted, up to the limit of available memory.

    Solving these problems could be an interesting research project, but, in the absence of implementation experience in the context of C++, it would be inappropriate to standardize an open-addressing container class.


    特别是对于数据小到可以直接存储在桶中的仅插入表、未使用桶的方便标记值和良好的散列函数,封闭散列方法可能大约快一个数量级并使用更少的内存,但是这不是通用目的。
    散列表设计选项及其含义的完整比较和阐述对于 S.O.因为它的范围太广而无法在这里正确解决。

    关于c++ - std::unordered_map 是如何实现的,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63115196/

    相关文章:

    c++ - 父类和子类之间的静态变量

    c++ - 如何保证long是4个字节

    c++ - 将标准元组解包为指针?

    java - 获取 HashMap 中的第一件事?

    c++ - 指针输出

    c++ - 下标运算符重载 : returning reference problems

    C++ 将非模板成员函数调用转发到模板函数

    c++ - 如何在运行时更改 .exe 的名称

    scala - 如何在 Scala 中单独存储 HashMap 值?

    java - 保存hashmap结构java