c# - 为什么我们在哈希算法中有一个额外的桶数组？

我一直在逐步完成 .net 框架的 Hashset 的实现，我对它的实现有点困惑。这是 Contains 方法:

    private int[] m_buckets;
    private Slot[] m_slots;

public bool Contains(T item) {
        if (m_buckets != null) {
            int hashCode = InternalGetHashCode(item);
            // see note at "HashSet" level describing why "- 1" appears in for loop
            for (int i = m_buckets[hashCode % m_buckets.Length] - 1; i >= 0; i = m_slots[i].next) {
                if (m_slots[i].hashCode == hashCode && m_comparer.Equals(m_slots[i].value, item)) {
                    return true;
                }
            }
        }
        // either m_buckets is null or wasn't found
        return false;
    }


internal struct Slot {
        internal int hashCode;      // Lower 31 bits of hash code, -1 if unused
        internal T value;
        internal int next;          // Index of next entry, -1 if last
    }

我理解了第一部分，获取item的hash code。接下来开始循环并从哈希码生成合适的索引。但随后它使用此索引从整数数组中检索一个值，然后使用该值检查值的哈希码和值本身是否相同。为什么是这样？另外，我无法理解 .next 属性，为什么有必要存储此信息？

最佳答案

多个对象可能具有相同的 hashCode % m_buckets.Length 值，即使它们具有不同的 hashCode 值。不同的对象也可能具有相同的 hashCode 值(即使这不太可能)。

解决方法是将所有具有相同 hashCode % m_buckets.Length 值的对象存储在一个数组中，然后在该数组中搜索适当的元素。它比较 hashCode 值和对象本身的原因是 hashCode 的比较比对象本身的比较快。通过首先对哈希码进行廉价检查，我们可以避免对对象进行昂贵的检查。

存储下一个值，以便可以枚举散列为单个值的元素。

关于c# - 为什么我们在哈希算法中有一个额外的桶数组？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28439590/

c# - 为什么我们在哈希算法中有一个额外的桶数组？

上一篇：c++ - 如何在 C++ 中将一点移向另一点？

下一篇：c++ - 尝试为 MFC 的 CMap 编写 for_each 算法