data-structures - 哈希表 - 使用二叉搜索树实现

来自破解编码面试，第 71 页:

Alternatively, we can implement hash table with a BST. We can then guarantee an O(log n) lookup time, since we can keep the tree balanced. Additionally we may use less space, since a large array no longer needs to be allocated in the very beginning.

我知道链表、哈希表和 BST 的基础知识，但我无法理解这些行。它实际上是什么意思？这个最终的数据结构会是一个 Trie 吗？

最佳答案

该部分的全文指出，最后一段是您询问的内容:

A hash table is a data structure that maps keys to values for highly efficient lookup. In a very simple implementation of a hash table, the hash table has an underlying array and a hash function. When you want to insert an object and its key, the hash function maps the key to an integer, which indicates the index in the array. The object is then stored at that index.

Typically, though, this won't quite work right. In the above implementation, the hash value of all possible keys must be unique, or we might accidentally overwrite data. The array would have to be extremely large—the size of all possible keys—to prevent such "collisions."

Instead of making an extremely large array and storing objects at index hash (key), we can make the array much smaller and store objects in a linked list at index hash (key) % array_length.To get the object with a particular key, we must search the linked list for this key.

Alternatively, we can implement the hash table with a binary search tree. We can then guarantee an 0(log n) lookup time, since we can keep the tree balanced. Additionally, we may use less space, since a large array no longer needs to be allocated in the very beginning.

所以他们在谈论使用 BST(二叉搜索树)来处理冲突。使用 BST 作为唯一的数据结构实际上没有意义，因为适当调整的散列的全部意义在于查找的顺序是 O(1) ，比O(log n)好多了来自 BST。最重要的是，使用 BST 来完全实现一个哈希表意味着它实际上不是一个哈希表 :-)

但是，请考虑到，当哈希表中发生冲突时，处理它们的常用方法是让每个桶包含其项目的链表。在退化的情况下(所有项目散列到同一个桶)，你最终只有一个链表和 O(1)变成 O(n) .

因此，您有一个 BST，而不是每个存储桶中的链表。那你就没有O(n)在单个存储桶具有多个项目(前面提到的冲突)的情况下的搜索复杂度。

您使用哈希函数在 O(1) 中查找桶然后在 O(log n) 中搜索 BST如果有碰撞。在最好的情况下(每个桶一个项目)，它仍然是 O(1) .最坏的情况变成 O(log n)而不是 O(n) .

最初让我担心该理论的唯一一件事是他们还讨论了不再需要大量分配的事实。如果是共享哈希/BST 组合，您仍然需要分配整个哈希表，这样看起来不协调。

但是，从上下文(“...因为不再需要分配大数组...”)，它们似乎意味着它们可以使对偶数据结构的哈希表部分更小因为碰撞处理更有效。换句话说，与包含冲突链接列表的 1000 元素哈希表不同，您可以使用 100 元素哈希表，因为如果使用 BST，冲突不会对搜索时间造成太大损害。

关于data-structures - 哈希表 - 使用二叉搜索树实现，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27739241/

data-structures - 哈希表 - 使用二叉搜索树实现

上一篇：asp.net-mvc - ConfigurationManager 读取错误的文件 - Web.config 而不是 App.config

下一篇：r - 解决错误消息 "step halving factor reduced below minimum in NLS step": adjusting nlsTols not working