hash - 内部哈希和外部哈希之间的区别

标签 hash hashmap hashtable

摘自《数据库系统基础》一书:

内部哈希

For internal files, hashing is typically implemented as a hash table through the use of an array of records. Suppose that the array index range is from 0 to M – 1, as shown in Figure 17.8(a); then we have M slots whose addresses correspond to the array indexes. We choose a hash function that transforms the hash field value into an integer between 0 and M − 1. One common hash function is the h(K) = K mod M function, which returns the remainder of an integer hash field value K after divi- sion by M; this value is then used for the record address. [...]

A collision occurs when the hash field value of a record that is being inserted hashes to an address that already contains a different record.


外部哈希

Hashing for disk files is called external hashing. To suit the characteristics of disk storage, the target address space is made of buckets, each of which holds multiple records. A bucket is either one disk block or a cluster of contiguous disk blocks. The hashing function maps a key into a relative bucket number, rather than assigning an absolute block address to the bucket. A table maintained in the file header converts the bucket number into the corresponding disk block address, as illustrated in Figure 17.9. The collision problem is less severe with buckets, because as many records as will fit in a bucket can hash to the same bucket without causing problems.

我有以下问题:
1) 一条记录总是记录在一个区 block 内,那么内部哈希是否返回区 block 地址和记录在区 block 内的位置?
2)为什么使用外部哈希冲突问题不那么严重?我的意思是,假设每个 block 可以存储 10 条记录;我推测我将存储的文件包含 100 条记录,然后,使用外部哈希,我可能分配 11-12 个桶(我假设一个桶=1 block ),因此哈希函数将向桶返回 10-12 个地址。
如果我使用内部散列,因为散列函数返回一个直接地址,我会使用一个返回给我大约 100-120 个地址的函数。
那么有什么区别呢?这样,我认为使用这两种方法我有相同的碰撞概率。

最佳答案

内部散列是一个包含散列键地址的数组。因此每个数组索引只能包含哈希键的一个地址,因此如果另一个哈希键分配给数组的相同索引,这将导致冲突。另一方面,外部哈希主要是M个桶。每个桶可以占用多个哈希键。因此当桶装满时就会发生碰撞。

关于hash - 内部哈希和外部哈希之间的区别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23547519/

相关文章:

java - 关于 HashMap.get(key) 函数

java - HashTable的get方法打印所有元素

data-structures - Haskell 可变映射/树

c# - 如何为地址结构生成唯一标识符?

c++ - 为大于内存的数据生成哈希(不会被捕)

c - 中断c代码,md5文件

java - 如果 HashMap 中的两个键相同,如何打印错误消息? (JAVA)

powershell - 在powershell中迭代多维哈希表

powershell - 将两个哈希表连接成一个

algorithm - 基于集合的哈希(摘要)算法?