c - 如何设计一个不使用这么多内存的哈希表？

为了学习 C，我正在设计一个简单的对象系统，也许是为了类似 Forth 的语言。我设计的数据结构之一是哈希表，或 hash_t。

typedef struct {
  array_t* keys;     // intelligent array object
  assoc_t* vals;     // array<pair>

  ssize_t* idxs;     // raw C array
  size_t   idxs_len; // and its length

} hash_t;

我已经根据我对 Python 3.6's dictionaries 的描述的理解实现了它:

a hashtable consists of:
    non-sparse array_t of keys
    non-sparse associative array of pairs of values and their key's hashes
    sparse raw array of which values map to which actual entries.

  { 1: 'a', 2: 'b', 3: 'c' }

  is represented in this structure as:

  hash->keys = array{ 1, 2, 3 }
  hash->vals = assoc{
    pair{ 'a' 5 }
    pair{ 'b' 7 }
    pair{ 'c' 9 }
  }
  hash->idxs = array{ -1 -1 -1 -1 -1  0 -1  1 -1  2  -1 }
                                      ^     ^     ^ 
                                      5     7     9

  where 5, 7, and 9 are the hashes of 1, 2, and 3.

-1 取代了 Python 帖子中的 None 以指示不存在的值。

当我的 key 1(字符串化)散列为 0x340ca71c 或 873,244,444 时，问题就出现了。因此，键索引数组 (hash->idxs) 需要是 sizeof (ssize_t) * (873,244,444 + 1)，或者 8 * 873,244,444 = 6,985,955,552 字节，或约 7 GB，这比我的笔记本电脑的 RAM 还多，也比我希望一个哈希表占用的 RAM 多。

当然，我在 Python 中创建的每个字典都不会占用数百万甚至数十万字节的 RAM，但它似乎是在 C 中以这种方式实现的。我错过了什么？

最佳答案

根据哈希将具有的项数决定您希望哈希具有多少个桶，然后将哈希范围缩小到该范围。

因此，如果您希望一个散列能够容纳大约 100,000 个项目，每个桶大约有 10 个项目，那么您需要大约 10,000 个桶。因此，在计算哈希后，将哈希对 10,000 取模来决定将项目放入哪个桶中。

通常，桶计数使用质数往往效果最好，所以可能是 9,973。

关于c - 如何设计一个不使用这么多内存的哈希表？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40267815/

c - 如何设计一个不使用这么多内存的哈希表？

上一篇：python - 查找列表中的三元组数

下一篇：python - 带有 "while"循环的 PYTHON 上的基本 FACTORIAL 算法