python - Python 中的 hash(n) == n 什么时候出现？

我一直在玩 Python 的 hash function .对于小整数，它总是出现 hash(n) == n。然而，这并没有扩展到大量:

>>> hash(2**100) == 2**100
False

我并不感到惊讶，我知道 hash 的取值范围是有限的。这个范围是多少？

我尝试使用 binary search找到最小的数字 hash(n) != n

>>> import codejamhelpers # pip install codejamhelpers
>>> help(codejamhelpers.binary_search)
Help on function binary_search in module codejamhelpers.binary_search:

binary_search(f, t)
    Given an increasing function :math:`f`, find the greatest non-negative integer :math:`n` such that :math:`f(n) \le t`. If :math:`f(n) > t` for all :math:`n \ge 0`, return None.

>>> f = lambda n: int(hash(n) != n)
>>> n = codejamhelpers.binary_search(f, 0)
>>> hash(n)
2305843009213693950
>>> hash(n+1)
0

2305843009213693951有什么特别之处？我注意到它小于 sys.maxsize == 9223372036854775807

编辑:我使用的是 Python 3。我在 Python 2 上运行了相同的二进制搜索，得到了不同的结果 2147483648，我注意到它是 sys.maxint+1

我还用 [hash(random.random()) for i in range(10**6)] 来估计散列函数的范围。最大值始终低于上述 n。比较最小值，似乎 Python 3 的哈希值始终为正值，而 Python 2 的哈希值可以为负值。

最佳答案

2305843009213693951 是 2^61 - 1。它是适合 64 位的最大梅森素数。

如果您必须仅通过取值 mod 某个数字来进行散列，那么大梅森素数是一个不错的选择 - 它易于计算并确保可能性的均匀分布。 (虽然我个人永远不会这样散列)

计算 float 的模数特别方便。它们有一个指数分量，将整数乘以 2^x。由于2^61 = 1 mod 2^61-1，你只需要考虑(exponent) mod 61。

见:https://en.wikipedia.org/wiki/Mersenne_prime

关于python - Python 中的 hash(n) == n 什么时候出现？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37612524/

python - Python 中的 hash(n) == n 什么时候出现？

上一篇：python - AttributeError ("' str' 对象没有属性 'read'")

下一篇：Python str 和列表

python - Python 中的 hash(n) == n 什么时候出现？

上一篇：python - AttributeError ("' str' 对象没有属性 'read'")

下一篇：Python __str__ 和列表

下一篇：Python str 和列表