python - numpy ndarray 可哈希性

我在理解如何管理 numpy 对象的可哈希性时遇到了一些问题。

>>> import numpy as np
>>> class Vector(np.ndarray):
...     pass
>>> nparray = np.array([0.])
>>> vector = Vector(shape=(1,), buffer=nparray)
>>> ndarray = np.ndarray(shape=(1,), buffer=nparray)
>>> nparray
array([ 0.])
>>> ndarray
array([ 0.])
>>> vector
Vector([ 0.])
>>> '__hash__' in dir(nparray)
True
>>> '__hash__' in dir(ndarray)
True
>>> '__hash__' in dir(vector)
True
>>> hash(nparray)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(ndarray)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(vector)
-9223372036586049780
>>> nparray.__hash__()
269709177
>>> ndarray.__hash__()
269702147
>>> vector.__hash__()
-9223372036586049780
>>> id(nparray)
4315346832
>>> id(ndarray)
4315234352
>>> id(vector)
4299616456
>>> nparray.__hash__() == id(nparray)
False
>>> ndarray.__hash__() == id(ndarray)
False
>>> vector.__hash__() == id(vector)
False
>>> hash(vector) == vector.__hash__()
True

怎么会

numpy 对象定义了一个 __hash__ 方法但是不可哈希
派生 numpy.ndarray 的类定义了 __hash__ 并且是可散列的吗？

我错过了什么吗？

我正在使用 Python 2.7.1 和 numpy 1.6.1

感谢您的帮助!

编辑:添加对象 ids

编辑2: 在 deinonychusaur 评论之后并试图弄清楚散列是否基于内容，我玩了 numpy.nparray.dtype 并发现了一些我觉得很奇怪的东西:

>>> [Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype) for mytype in ('float', 'int', 'float128')]
[Vector([ 1.]), Vector([1]), Vector([ 1.0], dtype=float128)]
>>> [id(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[4317742576, 4317742576, 4317742576]
>>> [hash(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[269858911, 269858911, 269858911]

我很困惑...在 numpy 中有一些(类型无关的)缓存机制吗？

最佳答案

我在 Python 2.6.6 和 numpy 1.3.0 中得到相同的结果。根据the Python glossary ，如果定义了 __hash__(并且不是 None)，并且 __eq__ 或 __cmp__ 对象应该是可散列的被定义为。 ndarray.__eq__ 和 ndarray.__hash__ 都被定义并返回一些有意义的东西，所以我不明白为什么 hash 会失败。快速谷歌后，我找到了 this post on the python.scientific.devel mailing list ，它指出数组从未被设计为可散列的——所以为什么 ndarray.__hash__ 被定义，我不知道。请注意，isinstance(nparray, collections.Hashable) 返回 True。

编辑:请注意 nparray.__hash__() 返回与 id(nparray) 相同的结果，因此这只是默认实现。也许很难或不可能在早期版本的 python 中删除 __hash__ 的实现(__hash__ = None 技术显然是在 2.6 中引入的)，所以他们使用了某种C API 神奇地以一种不会传播到子类的方式实现这一点，并且不会阻止您显式调用 ndarray.__hash__？

Python 3.2.2 和存储库中当前的 numpy 2.0.0 有所不同。 __cmp__ 方法不再存在，因此可哈希性现在需要 __hash__ 和 __eq__(参见 Python 3 glossary)。在这个版本的numpy中，定义了ndarray.__hash__，但是它只是None，所以不能被调用。 hash(nparray) 失败，isinstance(nparray, collections.Hashable) 按预期返回 False。 hash(vector) 也失败了。

关于python - numpy ndarray 可哈希性，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/9785514/

python - numpy ndarray 可哈希性

上一篇：Python Unicode解码错误: 'ascii' codec can't decode byte 0xe2 ordinal not in range(128)

下一篇：Python pytz 将时间戳(字符串格式)从一个时区转换为另一个时区