发现此未记录 _md5
当对缓慢的 stdlib 感到沮丧时 hashlib.md5
执行。
在 macbook 上:
>>> timeit hashlib.md5(b"hello world")
597 ns ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit _md5.md5(b"hello world")
224 ns ± 3.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> _md5
<module '_md5' from '/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_md5.cpython-37m-darwin.so'>
在 Windows 盒子上:
>>> timeit hashlib.md5(b"stonk overflow")
328 ns ± 21.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit _md5.md5(b"stonk overflow")
110 ns ± 12.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> _md5
<module '_md5' (built-in)>
在 Linux 机器上:
>>> timeit hashlib.md5(b"https://adventofcode.com/2016/day/5")
259 ns ± 1.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit _md5.md5(b"https://adventofcode.com/2016/day/5")
102 ns ± 0.0576 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> _md5
<module '_md5' from '/usr/local/lib/python3.8/lib-dynload/_md5.cpython-38-x86_64-linux-gnu.so'>
对于散列短消息,速度更快。对于长消息,类似的性能。
为什么它隐藏在下划线扩展模块中,为什么在 hashlib 中默认不使用这种更快的实现? 什么是
_md5
模块为什么它没有公共(public) API?
最佳答案
Python 公共(public)模块将方法委托(delegate)给隐藏模块是很常见的。
例如collections.abc
的完整代码模块是:
from _collections_abc import *
from _collections_abc import __all__
The functions of
hashlib
are dynamically created :for __func_name in __always_supported:
# try them all, some may not work due to the OpenSSL
# version not supporting that algorithm.
try:
globals()[__func_name] = __get_hash(__func_name)
The definition of
always_supported
is :__always_supported = ('md5', 'sha1', 'sha224', 'sha256', 'sha384', 'sha512',
'blake2b', 'blake2s',
'sha3_224', 'sha3_256', 'sha3_384', 'sha3_512',
'shake_128', 'shake_256')
And
get_hash
要么 __get_openssl_constructor
或 __get_builtin_constructor
:try:
import _hashlib
new = __hash_new
__get_hash = __get_openssl_constructor
algorithms_available = algorithms_available.union(
_hashlib.openssl_md_meth_names)
except ImportError:
new = __py_new
__get_hash = __get_builtin_constructor
__get_builtin_constructor
is a fallback for the (again) hidden _hashlib
module :def __get_openssl_constructor(name):
if name in __block_openssl_constructor:
# Prefer our blake2 and sha3 implementation.
return __get_builtin_constructor(name)
try:
f = getattr(_hashlib, 'openssl_' + name)
# Allow the C module to raise ValueError. The function will be
# defined but the hash not actually available thanks to OpenSSL.
f()
# Use the C function directly (very fast)
return f
except (AttributeError, ValueError):
return __get_builtin_constructor(name)
以上在
hashlib
code ,你有这个:def __get_builtin_constructor(name):
cache = __builtin_constructor_cache
...
elif name in {'MD5', 'md5'}:
import _md5
cache['MD5'] = cache['md5'] = _md5.md5
但是
md5
不在 __block_openssl_constructor
,因此 _hashlib/openssl
版本优于 _md5/builtin
版本:REPL 中的确认:
>>> hashlib.md5
<built-in function openssl_md5>
>>> _md5.md5
<built-in function md5>
这些函数是 MD5 算法和
openssl_md5
的不同实现。调用动态系统库。这就是为什么你有一些性能变化。第一个版本定义在 https://github.com/python/cpython/blob/master/Modules/_hashopenssl.c另一个在 https://github.com/python/cpython/blob/master/Modules/md5module.c ,如果你想检查差异。那为什么是
_md5.md5
定义了但从未使用过的函数?我想这个想法是为了确保某些算法始终可用,即使 openssl
缺席:Constructors for hash algorithms that are always present in this module are sha1(), sha224(), sha256(), sha384(), sha512(), blake2b(), and blake2s(). (https://docs.python.org/3/library/hashlib.html)
关于python - 什么是 _md5.md5,为什么 hashlib.md5 这么慢?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59955854/