我注意到它可以在速度方面产生相当大的差异, 如果您通过参数指定 pickle.dumps 中使用的协议(protocol)或者如果您 猴子补丁 pickle.DEFAULT_PROTOCOL 用于所需的协议(protocol)版本。
在 Python 3.6 上,pickle.DEFAULT_PROTOCOL
为 3 并且
pickle.HIGHEST_PROTOCOL
是 4。
对于达到一定长度的对象,设置似乎更快
DEFAULT_PROTOCOL 为 4,而不是传递 protocol=4
作为参数。
例如,在我的测试中,将 pickle.DEFAULT_PROTOCOL 设置为 4 并进行 pickle
通过调用 pickle.dumps(packet_list_1)
生成长度为 1 的列表需要 481 ns,而使用 pickle.dumps(packet_list_1, protocol=4)
调用则需要 733 ns,这是一个惊人的结果显式传递协议(protocol)而不是回退到默认值(之前设置为 4)的速度损失约为 52%。
"""
(stackoverflow insists this to be formatted as code:)
pickle.DEFAULT_PROTOCOL = 4
pickle.dumps(packet) vs pickle.dumps(packet, protocol=4):
(stackoverflow insists this to be formatted as code:)
For a list with length 1 it's 481ns vs 733ns (~52% penalty).
For a list with length 10 it's 763ns vs 999ns (~30% penalty).
For a list with length 100 it's 2.99 µs vs 3.21 µs (~7% penalty).
For a list with length 1000 it's 25.8 µs vs 26.2 µs (~1.5% penalty).
For a list with length 1_000_000 it's 32 ms vs 32.4 ms (~1.13% penalty).
"""
我发现实例、列表、字典和数组都有这种行为,即 到目前为止我测试过的所有内容。效果随着对象大小而减弱。
对于听写,我注意到效果在某些时候变成了相反的效果,所以 对于长度为 10**6 的字典(具有唯一的整数值),显式地更快 传递协议(protocol) = 4 作为参数(269 毫秒),而不是依赖默认设置为 4(286 毫秒)。
"""
pickle.DEFAULT_PROTOCOL = 4
pickle.dumps(packet) vs pickle.dumps(packet, protocol=4):
For a dict with length 1 it's 589 ns vs 811 ns (~38% penalty).
For a dict with length 10 it's 1.59 µs vs 1.81 µs (~14% penalty).
For a dict with length 100 it's 13.2 µs vs 12.9 µs (~2,3% penalty).
For a dict with length 1000 it's 128 µs vs 129 µs (~0.8% penalty).
For a dict with length 1_000_000 it's 306 ms vs 283 ms (~7.5% improvement).
"""
瞥见 pickle 源,没有什么引起我的注意 这样的变化。
如何解释这种意外行为?
设置 pickle.DEFAULT_PROTOCOL 而不是传递时是否有任何警告 协议(protocol)作为参数来利用改进的速度?
(在 Python 3.6.3、IPython 6.2.1、Windows 7 上使用 IPython 的 timeit magic 进行计时)
一些示例代码转储:
# instances -------------------------------------------------------------
class Dummy: pass
dummy = Dummy()
pickle.DEFAULT_PROTOCOL = 3
"""
>>> %timeit pickle.dumps(dummy)
5.8 µs ± 33.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit pickle.dumps(dummy, protocol=4)
6.18 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
%timeit pickle.dumps(dummy)
5.74 µs ± 18.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit pickle.dumps(dummy, protocol=4)
6.24 µs ± 26.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""
# lists -------------------------------------------------------------
packet_list_1 = [*range(1)]
pickle.DEFAULT_PROTOCOL = 3
"""
>>>%timeit pickle.dumps(packet_list_1)
476 ns ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_1, protocol=4)
730 ns ± 2.22 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_1)
481 ns ± 2.12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_1, protocol=4)
733 ns ± 2.94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
# --------------------------
packet_list_10 = [*range(10)]
pickle.DEFAULT_PROTOCOL = 3
"""
>>>%timeit pickle.dumps(packet_list_10)
714 ns ± 3.05 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_10, protocol=4)
978 ns ± 24.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_10)
763 ns ± 3.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_10, protocol=4)
999 ns ± 8.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
# --------------------------
packet_list_100 = [*range(100)]
pickle.DEFAULT_PROTOCOL = 3
"""
>>>%timeit pickle.dumps(packet_list_100)
2.96 µs ± 5.16 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>>%timeit pickle.dumps(packet_list_100, protocol=4)
3.22 µs ± 18.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_100)
2.99 µs ± 18.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>>%timeit pickle.dumps(packet_list_100, protocol=4)
3.21 µs ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""
# --------------------------
packet_list_1000 = [*range(1000)]
pickle.DEFAULT_PROTOCOL = 3
"""
>>>%timeit pickle.dumps(packet_list_1000)
26 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>>%timeit pickle.dumps(packet_list_1000, protocol=4)
26.4 µs ± 93.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_1000)
25.8 µs ± 110 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>>%timeit pickle.dumps(packet_list_1000, protocol=4)
26.2 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
"""
# --------------------------
packet_list_1m = [*range(10**6)]
pickle.DEFAULT_PROTOCOL = 3
"""
>>>%timeit pickle.dumps(packet_list_1m)
32 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>>%timeit pickle.dumps(packet_list_1m, protocol=4)
32.3 ms ± 141 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_1m)
32 ms ± 52.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>>%timeit pickle.dumps(packet_list_1m, protocol=4)
32.4 ms ± 466 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
"""
最佳答案
让我们按返回值重新组织 %timeit 结果:
| DEFAULT_PROTOCOL | call | %timeit | returns |
|------------------+-----------------------------------------+-------------------+------------------------------------------------------------------------------------------------------------------------------|
| 3 | pickle.dumps(dummy) | 5.8 µs ± 33.5 ns | b'\x80\x03c__main__\nDummy\nq\x00)\x81q\x01.' |
| 4 | pickle.dumps(dummy) | 5.74 µs ± 18.8 ns | b'\x80\x03c__main__\nDummy\nq\x00)\x81q\x01.' |
| 3 | pickle.dumps(dummy, protocol=4) | 6.18 µs ± 10.4 ns | b'\x80\x04\x95\x1b\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x05Dummy\x94\x93\x94)}\x94\x92\x94.' |
| 4 | pickle.dumps(dummy, protocol=4) | 6.24 µs ± 26.7 ns | b'\x80\x04\x95\x1b\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x05Dummy\x94\x93\x94)}\x94\x92\x94.' |
| 3 | pickle.dumps(packet_list_1) | 476 ns ± 1.01 ns | b'\x80\x03]q\x00cbuiltins\nrange\nq\x01K\x00K\x01K\x01\x87q\x02Rq\x03a.' |
| 4 | pickle.dumps(packet_list_1) | 481 ns ± 2.12 ns | b'\x80\x03]q\x00cbuiltins\nrange\nq\x01K\x00K\x01K\x01\x87q\x02Rq\x03a.' |
| 3 | pickle.dumps(packet_list_1, protocol=4) | 730 ns ± 2.22 ns | b'\x80\x04\x95#\x00\x00\x00\x00\x00\x00\x00]\x94\x8c\x08builtins\x94\x8c\x05range\x94\x93\x94K\x00K\x01K\x01\x87\x94R\x94a.' |
| 4 | pickle.dumps(packet_list_1, protocol=4) | 733 ns ± 2.94 ns | b'\x80\x04\x95#\x00\x00\x00\x00\x00\x00\x00]\x94\x8c\x08builtins\x94\x8c\x05range\x94\x93\x94K\x00K\x01K\x01\x87\x94R\x94a.' |
请注意,当我们将给出相同返回值的调用配对时,%timeit
结果如何很好地对应。
如您所见,pickle.DEFAULT_PROTOCOL
的值对 pickle.dumps
返回的值没有影响。
如果不指定协议(protocol)参数,则无论 pickle.DEFAULT_PROTOCOL
的值是什么,默认协议(protocol)都是 3。
# Use the faster _pickle if possible
try:
from _pickle import (
PickleError,
PicklingError,
UnpicklingError,
Pickler,
Unpickler,
dump,
dumps,
load,
loads
)
except ImportError:
Pickler, Unpickler = _Pickler, _Unpickler
dump, dumps, load, loads = _dump, _dumps, _load, _loads
如果pickle
模块成功导入_pickle
,则将pickle.dumps
设置为_pickle.dumps
, pickle 模块的编译版本。
_pickle
模块默认使用 protocol=3
。仅当 Python 无法导入 _pickle
时,dumps
设置为 the Python version :
def _dumps(obj, protocol=None, *, fix_imports=True):
f = io.BytesIO()
_Pickler(f, protocol, fix_imports=fix_imports).dump(obj)
res = f.getvalue()
assert isinstance(res, bytes_types)
return res
只有 Python 版本 _dumps
受 pickle.DEFAULT_PROTOCOL
值的影响:
In [68]: pickle.DEFAULT_PROTOCOL = 3
In [70]: pickle._dumps(dummy)
Out[70]: b'\x80\x03c__main__\nDummy\nq\x00)\x81q\x01.'
In [71]: pickle.DEFAULT_PROTOCOL = 4
In [72]: pickle._dumps(dummy)
Out[72]: b'\x80\x04\x95\x1b\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x05Dummy\x94\x93\x94)}\x94\x92\x94.'
关于python - 通过猴子修补 DEFAULT_PROTOCOL 提高 pickle.dumps 的性能?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47892816/