Python pickle 协议(protocol)选择?

标签 python python-2.7 numpy pickle

我使用 python 2.7 并试图 pickle 一个对象。我想知道 pickle 协议(protocol)之间的真正区别是什么。

import numpy as np
import pickle

class Data(object):
  def __init__(self):
    self.a = np.zeros((100, 37000, 3), dtype=np.float32)

d = Data()
print("data size: ", d.a.nbytes / 1000000.0)
print("highest protocol: ", pickle.HIGHEST_PROTOCOL)
pickle.dump(d, open("noProt", "w"))
pickle.dump(d, open("prot0", "w"), protocol=0)
pickle.dump(d, open("prot1", "w"), protocol=1)
pickle.dump(d, open("prot2", "w"), protocol=2)


out >> data size:  44.4
out >> highest protocol:  2

然后我发现保存的文件在磁盘上有不同的大小:

  • noProt:177.6MB
  • prot0:177.6MB
  • prot1:44.4MB
  • prot2:44.4MB

我知道 prot0 是人类可读的文本文件,所以我不想使用它。 我猜协议(protocol) 0 是默认给定的。

我想知道协议(protocol) 1 和 2 之间有什么区别,我有理由选择其中一个吗?

pickle 还是 cPickle 哪个更好用?

最佳答案

使用支持读取数据的最低 Python 版本的最新协议(protocol)。较新的协议(protocol)版本支持新的语言功能并包括优化。

来自 pickle module data format documentation :

There are currently 6 different protocols which can be used for pickling. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.

  • Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of Python.
  • Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.
  • Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.
  • Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This was the default protocol in Python 3.0–3.7.
  • Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8. Refer to PEP 3154 for information about improvements brought by protocol 4.
  • Protocol version 5 was added in Python 3.8. It adds support for out-of-band data and speedup for in-band data. Refer to PEP 574 for information about improvements brought by protocol 5.

并且来自 [pickle.Pickler(...) 类部分](

The optional protocol argument, an integer, tells the pickler to use the given protocol; supported protocols are 0 to HIGHEST_PROTOCOL. If not specified, the default is DEFAULT_PROTOCOL. If a negative number is specified, HIGHEST_PROTOCOL is selected.

因此,如果您想支持使用 Python 3.4 或更高版本加载 pickle 数据,请选择协议(protocol) 4。如果您仍需要支持 Python 2.7,请选择协议(protocol) 2,尤其是如果您使用的是自定义类派生自 object(新式类)(现在任何现代代码都这样做)。

但是,如果您要与其他 Python 版本交换 pickle 数据,或者需要保持与旧 Python 版本的向后兼容性,则最简单的方法是坚持使用您可以使用的最高协议(protocol)版本:

with open("prot2", 'wb') as pfile:
    pickle.dump(d, pfile, protocol=pickle.HIGHEST_PROTOCOL)

pickle.HIGHEST_PROTOCOL 将始终是当前 Python 版本的正确版本。因为这是二进制格式,所以一定要使用'wb'作为文件模式!

Python 3 不再区分 cPicklepickle,在使用 Python 3 时始终使用 pickle。它在引擎盖。

如果你还在使用 Python 2,那么 cPicklepickle 大部分是兼容的,区别在于提供的 API。对于大多数用例,只需坚持使用 cPickle;它更快。引用 documentation再次:

First, cPickle can be up to 1000 times faster than pickle because the former is implemented in C. Second, in the cPickle module the callables Pickler() and Unpickler() are functions, not classes. This means that you cannot use them to derive custom pickling and unpickling subclasses. Most applications have no need for this functionality and should benefit from the greatly improved performance of the cPickle module.

关于Python pickle 协议(protocol)选择?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23582489/

相关文章:

python - 如何将字符串和 float 加载到 numpy 数组中?

Python3 : Write csv directly to zipfile gives TypeError

python - Bigtable 还是 Datastore 更适合在线应用程序存储和使用财务数据?

python - 导入错误 : cannot import name 'Flask' from partially initialized module 'flask' (most likely due to a circular import)

python - 将 __getitem__ 添加到模块

python - 全局名称 'debug' 未定义

Python3 中缺少 Python 2's ` exceptions` 模块,它的内容去哪儿了?

python - lambdify 可以返回 dtype np.float128 的数组吗?

从 numpy 数组列表创建 numpy 数组的 Pythonic 方法

python - 在 Python 中高效检查数百万个图像 URL