在内存使用和 CPU 消耗方面,Python 中哪个更有效 - 字典还是对象?
背景: 我必须将大量数据加载到 Python 中。我创建了一个只是字段容器的对象。创建 4M 个实例并将它们放入字典大约需要 10 分钟和约 6GB 的内存。字典准备好后,访问它是一眨眼的功夫。
示例: 为了检查性能,我编写了两个相同的简单程序 - 一个是使用对象,另一个是字典:
对象(执行时间~18sec):
class Obj(object):
def __init__(self, i):
self.i = i
self.l = []
all = {}
for i in range(1000000):
all[i] = Obj(i)
字典(执行时间~12sec):
all = {}
for i in range(1000000):
o = {}
o['i'] = i
o['l'] = []
all[i] = o
问题: 我做错了什么还是字典比对象快?如果确实字典性能更好,有人可以解释为什么吗?
最佳答案
您是否尝试过使用 __slots__
?
来自 documentation :
By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.
The default can be overridden by defining
__slots__
in a new-style class definition. The__slots__
declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because__dict__
is not created for each instance.
那么这样既节省时间又节省内存?
在我的电脑上比较三种方法:
test_slots.py:
class Obj(object):
__slots__ = ('i', 'l')
def __init__(self, i):
self.i = i
self.l = []
all = {}
for i in range(1000000):
all[i] = Obj(i)
test_obj.py:
class Obj(object):
def __init__(self, i):
self.i = i
self.l = []
all = {}
for i in range(1000000):
all[i] = Obj(i)
test_dict.py:
all = {}
for i in range(1000000):
o = {}
o['i'] = i
o['l'] = []
all[i] = o
test_namedtuple.py(2.6 支持):
import collections
Obj = collections.namedtuple('Obj', 'i l')
all = {}
for i in range(1000000):
all[i] = Obj(i, [])
运行基准测试(使用 CPython 2.5):
$ lshw | grep product | head -n 1
product: Intel(R) Pentium(R) M processor 1.60GHz
$ python --version
Python 2.5
$ time python test_obj.py && time python test_dict.py && time python test_slots.py
real 0m27.398s (using 'normal' object)
real 0m16.747s (using __dict__)
real 0m11.777s (using __slots__)
使用 CPython 2.6.2,包括命名元组测试:
$ python --version
Python 2.6.2
$ time python test_obj.py && time python test_dict.py && time python test_slots.py && time python test_namedtuple.py
real 0m27.197s (using 'normal' object)
real 0m17.657s (using __dict__)
real 0m12.249s (using __slots__)
real 0m12.262s (using namedtuple)
所以是的(并不奇怪),使用 __slots__
是一种性能优化。使用命名元组的性能与 __slots__
相似。
关于python - 字典 vs 对象 - 哪个更有效,为什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1336791/