python - 为什么使用字典测试包含比使用集合测试更快?

标签 python python-3.x dictionary set

我知道在本质上,Python 集和 Python 字典非常相似。阅读他们各自的来源 - dictset - 很明显他们的查找几乎相同。在阅读 this answer 时,我决定通过将以下模型组合在一起来测试作者的主张“集合查找比字典查找更快”:

from timeit import timeit
import random

universe = range(1,100000)
keys = random.sample(universe, 50000)
lookups = random.sample(universe, 50000)
dict_set = dict((k,True) for k in keys)
set_set = set(keys)

def dict_lookup():
    for l in lookups:
        l in dict_set

def set_lookup():
    for l in lookups:
        l in set_set

if __name__ == '__main__':
    set_victories = 0
    dict_victories = 0
    for i in range(100):
        dict_time = timeit('dict_lookup()', setup="from __main__ import dict_lookup", number=10000)
        set_time = timeit('set_lookup()', setup="from __main__ import set_lookup", number=10000)
        print("dict time: {}".format(dict_time))
        print("set time:  {}".format(set_time))
        if set_time < dict_time:
            set_victories += 1
        else:
            dict_victories += 1
    print("Sets were faster in  {} trials".format(set_victories))
    print("Dicts were faster in {} trials".format(dict_victories))

预期的结果是,考虑到集合查找和字典查找的实现,它们的性能将没有区别。我实际发现的是以下最终结果:

$ python3 --version
Python 3.4.5
$ python3 sets-vs-dicts.py
<snip - see below for full output>
Sets were faster in  2 trials
Dicts were faster in 98 trials

所以字典实际上始终比集合更快。当然,我并不是建议我们都应该放弃集合并使用更快的字典,因为集合使程序员的意图更加清晰,并且考虑到测试的规模,差异小得可怜。然而,我确实发现这个结果非常奇怪。这是怎么回事?

如果您好奇,完整输出如下:

$ python3 set-vs-dict.py
dict time: 57.754860900342464
set time:  56.8056653002277
dict time: 50.8890880998224
set time:  50.642351899296045
dict time: 49.936297399923205
set time:  50.66272980067879
dict time: 49.92973940074444
set time:  50.65518939960748
dict time: 49.949383799917996
set time:  50.66877659969032
dict time: 49.93578719999641
set time:  50.64872649963945
dict time: 49.96432110015303
set time:  50.676835800521076
dict time: 49.95099350064993
set time:  50.64867010060698
dict time: 49.98275039996952
set time:  50.648987299762666
dict time: 49.92164439987391
set time:  50.66931669972837
dict time: 49.98953749984503
set time:  50.652459900826216
dict time: 49.95234560035169
set time:  50.65124330017716
dict time: 49.98174169939011
set time:  50.6712632002309
dict time: 49.93824000004679
set time:  50.65437529981136
dict time: 49.95089349988848
set time:  50.65370349958539
dict time: 49.963413699530065
set time:  50.65550949983299
dict time: 49.955208600498736
set time:  50.66121090017259
dict time: 49.94347499962896
set time:  50.64449250046164
dict time: 49.95420549996197
set time:  50.66687630023807
dict time: 49.92143050022423
set time:  50.64667259994894
dict time: 50.05037229973823
set time:  50.67966340016574
dict time: 49.93846719991416
set time:  50.64651320036501
dict time: 49.921281000599265
set time:  50.67906459979713
dict time: 49.942994699813426
set time:  50.65166569966823
dict time: 49.94313340075314
set time:  50.656177499331534
dict time: 49.94610709976405
set time:  50.65122799947858
dict time: 49.93874369934201
set time:  50.661101600155234
dict time: 49.94996269978583
set time:  50.63938449975103
dict time: 49.9602530002594
set time:  50.65474760066718
dict time: 49.91891669947654
set time:  50.663624899461865
dict time: 49.959330099634826
set time:  50.653377699665725
dict time: 49.98555530048907
set time:  50.64655719976872
dict time: 49.945239200256765
set time:  50.65128379967064
dict time: 49.95342260040343
set time:  50.65899199992418
dict time: 49.92802210059017
set time:  50.67100259941071
dict time: 49.942902400158346
set time:  50.74889140017331
dict time: 49.994800799526274
set time:  50.731577299535275
dict time: 49.98310230020434
set time:  50.747778999619186
dict time: 49.99376400001347
set time:  50.73122859932482
dict time: 50.00640409998596
set time:  50.68737949989736
dict time: 49.94556000083685
set time:  50.722481600008905
dict time: 49.98192979954183
set time:  50.72525530029088
dict time: 49.99698970001191
set time:  50.736096899956465
dict time: 49.94320739991963
set time:  50.71096289996058
dict time: 49.972679699771106
set time:  50.71838010009378
dict time: 49.957800599746406
set time:  50.747396499849856
dict time: 49.97235369961709
set time:  50.69941039942205
dict time: 49.951399500481784
set time:  50.647985899820924
dict time: 49.94027389958501
set time:  50.66828709933907
dict time: 49.94174600020051
set time:  50.65279300045222
dict time: 49.96716000046581
set time:  50.64943030010909
dict time: 49.95117200072855
set time:  50.65525580011308
dict time: 49.962328700348735
set time:  50.66319840028882
dict time: 49.960031100548804
set time:  50.672181099653244
dict time: 49.93908840045333
set time:  50.651302699930966
dict time: 49.94130470044911
set time:  50.655242399312556
dict time: 50.04310019966215
set time:  50.67391949985176
dict time: 49.93010629992932
set time:  50.64970660023391
dict time: 49.991717299446464
set time:  50.65591560024768
dict time: 49.952454400248826
set time:  50.649492600001395
dict time: 49.92677689995617
set time:  50.635977199301124
dict time: 49.95432769972831
set time:  50.64075019955635
dict time: 49.94808299932629
set time:  50.664196100085974
dict time: 49.966013699769974
set time:  50.649582100100815
dict time: 49.9813024001196
set time:  50.64982909988612
dict time: 49.93897459935397
set time:  50.66509110014886
dict time: 49.95878900028765
set time:  50.649003400467336
dict time: 49.96674569975585
set time:  50.69693780038506
dict time: 49.91303739976138
set time:  50.675189800560474
dict time: 49.950330699793994
set time:  50.64532170072198
dict time: 49.95022019930184
set time:  50.65448010060936
dict time: 49.95197269972414
set time:  50.65391890052706
dict time: 49.94361769966781
set time:  50.67086180020124
dict time: 49.95455109979957
set time:  50.670443600043654
dict time: 49.94633509963751
set time:  50.65955980028957
dict time: 49.967472000047565
set time:  50.66301089990884
dict time: 49.95830660033971
set time:  50.67482869978994
dict time: 49.984512499533594
set time:  50.67321899998933
dict time: 50.01141999941319
set time:  50.84260869957507
dict time: 50.31206789985299
set time:  51.02959220018238
dict time: 50.28449110034853
set time:  51.03110689949244
dict time: 50.303432799875736
set time:  51.02032170072198
dict time: 50.281682999804616
set time:  51.05188430007547
dict time: 50.30898350011557
set time:  51.01742030028254
dict time: 50.3027657000348
set time:  51.02114639990032
dict time: 50.00038649979979
set time:  50.65360379964113
dict time: 49.93306410033256
set time:  50.63413709960878
dict time: 49.95266539976001
set time:  50.65499630011618
dict time: 49.94854210037738
set time:  50.703547400422394
dict time: 49.96691229939461
set time:  50.69470370002091
dict time: 49.95223430078477
set time:  50.70982529968023
dict time: 49.954243999905884
set time:  50.791720499284565
dict time: 49.97948960028589
set time:  50.69436000008136
dict time: 49.98102519940585
set time:  50.73820179980248
dict time: 49.96782180014998
set time:  50.722959300503135
dict time: 49.9863857999444
set time:  50.70789400022477
dict time: 49.9592831004411
set time:  50.707397900521755
dict time: 49.94034240022302
set time:  50.667025099508464
dict time: 49.96215169969946
set time:  50.72984409984201
dict time: 49.98776920046657
set time:  50.72097889985889
Sets were faster in  2 trials
Dicts were faster in 98 trials

最佳答案

当我测试你的代码时,我认为这些数字可能有点小。所以我将它们增加了 10 倍,并让 random.sample 在 100 个数字中的比例为 1。

import random
from time import time


def timeit(func):
    def wrap(*args):
        start = time()
        result = func(*args)
        return time()-start
    return wrap


def get_set_and_dict():
    universe = range(1, 10**8)
    keys = random.sample(universe, 10**6)
    lookups = random.sample(universe,10**6)
    dict_set = dict((k,True) for k in keys)
    set_set = set(keys)
    return dict_set, set_set, lookups


@timeit
def test(container, lookups):

    for i in lookups:
        a = i in container


def main():
    dict_set, set_set, lookups = get_set_and_dict()
    acc_set = acc_dict = 0
    rounds = 100
    for _ in range(rounds):
        acc_dict += test(dict_set, lookups)
        acc_set += test(set_set, lookups)
    print("Set time: {:.4f}s\n Dict time: {:.4f}s".format(acc_set/rounds, acc_dict/rounds))

if __name__ == '__main__':
    main()

>> Set time: 0.1263s
>> Dict time: 0.1578s

但是如果 set 和 dict 有所不同,那就有意义了,因为即使相似,它们也不是同一件事。

<小时/>

也许仅仅取决于您如何设置实验,结论就会有所不同。

关于python - 为什么使用字典测试包含比使用集合测试更快?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40633006/

相关文章:

python - numba 的 Scipy 特殊函数

Python 从另一个列表中排序一个列表

python - 如何从Python中的打印列表中删除/删除最后一个字符

javascript - 在javascript中合并两个字典数组

java - 提取 ArrayList<String> 的元素作为 Hashmap 的一部分

python:为交叉点排序两个多边形列表

python - 使用适用于 Windows 的 pyUno 进行 OpenOffice.org 开发——哪个 Python?

悬停时,HTML map 区域标签不显示指针光标

Python 同时处理多个无限循环

Python 选项解析器 : Boolean flag with optional parameters