python - 奇怪的性能结果——循环 vs 列表理解和 zip()

我遇到了一个非常简单的问题，当我试图找出哪种解决方案更快时，得到了一些奇怪的结果。

原始问题:给定两个列表 ListA、ListB 和一个常量 k，删除两个列表总和为 的所有条目>k。

我通过两种方式解决了这个问题:首先我尝试使用循环，然后使用列表理解和 zip() 来压缩和解压缩两个列表。

使用循环的版本。

def Remove_entries_simple(listA, listB, k):
    """ removes entries that sum to k """
    new_listA = []
    new_listB = []
    for index in range(len(listA)):
        if listA[index] + listB[index] == k:
            pass
        else:
            new_listA.append(listA[index])
            new_listB.append(listB[index])
    return(new_listA, new_listB)

使用列表理解和zip()的版本

def Remove_entries_zip(listA, listB, k):
    """ removes entries that sum to k using zip"""
    zip_lists = [(a, b) for (a, b) in zip(listA, listB) if not (a+b) == k]

    # unzip the lists
    new_listA, new_listB = zip(*zip_lists)
    return(list(new_listA), list(new_listB))

然后我尝试确定哪种方法更快。但后来我得到了下图所示的结果(x 轴:列表的大小，y 轴:运行它的平均时间，10**3 次重复)。由于某种原因，使用 zip() 的版本总是在相同的位置进行相同的跳转——我在不同的机器上运行了多次。有人能解释一下是什么导致了这种奇怪的行为吗？

更新:我用来生成绘图的代码。我使用函数装饰器将每个问题运行 1000 次。

导入语句:

import random
import time
import matplotlib.pyplot as plt

函数装饰器:

def Repetition_Decorator(fun, Rep=10**2):
    ''' returns the average over Rep repetitions'''
    def Return_function(*args, **kwargs):
        Start_time = time.clock()
        for _ in range(Rep):
            fun(*args, **kwargs)
        return (time.clock() - Start_time)/Rep

return Return_function

创建绘图的代码:

Zippedizip = []
Loops = []
The_Number = 10
Size_list = list(range(10, 1000, 10))

Repeated_remove_loop = Repetition_Decorator(Remove_entries_simple, Rep=10**3)
Repeated_remove_zip = Repetition_Decorator(Remove_entries_zip, Rep=10**3)

for size in Size_list:
    ListA = [random.choice(range(10)) for _ in range(size)]
    ListB = [random.choice(range(10)) for _ in range(size)]

    Loops.append(Repeated_remove_loop(ListA, ListB, The_Number))
    Zippedizip.append(Repeated_remove_zip(ListA, ListB, The_Number))

plt.xlabel('Size of List')
plt.ylabel('Averaged time in seconds')
plt.plot(Size_list, Loops, label="Using Loop")
plt.plot(Size_list, Zippedizip, label="Zip")
plt.legend(loc='upper left', shadow=False, fontsize='x-large')
plt.show()

更新-更新:感谢 kaya3 指出 timeit 模块。

为了尽可能接近我的原始代码，同时也使用 timeit 模块，我创建了一个新的函数装饰器，它使用 timeit 模块对代码进行计时。

新的装饰器:

def Repetition_Decorator_timeit(fun, Rep=10**2):                                                                                   
"""returns average over Rep repetitions with timeit"""                                                                         
    def Return_function(*args, **kwargs):                                                                                          
        partial_fun = lambda: fun(*args, **kwargs)                                                                                 
        return timeit.timeit(partial_fun, number=Rep) / Rep                                                                        
return Return_function

当我使用新的装饰器时，使用 for 循环的版本不受影响，但 zip 版本不再进行跳转。

到目前为止，我非常确定跳跃是我测量函数而不是函数本身的结果。但这种跳跃是如此明显——在不同的机器上总是具有相同的列表大小——所以它不可能是侥幸。你知道为什么会发生这种跳跃吗？

更新-更新-更新:

它与垃圾收集器有关，因为如果我使用 gc.disable() 禁用垃圾收集器，两种测量方法都会给出相同的结果。

我在这里学到了什么:不要只自己测量执行时间。使用 timeit 模块来测量代码片段的性能。

最佳答案

这似乎是您测量运行时间的方式的产物。我不知道是什么原因导致您的计时代码产生这种效果，但是当我使用 timeit 来测量运行时间时，效果消失了。我使用的是 Python 3.6.2。

我可以使用您的计时代码一致地重现效果；我得到 zip 版本的运行时间在相同的阈值附近跳跃，尽管它仍然比我机器上的其他版本稍快:

但是，当我使用timeit测量时间时，效果完全消失:

这是使用timeit的代码；我对您的分析代码进行了尽可能少的更改。

import timeit

Zippedizip = []
Loops = []
The_Number = 10
Size_list = list(range(10, 1000, 10))
Reps = 1000

for size in Size_list:
    ListA = [random.choice(range(10)) for _ in range(size)]
    ListB = [random.choice(range(10)) for _ in range(size)]

    remove_loop = lambda: Remove_entries_simple(ListA, ListB, The_Number)
    remove_zip = lambda: Remove_entries_zip(ListA, ListB, The_Number)

    Loops.append(timeit.timeit(remove_loop, number=Reps) / Reps)
    Zippedizip.append(timeit.timeit(remove_zip, number=Reps) / Reps)

# ...

所以我认为这是一个虚假的结果。也就是说，我不明白是什么导致了你的计时代码。我尝试简化您的计时代码以不使用装饰器或 vargs，并将 time.clock() 替换为 time.perf_counter() ，这更准确，但这并没有不改变任何东西。

关于python - 奇怪的性能结果——循环 vs 列表理解和 zip()，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58868056/

python - 奇怪的性能结果——循环 vs 列表理解和 zip()

上一篇：winapi - 控件为 Unicode 或 ANSI 意味着什么？

下一篇：arrays - 我们如何记录属性名称和值？