python - 为什么 `for` 循环计算真值的速度如此之快？

我最近回答了 question on a sister site它要求一个计算一个数字的所有偶数位的函数。 other answers 之一包含两个函数(结果证明是迄今为止最快的):

def count_even_digits_spyr03_for(n):
    count = 0
    for c in str(n):
        if c in "02468":
            count += 1
    return count

def count_even_digits_spyr03_sum(n):
    return sum(c in "02468" for c in str(n))

此外，我还研究了使用列表推导和 list.count:

def count_even_digits_spyr03_list(n):
    return [c in "02468" for c in str(n)].count(True)

前两个函数基本相同，只是第一个使用显式计数循环，而第二个使用内置sum。我本来预计第二个会更快(基于例如 this answer )，如果被要求进行审查，我会建议将前者变成。但是，事实证明情况正好相反。用一些位数增加的随机数对其进行测试(因此任何单个数字为偶数的可能性约为 50%)我得到以下时间:

为什么手动 for 循环如此之快？它几乎比使用 sum 快两倍。而且由于内置的 sum 应该比手动对列表求和快大约五倍(根据 the linked answer )，这意味着它实际上快了十倍!只需将一半的值加到计数器中，因为另一半被丢弃，节省的成本是否足以解释这种差异？

使用 if 作为过滤器，如下所示:

def count_even_digits_spyr03_sum2(n):
    return sum(1 for c in str(n) if c in "02468")

仅将时序提高到与列表理解相同的水平。

当将时序扩展到更大的数字并归一化到 for 循环时序时，它们渐近收敛于非常大的数字(>10k 位)，可能是由于时间 str(n ) 需要:

最佳答案

sum 相当快，但 sum 并不是导致速度变慢的原因。三个主要因素导致放缓:

使用生成器表达式会导致不断暂停和恢复生成器的开销。
您的生成器版本无条件添加，而不是仅在数字为偶数时添加。当数字为奇数时，这会更昂贵。
添加 bool 值而不是整数会阻止 sum 使用其整数快速路径。

与列表推导相比，生成器有两个主要优势:它们占用的内存少得多，并且如果不需要所有元素，它们可以提前终止。它们不是旨在在需要所有元素的情况下提供时间优势。每个元素暂停和恢复一次生成器非常昂贵。

如果我们用列表推导替换 genexp:

In [66]: def f1(x):
   ....:     return sum(c in '02468' for c in str(x))
   ....: 
In [67]: def f2(x):
   ....:     return sum([c in '02468' for c in str(x)])
   ....: 
In [68]: x = int('1234567890'*50)
In [69]: %timeit f1(x)
10000 loops, best of 5: 52.2 µs per loop
In [70]: %timeit f2(x)
10000 loops, best of 5: 40.5 µs per loop

我们看到了立竿见影的加速，但代价是在列表上浪费了大量内存。

如果您查看您的 genexp 版本:

def count_even_digits_spyr03_sum(n):
    return sum(c in "02468" for c in str(n))

你会发现它没有if。它只是将 bool 值扔到 sum 中。相反，您的循环:

def count_even_digits_spyr03_for(n):
    count = 0
    for c in str(n):
        if c in "02468":
            count += 1
    return count

仅当数字为偶数时才添加任何内容。

如果我们将之前定义的 f2 更改为也包含 if，我们会看到另一个加速:

In [71]: def f3(x):
   ....:     return sum([True for c in str(x) if c in '02468'])
   ....: 
In [72]: %timeit f3(x)
10000 loops, best of 5: 34.9 µs per loop

f1，与您的原始代码相同，耗时 52.2 µs，而 f2，仅更改列表理解，耗时 40.5 µs。

在 f3 中使用 True 而不是 1 可能看起来很尴尬。我在那里写了 True ，因为将其更改为 1 会激活最终的加速。 sum 有一个用于整数的 fast path，但快速路径只对类型为 int 的对象激活。 bool 不算数。这是检查项目是否为 int 类型的行:

if (PyLong_CheckExact(item)) {

一旦我们做出最后的改变，将 True 改为 1:

In [73]: def f4(x):
   ....:     return sum([1 for c in str(x) if c in '02468'])
   ....: 
In [74]: %timeit f4(x)
10000 loops, best of 5: 33.3 µs per loop

我们看到了最后一个小幅加速。

那么，毕竟，我们是否击败了显式循环？

In [75]: def explicit_loop(x):
   ....:     count = 0
   ....:     for c in str(x):
   ....:         if c in '02468':
   ....:             count += 1
   ....:     return count
   ....: 
In [76]: %timeit explicit_loop(x)
10000 loops, best of 5: 32.7 µs per loop

不。我们大致收支平衡，但我们没有打败它。剩下的大问题是列表。构建它很昂贵，并且 sum 必须通过列表迭代器来检索元素，这有其自身的成本(尽管我认为这部分非常便宜)。不幸的是，只要我们通过 test-digits-and-call-sum 方法，我们就没有任何好的方法来摆脱列表。显式循环获胜。

我们还能走得更远吗？好吧，到目前为止，我们一直在尝试使 sum 更接近显式循环，但是如果我们被这个愚蠢的列表所困扰，我们可能会偏离显式循环并调用 len 而不是 sum:

def f5(x):
    return len([1 for c in str(x) if c in '02468'])

单独测试数字也不是我们可以尝试打破循环的唯一方法。进一步偏离显式循环，我们还可以尝试 str.count。 str.count 直接在 C 中迭代字符串的缓冲区，避免了很多包装对象和间接。我们需要调用它 5 次，对字符串进行 5 次传递，但仍然有返回:

def f6(x):
    s = str(x)
    return sum(s.count(c) for c in '02468')

不幸的是，这就是我用来计时的网站因为使用太多资源而让我陷入“tarpit”的时候，所以我不得不切换网站。以下时序无法与上述时序直接比较:

>>> import timeit
>>> def f(x):
...     return sum([1 for c in str(x) if c in '02468'])
... 
>>> def g(x):
...     return len([1 for c in str(x) if c in '02468'])
... 
>>> def h(x):
...     s = str(x)
...     return sum(s.count(c) for c in '02468')
... 
>>> x = int('1234567890'*50)
>>> timeit.timeit(lambda: f(x), number=10000)
0.331528635986615
>>> timeit.timeit(lambda: g(x), number=10000)
0.30292080697836354
>>> timeit.timeit(lambda: h(x), number=10000)
0.15950968803372234
>>> def explicit_loop(x):
...     count = 0
...     for c in str(x):
...         if c in '02468':
...             count += 1
...     return count
... 
>>> timeit.timeit(lambda: explicit_loop(x), number=10000)
0.3305045129964128

关于python - 为什么 `for` 循环计算真值的速度如此之快？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56288015/

python - 为什么 `for` 循环计算真值的速度如此之快？

上一篇：python - 来自两个以上字符串的最长公共(public)子字符串

下一篇：python - 在 Python 中继承类 "object"的目的是什么？