python - 使用迭代器的最快(最 Pythonic)方式

我很好奇使用迭代器最快的方式是什么，并且是最 Pythonic 的方式。

例如，假设我想用 map 内置函数创建一个迭代器，它会累积一些东西作为副作用。我实际上并不关心 map 的结果，只关心副作用，所以我想以尽可能少的开销或样板来完成迭代。像这样的东西:

my_set = set()
my_map = map(lambda x, y: my_set.add((x, y)), my_x, my_y)

在这个例子中，我只是想通过迭代器来积累my_set中的东西，而my_set只是一个空集，直到我真正运行我的 map 。像这样的东西:

for _ in my_map:
    pass

或裸体

[_ for _ in my_map]

有效，但它们都感觉笨拙。是否有更 Pythonic 的方法来确保迭代器快速迭代，以便您可以从一些副作用中获益？

基准

我在以下方面测试了上述两种方法:

my_x = np.random.randint(100, size=int(1e6))
my_y = np.random.randint(100, size=int(1e6))

使用上面定义的 my_set 和 my_map。我用 timeit 得到了以下结果:

for _ in my_map:
    pass
468 ms ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

[_ for _ in my_map]
476 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

两者之间没有真正的区别，而且都感觉很笨重。

请注意，我使用 list(my_map) 获得了类似的性能，这是评论中的建议。

最佳答案

虽然您不应该仅仅为了副作用而创建 map 对象，但实际上在 itertools docs 中有一个使用迭代器的标准方法。 :

def consume(iterator, n=None):
    "Advance the iterator n-steps ahead. If n is None, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

对于“完全消费”的情况，这可以简化为

def consume(iterator):
    collections.deque(iterator, maxlen=0)

以这种方式使用 collections.deque 避免存储所有元素(因为 maxlen=0)并以 C 速度迭代，没有字节码解释开销。甚至还有一个 dedicated fast path在双端队列实现中使用 maxlen=0 双端队列来使用迭代器。

时间:

In [1]: import collections

In [2]: x = range(1000)

In [3]: %%timeit
   ...: i = iter(x)
   ...: for _ in i:
   ...:     pass
   ...: 
16.5 µs ± 829 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [4]: %%timeit
   ...: i = iter(x)
   ...: collections.deque(i, maxlen=0)
   ...: 
12 µs ± 566 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

当然，这都是基于CPython的。解释器开销的整个性质在其他 Python 实现上非常不同，maxlen=0 快速路径特定于 CPython。参见 abarnert's answer用于其他 Python 实现。

关于python - 使用迭代器的最快(最 Pythonic)方式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50937966/

python - 使用迭代器的最快(最 Pythonic)方式

基准

上一篇：python - `if name == "__main_ _": ` 这样的成语有设计模式的名字吗？

下一篇：python - Pandas - Dataframe.set_index - 如何保留旧索引列

python - 使用迭代器的最快(最 Pythonic)方式

基准

上一篇：python - `if __name__ == "__main_ _": ` 这样的成语有设计模式的名字吗？

下一篇：python - Pandas - Dataframe.set_index - 如何保留旧索引列

上一篇：python - `if name == "__main_ _": ` 这样的成语有设计模式的名字吗？