python - 当iterable包含数百万个元素时,是否有zip(* iterable)的替代方法?

标签 python python-3.x optimization iterable-unpacking

我遇到了这样的代码:

from random import randint

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

points = [Point(randint(1, 10), randint(1, 10)) for _ in range(10)]
xs = [point.x for point in points]
ys = [point.y for point in points]
而且我认为这段代码不是Python式的,因为它会重复。如果将另一个维度添加到Point类,则需要编写一个全新的循环,如下所示:
zs = [point.z for point in points]
因此,我尝试通过编写如下代码使其更具有Python风格:
xs, ys = zip(*[(point.x, point.y) for point in p])
如果添加了新尺寸,则没有问题:
xs, ys, zs = zip(*[(point.x, point.y, point.z) for point in p])
但是,当存在数百万个点时,这比其他解决方案的几乎十倍,尽管它只有一个循环。我认为这是因为*运算子需要将数百万个参数解压缩到zip函数,这很可怕。所以我的问题是:
有没有办法更改上面的代码,使其速度与和 Pythonic 一样快(不使用第三方库)?

最佳答案

我只是测试了几种压缩Point坐标的方法,并随着点数的增加寻找它们的性能。
以下是我用来测试的功能:

def hardcode(points):
    # a hand crafted comprehension for each coordinate
    return [point.x for point in points], [point.y for point in points]


def using_zip(points):
    # using the "problematic" qip function
    return zip(*((point.x, point.y) for point in points))


def loop_and_comprehension(points):
    # making comprehension from a list of coordinate names
    zipped = []
    for coordinate in ('x', 'y'):
        zipped.append([getattr(point, coordinate) for point in points])
    return zipped


def nested_comprehension(points):
    # making comprehension from a list of coordinate names using nested
    # comprehensions
    return [
        [getattr(point, coordinate) for point in points]
        for coordinate in ('x', 'y')
    ]
使用timeit,我以不同的点数定时执行了每个函数,结果如下:
comparing processing times using 10 points and 10000000 iterations
hardcode................. 14.12024447 [+0%]
using_zip................ 16.84289724 [+19%]
loop_and_comprehension... 30.83631476 [+118%]
nested_comprehension..... 30.45758349 [+116%]

comparing processing times using 100 points and 1000000 iterations
hardcode................. 9.30594717 [+0%]
using_zip................ 13.74953714 [+48%]
loop_and_comprehension... 19.46766583 [+109%]
nested_comprehension..... 19.27818860 [+107%]

comparing processing times using 1000 points and 100000 iterations
hardcode................. 7.90372457 [+0%]
using_zip................ 12.51523594 [+58%]
loop_and_comprehension... 18.25679913 [+131%]
nested_comprehension..... 18.64352790 [+136%]

comparing processing times using 10000 points and 10000 iterations
hardcode................. 8.27348382 [+0%]
using_zip................ 18.23079485 [+120%]
loop_and_comprehension... 18.00183383 [+118%]
nested_comprehension..... 17.96230063 [+117%]

comparing processing times using 100000 points and 1000 iterations
hardcode................. 9.15848662 [+0%]
using_zip................ 22.70730675 [+148%]
loop_and_comprehension... 17.81126971 [+94%]
nested_comprehension..... 17.86892597 [+95%]

comparing processing times using 1000000 points and 100 iterations
hardcode................. 9.75002857 [+0%]
using_zip................ 23.13891725 [+137%]
loop_and_comprehension... 18.08724660 [+86%]
nested_comprehension..... 18.01269820 [+85%]

comparing processing times using 10000000 points and 10 iterations
hardcode................. 9.96045920 [+0%]
using_zip................ 23.11653558 [+132%]
loop_and_comprehension... 17.98296033 [+81%]
nested_comprehension..... 18.17317708 [+82%]

comparing processing times using 100000000 points and 1 iterations
hardcode................. 64.58698246 [+0%]
using_zip................ 92.53437881 [+43%]
loop_and_comprehension... 73.62493845 [+14%]
nested_comprehension..... 62.99444739 [-2%]

我们可以看到,随着点数的增加,“经过编码”的解决方案与使用gettattr构建的具有理解力的解决方案之间的差距似乎会不断缩小。
因此,对于大量点,最好使用从坐标列表生成的理解:
[[getattr(point, coordinate) for point in points]
 for coordinate in ('x', 'y')]
但是,对于少数几点来说,这是最糟糕的解决方案(至少从我测试过的解决方案来看)。

有关信息,这是我用于运行此基准测试的代码:
import timeit


...


def compare(nb_points, nb_iterations):
    reference = None
    points = [Point(randint(1, 100), randint(1, 100))
              for _ in range(nb_points)]
    print("comparing processing times using {} points and {} iterations"
          .format(nb_points, nb_iterations))

    for func in (hardcode, using_zip, loop_and_comprehension, nested_comprehension):
        duration = timeit.timeit(lambda: func(points), number=nb_iterations)

        print('{:.<25} {:0=2.8f} [{:0>+.0%}]'
              .format(func.__name__, duration,
                      0 if reference is None else (duration / reference - 1)))

        if reference is None:
            reference = duration

    print("-" * 80)



compare(10, 10000000)
compare(100, 1000000)
compare(1000, 100000)
compare(10000, 10000)
compare(100000, 1000)
compare(1000000, 100)
compare(10000000, 10)
compare(100000000, 1)

关于python - 当iterable包含数百万个元素时,是否有zip(* iterable)的替代方法?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63450157/

相关文章:

python - 我可以使用 Python 捕获 OSX 中的键盘和鼠标事件吗?

python - 计算每个数据框行中的表情符号数量

python - (Python) PyInstaller unicode 错误

python - 根据用户输入条件删除行(Pandas、Python 3)

python - 带有 Gif 背景的 PIL PNG

python - 查找行并删除它 - Pandas DataFrame

python - v4l2 文档的示例代码原样(打开为 'rw' )?

c++ - 限制指针和内联

javascript - 在 Canvas 上循环绘制图像 - 我如何优化这段代码?

c - 提高寄存器机虚拟机上循环的简单自制 JIT 的性能