python - 直接调用和分配给变量之间的速度差异

标签 python performance python-3.x

情况:考虑以下两个 Python 代码片段:-

代码1:

for root, dirs, files in os.walk(top):
    for f in files:
        path = os.path.join(root, f)
        print(path)

代码2:

for root, dirs, files in os.walk(top):
    for f in files:
        print(os.path,join(root,f))

问题:如果我不将文件路径声明为变量,性能或速度方面是否会有任何差异(假设我只会使用它一次 - 如果使用多次声明变量更有意义)

最佳答案

除了使用 timeit对于简单的基准测试,您可以pytest-benchmark,这使得创建比较变得非常简单,只需:

import os

def f1(top):
    for root, dirs, files in os.walk(top):
        for f in files:
            path = os.path.join(root, f)
            print(path)

def f2(top):
    for root, dirs, files in os.walk(top):
        for f in files:
            print(os.path.join(root, f))

def test_f1(benchmark):
    benchmark(f1, '~/tmp')

def test_f2(benchmark):
    benchmark(f2, '~/tmp')

注意:~/tmp 包含 350 个文件/文件夹,YMMV。运行

python -m pytest test.py --benchmark-min-time=0.001 --benchmark-histogram=hist

为您提供良好的数据和直方图:

----------------------------------------------------------------------- benchmark: 2 tests ----------------------------------------------------------------------
Name (time in us)        Min               Max              Mean            StdDev            Median               IQR            Outliers(*)  Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
test_f1               4.4811 (1.0)      8.6253 (1.0)      4.7941 (1.00)     0.3531 (1.0)      4.7141 (1.01)     0.2762 (1.31)            15;7     216        1000
test_f2               4.4967 (1.00)     9.3009 (1.08)     4.7773 (1.0)      0.5242 (1.48)     4.6838 (1.0)      0.2113 (1.0)             6;13     215        1000
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

benchmark histogram

如您所见,考虑到高方差,差异并不显着。

现在,如果您仍然好奇,可以使用 dis显示 CPython 正在执行的字节码。这是 CPython 解释器的功能,这是运行 python 代码的最常见方式:

In [1]: import os, dis

In [2]: def f1(top):
   ...:     for root, dirs, files in os.walk(top):
   ...:         for f in files:
   ...:             path = os.path.join(root, f)
   ...:             print(path)
   ...:             

In [3]: def f2(top):
   ...:     for root, dirs, files, in os.walk(top):
   ...:         for f in files:
   ...:             print(os.path.join(root, f))
   ...:             

In [4]: dis.dis(f1)
  2           0 SETUP_LOOP              60 (to 62)
              2 LOAD_GLOBAL              0 (os)
              4 LOAD_ATTR                1 (walk)
              6 LOAD_FAST                0 (top)
              8 CALL_FUNCTION            1
             10 GET_ITER
        >>   12 FOR_ITER                46 (to 60)
             14 UNPACK_SEQUENCE          3
             16 STORE_FAST               1 (root)
             18 STORE_FAST               2 (dirs)
             20 STORE_FAST               3 (files)

  3          22 SETUP_LOOP              34 (to 58)
             24 LOAD_FAST                3 (files)
             26 GET_ITER
        >>   28 FOR_ITER                26 (to 56)
             30 STORE_FAST               4 (f)

  4          32 LOAD_GLOBAL              0 (os)
             34 LOAD_ATTR                2 (path)
             36 LOAD_ATTR                3 (join)
             38 LOAD_FAST                1 (root)
             40 LOAD_FAST                4 (f)
             42 CALL_FUNCTION            2
             44 STORE_FAST               5 (path)

  5          46 LOAD_GLOBAL              4 (print)
             48 LOAD_FAST                5 (path)
             50 CALL_FUNCTION            1
             52 POP_TOP
             54 JUMP_ABSOLUTE           28
        >>   56 POP_BLOCK
        >>   58 JUMP_ABSOLUTE           12
        >>   60 POP_BLOCK
        >>   62 LOAD_CONST               0 (None)
             64 RETURN_VALUE

In [5]: dis.dis(f2)
  2           0 SETUP_LOOP              56 (to 58)
              2 LOAD_GLOBAL              0 (os)
              4 LOAD_ATTR                1 (walk)
              6 LOAD_FAST                0 (top)
              8 CALL_FUNCTION            1
             10 GET_ITER
        >>   12 FOR_ITER                42 (to 56)
             14 UNPACK_SEQUENCE          3
             16 STORE_FAST               1 (root)
             18 STORE_FAST               2 (dirs)
             20 STORE_FAST               3 (files)

  3          22 SETUP_LOOP              30 (to 54)
             24 LOAD_FAST                3 (files)
             26 GET_ITER
        >>   28 FOR_ITER                22 (to 52)
             30 STORE_FAST               4 (f)

  4          32 LOAD_GLOBAL              2 (print)
             34 LOAD_GLOBAL              0 (os)
             36 LOAD_ATTR                3 (path)
             38 LOAD_ATTR                4 (join)
             40 LOAD_FAST                1 (root)
             42 LOAD_FAST                4 (f)
             44 CALL_FUNCTION            2
             46 CALL_FUNCTION            1
             48 POP_TOP
             50 JUMP_ABSOLUTE           28
        >>   52 POP_BLOCK
        >>   54 JUMP_ABSOLUTE           12
        >>   56 POP_BLOCK
        >>   58 LOAD_CONST               0 (None)
             60 RETURN_VALUE

所以第一个代码确实产生了更多的字节码指令。

无论如何,你应该考虑profiling - 确保您查看真正相关的代码部分,并避免盲目优化。

关于python - 直接调用和分配给变量之间的速度差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44104630/

相关文章:

python - 如何在模型中使用 django.contrib. humanize

python - 将 pandas 表写入 impala

java - java中迭代列表的各种方法的性能评估

c - 一种基于另一个变量对某个值执行 MPI 全部缩减的有效方法?

c - 是否像在 JavaScript 中一样,在 C 中创建函数会导致性能下降?

python - fastapi - 从 main.py 导入配置

python - 如何在父@classmethod 中引用子类

python - Pandas 重新采样 numpy 数组

不透明度的python matplotlib图例

python - (Python) 使用 UTF-8 编码将字符串写入 CSV