python 列表理解与 +=

今天我试图找到一种方法，在 python 中对字符串进行一些处理。一些比我更高级的程序员据说不使用 += 但使用 ''.join() 我也可以在例如:http://wiki.python.org/moin/PythonSpeed/#Use_the_best_algorithms_and_fastest_tools 中阅读这个. 但是我自己测试了这个并发现了一些奇怪的结果(这不是我试图猜测它们而是我想理解)。这个想法是，如果有一个字符串 "This is\"an example text\" containing spaces"该字符串应该转换为 This"an example text"containingspaces 空格被删除，但仅在引号之外。

我测量了我算法的两个不同版本的性能，一个使用 ''.join(list)，一个使用 +=

import time

#uses '+=' operator
def strip_spaces ( s ):
    ret_val = ""
    quote_found = False
    for i in s:
        if i == '"':
            quote_found = not quote_found

        if i == ' ' and quote_found == True:
            ret_val += i

        if i != ' ':
            ret_val += i
    return ret_val

#uses "".join ()   
def strip_spaces_join ( s ):
    #ret_val = ""
    ret_val = []
    quote_found = False
    for i in s:
        if i == '"':
            quote_found = not quote_found

        if i == ' ' and quote_found == True:
            #ret_val = ''.join( (ret_val, i) )
            ret_val.append(i)

        if i != ' ':
            #ret_val = ''.join( (ret_val,i) )
            ret_val.append(i)
    return ''.join(ret_val)


def time_function ( function, data):
    time1 = time.time();
    function(data)
    time2 = time.time()
    print "it took about {0} seconds".format(time2-time1)

在我的机器上，这产生了这个输出，对于使用 +=

的算法来说有一个小优势

print '#using += yields ', timeit.timeit('f(string)', 'from __main__ import string, strip_spaces as f', number=1000)
print '#using \'\'.join() yields ', timeit.timeit('f(string)', 'from __main__ import string, strip_spaces_join as f', number=1000)

当用 timeit 计时时:

#using += yields  0.0130770206451
#using ''.join() yields  0.0108470916748

区别真的很小。但是为什么 ''.join() 没有明显地执行使用 += 的功能，但是 ''.join() 似乎有一个小优势版本。我在 Ubuntu 12.04 上用 python-2.7.3 测试了这个

最佳答案

在比较算法时一定要使用正确的方法；使用 timeit module以消除 CPU 利用率和交换的波动。

使用 timeit 表明这两种方法之间几乎没有区别，但是 ''.join() 稍微更快:

>>> s = 1000 * string
>>> timeit.timeit('f(s)', 'from __main__ import s, strip_spaces as f', number=100)
1.3209099769592285
>>> timeit.timeit('f(s)', 'from __main__ import s, strip_spaces_join as f', number=100)
1.2893600463867188
>>> s = 10000 * string
>>> timeit.timeit('f(s)', 'from __main__ import s, strip_spaces as f', number=100)
14.545105934143066
>>> timeit.timeit('f(s)', 'from __main__ import s, strip_spaces_join as f', number=100)
14.43651008605957

函数中的大部分工作是遍历每个字符并测试引号和空格，而不是字符串连接本身。此外，''.join() 变体做了更多的工作；您首先将元素附加到列表(这取代了 += 字符串连接操作)，然后您在末尾使用 '' 连接这些值。加入()。而且该方法仍然稍微快一些。

您可能想要剥离正在完成的工作以比较只是连接部分:

def inplace_add_concatenation(s):
    res = ''
    for c in s:
        res += c

def str_join_concatenation(s):
    ''.join(s)

显示:

>>> s = list(1000 * string)
>>> timeit.timeit('f(s)', 'from __main__ import s, inplace_add_concatenation as f', number=1000)
6.113742113113403
>>> timeit.timeit('f(s)', 'from __main__ import s, str_join_concatenation as f', number=1000)
0.6616439819335938

这表明 ''.join() 连接仍然比 += 快很多 heck。速度差异在于循环； s 在这两种情况下都是一个列表，但是 ''.join() 循环遍历 C 中的值，而另一个版本必须完成它在 Python 中循环的所有操作。这让一切变得不同。

关于python 列表理解与 +=，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16781032/

python 列表理解与 +=

上一篇：python - init() 的类型错误

下一篇：python - 按随机顺序做事？

python 列表理解与 +=

上一篇：python - __init__() 的类型错误

下一篇：python - 按随机顺序做事？

上一篇：python - init() 的类型错误