python - 为什么 str.strip() 比 str.strip (' ' 快得多)？

使用 str.strip 可以通过两种方式在空白处进行分割。 。您可以发出不带参数的调用，str.strip()，默认使用空格分隔符，或者自己使用 str.strip(' ')< 显式提供参数。

但是，为什么这些功能在计时时表现如此不同？

使用带有有意空格的示例字符串:

s = " " * 100 + 'a' + " " * 100

s.strip()和s.strip(' ')的时序分别为:

%timeit s.strip()
The slowest run took 32.74 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 396 ns per loop

%timeit s.strip(' ')
100000 loops, best of 3: 4.5 µs per loop

strip 需要 396ns 而 strip(' ') 需要 4.5 μs，类似的场景是rstrip 和 lstrip 条件相同。另外，bytes objects seem do be affected too .

时间是针对 Python 3.5.2 执行的，而在 Python 2.7.1 上，差异不那么剧烈。 docs on str.strip不要指出任何有用的东西，所以，为什么会发生这种情况？

最佳答案

以 tl;dr 的方式:

这是因为两种不同情况存在两个函数，如 unicode_strip 所示。 ; do_strip和 _PyUnicodeXStrip第一个执行比第二个快得多。

功能 do_strip 适用于常见情况 str.strip()其中不存在参数和 do_argstrip (包装 _PyUnicode_XStrip )对于 str.strip(arg) 的情况被调用，即提供参数。

do_argstrip只检查分隔符是否有效且不等于 None (在这种情况下它调用 do_strip )它调用 _PyUnicode_XStrip .

两者do_strip和 _PyUnicode_XStrip遵循相同的逻辑，使用两个计数器，一个等于 0，另一个等于字符串的长度。

使用两个 while循环，第一个计数器递增，直到达到不等于分隔符的值，第二个计数器递减，直到满足相同的条件。

区别在于检查当前字符是否不等于分隔符的方式。

对于 `do_strip` :

在最常见的情况下，要拆分的字符串中的字符可以用ascii 表示。存在额外的小幅性能提升。

while (i < len) {
    Py_UCS1 ch = data[i];
    if (!_Py_ascii_whitespace[ch])
        break;
    i++;
}

通过访问底层数组可以快速访问数据中的当前字符:Py_UCS1 ch = data[i];
检查一个字符是否为空格是通过一个简单的数组索引到一个名为 _Py_ascii_whitespace[ch] 的数组中进行的。 .

所以，简而言之，效率很高。

如果字符不在ascii范围，差异并没有那么大，但它们确实减慢了整体执行速度:

while (i < len) {
    Py_UCS4 ch = PyUnicode_READ(kind, data, i);
    if (!Py_UNICODE_ISSPACE(ch))
        break;
    i++;
}

使用 Py_UCS4 ch = PyUnicode_READ(kind, data, i); 进行访问
检查字符是否为空格由 Py_UNICODE_ISSPACE(ch) 完成。宏(它只是调用另一个宏: Py_ISSPACE )

对于 `_PyUnicodeXStrip` :

在这种情况下，访问基础数据就像在前一种情况下一样，使用 PyUnicode_Read 完成。 ;另一方面，检查字符是否为空格(或者实际上是我们提供的任何字符)的检查相当复杂。

while (i < len) {
     Py_UCS4 ch = PyUnicode_READ(kind, data, i);
     if (!BLOOM(sepmask, ch))
         break;
     if (PyUnicode_FindChar(sepobj, ch, 0, seplen, 1) < 0)
         break;
     i++;
}

PyUnicode_FindChar 使用，虽然效率很高，但与数组访问相比要复杂得多，速度也慢得多。对于字符串中的每个字符，都会调用它来查看该字符是否包含在我们提供的分隔符中。随着字符串长度的增加，连续调用此函数引入的开销也会增加。

对于那些感兴趣的人，PyUnicode_FindChar经过相当多的检查，最终会调用 find_char 里面 stringlib在分隔符长度为 < 10 的情况下将循环直到找到字符。

除此之外，请考虑需要已经调用才能到达此处的其他函数。

至于lstrip和 rstrip ，情况类似。存在要执行的 strip 化模式的标志，即:RIGHTSTRIP对于 rstrip , LEFTSTRIP对于 lstrip和 BOTHSTRIP对于 strip . do_strip里面的逻辑和 _PyUnicode_XStrip根据标志有条件地执行。

关于python - 为什么 str.strip() 比 str.strip (' ' 快得多)？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38285654/

python - 为什么 str.strip() 比 str.strip (' ' 快得多)？

以 tl;dr 的方式:

对于 `do_strip` :

对于 `_PyUnicodeXStrip` :

上一篇：python - 有什么理由给 self 一个默认值吗？

下一篇：python - 与 numpy.eye 相比，使用 numpy.identity 有什么优势？

python - 为什么 str.strip() 比 str.strip (' ' 快得多)？

以 tl;dr 的方式:

对于 do_strip :

对于 _PyUnicodeXStrip :

上一篇：python - 有什么理由给 self 一个默认值吗？

下一篇：python - 与 numpy.eye 相比，使用 numpy.identity 有什么优势？

对于 `do_strip` :

对于 `_PyUnicodeXStrip` :