Python字符串 'in'算子实现算法和时间复杂度

标签 python string algorithm cpython

我在想 in 运算符是如何实现的,例如

>>> s1 = 'abcdef'
>>> s2 = 'bcd'
>>> s2 in s1
True

在CPython中,用什么算法来实现字符串匹配,时间复杂度是多少?有没有关于这个的官方文档或维基?

最佳答案

它是 Boyer-Moore 的组合和 Horspool .

可以查看C代码here :

Fast search/count implementation, based on a mix between Boyer-Moore and Horspool, with a few more bells and whistles on the top. For some more background, see: https://web.archive.org/web/20201107074620/http://effbot.org/zone/stringlib.htm.

从上面的链接:

When designing the new algorithm, I used the following constraints:

  • should be faster than the current brute-force algorithm for all test cases (based on real-life code), including Jim Hugunin’s worst-case test
  • small setup overhead; no dynamic allocation in the fast path (O(m) for speed, O(1) for storage)
  • sublinear search behaviour in good cases (O(n/m))
  • no worse than the current algorithm in worst case (O(nm))
  • should work well for both 8-bit strings and 16-bit or 32-bit Unicode strings (no O(σ) dependencies)
  • many real-life searches should be good, very few should be worst case
  • reasonably simple implementation

关于Python字符串 'in'算子实现算法和时间复杂度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18139660/

相关文章:

python - Pandas 使用日期作为索引加入/合并 2 个数据帧

python - 在 Python 中,避免对 __init__ 参数和实例变量使用相同名称的最佳方法是什么?

python - 读取 CSV 中的特定数据

c - 查找 2 个大号(每个 10000 位)的乘积的程序

Python Pandas - 将两列按不同方向分组

string - Excel - 从逗号分隔列表中查找确切的字符串

C++ 不从数据转换字符串

java - 为什么字符串类根据给定的字符串值创建哈希码?

algorithm - 如何找到所有可能的单词?

javascript - 动态地向 JSON 子数组添加值?