python - 在pyparsing中匹配大量包含空格的字符串

标签 python performance parsing case-insensitive pyparsing

与 pyparsing我需要为像

这样的表达式编写一个匹配器

a + names + c

与

a = pp.OneOrMore(pp.Word(pp.alphas))
c = pp.OneOrMore(pp.Word(pp.nums))

和names匹配字符串列表names_list中的多个条目之一。

这两个并发症是:

names_list 中的条目可以包含空格。

匹配需要不区分大小写。

names_list 相当大(约 20000 个条目)

我试过了

names_kw_list = [pp.Keyword(name, caseless=True) for name in names_list ]
names = pp.Or(names_kw_list)

这不适用于带有空格的条目，而且我担心这不是一种非常高效的编写方式。

有什么想法可以让它适用于条目中的空格并可能使其执行得更快吗？

最佳答案

部分答案:

空格问题可以通过正确的 stopOn 函数解决:

def last_occurrence_of(expr):
    return expr + ~pp.FollowedBy(pp.SkipTo(expr))

names_kw_list = [pp.Keyword(word, caseless=True)
                                       for word in names_list ]
names = pp.Or(names_kw_list)("names")
a = pp.OneOrMore(pp.Word(pp.alphas), stopOn=last_occurrence_of(names))("A")
c = pp.OneOrMore(pp.Word(pp.nums))("C")

expr = a + names + c

这指示a不要吃掉names字符串。

但是性能会下降，因为现在 stopOn 表达式中使用了长名称列表。

关于python - 在pyparsing中匹配大量包含空格的字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41736402/

上一篇：python - Flask 和 MongoDB - For 循环不工作

下一篇：python - 将 GraphML 文件转换为另一个文件

相关文章：

python - python 3.2中的奇怪错误

python - 为什么向量化的 numpy 代码比 for 循环慢？

c - 将所有 C 代码写在一个源文件中真的会使程序运行得更快吗？

algorithm - 无向对状态

parsing - 使用列表函数提取字符串的中间部分

c - *int 是什么意思？

python - 有没有办法在 python 脚本中获取变量的所有值？

python - 在列表中分隔列表

python - 使用 Python 对 Google Storage 进行身份验证

mysql - `=` 和 `<=>` 之间有性能差异吗？