python - 更优雅的方式来实现类似正则表达式的量词

标签 python regex python-itertools

我正在编写一个简单的字符串解析器,它允许使用类似正则表达式的量词。输入字符串可能如下所示:

s = "x y{1,2} z"

我的解析器函数将此字符串转换为元组列表:

list_of_tuples = [("x", 1, 1), ("y", 1, 2), ("z", 1, 1)]

现在,棘手的一点是我需要一个由量化指定的所有有效组合的列表。所有组合都必须具有相同数量的元素,值 None 用于填充。对于给定的示例,预期输出为

[["x", "y", None, "z"], ["x", "y", "y", "z"]]

我确实有一个可行的解决方案,但我对它不是很满意:它使用两个嵌套的 for 循环,而且我发现代码有些晦涩难懂,所以通常有些笨拙和笨拙它:

import itertools

def permute_input(lot):
    outer = []
    # is there something that replaces these nested loops?
    for val, start, end in lot:
        inner = []
        # For each tuple, create a list of constant length
        # Each element contains a different number of 
        # repetitions of the value of the tuple, padded
        # by the value None if needed.
        for i in range(start, end + 1):
            x = [val] * i + [None] * (end - i)
            inner.append(x)
        outer.append(inner)
    # Outer is now a list of lists.

    final = []
    # use itertools.product to combine the elements in the
    # list of lists:
    for combination in itertools.product(*outer):
        # flatten the elements in the current combination,
        # and append them to the final list:
        final.append([x for x 
                    in itertools.chain.from_iterable(combination)])
    return final

print(permute_input([("x", 1, 1), ("y", 1, 2), ("z", 1, 1)]))
[['x', 'y', None, 'z'], ['x', 'y', 'y', 'z']]

我怀疑有一种更优雅的方法可以做到这一点,可能隐藏在 itertools 模块的某个地方?

最佳答案

解决该问题的另一种方法是使用 pyparsing还有这个example regex parser这会将正则表达式扩展为可能的匹配字符串。对于您的 x y{1,2} z 示例字符串,它将生成两个扩展量词的可能字符串:

$ python -i regex_invert.py 
>>> s = "x y{1,2} z"
>>> for item in invert(s):
...     print(item)
... 
x y z
x yy z

重复本身支持开放范围和封闭范围,定义为:

repetition = (
    (lbrace + Word(nums).setResultsName("count") + rbrace) |
    (lbrace + Word(nums).setResultsName("minCount") + "," + Word(nums).setResultsName("maxCount") + rbrace) |
    oneOf(list("*+?"))
)

为了获得所需的结果,我们应该修改从 recurseList 生成器生成结果的方式,并返回列表而不是字符串:

for s in elist[0].makeGenerator()():
    for s2 in recurseList(elist[1:]):
        yield [s] + [s2]  # instead of yield s + s2

然后,我们只需要 flatten the result :

$ ipython3 -i regex_invert.py 

In [1]: import collections

In [2]: def flatten(l):
   ...:     for el in l:
   ...:         if isinstance(el, collections.Iterable) and not isinstance(el, (str, bytes)):
   ...:             yield from flatten(el)
   ...:         else:
   ...:             yield el
   ...:             

In [3]: s = "x y{1,2} z"

In [4]: for option in invert(s):
   ...:     print(list(flatten(option)))
   ...: 
['x', ' ', 'y', None, ' ', 'z']
['x', ' ', 'y', 'y', ' ', 'z']

然后,如果需要,您可以过滤空白字符:

In [5]: for option in invert(s):
   ...:     print([item for item in flatten(option) if item != ' '])
   ...:     
['x', 'y', None, 'z']
['x', 'y', 'y', 'z']

关于python - 更优雅的方式来实现类似正则表达式的量词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41790538/

相关文章:

python - 在 python 中打印数字

python - 如何通过响应式进度条重复使用 Qthread

mysql 与正则表达式精确匹配?

python - matplotlib.pyplot 绘制 y 标签的错误顺序

python - 将 JSON 数据插入 sqlite - 操作错误 : unrecognized token "{"

python - 如何在 NLTK 中使用 word_tokenize 忽略单词之间的标点符号?

ruby - 如何从字符串中删除某些字符?

javascript - 获取字符串中两个符号之间的字符串并将它们插入数组

Python Itertools 字符串排列

python - itertools.islice 不是生成器?