python - 如何让python重新将连续匹配模式选项视为模式的单次出现?

标签 python regex

我正在开发一个聊天机器人,我想在其中嵌入某些规则。其中之一是解析这样的问题:

“一万二千三百四加两百五十六等于多少?” 或者 “五百八十九除以 89 等于多少?”

我有以下代码:

import re

pat_num = re.compile(r'((\b(zero|one|two|three|four|five|'
                     r'six|seven|eight|nine|ten|eleven|'
                     r'twelve|thirteen|fourteen|fifteen|sixteen|'
                     r'seventeen|eighteen|nineteen|twenty|thirty|'
                     r'forty|fifty|sixty|seventy|eighty|'
                     r'ninety|hundred|thousand|million|billion|'
                     r'trillion)\b)+|\d+)')
ind_list = [(m.start(0), m.end(0)) for m in re.finditer(pat_num, sentence)]

我希望两个句子都返回两个数字。例如,对于第一句话,它应该返回数字的索引:一万二千三百四和两百五十六。

但是,它会为第一个返回 9 个数字/匹配项,分别是:十二、千、三、百、四、二、百、五十、六。

如何更改正则表达式以使其返回 2 个数字?

非常感谢您的帮助!

最佳答案

以防万一您想要实际的索引而不是匹配的文本本身,只需一点点前瞻,它应该非常简单:

# easier to manage as a list
numerals = ["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine",
            "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen",
            "seventeen", "eighteen", "nineteen", "twenty", "thirty", "fourty", "fifty",
            "sixty", "seventy", "eighty", "ninety", "hundred", "thousand", "million",
            "billion", "trillion"]

pattern = re.compile(r"((({})\s*)+)(?=\s|$)|\d+".format("|".join(numerals)))  # all together

然后您可以将其测试为:

sentence = "How much is twelve thousand three hundred four plus two hundred fifty six?"
print([(m.start(0), m.end(0)) for m in re.finditer(pattern, sentence)])
# [(12, 46), (52, 69)]

sentence = "What is five hundred eighty nine divided by 89?"
print([(m.start(0), m.end(0)) for m in re.finditer(pattern, sentence)])
# [(8, 32), (44, 46)]

关于python - 如何让python重新将连续匹配模式选项视为模式的单次出现?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45069015/

相关文章:

python - 无法获取根窗口调整大小事件

regex - 如何处理 Google App Engine app.yaml 中的尾部斜杠

regex - Ruby\\\' 给出了意想不到的值(value)

python - Flask post/redirect/get 模式无法从错误中恢复

python - 在python中,如何对没有返回值的函数进行单元测试?

python - 无法解析余数 : '{{' from '{{'

python - Pylint 提示 wxPython - 'Too many public methods'

Javascript正则表达式用新词替换多个词

JavaScript 字符串替换函数丢失类对象

mysql - 将 String 替换为 Regexp 以获取从 Robot 框架中的 MySQL Query 获取的值