我正在开发一个聊天机器人,我想在其中嵌入某些规则。其中之一是解析这样的问题:
“一万二千三百四加两百五十六等于多少?” 或者 “五百八十九除以 89 等于多少?”
我有以下代码:
import re
pat_num = re.compile(r'((\b(zero|one|two|three|four|five|'
r'six|seven|eight|nine|ten|eleven|'
r'twelve|thirteen|fourteen|fifteen|sixteen|'
r'seventeen|eighteen|nineteen|twenty|thirty|'
r'forty|fifty|sixty|seventy|eighty|'
r'ninety|hundred|thousand|million|billion|'
r'trillion)\b)+|\d+)')
ind_list = [(m.start(0), m.end(0)) for m in re.finditer(pat_num, sentence)]
我希望两个句子都返回两个数字。例如,对于第一句话,它应该返回数字的索引:一万二千三百四和两百五十六。
但是,它会为第一个返回 9 个数字/匹配项,分别是:十二、千、三、百、四、二、百、五十、六。
如何更改正则表达式以使其返回 2 个数字?
非常感谢您的帮助!
最佳答案
以防万一您想要实际的索引而不是匹配的文本本身,只需一点点前瞻,它应该非常简单:
# easier to manage as a list
numerals = ["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine",
"ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen",
"seventeen", "eighteen", "nineteen", "twenty", "thirty", "fourty", "fifty",
"sixty", "seventy", "eighty", "ninety", "hundred", "thousand", "million",
"billion", "trillion"]
pattern = re.compile(r"((({})\s*)+)(?=\s|$)|\d+".format("|".join(numerals))) # all together
然后您可以将其测试为:
sentence = "How much is twelve thousand three hundred four plus two hundred fifty six?"
print([(m.start(0), m.end(0)) for m in re.finditer(pattern, sentence)])
# [(12, 46), (52, 69)]
sentence = "What is five hundred eighty nine divided by 89?"
print([(m.start(0), m.end(0)) for m in re.finditer(pattern, sentence)])
# [(8, 32), (44, 46)]
关于python - 如何让python重新将连续匹配模式选项视为模式的单次出现?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45069015/