python - 如何从字符串中提取所有类似数字的值？

我有一个包含不同值(数字和非数字混合)的字符串。我希望能够从文本中提取值。我不知道如何提取所有(或大部分)可能的案例。我有一个像这样的部分工作解决方案，

def extract_values(sentence):
    #sentence = normalizeString(sentence)
    matches = re.findall(r'((\d*\.?\d+(?:\/\d*\.?\d+)?)(?:\s+and\s+(\d*\.?\d+(?:\/\d*\.?\d+)?))?)', sentence)    
    # (\d\sto\s\d\s(and\s\d\/\d)*) << for adding 9 to 11, couldn't fix

    result = []
    for x,y,z in matches:
        if '/' in x:
            result.append(x)
        else:
            result.extend(filter(lambda x: x!="", [y,z]))
    return result

驱动代码，

extract_values("He is 1 and 1/2 years old. He is .5 years old and he is 5 years old. He is between 9 to 11 or 9 to 9 and 1/2. He was born 11/12/20")

错误答案:

['1 and 1/2', '5', '5', '9', '11', '9', '9 and 1/2', '11/12', '20']

预期答案:

['1 and 1/2', '.5', '5', '9 to 11', '9 to 9 and 1/2', '11/12/20']

请注意 5 和 .5 以及 'x to y' 和 'x to y and z' 之间的区别

如果有任何帮助，我将不胜感激。谢谢。

最佳答案

你可以使用

import re

def extract_values(sentence):
   num = r'\d*\.?\d+(?:/\d*\.?\d+)*'
   return re.findall(fr'{num}(?:\s+(?:and|to)\s+{num})*', sentence)

print(extract_values("He is 1 and 1/2 years old. He is .5 years old and he is 5 years old. He is between 9 to 11 or 9 to 9 and 1/2. He was born 11/12/20"))
# => ['1 and 1/2', '.5', '5', '9 to 11', '9 to 9 and 1/2', '11/12/20']

参见 Python demo , 和 regex demo .

详细信息:

\d*\.?\d+(?:/\d*\.?\d+)* - 一个 float /整数，然后出现零次或多次 / 和一个 float /整数
(?:\s+(?:and|to)\s+\d*\.?\d+(?:/\d*\.?\d+)*)* - 零或多次出现
- \s+(?:and|to)\s+ - and 或 to 包含一个或多个空格
- \d*\.?\d+(?:/\d*\.?\d+)* - 一个 float /整数，然后出现零次或多次 / 和一个 float /整数。

关于python - 如何从字符串中提取所有类似数字的值？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/70462846/

python - 如何从字符串中提取所有类似数字的值？

上一篇：javascript - getBoundingClientRect() 在 Chrome 中为复杂的 SVG 返回不准确的值

下一篇：javascript - 制作一个像 VS Code 一样的水平控制台窗口