我有一个包含不同值(数字和非数字混合)的字符串。我希望能够从文本中提取值。我不知道如何提取所有(或大部分)可能的案例。我有一个像这样的部分工作解决方案,
def extract_values(sentence):
#sentence = normalizeString(sentence)
matches = re.findall(r'((\d*\.?\d+(?:\/\d*\.?\d+)?)(?:\s+and\s+(\d*\.?\d+(?:\/\d*\.?\d+)?))?)', sentence)
# (\d\sto\s\d\s(and\s\d\/\d)*) << for adding 9 to 11, couldn't fix
result = []
for x,y,z in matches:
if '/' in x:
result.append(x)
else:
result.extend(filter(lambda x: x!="", [y,z]))
return result
驱动代码,
extract_values("He is 1 and 1/2 years old. He is .5 years old and he is 5 years old. He is between 9 to 11 or 9 to 9 and 1/2. He was born 11/12/20")
错误答案:
['1 and 1/2', '5', '5', '9', '11', '9', '9 and 1/2', '11/12', '20']
预期答案:
['1 and 1/2', '.5', '5', '9 to 11', '9 to 9 and 1/2', '11/12/20']
请注意 5 和 .5 以及 'x to y' 和 'x to y and z' 之间的区别
如果有任何帮助,我将不胜感激。谢谢。
最佳答案
你可以使用
import re
def extract_values(sentence):
num = r'\d*\.?\d+(?:/\d*\.?\d+)*'
return re.findall(fr'{num}(?:\s+(?:and|to)\s+{num})*', sentence)
print(extract_values("He is 1 and 1/2 years old. He is .5 years old and he is 5 years old. He is between 9 to 11 or 9 to 9 and 1/2. He was born 11/12/20"))
# => ['1 and 1/2', '.5', '5', '9 to 11', '9 to 9 and 1/2', '11/12/20']
参见 Python demo , 和 regex demo .
详细信息:
\d*\.?\d+(?:/\d*\.?\d+)*
- 一个 float /整数,然后出现零次或多次/
和一个 float /整数(?:\s+(?:and|to)\s+\d*\.?\d+(?:/\d*\.?\d+)*)*
- 零或多次出现\s+(?:and|to)\s+
-and
或to
包含一个或多个空格\d*\.?\d+(?:/\d*\.?\d+)*
- 一个 float /整数,然后出现零次或多次/
和一个 float /整数。
关于python - 如何从字符串中提取所有类似数字的值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70462846/