我有一堆文件,例如
{'Age': '27 yo'},
{'Age': '81.0'},
{'Age': '15 male'},
{'Age': '25 years old'},
{'Age': 'unknown'}
我想使用 MongoDB 聚合管道(在 PyMongo 中)替换所有不相关的字符串内容,为尝试转换为类似整数的年龄做准备。输出如下:
{'Age': '27 yo', 'Age_Standardized': 27},
{'Age': '81.0', 'Age_Standardized': 81},
{'Age': '15 male', 'Age_Standardized': 15},
{'Age': '25 years old', 'Age_Standardized': 25},
{'Age': 'unknown', 'Age_Standardized': None}
是否有一种简单的方法可以聚合进行多次替换?
cursor = db.aggregate([
{'$match':
{match query}
},
{'$set':
{'Age_Standardized':
{replace 'male' with ''},
{replace 'yo' with ''},
{replace 'years old' with ''},
}
}
])
*编辑:字符串比我用 regexFind 简单提取数字更困惑。首先,我需要运行一些功能排除。相当于下面的 Python 函数:
def clean_age(self, s):
if not any(str.isdigit(char) for char in s): return None
ign = ['year', 'years', 'yr', 'yrs', 'old', 'yo', 'y.o.', 'y.o', 'male', 'female', '`', '=']
for word in ign:
s = s.replace(word, '')
if '/' in s or re.match(r'\d+-\d+-\d+', s): return None
if re.match(r'\d{2}\ss', s): s = s.replace('s', '')
if re.match(r'\d{2}\sfe', s): s = s.replace('fe', '')
s = s.lstrip('-').rstrip('-')
s = s.strip()
try: return int(float(s))
except: return None
最佳答案
演示 - https://mongoplayground.net/p/iLEI2lVhpgF
使用$regexFind从文本中提取数字
db.collection.aggregate([
{
$set: {
Age_Standardized: {
$regexFind: { input: "$Age", regex: "[0-9]+" }
}
}
},
{
$set: {
Age_Standardized: {
$ifNull: [ "$Age_Standardized.match", "None" ]
}
}
}
])
关于python - Mongo在聚合中替换字段中的多个子字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67593623/