我想知道在 MongoDB 或 SQL 中是否有一种方法可以在 Pandas 中做类似的事情:
df['Category'] = df.product.str.extract('(pork|chicken|tofu)')
我想这样做是为了在聚合管道中创建一个新组,然后获取每个组的平均蛋白质含量。
例如:
db.test.insert(
[
{ "my_id": {"Product": "Pork Soup", "Protein": 100.0 }},
{ "my_id": {"Product": "Duck Sandwich", "Protein": 1000.1 }},
{ "my_id": {"Product": "Chicken Roll", "Protein": 100.69 }},
{ "my_id": {"Product": "Disgusting Tofu", "Protein": 0.1 }},
{ "my_id": {"Product": "Cardboard Casserole", "Protein": 50.0 }},
])
结果:
{Category: "Pork", "Product": "Pork Soup", "Protein": 100.0 }
{Category: NA, "Product": "Duck Sandwich", "Protein": 1000.1 }
{Category: "Chicken", "Product": "Chicken Roll", "Protein": 100.69}
{Category: "Tofu", "Product": "Disgusting Tofu", "Protein": 0.1 }
{Category: NA , "Product": "Cardboard Casserole", "Protein": 50.0 }
我正在查看 this one 等帖子中的条件语句和案例语句但无法找到使用 RegEx 执行此操作的方法。
最佳答案
这是可能的,但是写起来很痛苦。您可以在 $project
阶段为此使用 $switch
:
这里是查询:
db.collection.aggregate([{
"$project": {
"w": {
"$split": ["$my_id.Product", " "]
},
"Product": "$my_id.Product",
"Protein": "$my_id.Protein"
}
},
{
"$project": {
"_id": 0,
"Product": 1,
"Protein": 1,
"Category": {
"$switch": {
"branches": [{
"case": {
"$in": ["Pork", "$w"]
},
"then": "Pork"
},
{
"case": {
"$in": ["Chicken", "$w"]
},
"then": "Chicken"
},
{
"case": {
"$in": ["Tofu", "$w"]
},
"then": "Tofu"
}
],
"default": "NA"
}
}
}
}
])
结果:
[
{
"Category": "Pork", "Product": "Pork Soup", "Protein": 100
},
{
"Category": "NA", "Product": "Duck Sandwich", "Protein": 1000.1
},
{
"Category": "Chicken", "Product": "Chicken Roll", "Protein": 100.69
},
{
"Category": "Tofu", "Product": "Disgusting Tofu", "Protein": 0.1
},
{
"Category": "NA", "Product": "Cardboard Casserole", "Protein": 50
}
]
这是您可以尝试查询的链接:mongoplayground.net/p/7M0oS_ZdmIq
关于mysql - MongoDB 或 SQL - 使用 RegEx 创建新分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49443939/