我的 mongodb 集合中有以下文档:
{'name' : 'abc-1','parent':'abc', 'price': 10}
{'name' : 'abc-2','parent':'abc', 'price': 5}
{'name' : 'abc-3','parent':'abc', 'price': 9}
{'name' : 'abc-4','parent':'abc', 'price': 11}
{'name' : 'efg', 'parent':'', 'price': 10}
{'name' : 'efg-1','parent':'efg', 'price': 5}
{'name' : 'abc-2','parent':'efg','price': 9}
{'name' : 'abc-3','parent':'efg','price': 11}
我想执行以下操作:
a. Group By distinct parent
b. Sort all the groups based on price
c. For each group select a document with minimum price
i. check each record's parent sku exists as a record in name field
ii. If the name exists, do nothing
iii. If the record does not exists, insert a document with parent as empty and other values as the value of the record selected previously (minimum value).
我厌倦了按如下方式使用每个:
db.file.find().sort([("price", 1)]).forEach(function(doc){
cnt = db.file.count({"sku": {"$eq": doc.parent}});
if (cnt < 1){
newdoc = doc;
newdoc.name = doc.parent;
newdoc.parent = "";
delete newdoc["_id"];
db.file.insertOne(newdoc);
}
});
问题是它需要太多时间。这里有什么问题吗?如何对其进行优化?聚合管道是否是一个好的解决方案,如果是,该怎么做?
最佳答案
检索一组产品名称✔
def product_names(): for product in db.file.aggregate([{$group: {_id: "$name"}}]): yield product['_id']<p></p> <p>product_names = set(product_names()) </p>
检索具有最小值的产品 团体价格 ✔
result_set = db.file.aggregate([ { '$sort': { 'price': 1, } }, { '$group': { '_id': '$parent', 'name': { '$first': '$name', }, 'price': { '$min': '$price', } } }, { '$sort': { 'price': 1, } } ])
如果名称不在集合中,则插入在 2 中检索到的产品 1. 中检索到的产品名称数量。✔
from pymongo.operations import InsertOne def insert_request(product): return InsertOne({ name: product['name'], price: product['price'], parent: '' }) requests = ( insert_request(product) for product in result_set if product['name'] not in product_names ) db.file.bulk_write(list(requests))
步骤 2 和 3 可以在聚合
管道中实现。
db.file.aggregate([
{
'$sort': {'price': 1}
},
{
'$group': {
'_id': '$parent',
'name': {
'$first': '$name'
},
'price': {
'$min': '$price'
},
}
},
{
'$sort': {
'price': 1
}
},
{
'$project': {
'name': 1,
'price': 1,
'_id': 0,
'parent':''
}
},
{
'$match': {
'name': {
'$nin': list(product_names())
}
}
},
{
'$out': 'file'
}
])
关于python - mongodb查询时间太长,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48107682/