python - mongodb查询时间太长

我的 mongodb 集合中有以下文档:

{'name' : 'abc-1','parent':'abc', 'price': 10}
{'name' : 'abc-2','parent':'abc', 'price': 5}
{'name' : 'abc-3','parent':'abc', 'price': 9}
{'name' : 'abc-4','parent':'abc', 'price': 11}

{'name' : 'efg', 'parent':'', 'price': 10}
{'name' : 'efg-1','parent':'efg', 'price': 5}
{'name' : 'abc-2','parent':'efg','price': 9}
{'name' : 'abc-3','parent':'efg','price': 11}

我想执行以下操作:

a. Group By distinct parent
b. Sort all the groups based on price
c. For each group select a document with minimum price
  i. check each record's parent sku exists as a record in name field
  ii. If the name exists, do nothing
  iii. If the record does not exists, insert a document with parent as empty and other values as the  value of the record selected previously (minimum value).

我厌倦了按如下方式使用每个:

db.file.find().sort([("price", 1)]).forEach(function(doc){
          cnt = db.file.count({"sku": {"$eq": doc.parent}});
          if (cnt < 1){
               newdoc = doc;
               newdoc.name = doc.parent;
               newdoc.parent = "";
              delete newdoc["_id"];
              db.file.insertOne(newdoc);
          }
});

问题是它需要太多时间。这里有什么问题吗？如何对其进行优化？聚合管道是否是一个好的解决方案，如果是，该怎么做？

最佳答案

检索一组产品名称✔

def product_names():
    for product in db.file.aggregate([{$group: {_id: "$name"}}]):
        yield product['_id']<p></p>

<p>product_names = set(product_names())
</p>

检索具有最小值的产品团体价格 ✔

result_set = db.file.aggregate([
    {
        '$sort': {
            'price': 1,
        }
    }, 
    {
        '$group': {
            '_id': '$parent',
            'name': {
                '$first': '$name',
            }, 
            'price': {
                '$min': '$price',
            }
        }
    }, 
    {
        '$sort': {
            'price': 1,
        }
    }
])

如果名称不在集合中，则插入在 2 中检索到的产品 1. 中检索到的产品名称数量。✔

from pymongo.operations import InsertOne

def insert_request(product):
    return InsertOne({
        name: product['name'],
        price: product['price'],
        parent: ''
    })

requests = (
    insert_request(product)
    for product in result_set
    if product['name'] not in product_names
)
db.file.bulk_write(list(requests))

步骤 2 和 3 可以在聚合管道中实现。

db.file.aggregate([
    {
        '$sort': {'price': 1}
    }, 
    {
        '$group': {
            '_id': '$parent',
            'name': {
                '$first': '$name'
            }, 
            'price': {
                '$min': '$price'
            },
        }
    }, 
    {
        '$sort': {
            'price': 1
        }
    }, 
    {
        '$project': {
            'name': 1, 
            'price': 1,
            '_id': 0, 
            'parent':''
        }
    }, 
    {
        '$match': {
            'name': {
                '$nin': list(product_names())
            }
        }
    }, 
    {
        '$out': 'file'
    }
])

关于python - mongodb查询时间太长，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48107682/

python - mongodb查询时间太长

上一篇：python - 安装橙色图像分析时

下一篇：python - bs4.FeatureNotFound : Couldn't find a tree builder with the features you requested: html5lib