MongoDB AggregationOutput 响应时间更长

标签 mongodb grails aggregation-framework

我有一个名为“logTransaction”的集合。我想得到你在附图中看到的结果。

output

logTransaction 有很多字段,但用于此图像的字段是:

customerenvironmentfirstTimelastTimeintegrationIds[] (一个事务可以有多个集成), transactionStatus (FINISHED, UNFINISHED, FAILED)

我正在使用 AggregationOutput 来获得这个结果,但它需要 30 多秒,这比我拥有的数据量要长得多(我认为)。我只是想知道我是否可以通过修改我已经拥有或应该拥有的东西来改善这一点 我完全改变它。我应该使用什么类型的索引来加快速度?

我使用 MongoDBGrails。我目前的方法是这样的:

def myCustomAggregation(integrations, timestamp_lt, timestamp_gt, cust, env) {
    def currentRequest = RequestContextHolder.requestAttributes

    def customer = cust ?: currentRequest?.session?.customer
    def environment = env ?: currentRequest?.session?.environment

    //$match
    DBObject matchMap = new BasicDBObject('integrationIds', new BasicDBObject('$in', integrations.collectAll { it?.baselineId }))
    matchMap.put("firstTimestamp", new BasicDBObject('$lte', timestamp_lt as Long).append('$gte', timestamp_gt as Long))
    matchMap.put("customer",customer)
    matchMap.put("environment",environment)
    DBObject match = new BasicDBObject('$match',matchMap);

    //$group1
    Map<String, Object> dbObjIdMap1 = new HashMap<String, Object>();
    dbObjIdMap1.put('integrationId', '$integrationIds');
    dbObjIdMap1.put('transactionStatus', '$transactionStatus');
    DBObject groupFields1 = new BasicDBObject( "_id", new BasicDBObject(dbObjIdMap1));
    groupFields1.put('total', new BasicDBObject( '$sum', 1));
    DBObject group1 = new BasicDBObject('$group', groupFields1);

    //$group2
    DBObject groupFields2 = new BasicDBObject( "_id", '$_id.integrationId');
    groupFields2.put('total_finished',
        new BasicDBObject('$sum', new BasicDBObject('$cond', [
            new BasicDBObject('$eq', ['$_id.transactionStatus', 'FINISHED']), '$total', 0
        ]))
    );
    groupFields2.put('total_unfinished',
        new BasicDBObject('$sum', new BasicDBObject('$cond', [
            new BasicDBObject('$eq', ['$_id.transactionStatus', 'UNFINISHED']), '$total', 0
        ]))
    );
    groupFields2.put('total_failed',
        new BasicDBObject('$sum', new BasicDBObject('$cond', [
            new BasicDBObject('$eq', ['$_id.transactionStatus', 'FAILED']), '$total', 0
        ]))
    );
    DBObject group2 = new BasicDBObject('$group', groupFields2);
    // This taking more than 30 seconds. Its too much for the amount of data I have in Database.
    AggregationOutput output = db.logTransaction.aggregate(match,group1,group2)
    return output.results()
}

编辑:

我按照 HoefMeistert 的建议创建了一个复合索引:

db.logTransaction.createIndex({integrationIds: 1, firstTimestamp: -1, customer: 1, environment: 1})

但是当我在这个聚合上使用解释时:

db.logTransaction.explain().aggregate( [
    { $match: {integrationIds: {$in: ["INT010","INT011","INT012A","INT200"]}, "firstTimestamp": { "$lte" : 1476107324000 , "$gte" : 1470002400000}, "customer": "Awsome_Company", "environment": "PROD"}},
    { $group: { _id: {"integrationId": '$integrationIds', "transactionStatus": '$transactionStatus'}, total: {$sum: 1}}},
    { $group: { _id: "$_id.integrationId", "total_finished": {$sum: {$cond: [{$eq: ["$_id.transactionStatus", "FINISHED"]}, "$total", 0]}}, "total_unfinished": {$sum: {$cond: [{$eq: ["$_id.transactionStatus", "UNFINISHED"]}, "$total", 0]}}, "total_failed": {$sum: {$cond: [{$eq: ["$_id.transactionStatus", "FAILED"]}, "$total", 0]}}}}
]);

我仍然每次都能得到这个获奖计划:

"winningPlan" : {
                "stage" : "CACHED_PLAN",
                "inputStage" : {
                    "stage" : "FETCH",
                    "filter" : {
                        "$and" : [
                                {
                                    "environment" : {
                                            "$eq" : "PROD"
                                    }
                                },
                                {
                                    "integrationIds" : {
                                        "$in" : [
                                            "INT010",
                                            "INT011",
                                            "INT012A",
                                            "INT200"
                                        ]
                                    }
                                }
                        ]
                    },
                    "inputStage" : {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "tenant" : 1,
                                "firstTimestamp" : -1
                            },
                            "indexName" : "customer_1_firstTimestamp_-1",
                            "isMultiKey" : false,
                            "isUnique" : false,
                            "isSparse" : false,
                            "isPartial" : false,
                            "indexVersion" : 1,
                            "direction" : "forward",
                            "indexBounds" : {
                                "customer" : [
                                    "[\"Awsome_Company\", \"Awsome_Company\"]"
                                ],
                                "firstTimestamp" : [
                                    "[1476107324000.0, 1470002400000.0]"
                                ]
                            }
                    }
                }
        },

开发环境中集合的当前索引。而且速度比以前好但是当时间跨度大于1周时,我仍然得到sockettimeoutexception(3分钟):

"customer_1_firstTimestamp_-1" : 56393728,
"firstTimestamp_-1_customer_1" : 144617472,
"integrationIds_1_firstTimestamp_-1" : 76644352,
"integrationId_1_firstTimestamp_-1" : 56107008,
"transactionId_1_firstTimestamp_-1" : 151429120,
"firstTimestamp_1" : 56102912,
"transactionId_1" : 109445120,
"integrationIds_1_firstTimestamp_-1_customer_1_environment_1" : 247790976

最佳答案

您目前拥有哪些索引? 当我查看您的聚合时,请确保您在匹配的字段上有一个索引:

  • 集成标识
  • 第一个时间戳
  • 客户
  • 环境

在第一(匹配)阶段之后,索引不再相关。 正如elixir所问,shell/editor中的性能如何?那里也慢吗。如果是这样,请尝试找到“慢”阶段。

更新: 你也可以帮助Aggregation Pipeline optimizer ;-) 将匹配重写为单个 $and匹配

{ $match: {integrationIds: {$in: ["INT010","INT011","INT012A","INT200"]}, "firstTimestamp": { "$lte" : 1476107324000 , "$gte" : 1470002400000}, "customer": "Awsome_Company", "environment": "PROD"}}

到:

    { $match: { $and : [
      {integrationIds: {$in: ["INT010","INT011","INT012A","INT200"]}}, 
      {"firstTimestamp": { "$lte" : 1476107324000 , "$gte" : 1470002400000}}, 
      {"customer": "Awsome_Company"}, 
      {"environment": "PROD"}]
    }

关于MongoDB AggregationOutput 响应时间更长,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39916540/

相关文章:

python - 将值移出对象

node.js - 不在特定时间段内聚合数据

grails - 修改字段以存储到数据库

grails - grails spring-security-core插件在IntelliJ IDEA 9/10下不起作用

grails - 将启动Grails项目发布到cloudfoundry

mongodb - 如何在 MongoDB 集合中查找与给定条件匹配的文档和单个子文档

引用关系的 Mongoid 标准

javascript - 如何将Lambda函数中的Mongo Atlas数据库与Mongoose连接

javascript - 使用 Mongoose 通过 EJS 发送 DELETE 请求

php - 用正则表达式查找 MongoDB + PHP