java - 如何加快聚合查询?

标签 java mongodb aggregation-framework mongo-java-driver

以下是聚合查询:

[
  {
    "$match": {
      "UserId": {
        "$in": [
          5
        ]
      },
      "WorkflowStartTime": {
        "$gte": ISODate('2015-04-09T00:00:00.000Z'),
        "$lte": ISODate('2015-04-16T00:00:00.000Z')
      }
    }
  },
  {
    "$group": {
      "_id": {
        "Task": "$TaskId",
        "WorkflowId": "$WorkflowInstanceId"
      },
      "TaskName": {
        "$first": "$Task"
      },
      "StartTime": {
        "$first": "$StartTime"
      },
      "EndTime": {
        "$last": "$EndTime"
      },
      "LastExecutionTime": {
        "$last": "$StartTime"
      },
      "WorkflowName": {
        "$first": "$WorkflowName"
      }
    }
  },
  {
    "$project": {
      "_id": 1,
      "LastExecutionTime": 1,
      "TaskName": 1,
      "AverageExecutionTime": {
        "$subtract": [
          "$EndTime",
          "$StartTime"
        ]
      },
      "WorkflowName": 1
    }
  },
  {
    "$group": {
      "_id": "$_id.Task",
      "LastExecutionTime": {
        "$last": "$LastExecutionTime"
      },
      "AverageExecutionTime": {
        "$avg": "$AverageExecutionTime"
      },
      "TaskName": {
        "$first": "$TaskName"
      },
      "TotalInstanceCount": {
        "$sum": 1
      },
      "WorkflowName": {
        "$first": "$WorkflowName"
      }
    }
  },
  {
    "$project": {
      "Id": "$_id",
      "_id": 0,
      "Name": "$TaskName",
      "LastExecutionDate": {
        "$substr": [
          "$LastExecutionTime",
          0,
          30
        ]
      },
      "AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime",
      "TotalInstanceCount": "$TotalInstanceCount",
      "WorkflowName": 1
    }
  }
]

我的 Collection 文件如下:

{
        "_id" : ObjectId("550ff07ce4b09bf056df4ac1"),
        "OutputData" : "xyz",
        "InputData" : null,
        "Location" : null,
        "ChannelName" : "XYZ",
        "UserId" : 5,
        "TaskId" : 95,
        "ChannelId" : 5,
        "Status" : "Success",
        "TaskTypeId" : 7,
        "WorkflowId" : 37,
        "Task" : "XYZ",
        "WorkflowStartTime" : ISODate("2015-03-23T05:09:26Z"),
        "EndTime" : ISODate("2015-03-23T05:22:44Z"),
        "StartTime" : ISODate("2015-03-23T05:22:44Z"),
        "TaskType" : "TRIGGER",
        "WorkflowInstanceId" : "23-3-2015-95d17f17-2580-4fe3-b627-12e862af08ce",
        "StackTrace" : null,
        "WorkflowName" : "XYZ data workflow"
}

我在 {WorkflowStartTime:1,UserId:1, StartTime:1}

上有一个索引

它们的集合几乎没有 900000 条记录,而且我正在使用数据子集,同时使用日期范围进行查询仍然需要大约 1.5 到 1.7 秒。我已经将聚合框架与其他具有大量数据的集合一起使用,并且性能非常好。不知道这个查询有什么问题,因为它显示的输出非常慢,我希望它作为实时分析查询出现在工厂中。 任何关于它的指针表示赞赏。

{explain : true } 添加到聚合查询时的输出

{
  "stages": [


       {
          "$cursor": {
            "query": {
              "UserId": {
                "$in": [
                  5
                ]
              },
              "WorkflowStartTime": {
                "$gte": "ISODate(2015-04-09T00:00:00Z)",
                "$lte": "ISODate(2015-04-16T00:00:00Z)"
              }
            },
            "fields": {
              "EndTime": 1,
              "StartTime": 1,
              "Task": 1,
              "TaskId": 1,
              "WorkflowInstanceId": 1,
              "WorkflowName": 1,
              "_id": 0
            },
            "plan": {
              "cursor": "BtreeCursor ",
              "isMultiKey": false,
              "scanAndOrder": false,
              "indexBounds": {
                "WorkflowStartTime": [
                  [
                    "ISODate(2015-04-16T00:00:00Z)",
                    "ISODate(2015-04-09T00:00:00Z)"
                  ]
                ],
                "UserId": [
                  [
                    5,
                    5
                  ]
                ]
              },
              "allPlans": [
                {
                  "cursor": "BtreeCursor ",
                  "isMultiKey": false,
                  "scanAndOrder": false,
                  "indexBounds": {
                    "WorkflowStartTime": [
                      [
                        "ISODate(2015-04-16T00:00:00Z)",
                        "ISODate(2015-04-09T00:00:00Z)"
                      ]
                    ],
                    "UserId": [
                      [
                        5,
                        5
                      ]
                    ]
                  }
                }
              ]
            }
          }
        },
        {
          "$group": {
            "_id": {
              "Task": "$TaskId",
              "WorkflowId": "$WorkflowInstanceId"
            },
            "TaskName": {
              "$first": "$Task"
            },
            "StartTime": {
              "$first": "$StartTime"
            },
            "EndTime": {
              "$last": "$EndTime"
            },
            "LastExecutionTime": {
              "$last": "$StartTime"
            },
            "WorkflowName": {
              "$first": "$WorkflowName"
            }
          }
        },
        {
          "$project": {
            "_id": true,
            "LastExecutionTime": true,
            "TaskName": true,
            "AverageExecutionTime": {
              "$subtract": [
                "$EndTime",
                "$StartTime"
              ]
            },
            "WorkflowName": true
          }
        },
        {
          "$group": {
            "_id": "$_id.Task",
            "LastExecutionTime": {
              "$last": "$LastExecutionTime"
            },
            "AverageExecutionTime": {
              "$avg": "$AverageExecutionTime"
            },
            "TaskName": {
              "$first": "$TaskName"
            },
            "TotalInstanceCount": {
              "$sum": {
                "$const": 1
              }
            },
            "WorkflowName": {
              "$first": "$WorkflowName"
            }
          }
        },
        {
          "$project": {
            "_id": false,
            "Id": "$_id",
            "Name": "$TaskName",
            "LastExecutionDate": {
              "$substr": [
                "$LastExecutionTime",
                {
                  "$const": 0
                },
                {
                  "$const": 30
                }
              ]
            },
            "AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime",
            "TotalInstanceCount": "$TotalInstanceCount",
            "WorkflowName": true
          }
        }
      ],
      "ok": 1
    }

最佳答案

聚合不使用任何索引。 您需要创建一个新索引:

{UserId:1,WorkflowStartTime:1}

如果一切正常,聚合+解释必须出现这一行:

    "winningPlan" :...

关于java - 如何加快聚合查询?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29642620/

相关文章:

mongodb - 比较在 Mongodb 中存储为字符串的整数

node.js - 无法使用 $project 获取特定的数据库值

java 类和数组列表

java - 在 Java 中测试 NaN

java - 在 REST API 上对 SSL 进行单元测试

node.js - 函数返回对象作为promise {pending}

node.js - 如何仅为提供的键更新 Mongoose 文档的嵌套对象

performance - Mongodb聚合框架比map/reduce更快吗?

java - 如何使用正则表达式和 Java 计算文本中的音节

database - MongoDB 指南针未初始化