mongodb - 将聚合操作合并到单个结果中

标签 mongodb mongoose mongodb-query aggregation-framework

我有两个想要合并的聚合操作。第一个操作返回,例如:

{ "_id" : "Colors", "count" : 12 }
{ "_id" : "Animals", "count" : 6 }

第二个操作返回,例如:

{ "_id" : "Red", "count" : 10 }
{ "_id" : "Blue", "count" : 9 }
{ "_id" : "Green", "count" : 9 }
{ "_id" : "White", "count" : 7 }
{ "_id" : "Yellow", "count" : 7 }
{ "_id" : "Orange", "count" : 7 }
{ "_id" : "Black", "count" : 5 }
{ "_id" : "Goose", "count" : 4 }
{ "_id" : "Chicken", "count" : 3 }
{ "_id" : "Grey", "count" : 3 }
{ "_id" : "Cat", "count" : 3 }
{ "_id" : "Rabbit", "count" : 3 }
{ "_id" : "Duck", "count" : 3 }
{ "_id" : "Turkey", "count" : 2 }
{ "_id" : "Elephant", "count" : 2 }
{ "_id" : "Shark", "count" : 2 }
{ "_id" : "Fish", "count" : 2 }
{ "_id" : "Tiger", "count" : 2 }
{ "_id" : "Purple", "count" : 1 }
{ "_id" : "Pink", "count" : 1 }

如何结合这两个操作来实现以下目的?

{ "_id" : "Colors", "count" : 12, "items" :
    [
        { "_id" : "Red", "count" : 10 },
        { "_id" : "Blue", "count" : 9 },
        { "_id" : "Green", "count" : 9 },
        { "_id" : "White", "count" : 7 },
        { "_id" : "Yellow", "count" : 7 },
        { "_id" : "Orange", "count" : 7 },
        { "_id" : "Black", "count" : 5 },
        { "_id" : "Grey", "count" : 3 },
        { "_id" : "Purple", "count" : 1 },
        { "_id" : "Pink", "count" : 1 }
    ]
},
{ "_id" : "Animals", "count" : 6, "items" :
    [
        { "_id" : "Goose", "count" : 4 },
        { "_id" : "Chicken", "count" : 3 },
        { "_id" : "Cat", "count" : 3 },
        { "_id" : "Rabbit", "count" : 3 },
        { "_id" : "Duck", "count" : 3 },
        { "_id" : "Turkey", "count" : 2 },
        { "_id" : "Elephant", "count" : 2 },
        { "_id" : "Shark", "count" : 2 },
        { "_id" : "Fish", "count" : 2 },
        { "_id" : "Tiger", "count" : 2 }
    ]
}

架构

var ListSchema = new Schema({
    created: {
        type: Date,
        default: Date.now
    },
    title: {
        type: String,
        default: '',
        trim: true,
        required: 'Title cannot be blank'
    },
    items: {
        type: Array,
        default: [String],
        trim: true
    },
    creator: {
        type: Schema.ObjectId,
        ref: 'User'
    }
});

操作 1

db.lists.aggregate(
      [
        { $group: { _id: "$title", count: { $sum: 1 } } },
        { $sort: { count: -1 } }
      ]
    )

操作2

db.lists.aggregate(
      [
        { $unwind: "$items" },
        { $group: { _id: "$items", count: { $sum: 1 } } },
        { $sort: { count: -1 } }
      ]
    )

最佳答案

这实际上取决于您在响应中追求的结果类型。您所询问的事情似乎表明您正在寻找结果中的“方面计数”,但我稍后会谈到这一点。

作为基本结果,这种方法没有任何问题:

    Thing.aggregate(
      [
        { "$group": {
          "_id": {
            "type": "$type", "name": "$name"
          },
          "count": { "$sum": 1 }
        }},
        { "$group": {
          "_id": "$_id.type",
          "count": { "$sum": "$count" },
          "names": {
            "$push": { "name": "$_id.name", "count": "$count" }
          }
        }}
      ],
      function(err,results) {
        console.log(JSON.stringify(results, undefined, 2));
        callback(err);
      }
    )

这应该会给你这样的结果:

[
  {
    "_id": "colours",
    "count": 50102,
    "names": [
      { "name": "Green",  "count": 9906  },
      { "name": "Yellow", "count": 10093 },
      { "name": "Red",    "count": 10083 },
      { "name": "Orange", "count": 9997  },
      { "name": "Blue",   "count": 10023 }
    ]
  },
  {
    "_id": "animals",
    "count": 49898,
    "names": [
      { "name": "Tiger",    "count": 9710  },
      { "name": "Lion",     "count": 10058 },
      { "name": "Elephant", "count": 10069 },
      { "name": "Monkey",   "count": 9963  },
      { "name": "Bear",     "count": 10098 }
    ]
  }
]

这里最基本的方法就是简单地 $group分两个阶段,第一阶段将按键组合聚合到最低(最细粒度)的分组级别,然后再次处理 $group 以基本上“累加”最高级别的总数(最细粒度)分组级别,也因此将较低的结果添加到项目数组中。

但这并不像“方面计数”那样“分离”,因此这样做会变得更复杂一些,也更疯狂一些。但首先是例子:

    Thing.aggregate(
      [
        { "$group": {
          "_id": {
            "type": "$type",
            "name": "$name"
          },
          "count": { "$sum": 1 }
        }},
        { "$group": {
          "_id": "$_id.type",
          "count": { "$sum": "$count" },
          "names": {
            "$push": { "name": "$_id.name", "count": "$count" }
          }
        }},
        { "$group": {
          "_id": null,
          "types": {
            "$push": {
              "type": "$_id", "count": "$count"
            }
          },
          "names": { "$push": "$names" }
        }},
        { "$unwind": "$names" },
        { "$unwind": "$names" },
        { "$group": {
          "_id": "$types",
          "names": { "$push": "$names" }
        }},
        { "$project": {
          "_id": 0,
          "facets": {
            "types": "$_id",
            "names": "$names",
          },
          "data": { "$literal": [] }
        }}
      ],
      function(err,results) {
        console.log(JSON.stringify(results[0], undefined, 2));
        callback(err);
      }
    );

这将产生如下输出:

{
  "facets": {
    "types": [
      { "type": "colours", "count": 50102 },
      { "type": "animals", "count": 49898 }
    ],
    "names": [
      { "name": "Green",    "count": 9906  },
      { "name": "Yellow",   "count": 10093 },
      { "name": "Red",      "count": 10083 },
      { "name": "Orange",   "count": 9997  },
      { "name": "Blue",     "count": 10023 },
      { "name": "Tiger",    "count": 9710  },
      { "name": "Lion",     "count": 10058 },
      { "name": "Elephant", "count": 10069 },
      { "name": "Monkey",   "count": 9963  },
      { "name": "Bear",     "count": 10098 }
    ]
  },
  "data": []
}

应该显而易见的是,虽然“可能”,但在管道中进行的那种“杂耍”产生这种输出格式的效率并不高。与第一个示例相比,这里有很多开销,只是简单地将结果拆分为它们自己的数组响应并且独立于分组键。随着生成的“方面”越多,这显然会变得更加复杂。

正如输出中所暗示的,人们通常要求的“方面计数”是,除了聚合方面之外,结果“数据”也包含在响应中(可能是分页的)。因此,进一步的复杂性应该在这里显而易见:

        { "$group": {
          "_id": null,
          (...)

这种类型的操作的要求基本上是将每条数据“填充”到单个对象中。在大多数情况下,当然,当您想要结果中的实际数据时(在本示例中使用 100,000 个),遵循这种方法变得完全不切实际,并且几乎肯定会超过 BSON 文档大小 16MB 的限制。

在这种情况下,如果您希望在响应中生成结果和该数据的“方面”,那么这里最好的方法是将每个聚合和输出页面作为单独的查询操作运行,并“流式传输”输出JSON(或其他格式)返回到接收客户端。

作为一个独立的示例:

var async = require('async'),
    mongoose = require('mongoose'),
    Schema = mongoose.Schema;


mongoose.connect('mongodb://localhost/things');

var data = {
      "colours": [
        "Red","Blue","Green","Yellow","Orange"
      ],
      "animals": [
        "Lion","Tiger","Bear","Elephant","Monkey"
      ]
    },
    dataKeys = Object.keys(data);

var thingSchema = new Schema({
  "name": String,
  "type": String
});

var Thing = mongoose.model( 'Thing', thingSchema );

var writer = process.stdout;

mongoose.connection.on("open",function(err) {
  if (err) throw err;
  async.series(
    [
      function(callback) {
        process.stderr.write("removing\n");
        Thing.remove({},callback);
      },
      function(callback) {
        process.stderr.write("inserting\n");
        var bulk = Thing.collection.initializeUnorderedBulkOp(),
            count = 0;

        async.whilst(
          function() { return count < 100000; },
          function(callback) {
            var keyLen    = dataKeys.length,
                keyIndex  = Math.floor(Math.random(keyLen)*keyLen),
                type      = dataKeys[keyIndex],
                types     = data[type],
                typeLen   = types.length,
                nameIndex = Math.floor(Math.random(typeLen)*typeLen),
                name      = types[nameIndex];

            var obj = { "type": type, "name": name };
            bulk.insert(obj);
            count++;

            if ( count % 1000 == 0 ) {
              process.stderr.write('insert count: ' + count + "\n");
              bulk.execute(function(err,resp) {
                bulk = Thing.collection.initializeUnorderedBulkOp();
                callback(err);
              });
            } else {
              callback();
            }

          },
          callback
        );
      },

      function(callback) {
        writer.write("{ \n  \"page\": 1,\n  \"pageSize\": 25,\n")
        writer.write("  \"facets\":  {\n");      // open object response

        var stream = Thing.collection.aggregate(
          [
            { "$group": {
              "_id": "$name",
              "count": { "$sum": 1 }
            }}
          ],
          {
            "cursor": {
              "batchSize": 1000
            }
          }
        );

        var counter = 0;

        stream.on("data",function(data) {
          stream.pause();

          if ( counter == 0 ) {
            writer.write("    \"names\": [\n");
          } else {
            writer.write(",\n");
          }

          data = { "name": data._id, "count": data.count };

          writer.write("      " + JSON.stringify(data));

          counter++;
          stream.resume();
        });

        stream.on("end",function() {
          writer.write("\n    ],\n");

          var stream = Thing.collection.aggregate(
            [
              { "$group": {
                "_id": "$type",
                "count": { "$sum": 1 }
              }}
            ],
            {
              "cursor": {
                "batchSize": 1000
              }
            }
          );

          var counter = 0;
          stream.on("data",function(data) {
            stream.pause();

            if ( counter == 0 ) {
              writer.write("    \"types\": [\n");
            } else {
              writer.write(",\n");
            }

            data = { "name": data._id, "count": data.count };

            writer.write("      " + JSON.stringify(data));

            counter++;
            stream.resume();
          });

          stream.on("end",function() {
            writer.write("\n    ]\n  },\n");

            var stream = Thing.find({}).limit(25).stream();
            var counter = 0;

            stream.on("data",function(data) {
              stream.pause();
              if ( counter == 0 ) {
                writer.write("  \"data\": [\n");
              } else {
                writer.write(",\n");
              }

              writer.write("    " + JSON.stringify(data));

              counter++;
              stream.resume();

            });

            stream.on("end",function() {
                writer.write("\n  ]\n}\n");
                callback();
            });

          });

        });
      }
    ],
    function(err) {
      if (err) throw err;
      process.exit();
    }
  );
});

输出如下:

{
  "page": 1,
  "pageSize": 25,
  "facets":  {
    "names": [
      {"name":"Red","count":10007},
      {"name":"Tiger","count":10012},
      {"name":"Yellow","count":10119},
      {"name":"Monkey","count":9970},
      {"name":"Elephant","count":10046},
      {"name":"Bear","count":10082},
      {"name":"Orange","count":9982},
      {"name":"Green","count":10005},
      {"name":"Blue","count":9884},
      {"name":"Lion","count":9893}
    ],
    "types": [
      {"name":"colours","count":49997},
      {"name":"animals","count":50003}
    ]
  },
  "data": [
    {"_id":"55bf141f3edc150b6abdcc02","type":"animals","name":"Lion"},
    {"_id":"55bf141f3edc150b6abdc81b","type":"colours","name":"Blue"},
    {"_id":"55bf141f3edc150b6abdc81c","type":"colours","name":"Orange"},
    {"_id":"55bf141f3edc150b6abdc81d","type":"animals","name":"Bear"},
    {"_id":"55bf141f3edc150b6abdc81e","type":"animals","name":"Elephant"},
    {"_id":"55bf141f3edc150b6abdc81f","type":"colours","name":"Orange"},
    {"_id":"55bf141f3edc150b6abdc820","type":"colours","name":"Green"},
    {"_id":"55bf141f3edc150b6abdc821","type":"animals","name":"Lion"},
    {"_id":"55bf141f3edc150b6abdc822","type":"animals","name":"Monkey"},
    {"_id":"55bf141f3edc150b6abdc823","type":"colours","name":"Yellow"},
    {"_id":"55bf141f3edc150b6abdc824","type":"colours","name":"Yellow"},
    {"_id":"55bf141f3edc150b6abdc825","type":"colours","name":"Orange"},
    {"_id":"55bf141f3edc150b6abdc826","type":"animals","name":"Monkey"},
    {"_id":"55bf141f3edc150b6abdc827","type":"colours","name":"Blue"},
    {"_id":"55bf141f3edc150b6abdc828","type":"animals","name":"Tiger"},
    {"_id":"55bf141f3edc150b6abdc829","type":"colours","name":"Red"},
    {"_id":"55bf141f3edc150b6abdc82a","type":"animals","name":"Monkey"},
    {"_id":"55bf141f3edc150b6abdc82b","type":"animals","name":"Elephant"},
    {"_id":"55bf141f3edc150b6abdc82c","type":"animals","name":"Tiger"},
    {"_id":"55bf141f3edc150b6abdc82d","type":"animals","name":"Bear"},
    {"_id":"55bf141f3edc150b6abdc82e","type":"colours","name":"Yellow"},
    {"_id":"55bf141f3edc150b6abdc82f","type":"animals","name":"Lion"},
    {"_id":"55bf141f3edc150b6abdc830","type":"animals","name":"Elephant"},
    {"_id":"55bf141f3edc150b6abdc831","type":"colours","name":"Orange"},
    {"_id":"55bf141f3edc150b6abdc832","type":"animals","name":"Elephant"}
  ]
}

这里有一些考虑因素,特别是 Mongoose .aggregate()并不真正直接支持标准节点流接口(interface)。 .cursor() 提供了一个 .each() 方法。在聚合方法上,但是 core API method 隐含的“流”这里提供了更多的控制,因此这里的 .collection 方法可以获取底层 driver object是优选的。希望 future 的 Mongoose 版本能够考虑到这一点。

因此,如果您的最终目标是与此处演示的结果一起进行“方面计数”,那么每个聚合和结果都最适合以所示方式“流式传输”。如果没有这个,聚合就会变得过于复杂,并且很可能超出 BSON 限制,就像在这种情况下这样做一样。

关于mongodb - 将聚合操作合并到单个结果中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31777590/

相关文章:

MongoDB $or/$and 命令优化

node.js - 如何从 GridFS 中的 block 和文件中删除文件?

java - Netty 和 MongoDB 异步回调不能一起工作

javascript - 如何使用express、mongoose 和nodeunit 测试异步函数?

mongodb - 填充 $lookup 中的特定字段

mongodb - Mongodb - 查询嵌套数组和对象

c++ - 使用 Mongodb C++ API 将记录插入文档

mongodb - Mongoose 限制/偏移和计数查询

mongodb - MeteorJS : Autoform + CollectionFS, 将来自 FS.Collection 的图像与相应的 Mongo.Collection 文档相关联?

mongodb - 从数组中获取匹配的嵌入文档