node.js - MongoDB 聚合 - $lookup 性能

我使用 MongoDB 3.6 聚合和查找来连接两个集合(用户和订阅用户)。

var UserSchema = mongoose.Schema({
  email:{
    type: String,
    trim: true,
    unique: true,
  },
  name: {
    type: String,
    required: true,
    trim: true,
  },
  password: String,
  gender: { type: String, enum: ['male', 'female', 'unknown'], default: 'unknown'},
  age_range: { type: String, enum: [12, 16, 18], default: 18},
  country: {type:String, default:'co'}
});

var SuscriptionUsersSchema = mongoose.Schema({
  user_id: {
    ref: 'Users',
    type: mongoose.Schema.ObjectId
  },
  channel_id: {
    ref: 'Channels',
    type: mongoose.Schema.ObjectId
  },
  subscribed: {type: Boolean, default:false},
  unsubscribed_at: Date,
  subscribed_at: Date
});

我的目标是查询订阅用户并加入用户集合，匹配开始日期和结束日期，以获得订阅的一些分析，例如订阅用户的国家/地区、年龄范围和性别，并以折线图显示数据。我这样做:

db.getCollection('suscriptionusers').aggregate([
{$match: {
    'channel_id': ObjectId('......'),
    'subscribed_at': {
            $gte: new Date('2018-01-01'),
            $lte: new Date('2019-01-01'),
    },
    'subscribed': true
}},     
{
    $lookup:{
        from: "users",      
        localField: "user_id", 
        foreignField: "_id",
        as: "users"        
    }
},
/*  Implementing this form instead the earlier (above), make the process even slower :(
 {$lookup:
 {
   from: "users",
   let: { user_id: "$user_id" },
   pipeline: [
      { $match:
          { $expr:
             {$eq: [ "$_id",  "$$user_id" ]}
          }
      },
      { $project: { age_range:1, country: 1, gender:1 } }
   ],
   as: "users"
 }
},*/
{$unwind: {
    path: "$users",
    preserveNullAndEmptyArrays: false
}},
{$project: {
    'users.age_range': 1, 
    'users.country': 1, 
    'users.gender': 1, 
    '_id': 1, 
    'subscribed_at': { $dateToString: { format: "%Y-%m", date: "$subscribed_at" } },
    'unsubscribed_at': { $dateToString: { format: "%Y-%m", date: "$unsubscribed_at" } }
}},
])

主要关注的是性能。例如，对于大约 150.000 个订阅者，查询大约需要 7~8 秒来检索信息，而且我担心百万订阅者会发生什么，因为即使我对记录设置限制(例如仅检索数据)两个月之间)，在此期间可能有数百名订阅者。

我已经尝试为 user_id 字段的 subscriptionusers 集合创建索引，但是没有任何改进。

db.getCollection('suscriptionusers').ensureIndex({user_id: 1});

我的问题是，我是否应该将字段(国家/地区、年龄范围和性别)也保存在 subscriptionusers 集合中？因为如果我在不查找用户集合的情况下进行查询，则该过程足够快。

或者有更好的方法来使用我当前的方案提高性能吗？

非常感谢:)

编辑:考虑到，用户可以订阅多个 channel ，正因为如此，订阅不会保存在用户集合中

最佳答案

好吧，也许不是最好的方法，但我只是将 UserSchema 所需的字段包含到 SuscriptionUsersSchema 中。对于分析目的来说，这明显更快。另外，我发现分析记录必须在当时保持不变，才能保持当时生成的数据。因此，通过这种方式使用数据，即使用户更改了她/他的信息，或者删除了帐户，数据也将保持不变。如果您有任何建议，请随时分享:)

仅供引用，我的 SuscriptionUsersSchema 现在看起来像:

    var SuscriptionUsersSchema = mongoose.Schema({
  user_id: {
    ref: 'Users',
    type: mongoose.Schema.ObjectId
  },
  channel_id: {
    ref: 'Channels',
    type: mongoose.Schema.ObjectId
  },
  subscribed: {type: Boolean, default:false},
  gender: { type: String, enum: ['male', 'female', 'unknown'], default: 'unknown'},
  age_range: { type: String, enum: [12, 16, 18], default: 18},
  country: {type:String, default:'co'}
  unsubscribed_at: Date,
  subscribed_at: Date
});

关于node.js - MongoDB 聚合 - $lookup 性能，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52886458/

node.js - MongoDB 聚合 - $lookup 性能

上一篇：node.js - 如何使用 AWS CodeBuild npm 安装所有函数目录

下一篇：node.js - 实时应用对 pod 代码源的更改 - npm