我正在尝试通过单个字段对包含数百万行的集合进行完全排序。 据我所知,ObjectId 包含 4 个字节的时间戳。我的时间戳是 4 字节整数索引字段。所以我想按 _id 和时间戳排序应该是相似的,但这是结果
db.coll.find().sort("_id", pymongo.ASCENDING)
# takes 25 minutes to run
和
db.coll.find().sort("timestamp", pymongo.ASCENDING)
# takes 2 hours to run
为什么会这样,这是优化它的方法吗? 谢谢
更新
我试图排序的时间戳字段已经按我指出的那样编入索引
收集统计
"size" : 55881082188,
"count" : 126048972,
"avgObjSize" : 443,
"storageSize" : 16998031360,
"capped" : false,
"nindexes" : 2,
"totalIndexSize" : 2439606272,
我致力于 mongodb 进程 4gb 的 ram(试图增加到 8gb 但速度没有增加)
更新 2
原来字段顺序排序多少遵循插入(自然)顺序,所以排序速度更快
我试过了
db.new_coll.create_index([("timestamp", pymongo.ASCENDING)])
for el in db.coll.find().sort("timestamp", pymongo.ASCENDING):
del el['_id']
db.new_coll.insert(el)
# and now
db.new_coll.find().sort("timestamp", pymongo.ASCENDING)
# takes 25 minutes vs 2 hours as in previous example
最佳答案
由于 _id 字段值的生成方式,按 _id 排序更快。
来自 Documentation 的文字
One of the main reasons ObjectId’s are generated in the fashion mentioned above by the drivers is that is contains a useful behavior due to the way sorting works. Given that it contains a 4 byte timestamp (resolution of seconds) and an incrementing counter as well as some more unique identifiers such as the machine id once can use the _id field to sort documents in the order of creation just by simply sorting on the _id field. This can be useful to save the space needed by an additional timestamp if you wish to track the time of creation of a document.
我也尝试解释查询并注意到当使用 _id 完成排序时 nscannedObjects 和 nscannedObjectsAllPlans 为 0。
> db.coll.find({},{_id:1}).sort({_id:1}).explain();
{
"cursor" : "BtreeCursor _id_",
"isMultiKey" : false,
"n" : 353,
"nscannedObjects" : 0,
"nscanned" : 353,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 353,
"scanAndOrder" : false,
"indexOnly" : true,
"nYields" : 2,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"_id" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "server",
"filterSet" : false
}
关于mongodb - 为什么 mongodb 按 _id 排序比按任何其他索引字段排序快得多?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49760024/