mongodb - 为什么 mongodb 按 _id 排序比按任何其他索引字段排序快得多？

我正在尝试通过单个字段对包含数百万行的集合进行完全排序。据我所知，ObjectId 包含 4 个字节的时间戳。我的时间戳是 4 字节整数索引字段。所以我想按 _id 和时间戳排序应该是相似的，但这是结果

db.coll.find().sort("_id", pymongo.ASCENDING)
# takes 25 minutes to run

和

db.coll.find().sort("timestamp", pymongo.ASCENDING)
# takes 2 hours to run

为什么会这样，这是优化它的方法吗？谢谢

更新

我试图排序的时间戳字段已经按我指出的那样编入索引

收集统计

"size" : 55881082188,
"count" : 126048972,
"avgObjSize" : 443,
"storageSize" : 16998031360,
"capped" : false,
"nindexes" : 2,
"totalIndexSize" : 2439606272,

我致力于 mongodb 进程 4gb 的 ram(试图增加到 8gb 但速度没有增加)

更新 2

原来字段顺序排序多少遵循插入(自然)顺序，所以排序速度更快

我试过了

db.new_coll.create_index([("timestamp", pymongo.ASCENDING)])
for el in db.coll.find().sort("timestamp", pymongo.ASCENDING):
    del el['_id']
    db.new_coll.insert(el)

# and now
db.new_coll.find().sort("timestamp", pymongo.ASCENDING)
# takes 25 minutes vs 2 hours as in previous example

最佳答案

由于 _id 字段值的生成方式，按 _id 排序更快。

来自 Documentation 的文字

One of the main reasons ObjectId’s are generated in the fashion mentioned above by the drivers is that is contains a useful behavior due to the way sorting works. Given that it contains a 4 byte timestamp (resolution of seconds) and an incrementing counter as well as some more unique identifiers such as the machine id once can use the _id field to sort documents in the order of creation just by simply sorting on the _id field. This can be useful to save the space needed by an additional timestamp if you wish to track the time of creation of a document.

我也尝试解释查询并注意到当使用 _id 完成排序时 nscannedObjects 和 nscannedObjectsAllPlans 为 0。

> db.coll.find({},{_id:1}).sort({_id:1}).explain();
{
        "cursor" : "BtreeCursor _id_",
        "isMultiKey" : false,
        "n" : 353,
        "nscannedObjects" : 0,
        "nscanned" : 353,
        "nscannedObjectsAllPlans" : 0,
        "nscannedAllPlans" : 353,
        "scanAndOrder" : false,
        "indexOnly" : true,
        "nYields" : 2,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "_id" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ]
        },
        "server" : "server",
        "filterSet" : false
}

关于mongodb - 为什么 mongodb 按 _id 排序比按任何其他索引字段排序快得多？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49760024/

mongodb - 为什么 mongodb 按 _id 排序比按任何其他索引字段排序快得多？

上一篇：google-app-engine - App 引擎应用程序和 Datastore 的权限问题

下一篇：spinner - 如何使用 appium 从混合应用程序的下拉列表中选择值？