performance - MongoDb 的分片不会改进实验室设置中的应用程序

我目前正在开发一个由 Mongo 数据库驱动的移动应用程序，但是现在一切正常，我们想添加 Sharding为 future 做好准备。

为了对此进行测试，我们创建了一个实验室环境(在 Hyper-V 中运行)来测试各种场景:

已创建以下服务器:

Ubuntu Server 14.04.3 非分片(数据库服务器)(256 MB 内存/限制为 10% CPU)。

Ubuntu Server 14.04.3 分片(配置服务器)(256 MB 内存/限制为 10% CPU)。

Ubuntu Server 14.04.3 Sharding(查询路由器服务器)(256 MB Ram/限制为 10% CPU)。

Ubuntu Server 14.04.3 Sharding (Database Server 01) (256 MB Ram/限制为 10% CPU)。

Ubuntu Server 14.04.3 Sharding (Database Server 02) (256 MB Ram/限制为 10% CPU)。

已在 C# 中创建了一个小型控制台应用程序，以便能够测量执行插入的时间。

此控制台应用程序确实导入了具有以下属性的 10.000 人:
- 姓名
- 名
- 全名
- 出生日期
- ID

所有 10.000 条记录仅由“_id”不同，所有其他字段对于所有记录都是相同的。

需要注意的是，每个测试都恰好运行 3 次。
每次测试后，数据库都会被删除，这样系统就会再次干净。

找到下面的测试结果:

插入 10.000 条记录而不分片

Writing 10.000 records | Non-Sharding environment - Full Disk IO #1: 14 Seconds.
Writing 10.000 records | Non-Sharding environment - Full Disk IO #2: 14 Seconds.
Writing 10.000 records | Non-Sharding environment - Full Disk IO #3: 12 Seconds.

使用单个数据库分片插入 10.000 条记录

备注 : 分片键已设置为散列 _id field 。
有关(部分)分片信息，请参见下面的 Json:

shards:
  {  "_id" : "shard0000",  "host" : "192.168.137.12:27017" }

databases:
  {  "_id" : "DemoDatabase",  "primary" : "shard0000",  "partitioned" : true }
          DemoDatabase.persons
                  shard key: { "_id" : "hashed" }
                  unique: false
                  balancing: true
                  chunks:
                          shard0000       2
                  { "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong(0) } on : shard0000 Timestamp(1, 1)
                  { "_id" : NumberLong(0) } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 2)

结果:

Writing 10.000 records | Single Sharding environment - Full Disk IO #1: 1 Minute, 59 Seconds.
Writing 10.000 records | Single Sharding environment - Full Disk IO #2: 1 Minute, 51 Seconds.
Writing 10.000 records | Single Sharding environment - Full Disk IO #3: 1 Minute, 52 Seconds.

使用双数据库分片插入 10.000 条记录

备注 : 分片键已设置为散列 _id field 。
有关(部分)分片信息，请参见下面的 Json:

shards:
  {  "_id" : "shard0000",  "host" : "192.168.137.12:27017" }
  {  "_id" : "shard0001",  "host" : "192.168.137.13:27017" }

databases:
  {  "_id" : "DemoDatabase",  "primary" : "shard0000",  "partitioned" : true }
          DemoDatabase.persons
                  shard key: { "_id" : "hashed" }
                  unique: false
                  balancing: true
                  chunks:
                          shard0000       2
                  { "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard0000 Timestamp(2, 2)
                  { "_id" : NumberLong("-4611686018427387902") } -->> { "_id" : NumberLong(0) } on : shard0000 Timestamp(2, 3)
                  { "_id" : NumberLong(0) } -->> { "_id" : NumberLong("4611686018427387902") } on : shard0001 Timestamp(2, 4)
                  { "_id" : NumberLong("4611686018427387902") } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001 Timestamp(2, 5)

结果:

Writing 10.000 records | Single Sharding environment - Full Disk IO #1: 49 Seconds.
Writing 10.000 records | Single Sharding environment - Full Disk IO #2: 53 Seconds.
Writing 10.000 records | Single Sharding environment - Full Disk IO #3: 54 Seconds.

根据上面执行的测试，分片确实有效，我添加的分片越多，性能就越好。
但是，我不明白为什么在使用分片而不是使用单个服务器时，我会面临如此巨大的性能下降。

我需要快速阅读和写作我认为分片将是解决方案，但似乎我在这里遗漏了一些东西。

任何人为什么可以指出我正确的方向？

亲切的问候

最佳答案

路由服务器和配置服务器、路由服务器和数据节点之间的层增加了延迟。
如果您有 1 毫秒 ping * 10k 插入，则您有 10 秒的延迟，这不会出现在非分片设置中。

根据您配置的写入关注级别(如果您配置了任何级别的写入确认)，由于阻塞，您可能有额外的 10 秒时间在分片环境上进行基准测试，直到从数据节点收到确认。

如果您的写入关注设置为确认并且您有副本节点，那么您还必须等待写入传播到您的副本节点，从而增加额外的网络延迟。 (尽管您似乎没有副本节点)。并且根据您的网络拓扑，如果您使用默认设置来允许链式复制(从其他辅助节点同步辅助节点)，写入关注可能会增加多层网络延迟。 https://docs.mongodb.org/manual/tutorial/manage-chained-replication/ .如果您有其他索引和写入问题，则每个副本节点都必须在返回写确认之前写入该索引(尽管可以禁用副本节点上的索引)

没有分片和没有复制(但有写确认)，虽然你的插入仍然会阻塞插入，但由于网络层没有额外的延迟。

散列 _id 字段的成本也可能累积到 10k 总共几秒钟。您可以使用具有高度随机性的 _id 字段来避免散列，但我认为这对性能影响不大。

关于performance - MongoDb 的分片不会改进实验室设置中的应用程序，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35272924/

performance - MongoDb 的分片不会改进实验室设置中的应用程序

上一篇：ubuntu - 如何在 MATE 菜单栏中配置 "places"

下一篇：php - 为 Cassandra 安装 PHP 驱动程序