java - Cassandra 读/写性能 - 高 CPU

我从最近几天开始使用 Casandra，这就是我正在尝试做的事情。

我有大约 200 万多个对象来维护用户的个人资料。我将这些对象转换为 json，压缩并将它们存储在 blob 列中。平均压缩后的 json 大小约为 10KB。这就是我的表在 cassandra 中的样子，

表:

dev.userprofile (uid varchar primary key, profile blob);

选择查询: 从 dev.userprofile 中选择配置文件，其中 uid='';

更新查询:

update dev.userprofile set profile='<bytebuffer>' where uid = '<uid>'

每小时，我都会从队列中获取事件，并将其应用于我的用户配置文件对象。每个事件对应一个用户配置文件对象。我收到大约 100 万个此类事件，因此我必须在短时间内更新大约 1M 的用户配置文件对象，即更新应用程序中的对象、压缩 json 并更新 cassandra blob。我必须最好在几分钟内完成所有 100 万个用户配置文件对象的更新。但我注意到现在需要更长的时间。

在运行我的应用程序时，我注意到平均每秒可以更新大约 400 个配置文件。我已经在 cassandra 实例上看到大量 CPU iowait - 70%+。此外，负载最初相当高，约为 16(在 8 个 vcpu 实例上)，然后降至 4 左右。

我做错了什么？因为，当我更新 2KB 的较小对象时，我注意到 cassandra 操作/秒要快得多。我能够获得大约 3000 次操作/秒。关于如何提高性能有什么想法吗？

<dependency>
  <groupId>com.datastax.cassandra</groupId>
  <artifactId>cassandra-driver-core</artifactId>
  <version>3.1.0</version>
</dependency>
<dependency>
  <groupId>com.datastax.cassandra</groupId>
  <artifactId>cassandra-driver-extras</artifactId>
  <version>3.1.0</version>
</dependency>

我只在 m4.2xlarge aws 实例中设置了一个 cassandra 节点用于测试

Single node Cassandra instance
m4.2xlarge aws ec2
500 GB General Purpose (SSD) 
IOPS - 1500 / 10000

nodetool cfstats 输出

Keyspace: dev
    Read Count: 688795
    Read Latency: 27.280683695439137 ms.
    Write Count: 688780
    Write Latency: 0.010008401811899301 ms.
    Pending Flushes: 0
        Table: userprofile
        SSTable count: 9
        Space used (live): 32.16 GB
        Space used (total): 32.16 GB
        Space used by snapshots (total): 0 bytes
        Off heap memory used (total): 13.56 MB
        SSTable Compression Ratio: 0.9984539538554672
        Number of keys (estimate): 2215817
        Memtable cell count: 38686
        Memtable data size: 105.72 MB
        Memtable off heap memory used: 0 bytes
        Memtable switch count: 6
        Local read count: 688807
        Local read latency: 29.879 ms
        Local write count: 688790
        Local write latency: 0.012 ms
        Pending flushes: 0
        Bloom filter false positives: 47
        Bloom filter false ratio: 0.00003
        Bloom filter space used: 7.5 MB
        Bloom filter off heap memory used: 7.5 MB
        Index summary off heap memory used: 2.07 MB
        Compression metadata off heap memory used: 3.99 MB
        Compacted partition minimum bytes: 216 bytes
        Compacted partition maximum bytes: 370.14 KB
        Compacted partition mean bytes: 5.82 KB
        Average live cells per slice (last five minutes): 1.0
        Maximum live cells per slice (last five minutes): 1
        Average tombstones per slice (last five minutes): 1.0
        Maximum tombstones per slice (last five minutes): 1

nodetool cfhistograms 输出

Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)
50%             3.00              9.89           2816.16              4768                 2
75%             3.00             11.86          43388.63              8239                 2
95%             4.00             14.24         129557.75             14237                 2
98%             4.00             20.50         155469.30             17084                 2
99%             4.00             29.52         186563.16             20501                 2
Min             0.00              1.92             61.22               216                 2
Max             5.00          74975.55        4139110.98            379022                 2

数据统计输出

---load-avg--- --io/total- ---procs--- ------memory-usage----- ---paging-- -dsk/total- ---system-- ----total-cpu-usage---- -net/total-
 1m   5m  15m | read  writ|run blk new| used  buff  cach  free|  in   out | read  writ| int   csw |usr sys idl wai hiq siq| recv  send
12.8 13.9 10.6|1460  31.1 |1.0  14 0.2|9.98G  892k 21.2G  234M|   0     0 | 119M 3291k|  63k   68k|  1   1  26  72   0   0|3366k 3338k
13.2 14.0 10.7|1458  28.4 |1.1  13 1.5|9.97G  884k 21.2G  226M|   0     0 | 119M 3278k|  61k   68k|  2   1  28  69   0   0|3396k 3349k
12.7 13.8 10.7|1477  27.6 |0.9  11 1.1|9.97G  884k 21.2G  237M|   0     0 | 119M 3321k|  69k   72k|  2   1  31  65   0   0|3653k 3605k
12.0 13.7 10.7|1474  27.4 |1.1 8.7 0.3|9.96G  888k 21.2G  236M|   0     0 | 119M 3287k|  71k   75k|  2   1  36  61   0   0|3807k 3768k
11.8 13.6 10.7|1492  53.7 |1.6  12 1.2|9.95G  884k 21.2G  228M|   0     0 | 119M 6574k|  73k   75k|  2   2  32  65   0   0|3888k 3829k

编辑

切换到 LeveledCompactionStrategy 并禁用 sstables 上的压缩，我没有看到很大的改进:

每秒更新的配置文件有一些改进。现在为 550-600 个配置文件/秒。但是，CPU 峰值仍然存在，即 iowait。

gcstats

   Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms)   GC Reclaimed (MB)         Collections      Direct Memory Bytes
          755960                  83                3449                   8         73179796264                 107                       -1

数据统计

---load-avg--- --io/total- ---procs--- ------memory-usage----- ---paging-- -dsk/total- ---system-- ----total-cpu-usage---- -net/total-
 1m   5m  15m | read  writ|run blk new| used  buff  cach  free|  in   out | read  writ| int   csw |usr sys idl wai hiq siq| recv  send
7.02 8.34 7.33| 220  16.6 |0.0   0 1.1|10.0G  756k 21.2G  246M|   0     0 |  13M 1862k|  11k   13k|  1   0  94   5   0   0|   0     0
6.18 8.12 7.27|2674  29.7 |1.2 1.5 1.9|10.0G  760k 21.2G  210M|   0     0 | 119M 3275k|  69k   70k|  3   2  83  12   0   0|3906k 3894k
5.89 8.00 7.24|2455   314 |0.6 5.7   0|10.0G  760k 21.2G  225M|   0     0 | 111M   39M|  68k   69k|  3   2  51  44   0   0|3555k 3528k
5.21 7.78 7.18|2864  27.2 |2.6 3.2 1.4|10.0G  756k 21.2G  266M|   0     0 | 127M 3284k|  80k   76k|  3   2  57  38   0   0|4247k 4224k
4.80 7.61 7.13|2485   288 |0.1  12 1.4|10.0G  756k 21.2G  235M|   0     0 | 113M   36M|  73k   73k|  2   2  36  59   0   0|3664k 3646k
5.00 7.55 7.12|2576  30.5 |1.0 4.6   0|10.0G  760k 21.2G  239M|   0     0 | 125M 3297k|  71k   70k|  2   1  53  43   0   0|3884k 3849k
5.64 7.64 7.15|1873   174 |0.9  13 1.6|10.0G  752k 21.2G  237M|   0     0 | 119M   21M|  62k   66k|  3   1  27  69   0   0|3107k 3081k

您可能会注意到 CPU 峰值。

我主要关心的是在进一步增加负载之前的 iowait。我应该寻找导致此问题的任何具体内容吗？因为，600 个配置文件/秒(即 600 次读取 + 写入)对我来说似乎很低。

最佳答案

你能尝试 LeveledCompactionStrategy 吗？通过对像这样的大型对象进行 1:1 读/写，读取时节省的 IO 可能会抵消花费在更昂贵的压缩上的 IO。

如果您在发送数据之前已经压缩了数据，则应该关闭表上的压缩。它将其分成 64kb 的 block ，这些 block 主要由 6 个值主导，这些值不会得到太多压缩(如可怕的压缩比 SSTable Compression Ratio: 0.9984539538554672 所示)。

ALTER TABLE dev.userprofile
  WITH compaction = { 'class' : 'LeveledCompactionStrategy'  }
  AND compression = { 'sstable_compression' : '' };

每秒 400 个配置文件的速度非常慢，而且您的客户端上可能还有一些工作要做，这也可能成为瓶颈。如果您在 8 核系统上有 4 个负载，Cassandra 可能不会减慢速度。确保并行化请求并异步使用它们，按顺序发送请求是一个常见问题。

对于较大的 blob，将对 GC 产生影响，因此监视它们并添加该信息可能会有所帮助。我会惊讶 10kb 对象对它的影响如此之大，但它是需要注意的，并且可能需要更多的 JVM 调整。

如果这有帮助，我建议从那里调整堆并升级到至少 3.7 或 3.0 行中的最新版本。

关于java - Cassandra 读/写性能 - 高 CPU，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40334461/

java - Cassandra 读/写性能 - 高 CPU

上一篇：java - Android中的JSON解析: opening objects inside objects

下一篇：java - 使用正则表达式转义utf8