nosql - 在我的场景中提高 Cassandra 读取性能的方法

我们最近开始在生产中使用 Cassandra 数据库。我们有一个 single cross colo cluster of 24 nodes意思12 nodes in PHX和 12 nodes in SLC colo .我们有一个 replication factor of 4这意味着 2 copies will be there in each datacenter .

以下是keyspace的方式和 column families已由我们创建 Production DBA's .

create keyspace profile with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = {slc:2,phx:2};
create column family PROFILE_USER
with key_validation_class = 'UTF8Type'
and comparator = 'UTF8Type'
and default_validation_class = 'UTF8Type'
and gc_grace = 86400;

我们正在运行 Cassandra 1.2.2它有 org.apache.cassandra.dht.Murmur3Partitioner , 与 KeyCaching , SizeTieredCompactionStrategy和 Virtual Nodes也启用。

Cassandra 生产节点的机器规范-

16 cores, 32 threads
128GB RAM
4 x 600GB SAS in Raid 10, 1.1TB usable
2 x 10GbaseT NIC, one usable

下面是我得到的结果。

Read Latency(95th Percentile)      Number of Threads    Duration the program was running(in minutes)    Throughput(requests/seconds)    Total number of id's requested    Total number of columns requested
    9 milliseconds                         10                      30                                               1977                              3558701                        65815867

我不确定我应该与 Cassandra 一起尝试哪些其他事情才能变得更好read performance .我假设在我的情况下它正在击中磁盘。我应该尝试将复制因子增加到更高的数字吗？还有什么建议吗？

我相信与 SSD 相比，从 HDD 读取数据大约需要 6-12 毫秒？在我的情况下，每次我猜测它都会命中磁盘，并且在此处启用 key 缓存无法正常工作。我无法启用 RowCache，因为使用 OS 页面缓存更有效。在 JVM 中维护行缓存非常昂贵，因此建议行缓存仅用于较小数量的行，例如 <100K 行。

有什么方法可以验证 key 缓存在我的情况下是否正常工作？

这就是我在为列族显示架构时得到的结果 -

create column PROFILE
  with column_type = 'Standard'
  and comparator = 'UTF8Type'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'UTF8Type'
  and read_repair_chance = 0.1
  and dclocal_read_repair_chance = 0.0
  and populate_io_cache_on_flush = false
  and gc_grace = 86400
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and caching = 'KEYS_ONLY'
  and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};

有什么我应该做的改变以获得良好的读取性能吗？

最佳答案

I am assuming it is hitting the disk in my case. Should I try increasing the Replication Factor to some higher number? Any other suggestion?

如果您的数据比内存大得多并且您的访问接近随机，您将遇到磁盘。这与约 10 毫秒的延迟一致。

增加复制因子可能会有所帮助，尽管它会降低您的缓存效率，因为每个节点将存储更多数据。如果您的读取模式大部分是随机的，您的数据非常大，您的一致性要求低并且您的访问量很大，那么这可能只值得做。

如果要减少读取延迟，可以使用较低的一致性级别。在一致性级别 CL.ONE 读取通常以一致性为代价提供最低的读取延迟。如果写入位于 CL.ALL，您将仅在 CL.ONE 上获得一致的读取。但如果不需要一致性，这是一个很好的权衡。

如果要增加读取吞吐量，可以减少 read_repair_chance。这个数字指定了 Cassandra 对每次读取执行读取修复的概率。读取修复涉及从可用副本读取并更新任何具有旧值的副本。

如果在低一致性级别读取，读取修复会导致额外的读取 I/O，从而降低吞吐量。它不会影响延迟(对于低一致性级别)，因为读取修复是异步完成的。同样，如果一致性对您的应用程序不重要，请将 read_repair_chance 降低到 0.01 以提高吞吐量。

Is there any way I can verify whether keycaching is working fine in my case or not?

查看“nodetool info”的输出，它会输出如下一行:

Key Cache : size 96468768 (bytes), capacity 96468992 (bytes), 959293 hits, 31637294 requests, 0.051 recent hit rate, 14400 save period in seconds

这为您提供了 key 缓存命中率，这在上面的示例中非常低。

关于nosql - 在我的场景中提高 Cassandra 读取性能的方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16528826/

nosql - 在我的场景中提高 Cassandra 读取性能的方法

上一篇：sql - SQL 中的组合聚合和非聚合查询

下一篇：r - 通过正则表达式筛选的选择性preventWarnings()