hadoop - 在一致性 ONE 下读取查询期间 Cassandra 超时(需要 1 个响应，但只有 0 个副本响应)

我在一个有 500000 行的表上执行读取和更新查询，有时在处理大约 300000 行后出现错误，即使没有节点关闭也是如此。

Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

基础设施详情:
拥有 5 个 Cassandra 节点、5 个 Spark 节点和 3 个 Hadoop 节点，每个节点具有 8 个内核和 28 GB 内存，Cassandra 复制因子为3。

Cassandra 2.1.8.621 | DSE 4.7.1 |星火 1.2.1 | Hadoop 2.7.1。

Cassandra 配置:

read_request_timeout_in_ms (ms): 10000
range_request_timeout_in_ms (ms): 10000
write_request_timeout_in_ms (ms): 5000
cas_contention_timeout_in_ms (ms): 1000 
truncate_request_timeout_in_ms (ms): 60000
request_timeout_in_ms (ms): 10000.

我也通过将 read_request_timeout_in_ms (ms) 增加到 20,000 来尝试同样的工作，但没有帮助。

我正在对两个表进行查询。下面是其中一个表的创建语句:

创建表:

CREATE TABLE section_ks.testproblem_section (
    problem_uuid text PRIMARY KEY,
    documentation_date timestamp,
    mapped_code_system text,
    mapped_problem_code text,
    mapped_problem_text text,
    mapped_problem_type_code text,
    mapped_problem_type_text text,
    negation_ind text,
    patient_id text,
    practice_uid text,
    problem_category text,
    problem_code text,
    problem_comment text,
    problem_health_status_code text,
    problem_health_status_text text,
    problem_onset_date timestamp,
    problem_resolution_date timestamp,
    problem_status_code text,
    problem_status_text text,
    problem_text text,
    problem_type_code text,
    problem_type_text text,
    target_site_code text,
    target_site_text text
    ) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 
    'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 
    'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

查询:

1) SELECT encounter_uuid, encounter_start_date FROM section_ks.encounters WHERE patient_id = '1234' AND encounter_start_date >= '"+ formatted_documentation_date + "' 允许过滤；

2) UPDATE section_ks.encounters SET testproblem_uuid_set = testproblem_uuid_set + {'1256'} WHERE encounter_uuid = 'abcd345';

最佳答案

通常，当您遇到超时错误时，这意味着您正在尝试执行某些在 Cassandra 中无法很好扩展的操作。解决方法通常是修改您的架构。

我建议您在运行查询时监视节点，看看是否可以发现问题区域。例如，您可以运行“watch -n 1 nodetool tpstats”来查看是否有任何队列正在备份或删除项目。看其他监控建议here .

您的配置中可能有一个问题是您说您有五个 Cassandra 节点，但只有 3 个 spark worker(或者您是说每个 Cassandra 节点上有三个 spark worker？)您至少需要一个每个 Cassandra 节点上的 spark worker，以便将数据加载到 spark 是在每个节点本地完成的，而不是通过网络。

如果没有看到您的架构和正在运行的查询，很难说出更多信息。你是从一个分区读取的吗？从单个分区读取时，我开始在 300,000 行附近出现超时错误。见问题 here .到目前为止，我发现的唯一解决方法是在我的分区键中使用客户端哈希将分区分成大约 10 万行的较小块。到目前为止，我还没有找到一种方法来告诉 Cassandra 不要对我预计需要很长时间的查询超时。

关于hadoop - 在一致性 ONE 下读取查询期间 Cassandra 超时(需要 1 个响应，但只有 0 个副本响应)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32327629/

hadoop - 在一致性 ONE 下读取查询期间 Cassandra 超时(需要 1 个响应，但只有 0 个副本响应)

上一篇：amazon-s3 - 如何以编程方式有效地将文件从 HDFS 复制到 S3

下一篇：hadoop - Hadoop 编程世界中的关键字上下文是什么？