python - 为什么我的 Cassandra 数据库读取数据的速度这么慢？我想在 10 秒内读取 100,000 行

我有一个 cassandra 表“文章”，包含 400,000 行

primary key (source,created_at desc)

当我使用以下方式查询数据时:

select * from articles where source = 'abc' and created_at <= '2016-01-01 00:00:00'

加载 110,000 行需要 8 分钟。

这非常慢，而且我不知道错误出在哪里。

我想在 10 秒内加载 100,000 行。不确定这是否可行？

这里有更多详细信息:

我有 3 个节点，复制因子 =2、strategy=SimpleStrategy、4CPU、32G RAM
我正在使用 Cassandra-driver-3.0.0。我不确定它是来自 python 还是 Cassandra，因为我们也在使用 python。

这是我的 CQL 架构:

CREATE TABLE crawler.articles (
    source text,
    created_at timestamp,
    id text,
    category text,
    channel text,
    last_crawled timestamp,
    text text,
    thumbnail text,
    title text,
    url text,
    PRIMARY KEY (source, created_at, id)
) WITH CLUSTERING ORDER BY (created_at DESC, id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"ALL"}'
AND comment = ''
AND compaction = {'sstable_size_in_mb': '160', 'enabled': 'true', 'unchecked_tombstone_compaction': 'false', 'tombstone_compaction_interval': '86400', 'tombstone_threshold': '0.2', 'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 604800
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

CREATE INDEX articles_id_idx ON crawler.articles (id);
CREATE INDEX articles_url_idx ON crawler.articles (url);

感谢您的回复!

最佳答案

如果不知道确切的配置，就很难说出确切的问题。

但是您可以检查以下内容:

Monitor Cassandra for memory consumption and stage throughput.

Set your Memtable thresholds low.

Access Cassandra concurrently.

Don’t store all your data in a single row.

Check for time-outs. 
What is the size of javaHeap?

关于python - 为什么我的 Cassandra 数据库读取数据的速度这么慢？我想在 10 秒内读取 100,000 行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34650283/

python - 为什么我的 Cassandra 数据库读取数据的速度这么慢？我想在 10 秒内读取 100,000 行

上一篇：python - GAE : Find children of an ndb entity

下一篇：python - 在 python 中渲染格式化文本(当前使用 pyglet)