我有一个 cassandra 表“文章”,包含 400,000 行
primary key (source,created_at desc)
当我使用以下方式查询数据时:
select * from articles where source = 'abc' and created_at <= '2016-01-01 00:00:00'
加载 110,000 行需要 8 分钟。
这非常慢,而且我不知道错误出在哪里。
我想在 10 秒内加载 100,000 行。不确定这是否可行?
这里有更多详细信息:
- 我有 3 个节点,
复制因子 =2、strategy=SimpleStrategy
、4CPU、32G RAM - 我正在使用 Cassandra-driver-3.0.0。我不确定它是来自 python 还是 Cassandra,因为我们也在使用 python。
这是我的 CQL 架构:
CREATE TABLE crawler.articles (
source text,
created_at timestamp,
id text,
category text,
channel text,
last_crawled timestamp,
text text,
thumbnail text,
title text,
url text,
PRIMARY KEY (source, created_at, id)
) WITH CLUSTERING ORDER BY (created_at DESC, id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"ALL"}'
AND comment = ''
AND compaction = {'sstable_size_in_mb': '160', 'enabled': 'true', 'unchecked_tombstone_compaction': 'false', 'tombstone_compaction_interval': '86400', 'tombstone_threshold': '0.2', 'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 604800
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX articles_id_idx ON crawler.articles (id);
CREATE INDEX articles_url_idx ON crawler.articles (url);
感谢您的回复!
最佳答案
如果不知道确切的配置,就很难说出确切的问题。
但是您可以检查以下内容:
Monitor Cassandra for memory consumption and stage throughput.
Set your Memtable thresholds low.
Access Cassandra concurrently.
Don’t store all your data in a single row.
Check for time-outs.
What is the size of javaHeap?
关于python - 为什么我的 Cassandra 数据库读取数据的速度这么慢?我想在 10 秒内读取 100,000 行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34650283/