database - 同一分区中 cassandra 性能中的多个二级索引

我有这样的表。

CREATE TABLE posts (
topic text
country text,
bookmarked text,
id uuid,
PRIMARY KEY (topic,id)
);

之后，我创建了国家二级索引并添加了书签，如下所示。

CREATE INDEX posts_country ON posts (country);
CREATE INDEX posts_bookmarked ON posts (bookmarked);

现在我正在查询具有二级索引的单个分区，如下所示。

select * from posts where topic='cassandra' and country='india' and bookmarked='true' allow filtering;
select * from posts where topic='sql' and country='us' and bookmarked='true' allow filtering;

我的问题是，如果所有查询都转到同一分区(topic = cassandra 或 topic=sql)，那么允许过滤将查询所有行或特定分区？性能将受到怎样的影响？

关于如何处理影响性能的这种情况的任何建议。

谢谢。

最佳答案

提到分区键，从一个分区中搜索数据。它肯定比不提及分区键并且仅通过二级索引列查询(因为必须查询许多节点)更有效但是性能影响取决于您的数据集。

ALLOW FILTERING involves data filtering and thus may have unpredictable performance.

过滤数据(特别是大数据集)可能效率很低，因此不可取。但这取决于它的效率有多低。

如果您的分区太大(单个分区中的行太多)并且如果您过滤具有最唯一值的列并过滤数据以获取小数据集效率不高，因为 Cassandra 加载大数据和过滤器他们出来。

select * from posts where topic='cassandra' and country='india';

尽管您提到了分区键，但此查询非常有效。

select * from posts where topic='cassandra' and country='india' and bookmarked='true' allow filtering;

在书签上添加索引可能会提高查询性能。

Cassandra will then use the index with the highest selectivity to find the rows that need to be loaded. It will however not change anything regarding the need for ALLOW FILTERING, as it will still have to filter the loaded rows using the remaining predicate.

请阅读以下文章。我想它有你需要的答案我猜 :) https://www.datastax.com/dev/blog/allow-filtering-explained-2

此外，基数非常高或非常低的二级索引效率不高。由于您在书签上有二级索引(数据类型为文本)，但如果值仅为“真”或“假”，则效率不高。 https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html

关于database - 同一分区中 cassandra 性能中的多个二级索引，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42920905/

database - 同一分区中 cassandra 性能中的多个二级索引

上一篇：php - 如何在laravel中执行多个where with or查询

下一篇：asp.net-mvc - 使用 Entity Framework 从 oracle 数据库创建模型