database - 查询高基数字段

标签 database cassandra nosql query-optimization

我正在为我们即将进行的 Cassandra 迁移的订单设计一个数据模型。一个订单有一个 orderId(神秘的 UUID 字段)和一个 orderNumber(用户友好的数字)。可以使用两者中的任何一个来完成 getOrder 查询。

我的partition key是orderId,所以getByOrderId没有问题。通过 getByOrderNumber 是 - 有一对一的映射 b/w orderId 和 orderNumber(高基数字段),因此在每个节点上创建本地二级索引会减慢我的查询。

我想知道的是我可以创建一个新表,其中 orderNumber 作为分区键,orderId 作为唯一的列(一种二级索引,但由我维护)。所以现在,一个 getByOrderNumber 查询可以通过两次调用来解决。

如果上述解决方案严重错误,请耐心等待,我对 Cassandra 非常陌生。据我了解,对于这样的列,如果我使用本地二级索引,Cassandra 将不得不查询每个节点以获得单个订单。所以我想为什么不创建另一个存储映射的表。

如果我自己管理这个索引,我会错过什么?我可以看到的一件事是,如果每次写入,我现在都必须更新两个表。还有什么吗?

最佳答案

I thought why not create another table that stores the mapping.

没关系。来自 Cassandra 文档:

Do not use an index in these situations:

On high-cardinality columns because you then query a huge volume of records for a small number of results. See Problems using a high-cardinality column index below.

Problems using a high-cardinality column index

If you create an index on a high-cardinality column, which has many distinct values, a query between the fields incurs many seeks for very few results. In the table with a billion songs, looking up songs by writer (a value that is typically unique for each song) instead of by their recording artist is likely to be very inefficient..

It would probably be more efficient to manually maintain the table as a form of an index instead of using the built-in index. For columns containing unique data, it is sometimes fine performance-wise to use an index for convenience, as long as the query volume to the table having an indexed column is moderate and not under constant load.

Conversely, creating an index on an extremely low-cardinality column, such as a boolean column, does not make sense. Each value in the index becomes a single row in the index, resulting in a huge row for all the false values, for example. Indexing a multitude of indexed columns having foo = true and foo = false is not useful.

Cassandra 数据建模有非规范化数据是正常的。

关于database - 查询高基数字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57747584/

相关文章:

php - 恢复sql语句的函数

php - 哪些 php 框架最适合与 Cassandra 等 NoSQL 数据库一起使用?

Spring Boot 数据嵌入式 Cassandra

java - 如何使用 java 截断 Cassandra 中的表

azure - 记录 Azure 上的数据库使用情况

java - 哪个用Java编写的嵌入式数据库可用于简单的键/值存储?

c# - 如何在不检查 null 或 0 的情况下为 sqlparameter NULL 值赋值?

database - 将数据库从弹性池移动到 Azure 中的单个数据库

linux - Cassandra:Linux 和 Windows 兼容吗?

mysql - 从 MySQL 迁移到 Couchbase Server (NoSQL)