cassandra - 时间序列数据，在cassandra中使用maxTimeuuid/minTimeuuid选择范围

我最近在 cassandra 中创建了一个键空间和一个列族。我有以下内容

CREATE TABLE reports (
  id timeuuid PRIMARY KEY,
  report varchar
)

我想根据一个时间范围来选择报表。所以我的查询如下；

select dateOf(id), id 
from keyspace.reports 
where token(id) > token(maxTimeuuid('2013-07-16 16:10:48+0300'));

返回；

dateOf(id)                | id
--------------------------+--------------------------------------
 2013-07-16 16:10:37+0300 | 1b3f6d00-ee19-11e2-8734-8d331d938752
 2013-07-16 16:10:13+0300 | 0d4b20e0-ee19-11e2-bbb3-e3eef18ad51b
 2013-07-16 16:10:37+0300 | 1b275870-ee19-11e2-b3f3-af3e3057c60f
 2013-07-16 16:10:48+0300 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

所以，这是错误的。

当我尝试使用以下 cql 时；

select dateOf(id), id from keyspace.reports 
where token(id) > token(minTimeuuid('2013-07-16 16:12:48+0300'));

 dateOf(id)               | id
--------------------------+--------------------------------------
 2013-07-16 16:10:37+0300 | 1b3f6d00-ee19-11e2-8734-8d331d938752
 2013-07-16 16:10:13+0300 | 0d4b20e0-ee19-11e2-bbb3-e3eef18ad51b
 2013-07-16 16:10:37+0300 | 1b275870-ee19-11e2-b3f3-af3e3057c60f
 2013-07-16 16:10:48+0300 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

select dateOf(id), id from keyspace.reports
where token(id) > token(minTimeuuid('2013-07-16 16:13:48+0300'));

 dateOf(id)               | id
--------------------------+--------------------------------------
 2013-07-16 16:10:37+0300 | 1b275870-ee19-11e2-b3f3-af3e3057c60f
 2013-07-16 16:10:48+0300 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

是随机的吗？为什么它不提供有意义的输出？

在 cassandra 中最好的解决方案是什么？

最佳答案

您正在使用 token 函数，它在您的上下文中并不是很有用(使用 mintimeuuid 和 maxtimeuuid 在时间之间进行查询)并且生成随机且不正确的输出:

来自CQL documentation :

The TOKEN function can be used with a condition operator on the partition key column to query. The query selects rows based on the token of their partition key rather than on their value. The token of a key depends on the partitioner in use. The RandomPartitioner and Murmur3Partitioner do not yield a meaningful order.

如果您希望基于两个日期之间的所有记录进行检索，那么将数据建模为宽行可能更有意义，每列一条记录，而不是每行一条记录，例如，创建表格:

CREATE TABLE reports (
  reportname text,
  id timeuuid,
  report text,
  PRIMARY KEY (reportname, id)
)

，填充数据:

insert into reports2(reportname,id,report) VALUES ('report', 1b3f6d00-ee19-11e2-8734-8d331d938752, 'a');
insert into reports2(reportname,id,report) VALUES ('report', 0d4b20e0-ee19-11e2-bbb3-e3eef18ad51b, 'b');
insert into reports2(reportname,id,report) VALUES ('report', 1b275870-ee19-11e2-b3f3-af3e3057c60f, 'c');
insert into reports2(reportname,id,report) VALUES ('report', 21f9a390-ee19-11e2-89a2-97143e6cae9e, 'd');

和查询(没有 token 调用!):

select dateOf(id),id from reports2 where reportname='report' and id>maxtimeuuid('2013-07-16 16:10:48+0300');

，返回预期结果:

 dateOf(id)               | id
--------------------------+--------------------------------------
 2013-07-16 14:10:48+0100 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

这样做的缺点是您的所有报告都在一行中，当然您现在可以存储许多不同的报告(此处以报告名称为键)。要在 2013 年 8 月获取名为 mynewreport 的所有报告，您可以使用以下方式进行查询:

select dateOf(id),id from reports2 where reportname='mynewreport' and id>=mintimeuuid('2013-08-01+0300') and id<mintimeuuid('2013-09-01+0300');

关于cassandra - 时间序列数据，在cassandra中使用maxTimeuuid/minTimeuuid选择范围，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17883202/

cassandra - 时间序列数据，在cassandra中使用maxTimeuuid/minTimeuuid选择范围

上一篇：ruby-on-rails - 无法使用嵌套属性批量分配 protected 属性

下一篇：scala - 条件特征混合