我有一种情况,我需要提取在给定时间段内多次出现的users(user_id)。我创建了一个聚合,在其中可以提取并计算它们在特定时期内出现的实例,但是在下一个日期左右应该再次对而不是进行计数,它应该只出现一次。
这就像在给定的时间段内计算唯一用户注册一样,您只需要计算它们出现的第一个实例/日期即可。日期过滤器可以是每小时,每天,每周,每月。
user_id_1 2020-01-05
user_id_1 2020-02-06
user_id_1 2020-02-14
user_id_2 2020-02-03
user_id_2 2020-02-04
user_id_3 2020-03-03
user_id_1 2020-03-15
user_id_2 2020-03-21
user_id_3 2020-03-25
user_id_3 2020-04-01
预期的输出应该是,仅在它们首次显示的当月计数一次。他们不应该在其他月份再次计算
user_id_1 | 1 count | 2020-01-05
user_id_2 | 1 count | 2020-02-03
user_id_3 | 1 count | 2020-03-03
Total | 3 counts|
这是我提出的示例代码。它计算在给定时间段内出现的user_id的实例。
{
"size":0,
"aggs":{
"result":{
"date_range":{
"field":"timestamp",
"format":"yyyy-MM-dd",
"ranges":[
{
"from":"2020-01-01",
"to":"2020-03-31"
}
]
},
"aggs":{
"histogram":{
"date_histogram":{
"field":"timestamp",
"calendar_interval":"1M",
"extended_bounds":{
"min":"2020-01-01",
"max":"2020-03-31"
},
"format":"yyyy-MM-dd"
},
"aggs":{
"user_sigups":{
"terms":{
"field":"user_id.keyword"
}
}
}
}
}
}
}
}
以上查询的样本结果如下所示。
{
"histogram":{
"buckets":[
{
"key_as_string":"2020-01-01",
"key":1577836800000,
"doc_count":1925,
"user_sigups":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":328,
"buckets":[
{
"key":"2532456443539602",
"doc_count":505
}
]
}
}
]
}
}
任何帮助,将不胜感激。
最佳答案
您需要使用基数聚合来计算唯一项,而最小聚合则可以获取最短日期
查询:
{
"size": 0,
"aggs": {
"result": {
"date_range": {
"field": "timestamp",
"format": "yyyy-MM-dd",
"ranges": [
{
"from": "2020-01-01",
"to": "2020-03-31"
}
]
},
"aggs": {
"histogram": {
"date_histogram": {
"field": "date",
"interval": "1M",
"extended_bounds": {
"min": "2020-01-01",
"max": "2020-03-31"
},
"format": "yyyy-MM-dd"
},
"aggs": {
"user_sigups": {
"terms": {
"field": "user_id.keyword"
},
"aggs": {
"unique_count": {
"cardinality": {
"field": "user_id.keyword"
}
},
"min_date":{
"min": {
"field": "timestamp"
}
}
}
}
}
}
}
}
}
}
结果:
"aggregations" : {
"result" : {
"buckets" : [
{
"key" : "2020-01-01-2020-03-31",
"from" : 1.5778368E12,
"from_as_string" : "2020-01-01",
"to" : 1.5856128E12,
"to_as_string" : "2020-03-31",
"doc_count" : 9,
"histogram" : {
"buckets" : [
{
"key_as_string" : "2020-01-01",
"key" : 1577836800000,
"doc_count" : 1,
"user_sigups" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "user_id_1",
"doc_count" : 1,
"min_date" : {
"value" : 1.5781824E12,
"value_as_string" : "2020-01-05"
},
"unique_count" : {
"value" : 1
}
}
]
}
},
{
"key_as_string" : "2020-02-01",
"key" : 1580515200000,
"doc_count" : 4,
"user_sigups" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "user_id_1",
"doc_count" : 2,
"min_date" : {
"value" : 1.5809472E12,
"value_as_string" : "2020-02-06"
},
"unique_count" : {
"value" : 1
}
},
{
"key" : "user_id_2",
"doc_count" : 2,
"min_date" : {
"value" : 1.580688E12,
"value_as_string" : "2020-02-03"
},
"unique_count" : {
"value" : 1
}
}
]
}
},
{
"key_as_string" : "2020-03-01",
"key" : 1583020800000,
"doc_count" : 4,
"user_sigups" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "user_id_3",
"doc_count" : 2,
"min_date" : {
"value" : 1.5831936E12,
"value_as_string" : "2020-03-03"
},
"unique_count" : {
"value" : 1
}
},
{
"key" : "user_id_1",
"doc_count" : 1,
"min_date" : {
"value" : 1.5842304E12,
"value_as_string" : "2020-03-15"
},
"unique_count" : {
"value" : 1
}
},
{
"key" : "user_id_2",
"doc_count" : 1,
"min_date" : {
"value" : 1.5847488E12,
"value_as_string" : "2020-03-21"
},
"unique_count" : {
"value" : 1
}
}
]
}
}
]
}
}
]
}
让我知道它如何为您服务
关于elasticsearch - Elasticsearch在日期范围内仅获得一次唯一字段计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60897283/