elasticsearch - Elasticsearch在日期范围内仅获得一次唯一字段计数

标签 elasticsearch

我有一种情况,我需要提取在给定时间段内多次出现的users(user_id)。我创建了一个聚合,在其中可以提取并计算它们在特定时期内出现的实例,但是在下一个日期左右应该再次对而不是进行计数,它应该只出现一次。

这就像在给定的时间段内计算唯一用户注册一样,您只需要计算它们出现的第一个实例/日期即可。日期过滤器可以是每小时,每天,每周,每月。

 user_id_1   2020-01-05
 user_id_1   2020-02-06
 user_id_1   2020-02-14
 user_id_2   2020-02-03
 user_id_2   2020-02-04
 user_id_3   2020-03-03
 user_id_1   2020-03-15
 user_id_2   2020-03-21
 user_id_3   2020-03-25
 user_id_3   2020-04-01

预期的输出应该是,仅在它们首次显示的当月计数一次。他们不应该在其他月份再次计算
user_id_1 | 1 count | 2020-01-05
user_id_2 | 1 count | 2020-02-03
user_id_3 | 1 count | 2020-03-03
Total     | 3 counts|

这是我提出的示例代码。它计算在给定时间段内出现的user_id的实例。
{
   "size":0,
   "aggs":{
      "result":{
         "date_range":{
            "field":"timestamp",
            "format":"yyyy-MM-dd",
            "ranges":[
               {
                  "from":"2020-01-01",
                  "to":"2020-03-31"
               }
            ]
         },
         "aggs":{
            "histogram":{
               "date_histogram":{
                  "field":"timestamp",
                  "calendar_interval":"1M",
                  "extended_bounds":{
                     "min":"2020-01-01",
                     "max":"2020-03-31"
                  },
                  "format":"yyyy-MM-dd"
               },
               "aggs":{
                  "user_sigups":{
                     "terms":{
                        "field":"user_id.keyword"
                     }
                  }
               }
            }
         }
      }
   }
}

以上查询的样本结果如下所示。
{
   "histogram":{
      "buckets":[
         {
            "key_as_string":"2020-01-01",
            "key":1577836800000,
            "doc_count":1925,
            "user_sigups":{
               "doc_count_error_upper_bound":0,
               "sum_other_doc_count":328,
               "buckets":[
                  {
                     "key":"2532456443539602",
                     "doc_count":505
                  }
               ]
            }
         }
      ]
   }
}

任何帮助,将不胜感激。

最佳答案

您需要使用基数聚合来计算唯一项,而最小聚合则​​可以获取最短日期

查询:

{
  "size": 0,
  "aggs": {
    "result": {
      "date_range": {
        "field": "timestamp",
        "format": "yyyy-MM-dd",
        "ranges": [
          {
            "from": "2020-01-01",
            "to": "2020-03-31"
          }
        ]
      },
      "aggs": {
        "histogram": {
          "date_histogram": {
            "field": "date",
            "interval": "1M",
            "extended_bounds": {
              "min": "2020-01-01",
              "max": "2020-03-31"
            },
            "format": "yyyy-MM-dd"
          },
          "aggs": {
            "user_sigups": {
              "terms": {
                "field": "user_id.keyword"
              },
             "aggs": {
               "unique_count": {
                 "cardinality": {
                   "field": "user_id.keyword"
                 }
               },
               "min_date":{
                 "min": {
                   "field": "timestamp"
                 }
               }
             }
            }
          }
        }
      }
    }
  }
}

结果:
"aggregations" : {
    "result" : {
      "buckets" : [
        {
          "key" : "2020-01-01-2020-03-31",
          "from" : 1.5778368E12,
          "from_as_string" : "2020-01-01",
          "to" : 1.5856128E12,
          "to_as_string" : "2020-03-31",
          "doc_count" : 9,
          "histogram" : {
            "buckets" : [
              {
                "key_as_string" : "2020-01-01",
                "key" : 1577836800000,
                "doc_count" : 1,
                "user_sigups" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [
                    {
                      "key" : "user_id_1",
                      "doc_count" : 1,
                      "min_date" : {
                        "value" : 1.5781824E12,
                        "value_as_string" : "2020-01-05"
                      },
                      "unique_count" : {
                        "value" : 1
                      }
                    }
                  ]
                }
              },
              {
                "key_as_string" : "2020-02-01",
                "key" : 1580515200000,
                "doc_count" : 4,
                "user_sigups" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [
                    {
                      "key" : "user_id_1",
                      "doc_count" : 2,
                      "min_date" : {
                        "value" : 1.5809472E12,
                        "value_as_string" : "2020-02-06"
                      },
                      "unique_count" : {
                        "value" : 1
                      }
                    },
                    {
                      "key" : "user_id_2",
                      "doc_count" : 2,
                      "min_date" : {
                        "value" : 1.580688E12,
                        "value_as_string" : "2020-02-03"
                      },
                      "unique_count" : {
                        "value" : 1
                      }
                    }
                  ]
                }
              },
              {
                "key_as_string" : "2020-03-01",
                "key" : 1583020800000,
                "doc_count" : 4,
                "user_sigups" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [
                    {
                      "key" : "user_id_3",
                      "doc_count" : 2,
                      "min_date" : {
                        "value" : 1.5831936E12,
                        "value_as_string" : "2020-03-03"
                      },
                      "unique_count" : {
                        "value" : 1
                      }
                    },
                    {
                      "key" : "user_id_1",
                      "doc_count" : 1,
                      "min_date" : {
                        "value" : 1.5842304E12,
                        "value_as_string" : "2020-03-15"
                      },
                      "unique_count" : {
                        "value" : 1
                      }
                    },
                    {
                      "key" : "user_id_2",
                      "doc_count" : 1,
                      "min_date" : {
                        "value" : 1.5847488E12,
                        "value_as_string" : "2020-03-21"
                      },
                      "unique_count" : {
                        "value" : 1
                      }
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }

让我知道它如何为您服务

关于elasticsearch - Elasticsearch在日期范围内仅获得一次唯一字段计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60897283/

相关文章:

elasticsearch - 从2.x升级到5.x后,Elasticsearch存储库不显示任何快照

jquery - 飞行前响应中的Access-Control-Allow-Headers不允许请求 header 字段kbn-version

elasticsearch - 如何使用NEST搜索文本字段 “as is”?

elasticsearch - ElasticSearch-模糊关键字匹配

elasticsearch - ElasticSearch-用于在索引上分析文档的配置

elasticsearch - Elasticsearch中的过滤器/查询支持

ruby-on-rails - 是否可以在本地重现429错误(请求过多)?

Elasticsearch - Bootstrap 检查失败

elasticsearch - 删除数据库项目后如何从Elasticsearch中删除项目

node.js - 我无法过滤也无法聚合通过 LogStash 保存到 ElasticSearch 的文档