elasticsearch - Elasticsearch汇总存储桶结果

标签 elasticsearch

我想汇总或计算存储桶中的结果。
例如:

{
 ID: 1,
 customer_name: a,
 age: 21,
 other_field: x
},
{
 ID: 2,
 customer_name: a,
 age: 25,
 other_field: x
}
{
 ID: 3,
 customer_name: a,
 age: 32,
 other_field: x
}
{
 ID: 4,
 customer_name: b,
 age: 24,
 other_field: x
}
{
 ID: 5,
 customer_name: b,
 age: 33,
 other_field: x
}
{
 ID: 6,
 customer_name: b,
 age: 17,
 other_field: y
},
{
 ID: 7,
 customer_name: c,
 age: 34,
 other_field: x
},
{
 ID: 8,
 customer_name: c,
 age: 26,
 other_field: y
}

我的查询是:
"query": {
  "bool": {
    "must": { "match": { "other_field": "x" }},
    }
  }

命中文档的ID为[1,2,3,4,5,7]

我要做的就是找出每个客户最年轻的热门文档

我的汇总查询是
    "aggs": {
        "distinct_user": {
            "terms": {
                "field": "customer_name",
                "size": 100
            },
            "aggs": {
                "youngest": {
                    "min": {
                        "field": "AGE"
                    }
                }
            }
        }
    }
bucket: [
 {
  "key": "a",
  "doc_count": 3,
  "youngest": {
  "value": 21
   }
 },
 {
  "key": "b",
  "doc_count": 2,
  "youngest": {
  "value": 24
   }
 },
 {
  "key": "c",
  "doc_count": 1,
  "youngest": {
  "value": 34
   }
 }
]

比使用范围汇总来计算年龄分布

21〜30:2
31〜40:1

有什么办法可以汇总存储桶结果?
或有什么解决方法?

最佳答案

一种方法是利用 bucket_selector stats_bucket 管道聚合。您可以根据需要添加任意数量的年龄组。我刚刚添加了两个相关的组件,以展示一种解决方案:

POST test/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": {
        "match": {
          "other_field": "x"
        }
      }
    }
  },
  "aggs": {
    "customers_20_30": {
      "terms": {
        "field": "customer_name",
        "size": 100
      },
      "aggs": {
        "youngest": {
          "min": {
            "field": "age"
          }
        },
        "20-30": {
          "bucket_selector": {
            "buckets_path": {
              "youngest": "youngest"
            },
            "script": "params.youngest >= 20 && params.youngest < 30"
          }
        }
      }
    },
    "customers_20_30_count": {
      "stats_bucket": {
        "buckets_path": "customers_20_30._count"
      }
    },
    "customers_30_40": {
      "terms": {
        "field": "customer_name.keyword",
        "size": 100
      },
      "aggs": {
        "youngest": {
          "min": {
            "field": "age"
          }
        },
        "30-40": {
          "bucket_selector": {
            "buckets_path": {
              "youngest": "youngest"
            },
            "script": "params.youngest >= 30 && params.youngest < 40"
          }
        }
      }
    },
    "customers_30_40_count": {
      "stats_bucket": {
        "buckets_path": "customers_30_40._count"
      }
    }
  }
}

在结果中,您将获得:
"customers_20_30_count" : {
  "count" : 2,                   <--- 2 buckets for 20-30
  "min" : 2.0,
  "max" : 3.0,
  "avg" : 2.5,
  "sum" : 5.0
},
"customers_30_40_count" : {
  "count" : 1,                   <--- 1 bucket for 30-40
  "min" : 1.0,
  "max" : 1.0,
  "avg" : 1.0,
  "sum" : 1.0
}

关于elasticsearch - Elasticsearch汇总存储桶结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62263347/

相关文章:

elasticsearch - 将JSON文件中的大容量索引文档导入ElasticSearch

elasticsearch - 我可以使用ElasticSearch映射转换来复制字段

python - Elasticsearch DSL 中动态生成的 DocType

python - “没有为[过滤]注册[查询]”

elasticsearch - Logstash 错误 |伐木 worker 协议(protocol)错误

elasticsearch - 如何让Elasticsearch坐标节点不合并和求助

java - 为什么 filterQuery 在 Elastic Search 的 JAVA 高级 REST 客户端中不起作用?

elasticsearch - Metricbeat WARN无法索引事件

elasticsearch - 在多/全部elasticsearch嵌套字段中动态搜索

elasticsearch - postman 对 Elasticsearch 的 POST 请求有错误