带条件的 Elasticsearch 子聚合

标签 elasticsearch elasticsearch-aggregation

我的数据库表列如下:

ID |公司名称 |许可证号 |违规行为 | ...

我需要找出那些违规次数超过 5 次的企业。

我有以下内容:

{
   "query": {
       "bool": {
         "must": {
            "match": {
              "violations": {
                "query": "MICE DROPPINGS were OBSERVED",
                "operator": "and"
              }
            }
          },
          "must_not": {
            "match": {
              "violations": {
                "query": "NO MICE DROPPINGS were OBSERVED",
                "operator": "and"
              }
            }
          }
        }
      }
    },

    "aggs" : {
          "selected_bizs" :{
                 "terms" : {
                      "field" : "Biz Name.keyword",
                                "min_doc_count": 5,
                                "size" :1000
                           },
                      "aggs": {
                          "top_biz_hits": {
                          "top_hits": {
                              "size": 10
                              }
                          }
                      }
                 }
            }
       }
   

看起来可行。

现在我需要找出那些有 5 次或更多违规行为(如上),并且还拥有 3 或更多许可证编号的企业。

我不知道如何进一步汇总它。

谢谢!

最佳答案

假设您的 License # 字段的定义与 Biz Name 一样并且具有 .keyword 映射.


现在,声明:

find the businesses that have ... 3 or more license #s

可以改写为:

aggregate by the business name under the condition that the number of distinct values of the bucketed license IDs is greater or equal to 3.

话虽这么说,您可以使用cardinality aggregation 获取不同的许可证 ID。

其次,“在条件下聚合”的机制是方便的 bucket_script aggregation 它执行一个脚本来确定当前迭代的存储桶是否将保留在最终聚合中。

同时利用这两者意味着:

POST your-index/_search
{
  "size": 0, 
  "query": {
    "bool": {
      "must": {
        "match": {
          "violations": {
            "query": "MICE DROPPINGS were OBSERVED",
            "operator": "and"
          }
        }
      },
      "must_not": {
        "match": {
          "violations": {
            "query": "NO MICE DROPPINGS were OBSERVED",
            "operator": "and"
          }
        }
      }
    }
  },
  "aggs": {
    "selected_bizs": {
      "terms": {
        "field": "Biz Name.keyword",
        "min_doc_count": 5,
        "size": 1000
      },
      "aggs": {
        "top_biz_hits": {
          "top_hits": {
            "size": 10
          }
        },
        "unique_license_ids": {
          "cardinality": {
            "field": "License #.keyword"
          }
        },
        "must_have_min_3_License #s": {
          "bucket_selector": {
            "buckets_path": {
              "unique_license_ids": "unique_license_ids" 
            },
            "script": "params.unique_license_ids >= 3"
          }
        }
      }
    }
  }
}

这就是全部内容!

关于带条件的 Elasticsearch 子聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66602013/

相关文章:

elasticsearch - ElasticSearch聚合+在非数值字段5.3上排序

elasticsearch - filter_duplicate_text不起作用聚合查询

ruby - 如何在Elasticsearch中优先考虑整个单词?

Elasticsearch 匹配具有多个值的属性(数组)

java - Elasticsearch : Retrieve long text field from a document

elasticsearch - Elasticsearch中的多个聚合

elasticsearch - Elasticsearch 按字段分组

java - elasticsearch java查询匹配我的列表中的任何一个

sorting - 如何将每个人(具有多个地址)的最短距离带到原点并对该值进行排序

elasticsearch - Elasticsearch和Kibana中的集合问题