带条件的 Elasticsearch 子聚合

我的数据库表列如下:

ID |公司名称 |许可证号 |违规行为 | ...

我需要找出那些违规次数超过 5 次的企业。

我有以下内容:

{
   "query": {
       "bool": {
         "must": {
            "match": {
              "violations": {
                "query": "MICE DROPPINGS were OBSERVED",
                "operator": "and"
              }
            }
          },
          "must_not": {
            "match": {
              "violations": {
                "query": "NO MICE DROPPINGS were OBSERVED",
                "operator": "and"
              }
            }
          }
        }
      }
    },

    "aggs" : {
          "selected_bizs" :{
                 "terms" : {
                      "field" : "Biz Name.keyword",
                                "min_doc_count": 5,
                                "size" :1000
                           },
                      "aggs": {
                          "top_biz_hits": {
                          "top_hits": {
                              "size": 10
                              }
                          }
                      }
                 }
            }
       }

看起来可行。

现在我需要找出那些有 5 次或更多违规行为(如上)，并且还拥有 3 或更多许可证编号的企业。

我不知道如何进一步汇总它。

谢谢!

最佳答案

假设您的 License # 字段的定义与 Biz Name 一样并且具有 .keyword 映射.

现在，声明:

find the businesses that have ... 3 or more license #s

可以改写为:

aggregate by the business name under the condition that the number of distinct values of the bucketed license IDs is greater or equal to 3.

话虽这么说，您可以使用cardinality aggregation 获取不同的许可证 ID。

其次，“在条件下聚合”的机制是方便的 bucket_script aggregation 它执行一个脚本来确定当前迭代的存储桶是否将保留在最终聚合中。

同时利用这两者意味着:

POST your-index/_search
{
  "size": 0, 
  "query": {
    "bool": {
      "must": {
        "match": {
          "violations": {
            "query": "MICE DROPPINGS were OBSERVED",
            "operator": "and"
          }
        }
      },
      "must_not": {
        "match": {
          "violations": {
            "query": "NO MICE DROPPINGS were OBSERVED",
            "operator": "and"
          }
        }
      }
    }
  },
  "aggs": {
    "selected_bizs": {
      "terms": {
        "field": "Biz Name.keyword",
        "min_doc_count": 5,
        "size": 1000
      },
      "aggs": {
        "top_biz_hits": {
          "top_hits": {
            "size": 10
          }
        },
        "unique_license_ids": {
          "cardinality": {
            "field": "License #.keyword"
          }
        },
        "must_have_min_3_License #s": {
          "bucket_selector": {
            "buckets_path": {
              "unique_license_ids": "unique_license_ids" 
            },
            "script": "params.unique_license_ids >= 3"
          }
        }
      }
    }
  }
}

这就是全部内容!

关于带条件的 Elasticsearch 子聚合，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66602013/

带条件的 Elasticsearch 子聚合

上一篇：sql-server - Entity Framework 扩展递归查询

下一篇：html - 如何根据视口(viewport)中的部分将类添加到正文标记