我的数据库表列如下:
ID |公司名称 |许可证号 |违规行为 | ...
我需要找出那些违规次数超过 5 次的企业。
我有以下内容:
{
"query": {
"bool": {
"must": {
"match": {
"violations": {
"query": "MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
},
"must_not": {
"match": {
"violations": {
"query": "NO MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
}
}
}
},
"aggs" : {
"selected_bizs" :{
"terms" : {
"field" : "Biz Name.keyword",
"min_doc_count": 5,
"size" :1000
},
"aggs": {
"top_biz_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
看起来可行。
现在我需要找出那些有 5 次或更多违规行为(如上),并且还拥有 3 或更多许可证编号的企业。
我不知道如何进一步汇总它。
谢谢!
最佳答案
假设您的 License #
字段的定义与 Biz Name
一样并且具有 .keyword
映射.
现在,声明:
find the businesses that have ... 3 or more license #s
可以改写为:
aggregate by the
business name
under the condition that the number of distinct values of the bucketedlicense IDs
is greater or equal to 3.
话虽这么说,您可以使用cardinality
aggregation 获取不同的许可证 ID。
其次,“在条件下聚合”的机制是方便的 bucket_script
aggregation 它执行一个脚本来确定当前迭代的存储桶是否将保留在最终聚合中。
同时利用这两者意味着:
POST your-index/_search
{
"size": 0,
"query": {
"bool": {
"must": {
"match": {
"violations": {
"query": "MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
},
"must_not": {
"match": {
"violations": {
"query": "NO MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
}
}
},
"aggs": {
"selected_bizs": {
"terms": {
"field": "Biz Name.keyword",
"min_doc_count": 5,
"size": 1000
},
"aggs": {
"top_biz_hits": {
"top_hits": {
"size": 10
}
},
"unique_license_ids": {
"cardinality": {
"field": "License #.keyword"
}
},
"must_have_min_3_License #s": {
"bucket_selector": {
"buckets_path": {
"unique_license_ids": "unique_license_ids"
},
"script": "params.unique_license_ids >= 3"
}
}
}
}
}
}
这就是全部内容!
关于带条件的 Elasticsearch 子聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66602013/