我有一个 Elasticsearch 文档索引,其中有一个包含 URL 列表的字段。正如预期的那样,在此字段上聚合为我提供了唯一 URL 的计数。
GET models*/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"links": {
"terms": {
"field": "links.keyword",
"size": 10
}
}
}
}
然后我想过滤掉键不包含某个字符串的桶。我试过用 Bucket Selector Aggregation 这样做.
这次尝试:
GET models*/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"links": {
"terms": {
"field": "links.keyword",
"size": 10
}
},
"links_key_filter": {
"bucket_selector": {
"buckets_path": {
"key": "links"
},
"script": "!key.contains('foo')"
}
}
}
}
失败:
Invalid pipeline aggregation named [links_key_filter] of type [bucket_selector]. Only sibling pipeline aggregations are allowed at the top level
将桶选择器放在链接聚合中,如下所示:
GET models*/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"links": {
"terms": {
"field": "links.keyword",
"size": 10
},
"bucket_selector": {
"buckets_path": {
"key": "links"
},
"script": "!key.contains('foo')"
}
}
}
}
失败:
Found two aggregation type definitions in [links]: [terms] and [bucket_selector]
我会继续修修补补,但现在有点卡住了:(
最佳答案
您不会可以使用bucket_selector
因为它的 bucket_path
must reference either a number value or a single value numeric metric aggregation [source]
多么
terms
聚合产生的表示为 StringTerms
— 这根本行不通,无论您是否force a placeholder multibucket aggregation或不。话虽如此,各
terms
聚合支持 exclude
filter .假设您的链接是关键字数组:
POST models/_doc/1
{
"links": [
"google.com",
"wikipedia.org"
]
}
POST models/_doc/2
{
"links": [
"reddit.com",
"google.com"
]
}
并且您想对除 reddit
之外的所有内容进行分组,您可以使用以下正则表达式:POST models*/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"links": {
"terms": {
"field": "links.keyword",
"exclude": ".*reddit.*", <--
"size": 10
}
}
}
}
顺便说一句,使用此类正则表达式会产生一些重要的影响,尤其是。当您想象一个区分大小写的场景时,您需要一个查询时间生成的正则表达式 — 如 How to correctly query inside of terms aggregate values in elasticsearch, using include and regex? 中所述
关于elasticsearch - 按桶键值过滤 Elasticsearch 聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47458352/