Elasticsearch 过滤聚合桶不存在的文档

标签 elasticsearch elasticsearch-aggregation elasticsearch-dsl

我有一个查询可以提供我想要的结果,但我需要进一步过滤,以便仅显示缺少特定存储桶的记录。

我的查询是这样的:

{
"size": 0,
"query": 
{
    "bool": 
    {
        "must": [{"match_all": {}}],
        "filter": 
        [
            {
                "bool": 
                {
                    "should": 
                    [
                        {"match_phrase": {"user": "bob_user"}},
                        {"match_phrase": {"user": "tom_user"}}
                    ],"minimum_should_match": 1
                }
            },
            {
                "bool": 
                {
                    "should": 
                    [
                        {"match_phrase": {"result_code": "403"}},
                        {"match_phrase": {"result_code": "200"}}
                    ],"minimum_should_match": 1
                }
            },
            {
                "range": {"time": {"gte": "2021-05-12T18:51:22.512Z","lte": "2021-05-13T18:51:22.512Z","format": "strict_date_optional_time"}}}
        ]
    }
},
"aggs": 
{
    "stats": 
    {
        "terms": {"field": "host.keyword","order": {"total_distinct_ip_count": "desc"},"size": 10000},
        "aggs": 
        {
            "total_distinct_ip_count": {"cardinality": {"field": "ip.keyword"}},
            "status_codes": 
            {
                "terms": {"field": "result_code.keyword","order": {"distinct_ip_count_by_status_code": "desc"},"size": 2},
                "aggs": 
                {
                    "distinct_ip_count_by_status_code": {"cardinality": {"field": "ip.keyword"}}
                }
            }
        }
    }
}

}

这会产生以下结果:

{
  "key" : "dom.com",
  "doc_count" : 92974,
  "status_codes" : {
    "buckets" : [
      {
        "key" : "200",
        "doc_count" : 92965,
        "distinct_ip_count_by_status_code" : {"value" : 51269}
      },
      {
        "key" : "403",
        "doc_count" : 9,
        "distinct_ip_count_by_status_code" : {"value" : 2}
      }
    ]
  },
  "total_distinct_ip_count" : {"value" : 51269}
},
{
  "key" : "dom2.com",
  "doc_count" : 1420,
  "status_codes" : {
    "buckets" : [
      {
        "key" : "403",
        "doc_count" : 1420,
        "distinct_ip_count_by_status_code" : {"value" : 5}
      }
    ]
  },
  "total_distinct_ip_count" : {"value" : 500}
},
{
  "key" : "dom3.com",
  "doc_count" : 171097,
  "status_codes" : {
    "buckets" : [
      {
        "key" : "200",
        "doc_count" : 127437,
        "distinct_ip_count_by_status_code" : {"value" : 735}
      },
      {
        "key" : "403",
        "doc_count" : 43660,
        "distinct_ip_count_by_status_code" : {"value" : 73}
      }
    ]
  },
  "total_distinct_ip_count" : {"value" : 808}
}

我需要一种方法来仅返回缺少 200 存储桶的记录。在本例中,dom2.com 的记录将是ONLY,因为它有 403 存储桶,但没有 200 存储桶。我搞乱了一个bucket_selector,但这只能从结果中排除一个存储桶。我想从整个结果中排除同时具有 200 和 403 条记录的记录。

最佳答案

{
"size": 0,
"query": 
{
    "bool": 
    {
        "must": [{"match_all": {}}],
        "filter": 
        [
            {
                "bool": 
                {
                    "should": 
                    [
                        {"match_phrase": {"user": "bob_user"}},
                        {"match_phrase": {"user": "tom_user"}}
                    ],"minimum_should_match": 1
                }
            },
            {
                "bool": 
                {
                    "should": 
                    [
                        {"match_phrase": {"result_code": "403"}},
                        {"match_phrase": {"result_code": "200"}}
                    ],"minimum_should_match": 1
                }
            },
            {
                "range": {"time": {"gte": "2021-05-12T18:51:22.512Z","lte": "2021-05-13T18:51:22.512Z","format": "strict_date_optional_time"}}}
        ]
    }
},
"aggs": 
{
    "stats": 
    {
        "terms": {"field": "host.keyword","order": {"total_distinct_ip_count": "desc"},"size": 10000},
        "aggs": 
        {
            "total_distinct_ip_count": {"cardinality": {"field": "ip.keyword"}},
            "status_codes": 
            {
                "terms": {"field": "result_code.keyword","order": {"distinct_ip_count_by_status_code": "desc"},"size": 2},
                "aggs": 
                {
                    "distinct_ip_count_by_status_code": {"cardinality": {"field": "ip.keyword"}}
                }
            },
            "only_403":
            {
                "bucket_selector":
                {
                    "buckets_path":
                    {"var1": "status_codes['200']>_count"},
                    "script": "params.var1 == null"
                }
            }
        }
    }
}

关于Elasticsearch 过滤聚合桶不存在的文档,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67540343/

相关文章:

elasticsearch - Dokku:如何在插件中更改Elasticsearch的版本

elasticsearch - ElasticSearch聚合可以执行SQL的功能吗?

elasticsearch - 具有过滤器的 Elasticsearch 聚合无法过滤聚合

python - 使用Elasticsearch DSL索引数据时出错

elasticsearch - 检查 Elasticsearch 中的列表字段

elasticsearch - 如何备份和恢复 ElasticSearch

node.js - Elasticsearch Node.js检查队列是否已满

elasticsearch - ElasticSearch Java API与ElasticsearchTemplate

elasticsearch - Elasticsearch聚合保留空格

elasticsearch - elasticsearch-无法通过更新API更新密集向量字段