Elasticsearch 聚合之聚合

标签 elasticsearch aggregation elasticsearch-2.0

我想知道是否有类似 bucket_selector 的方法但是基于键匹配而不是数字度量的测试。

为了提供更多上下文,这是我的用例:

数据样本:

[
  {
    "@version": "1",
    "@timestamp": "2017-04-27T04:28:23.589Z",
    "type": "json",
    "headers": {
      "message": {
        "type": "requestactivation"
      }
    },
    "id": "668"
  },
  {
    "@version": "1",
    "@timestamp": "2017-04-27T04:32:23.589Z",
    "type": "json",
    "headers": {
      "message": {
        "type": "requestactivation"
      }
    },
    "id": "669"
  },
  {
    "@version": "1",
    "@timestamp": "2017-04-27T04:30:00.802Z",
    "type": "json",
    "headers": {
      "message": {
        "type": "activationrequested"
      }
    },
    "id": "668"
  }
]

我想检索最后一个事件类型为 requestactivation 的所有 id .

我已经有一个聚合,可以检索每个 id 的最后一个事件类型,
但我还没有弄清楚如何根据 key 过滤桶

这是查询:
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "exists": {
            "field": "id"
          }
        },
        {
          "terms": {
            "headers.message.type": [
              "requestactivation",
              "activationrequested"
            ]
          }
        }
      ]
    }
  },
  "aggs": {
    "id": {
      "terms": {
        "field": "id",
        "size": 10000
      },
      "aggs": {
        "latest": {
          "max": {
            "field": "@timestamp"
          }
        },
        "hmtype": {
          "terms": {
            "field": "headers.message.type",
            "size": 1
          }
        }
      }
    }
  }
}

这是一个结果示例:
{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "id": {
      "doc_count_error_upper_bound": 3,
      "sum_other_doc_count": 46,
      "buckets": [
        {
          "key": "986",
          "doc_count": 4,
          "hmtype": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 2,
            "buckets": [
              {
                "key": "activationrequested",
                "doc_count": 2
              }
            ]
          },
          "latest": {
            "value": 1493238253603,
            "value_as_string": "2017-04-26T20:24:13.603Z"
          }
        },
        {
          "key": "967",
          "doc_count": 2,
          "hmtype": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 1,
            "buckets": [
              {
                "key": "requestactivation",
                "doc_count": 1
              }
            ]
          },
          "latest": {
            "value": 1493191161242,
            "value_as_string": "2017-04-26T07:19:21.242Z"
          }
        },
        {
          "key": "554",
          "doc_count": 7,
          "hmtype": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 5,
            "buckets": [
              {
                "key": "requestactivation",
                "doc_count": 5
              }
            ]
          },
          "latest": {
            "value": 1493200196871,
            "value_as_string": "2017-04-26T09:49:56.871Z"
          }
        }
      ]
    }
  }
}

不分析所有映射(关键字)。

目标是将结果减少到仅存储桶中的键为“requestactivation”的结果。

文档计数不能用于一个 id 的activationrequest 可以多次出现。

最近才开始研究聚合,所以如果问题似乎很明显,请道歉,周围的例子似乎与这个特定的逻辑不匹配。

最佳答案

include 怎么样用于 terms聚合以将术语中包含的值“过滤”为仅与请求相关的值:

{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "exists": {
            "field": "id"
          }
        },
        {
          "terms": {
            "headers.message.type": [
              "requestactivation",
              "activationrequested"
            ]
          }
        }
      ]
    }
  },
  "aggs": {
    "id": {
      "terms": {
        "field": "id",
        "size": 10000
      },
      "aggs": {
        "latest": {
          "max": {
            "field": "@timestamp"
          }
        },
        "hmtype": {
          "filter": {
            "terms": {
              "headers.message.type": [
                "requestactivation",
                "activationrequested"
              ]
            }
          },
          "aggs": {
            "count_types": {
              "cardinality": {
                "field": "headers.message.type"
              }
            }
          }
        },
        "filter_buckets": {
          "bucket_selector": {
            "buckets_path": {
              "totalTypes":"hmtype > count_types"
            },
            "script": "params.totalTypes == 2"
          }
        }
      }
    }
  }
}

关于Elasticsearch 聚合之聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43650641/

相关文章:

hadoop - 为什么选择 Hadoop 或 Spark?有 Elasticsearch

elasticsearch - 如何在Elasticsearch中存储URL以便快速访问?

java - Elastic Search 7高级客户端使用映射创建索引

elasticsearch - Elasticsearch:聚合访问IndexedScript

elasticsearch - Elasticsearch Curator:不按时间字符串显示索引

php - PHP Elasticsearch “Set”映射

Elasticsearch 查询跨索引搜索地理和非地理数据?

elasticsearch - 如何按ID中的文档对Elasticsearch进行排序?

具有多个条件和ID的mongodb聚合查找

node.js - MongoDB - $merge 可在 Compass 中运行,但不能在 Node.js Lambda 中运行