elasticsearch - Elasticsearch:随机字段的聚合

标签 elasticsearch elasticsearch-aggregation elasticsearch-scripting

enter image description here
现在,我有一个类似图片的文件。本文档的结构是带有多个随机键字段的“内容”字段(请注意,键没有固定格式。它们可能就像UUID一样)。我想通过ES查询找到“目录”中所有键的start_time最大值。我该怎么办?
该文件:

{"contents": {
    "key1": {
        "start_time": "2020-08-01T00:00:19.500Z",
        "last_event_published_time": "2020-08-01T23:59:03.738Z",
        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
        "size": 1590513,
        "read_offset": 1590513,
        "name": "key1_name"
    },
    "key2": {
        "start_time": "2020-08-01T00:00:19.500Z",
        "last_event_published_time": "2020-08-01T23:59:03.738Z",
        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
        "size": 1590513,
        "read_offset": 1590513,
        "name": "key2_name"
    }
}}
我已经尝试过乔的解决方案,并且可以正常工作。但是当我像修改文档时:
{
"timestamp": "2020-08-01T23:59:59.359Z",
"type": "beats_stats",
"beats_stats": {
    "metrics": {
        "filebeat": {
            "harvester": {
                "files": {
                    "d47f60db-ac59-4b51-a928-0772a815438a": {
                        "start_time": "2020-08-01T00:00:18.320Z",
                        "last_event_published_time": "2020-08-01T23:59:03.738Z",
                        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
                        "size": 1590513,
                        "read_offset": 1590513,
                        "name": "/data/logs/galogs/ga_log_2020-08-01.log"
                    },
                    "e47f60db-ac59-4b51-a928-0772a815438a": {
                        "start_time": "2020-08-01T00:00:19.500Z",
                        "last_event_published_time": "2020-08-01T23:59:03.738Z",
                        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
                        "size": 1590513,
                        "read_offset": 1590513,
                        "name": "/data/logs/galogs/ga_log_2020-08-01.log"
                    }
                }
            }
        }
    }
}}
出错了:
"error" : {
"root_cause" : [
  {
    "type" : "script_exception",
    "reason" : "runtime error",
    "script_stack" : [
      "for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            ",
      "                                                                               ^---- HERE"
    ],
    "script" : "\n          for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            state.start_millis_arr.add(\n              Instant.parse(entry.start_time).toEpochMilli()\n            );\n          }\n        ",
    "lang" : "painless"
  }
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
  {
    "shard" : 0,
    "index" : "agg-test-index-1",
    "node" : "B4mXZVgrTe-MsAQKMVhHUQ",
    "reason" : {
      "type" : "script_exception",
      "reason" : "runtime error",
      "script_stack" : [
        "for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            ",
        "                                                                               ^---- HERE"
      ],
      "script" : "\n          for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            state.start_millis_arr.add(\n              Instant.parse(entry.start_time).toEpochMilli()\n            );\n          }\n        ",
      "lang" : "painless",
      "caused_by" : {
        "type" : "null_pointer_exception",
        "reason" : null
      }
    }
  }
]}

最佳答案

您可以使用scripted_metric进行计算。这相当繁重,但肯定是可能的。
模拟索引并同步一些文档:

POST myindex/_doc
{"contents":{"randomKey1":{"start_time":"2020-08-06T11:01:00.515Z"}}}

POST myindex/_doc
{"contents":{"35431fsf31_s35dfas":{"start_time":"2021-08-06T11:01:00.515Z"}}}

POST myindex/_doc
{"contents":{"999bc_123":{"start_time":"2019-08-06T11:01:00.515Z"}}}
获取未知随机子对象的最大日期:
GET myindex/_search
{
  "size": 0,
  "aggs": {
    "max_start_date": {
      "scripted_metric": {
        "init_script": "state.start_millis_arr = [];",
        "map_script": """
          for (def entry : params._source['contents'].values()) {
            state.start_millis_arr.add(
              Instant.parse(entry.start_time).toEpochMilli()
            );
          }
        """,
        "combine_script": """
          // sort in-place
          Collections.sort(state.start_millis_arr, Collections.reverseOrder());
          return DateTimeFormatter.ISO_INSTANT.format(
            Instant.ofEpochMilli(
              // first is now the highest
              state.start_millis_arr[0]
            )
          );

        """,
        "reduce_script": "return states"
      }
    }
  }
}
顺便说一句:@Sahil Gupta的评论是正确的-切勿在可能粘贴文字的地方使用图片(并且很有帮助)。

关于elasticsearch - Elasticsearch:随机字段的聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63281932/

相关文章:

django - Django Rest Framework Elasticsearch :RequestError 400 parsing_exception

elasticsearch - 如何使用ReST API在Elastic Search中保持重复数据计数?

java - 用于获取值数组的 Elasticsearch 查询

elasticsearch - elasticsearch字符串聚合数组

python - 更新/插入后,Elasticsearch排序失败

elasticsearch 按文档 id 排序

elasticsearch - 如何使聚合适用于文本字段

elasticsearch - Elasticsearch轻松更新唯一元素列表

elasticsearch - 从ElasticSearch查询的日期字段中提取日期(yyyyMMdd)