enter image description here
现在,我有一个类似图片的文件。本文档的结构是带有多个随机键字段的“内容”字段(请注意,键没有固定格式。它们可能就像UUID一样)。我想通过ES查询找到“目录”中所有键的start_time最大值。我该怎么办?
该文件:
{"contents": {
"key1": {
"start_time": "2020-08-01T00:00:19.500Z",
"last_event_published_time": "2020-08-01T23:59:03.738Z",
"last_event_timestamp": "2020-08-01T23:59:03.737Z",
"size": 1590513,
"read_offset": 1590513,
"name": "key1_name"
},
"key2": {
"start_time": "2020-08-01T00:00:19.500Z",
"last_event_published_time": "2020-08-01T23:59:03.738Z",
"last_event_timestamp": "2020-08-01T23:59:03.737Z",
"size": 1590513,
"read_offset": 1590513,
"name": "key2_name"
}
}}
我已经尝试过乔的解决方案,并且可以正常工作。但是当我像修改文档时:{
"timestamp": "2020-08-01T23:59:59.359Z",
"type": "beats_stats",
"beats_stats": {
"metrics": {
"filebeat": {
"harvester": {
"files": {
"d47f60db-ac59-4b51-a928-0772a815438a": {
"start_time": "2020-08-01T00:00:18.320Z",
"last_event_published_time": "2020-08-01T23:59:03.738Z",
"last_event_timestamp": "2020-08-01T23:59:03.737Z",
"size": 1590513,
"read_offset": 1590513,
"name": "/data/logs/galogs/ga_log_2020-08-01.log"
},
"e47f60db-ac59-4b51-a928-0772a815438a": {
"start_time": "2020-08-01T00:00:19.500Z",
"last_event_published_time": "2020-08-01T23:59:03.738Z",
"last_event_timestamp": "2020-08-01T23:59:03.737Z",
"size": 1590513,
"read_offset": 1590513,
"name": "/data/logs/galogs/ga_log_2020-08-01.log"
}
}
}
}
}
}}
出错了:"error" : {
"root_cause" : [
{
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n ",
" ^---- HERE"
],
"script" : "\n for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n state.start_millis_arr.add(\n Instant.parse(entry.start_time).toEpochMilli()\n );\n }\n ",
"lang" : "painless"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "agg-test-index-1",
"node" : "B4mXZVgrTe-MsAQKMVhHUQ",
"reason" : {
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n ",
" ^---- HERE"
],
"script" : "\n for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n state.start_millis_arr.add(\n Instant.parse(entry.start_time).toEpochMilli()\n );\n }\n ",
"lang" : "painless",
"caused_by" : {
"type" : "null_pointer_exception",
"reason" : null
}
}
}
]}
最佳答案
您可以使用scripted_metric
进行计算。这相当繁重,但肯定是可能的。
模拟索引并同步一些文档:
POST myindex/_doc
{"contents":{"randomKey1":{"start_time":"2020-08-06T11:01:00.515Z"}}}
POST myindex/_doc
{"contents":{"35431fsf31_s35dfas":{"start_time":"2021-08-06T11:01:00.515Z"}}}
POST myindex/_doc
{"contents":{"999bc_123":{"start_time":"2019-08-06T11:01:00.515Z"}}}
获取未知随机子对象的最大日期:GET myindex/_search
{
"size": 0,
"aggs": {
"max_start_date": {
"scripted_metric": {
"init_script": "state.start_millis_arr = [];",
"map_script": """
for (def entry : params._source['contents'].values()) {
state.start_millis_arr.add(
Instant.parse(entry.start_time).toEpochMilli()
);
}
""",
"combine_script": """
// sort in-place
Collections.sort(state.start_millis_arr, Collections.reverseOrder());
return DateTimeFormatter.ISO_INSTANT.format(
Instant.ofEpochMilli(
// first is now the highest
state.start_millis_arr[0]
)
);
""",
"reduce_script": "return states"
}
}
}
}
顺便说一句:@Sahil Gupta的评论是正确的-切勿在可能粘贴文字的地方使用图片(并且很有帮助)。
关于elasticsearch - Elasticsearch:随机字段的聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63281932/