elasticsearch - 对top_hits聚合的总和

标签 elasticsearch

简短的问题:如果我对每个存储区的top_hits进行汇总,如何在结果结构中求和特定值?

细节:

我有许多记录,每个商店包含一定数量。我想获取每个商店的所有最新记录的总和。

为了获得每个商店的最新记录,我创建以下聚合:

"latest_quantity_per_store": {
    "aggs": {
        "latest_quantity": {
            "top_hits": {
                "sort": [
                    {
                        "datetime": "desc"
                    },
                    {
                        "quantity": "asc"
                    }
                ],
                "_source": {
                    "includes": [
                        "quantity"
                    ]
                },
                "size": 1
            }
        }
    },
    "terms": {
        "field": "store",
        "size": 10000
    }
}

假设我有两个商店,每个商店有两个数量用于两个不同的时间戳。这是该聚合的结果:
"latest_quantity_per_store": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        {
            "key": "01",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "O6wFD2UBG8e7nvSU8dYg",
                            "_score": null,
                            "_source": {
                                "quantity": 6
                            },
                            "sort": [
                                1532476800000,
                                6
                            ]
                        }
                    ]
                }
            }
        },
        {
            "key": "02",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "pLUFD2UBHBuSGcoH0ZT4",
                            "_score": null,
                            "_source": {
                                "quantity": 11
                            },
                            "sort": [
                                1532476800000,
                                11
                            ]
                        }
                    ]
                }
            }
        }
    ]
}

我现在想在ElasticSearch中进行汇总,以汇总这些存储桶中的总和。在示例数据中,总和超过6和11。我尝试了以下聚合:
"latest_quantity": {
    "sum_bucket": {
        "buckets_path": "latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity"
    }
}

但这导致此错误:
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "No aggregation [hits] found for path [latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity]"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "inventory-local",
        "node": "3z5CqmmAQ-yT2sUCb69DzA",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "No aggregation [hits] found for path [latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity]"
        }
      }
    ]
  },
  "status": 400
}

什么是正确的聚合,以某种方式从ElasticSearch获得数字17?

我对另一个聚合进行了类似的操作,即平均值而不是top_hits聚合。
"average_quantity": {
    "sum_bucket": {
        "buckets_path": "average_quantity_per_store>average_quantity"
    }
},
"average_quantity_per_store": {
    "aggs": {
        "average_quantity": {
            "avg": {
                "field": "quantity"
            }
        }
    },
    "terms": {
        "field": "store",
        "size": 10000
    }
}

这可以按预期工作,这是结果:
"average_quantity_per_store": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        {
            "key": "01",
            "doc_count": 2,
            "average_quantity": {
                "value": 6
            }
        },
        {
            "key": "02",
            "doc_count": 2,
            "average_quantity": {
                "value": 11.5
            }
        }
    ]
},
"average_quantity": {
    "value": 17.5
}

最佳答案

有一种方法可以混合使用 scripted_metric 聚合和 sum_bucket 管道聚合来解决。脚本化的指标聚合有点复杂,但主要思想是允许您提供自己的存储算法并从中吐出一个指标指标。

在您的情况下,您要做的是找出每个商店的最新数量,然后对这些商店数量求和。解决方案如下所示,我将在下面解释一些细节:

POST inventory-local/_search
{
  "size": 0,
  "aggs": {
    "bystore": {
      "terms": {
        "field": "store.keyword",
        "size": 10000
      },
      "aggs": {
        "latest_quantity": {
          "scripted_metric": {
            "init_script": "params._agg.quantities = new TreeMap()",
            "map_script": "params._agg.quantities.put(doc.datetime.date, [doc.datetime.date.millis, doc.quantity.value])",
            "combine_script": "return params._agg.quantities.lastEntry().getValue()",
            "reduce_script": "def maxkey = 0; def qty = 0; for (a in params._aggs) {def currentKey = a[0]; if (currentKey > maxkey) {maxkey = currentKey; qty = a[1]} } return qty;"
          }
        }
      }
    },
    "sum_latest_quantities": {
      "sum_bucket": {
        "buckets_path": "bystore>latest_quantity.value"
      }
    }
  }
}

请注意,为了使其正常工作,您需要在script.painless.regex.enabled: true配置文件中设置elasticsearch.yml
init_script为每个分片创建一个TreeMapmap_script用日期/数量的映射填充每个分片上的TreeMap。我们在 map 中输入的值在单个字符串中包含时间戳和数量。我们稍后将在reduce_script中需要该时间戳。combine_script仅采用TreeMap的最后一个值,因为这是给定分片的最新数量。
大部分工作位于reduce_script中。我们迭代每个分片的所有最新数量,并返回最新的数量。

此时,我们为每个商店提供了最新数量。剩下要做的就是使用sum_bucket管道聚合来求和每个存储量。在那里,您得到17的结果。

响应如下所示:
 "aggregations": {
    "bystore": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "01",
          "doc_count": 2,
          "latest_quantity": {
            "value": 6
          }
        },
        {
          "key": "02",
          "doc_count": 2,
          "latest_quantity": {
            "value": 11
          }
        }
      ]
    },
    "sum_latest_quantities": {
      "value": 17
    }
  }

关于elasticsearch - 对top_hits聚合的总和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51709347/

相关文章:

django - 搜索结果因分页而失败

elasticsearch - _all上的 Elasticsearch edge_ngram匹配查询被忽略

java - 返回包含用户在 Elasticsearch 中输入的关键字的字符串列表

elasticsearch - 将Kibana连接到Elasticsearch-ELASTICSEARCH_URL与ELASTICSEARCH_HOSTS

python - 如何为 `Array`类型的字段创建映射?

elasticsearch - ElasticSearch:仅返回逻辑组的第一个结果

ruby-on-rails - 使用轮胎时如何将GET参数附加到 Elasticsearch

elasticsearch - 如何在 ElasticSearch 查询中只返回聚合统计信息?

elasticsearch - 无效的配置Logstash文件

hibernate - 为什么要将 Elasticsearch 或 Apache Solr 与 Hibernate Search 一起使用?