elasticsearch - Elasticsearch在inner_hits上聚合

标签 elasticsearch

我正在尝试对嵌套对象(查询)的inner_hits进行一些聚合,这些聚合基于查询日期进行过滤。我在以下块中进行的聚合是对主文档和“查询”中的所有对象进行聚合,而不仅仅是内部匹配中的对象。

GET /networkcollection/branch_routers/_search/
{
  "_source": false,
  "query": {
    "filtered": {
      "query": {
        "match": {
          "mh": 123
        }
      },
      "filter": {
        "nested": {
          "path": "queries",
          "filter": {
            "range": {
              "queries.dateQuery": {
                "gt": "20160101T200000.000Z",
                "lte": "now"
              }
            }
          },
          "inner_hits": {}
        }
      }
    }
  },
  "aggs": {
    "queries": {
      "filter": {
        "nested": {
          "path": "queries",
          "filter": {
            "range": {
              "queries.dateQuery": {
                "gte": "20160101T200000.000Z",
                "lte": "now"
              }
            }
          }
        }
      },
      "aggs": {
        "minDateQuery": {
          "min": {
            "field": "queries.dateQuery"
          }
        }
      }
    }
  }
}

我如何完成此聚合,以便仅聚合inner_hits中返回的“查询”对象?

最佳答案

我对这个答案很晚了,但是很可能仅在inner_hits上进行汇总。

我的ES版本:6.2.3

我正在提供详细的响应,包括索引映射,一些虚拟文档和search_query +响应。

基本思想是使用“过滤器”聚合。您根本不需要实际使用search_request的“query”部分,除非您要执行一些非常复杂的查询(以缩小聚合配置文件的范围)。可以在聚合“过滤器”中轻松指定大多数简单查询。

索引设置:

PUT networkcollection
{
  "mappings": { 
    "branch_routers" : {
      "properties" : {
        "mh" : {
          "type" : "text"
        },
        "queries" : {
          "type" : "nested",
          "properties" : {
            "dateQuery" : {
              "type" : "date"
            }
          }
        }
      }
    }
  }
}

PUT networkcollection/branch_routers/1
{
  "mh" : "corona",
  "queries" : [
    {
      "dateQuery" : "2012-04-23"
    },
    {
      "dateQuery" : "2013-04-23"
    },
    {
      "dateQuery" : "2014-04-23"
    },
    {
      "dateQuery" : "2015-04-23"
    },
    {
      "dateQuery" : "2016-04-23"
    },
    {
      "dateQuery" : "2017-04-23"
    },
    {
      "dateQuery" : "2018-04-23"
    },
    {
      "dateQuery" : "2019-04-23"
    },
    {
      "dateQuery" : "2020-04-23"
    }
  ]
}

PUT networkcollection/branch_routers/2
{
  "mh" : "happy",
  "queries" : [
    {
      "dateQuery" : "2009-04-23"
    },
    {
      "dateQuery" : "2008-04-23"
    },
    {
      "dateQuery" : "2007-04-23"
    },
    {
      "dateQuery" : "2015-04-23"
    },
    {
      "dateQuery" : "2016-04-23"
    },
    {
      "dateQuery" : "2017-04-23"
    },
    {
      "dateQuery" : "2018-04-23"
    },
    {
      "dateQuery" : "2019-04-23"
    },
    {
      "dateQuery" : "2020-04-23"
    }
  ]
}

PUT networkcollection/branch_routers/3
{
  "mh" : "happy",
  "queries" : [
    {
      "dateQuery" : "2001-04-23"
    },
    {
      "dateQuery" : "2008-04-23"
    },
    {
      "dateQuery" : "2007-04-23"
    },
    {
      "dateQuery" : "2015-04-23"
    },
    {
      "dateQuery" : "2016-04-23"
    },
    {
      "dateQuery" : "2017-04-23"
    },
    {
      "dateQuery" : "2018-04-23"
    },
    {
      "dateQuery" : "2019-04-23"
    },
    {
      "dateQuery" : "2020-04-23"
    }
  ]
}


我们添加了三个基本文档,现在我们尝试将“mh”过滤为“happy”,并且我们希望嵌套对象中的最小dateQuery能够在2016年到现在之间过滤(我们目前位于中间日冕病毒锁定的原因,所以您知道这一年:))。

搜索查询:
GET networkcollection/branch_routers/_search
{
  "_source": false, 
  "query": {
    "match": {
      "mh": "happy"
    }
  },
  "aggs": {
    "filtered_agg": {
      "filter": {
        "match" : {
          "mh" : "happy"
        }
      },
      "aggs": {
        "filtered_nested": {
          "nested": {
            "path": "queries"
          },
          "aggs": {
            "dateQuery_agg": {
              "date_range": {
                "field": "queries.dateQuery",
                "ranges": [
                  {
                    "from": "now-4y/y",
                    "to": "now"
                  }
                ]
              },
              "aggs": {
                "min_date": {
                  "min": {
                    "field": "queries.dateQuery"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

响应:
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "networkcollection",
        "_type": "branch_routers",
        "_id": "2",
        "_score": 0.2876821
      },
      {
        "_index": "networkcollection",
        "_type": "branch_routers",
        "_id": "3",
        "_score": 0.2876821
      }
    ]
  },
  "aggregations": {
    "filtered_agg": {
      "doc_count": 2,
      "filtered_nested": {
        "doc_count": 18,
        "dateQuery_agg": {
          "buckets": [
            {
              "key": "2016-01-01T00:00:00.000Z-2020-05-14T23:02:31.611Z",
              "from": 1451606400000,
              "from_as_string": "2016-01-01T00:00:00.000Z",
              "to": 1589497351611,
              "to_as_string": "2020-05-14T23:02:31.611Z",
              "doc_count": 10,
              "min_date": {
                "value": 1461369600000,
                "value_as_string": "2016-04-23T00:00:00.000Z"
              }
            }
          ]
        }
      }
    }
  }
}

如您所见,它可以正确过滤掉以“mh” =“corona”列出的文档,并仅保留带有“mh” =“happy”的两个文档,然后只过滤那些位于我指定的对象中的“查询”对象日期范围,最后提供min_date。

关于elasticsearch - Elasticsearch在inner_hits上聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35594652/

相关文章:

elasticsearch - 像GROUP BY AND HAVING这样的SQL示例

elasticsearch - 聚合查询:如何聚合并获取带有文档计数的多个值?

python - 如何将提取的数据转换成python字典?

elasticsearch - 查找字段不存在或字段小于值的文档

heroku - 无法使geo_point与Heroku上的Bonsai一起使用

elasticsearch - 如何在elasticsearch中结合模式分析器和char_filter

elasticsearch - 严格使用ElasticSearch映射

elasticsearch - 如何在ElasticSearch索引中的每个文档中添加路径字段?

elasticsearch - 在并行运行的 Ubuntu 上设置 Elasticsearch 的网络访问

elasticsearch - 对嵌套对象的所有字段进行匹配的 Elasticsearch 嵌套查询