elasticsearch - ElasticSearch查询:从ElasticSearch中的每个记录获取 `key`并返回唯一值的集合

标签 elasticsearch

我试图从ElasticSearch内的每个记录中获取categories并返回唯一类别的集合。

鉴于我有一些藏书

GET _search
{
  "query": {
    "match_all": {}
  }
}

# Response

{
  "hits": {
    "hits": [
      {
        "_source": {
          "title" : "Amazing Book",
          "categories": [
            {
                "id" : "123",
                "name" : "Comedy"
            },
            {
                "id" : "456",
                "name" : "Action"
            }
          ],
        }
      },
      {
        "_source": {
          "title" : "Other Amazing Book",
          "categories": [
            {
                "id" : "456",
                "name" : "Action"
            },
            {
                "id" : "987",
                "name" : "Romance"
            }
          ],
        }
      }
    ]
  }
}

什么查询将产生此输出?
{
  "categories": [
    {
        "id" : "123",
        "name" : "Comedy"
    },
    {
        "id" : "456",
        "name" : "Action"
    },
    {
        "id" : "987",
        "name" : "Romance"
    }
  ]
}

最佳答案

您想要的与聚集功能有关。

我实现了产生合适的输出的效果,但是您必须像这样更改映射:

POST test/book/_mapping
{
"properties": {
    "title":{
      "type": "string"
    },
    "categories":{
      "type": "nested" 
    }
  }
}

然后,如果您将文档编入索引:
PUT test/book/1
{
  "title" : "Amazing Book",
  "categories": [
    {
        "id" : "123",
        "name" : "Comedy"
    },
    {
        "id" : "456",
        "name" : "Action"
    }
  ]
}

PUT test/book/2
{
  "title" : "Other Amazing Book",
  "categories": [
    {
        "id" : "456",
        "name" : "Action"
    },
    {
        "id" : "987",
        "name" : "Romance"
    }
  ]
}

最后,以下搜索请求:
GET test/book/_search
{
  "aggs": {
    "categories": {
      "nested": {
        "path": "categories"
      },
      "aggs": {
        "id": {
          "terms": {
            "field": "categories.id"
          }
          , 
          "aggs": {
            "name": {
              "terms": {
                "field": "categories.name"
              }
            }
          }
        }
      }
    }
  }
}

产生此输出(我提取了相关部分):
"aggregations": {
      "categories": {
         "doc_count": 4,
         "id": {
            "buckets": [
               {
                  "key": "456",
                  "doc_count": 2,
                  "name": {
                     "buckets": [
                        {
                           "key": "action",
                           "doc_count": 2
                        }
                     ]
                  }
               },
               {
                  "key": "123",
                  "doc_count": 1,
                  "name": {
                     "buckets": [
                        {
                           "key": "comedy",
                           "doc_count": 1
                        }
                     ]
                  }
               },
               {
                  "key": "987",
                  "doc_count": 1,
                  "name": {
                     "buckets": [
                        {
                           "key": "romance",
                           "doc_count": 1
                        }
                     ]
                  }
               }
            ]
         }
      }
   }

您必须更改映射的原因是,使用默认映射,JSON文档被展平为简单的键值格式,例如:
{
  "title": "Amazing book",
  "categories.id": [123 , 456],
  "categories.name": [comedy, action],
}

在这种情况下,会丢失“123”和“喜剧” 之间的关联,并且等效的聚合(只需删除“嵌套” agg)将输出:
"aggregations": {
      "categories": {
         "buckets": [
            {
               "key": "456",
               "doc_count": 2,
               "name": {
                  "buckets": [
                     {
                        "key": "action",
                        "doc_count": 2
                     },
                     {
                        "key": "comedy",
                        "doc_count": 1
                     },
                     {
                        "key": "romance",
                        "doc_count": 1
                     }
                  ]
               }
            },
            {
               "key": "123",
               "doc_count": 1,
               "name": {
                  "buckets": [
                     {
                        "key": "action",
                        "doc_count": 1
                     },
                     {
                        "key": "comedy",
                        "doc_count": 1
                     }
                  ]
               }
            },
            {
               "key": "987",
               "doc_count": 1,
               "name": {
                  "buckets": [
                     {
                        "key": "action",
                        "doc_count": 1
                     },
                     {
                        "key": "romance",
                        "doc_count": 1
                     }
                  ]
               }
            }
         ]
      }
   }

关于elasticsearch - ElasticSearch查询:从ElasticSearch中的每个记录获取 `key`并返回唯一值的集合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25281745/

相关文章:

elasticsearch - Elasticsearch 中 bool 查询中 "must"的 Min_score

django - 在Django Haystick/Elasticsearch中找不到不是单词的字符串

elasticsearch - 如何增强ES中应子句的某些单词/短语?

elasticsearch - 在Elastic Search中创建索引时出错

json - 使用json不配置Elasticsearch索引

symfony - Asciifolding 不适用于 FOSElasticabundle

elasticsearch - Elasticsearch,了解完成建议

elasticsearch - ES 2.3.3 中的嵌套字段限制

elasticsearch - 当未为搜索提供对象名称时,嵌套过滤器返回错误结果

elasticsearch - Kibana Timelion插件如何在 Elasticsearch 中指定一个字段