elasticsearch - ES-基于存储桶属性中的值而不是文档值的子存储桶

我是ElasticSearch的新手，正在尝试按层次结构分类来自搜索的对象。

对于问题的长度，我预先表示歉意，但我想提供足够的样本和信息以使需求尽可能清楚。

我正在努力实现的目标

问题在于类别构成层次结构，但被表示为对象的平面阵列，每个对象都有深度。我想生成一个聚合，将按类别和类别深度进行分类。

这是仅包含最少数据的文档的简化映射:

{
  "mappings": {
    "_doc": {
      "properties": {
        "categoriesList": {
          "properties": {
            "depth": {
              "type": "long"
            },
            "title": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      }
    }
  }
}

这是一个简化的示例文档:

{
  "_index": "x",
  "_type": "_doc",
  "_id": "wY0w5GYBOIOl7fi31c_b",
  "_score": 22.72073,
  "_source": {
    "categoriesList": [
      {
        "title": "category_lvl_2_2",
        "depth": 2
      },
      {
        "title": "category_lvl_2",
        "depth": 2,
      },
      {
        "title": "category_lvl_1",
        "depth": 1
      }
    ]
  }
}

现在，我想要实现的是根据深度来获取类别的分层存储桶，即我想要一个存储桶，其中包含所有匹配中深度为1的所有类别的标题，然后是另一个存储桶(或带有标题的子存储桶)在所有匹配中仅深度2的类别，依此类推。
就像是:

"aggregations": {
    "depth": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1,
          "doc_count": 47,
          "name": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "category_lvl_1",
                "doc_count": 47,
                "depth_1": {
                  "doc_count": 47
                }
              }
            ]
          }
        },
        {
          "key": 2,
          "doc_count": 47,
          "name": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "category_lvl_2_1",
                "doc_count": 47
              },
              {
                "key": "category_lvl_2_2",
                "doc_count": 33
              }
            ]
          }
        }
      ]
    }
  }

我尝试过的

首先，我尝试简单地创建嵌套聚合，如下所示:

  "aggs": {
    "depth": {
      "terms": {
        "field": "categoriesList.depth"
      },
      "aggs": {
        "name": {
          "terms": {
            "field": "categoriesList.title.keyword"
          },
        }
      }
    }
  }

当然，这并没有提供我想要的。它基本上给了我一些桶，这些桶的键是按深度排列的，但是不管深度是多少，它都包含所有类别的所有标题。内容是相同的。类似于以下内容:

  "aggregations": {
    "depth": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1,
          "doc_count": 47,
          "name": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "category_lvl_1",
                "doc_count": 47
              },
              {
                "key": "category_lvl_2_1",
                "doc_count": 33
              },
              {
                "key": "category_lvl_2_2",
                "doc_count": 15
              }
            ]
          }
        },
        {
          "key": 2,
          "doc_count": 47,
          "name": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "category_lvl_1",
                "doc_count": 47
              },
              {
                "key": "category_lvl_2_1",
                "doc_count": 33
              },
              {
                "key": "category_lvl_2_2",
                "doc_count": 15
              }
            ]
          }
        }
      ]
    }
  }

然后，我尝试通过尝试按深度1的值过滤一个子桶来查看过滤后的聚合是否可行:

  "aggs": {
    "depth": {
      "terms": {
        "field": "categoriesList.depth"
      },
      "aggs": {
        "name": {
          "terms": {
            "field": "categoriesList.title.keyword"
          },
          "aggs": {
            "depth_1": {
              "filter": {
                "term": {
                  "categoriesList.depth": 1
                }
              }
            }
          }
        }
      }
    }
  }

这样得到的结果与上面的简单聚合查询相同，但是具有额外的嵌套级别，没有任何作用。

问题

以我目前对ES的理解，我所看到的是有道理的:它将遍历搜索中的每个文档，然后根据类别深度创建存储桶，但是由于每个文档的每个深度至少具有一个类别，因此将整个类别列表添加到了桶。

ES可以为我做些什么？我感觉这将行不通，因为我基本上是在尝试存储和筛选初始存储查询所使用的属性，而不是处理文档属性。

我也可以直接在代码中存储自己，因为我们正在获取类别结果，但是我想知道是否有可能在ES方面完成此工作，这将使我免于修改大量现有代码。

谢谢!

最佳答案

根据sramalingam24的评论，我进行了以下操作以使其工作:

使用指定嵌套类型的映射创建索引

我更改了映射以告诉ES，categoryList属性是一个嵌套对象。为此，我使用以下映射创建了一个新索引:

{
  "mappings": {
    "_doc": {
      "properties": {
        "categoriesList": {
          "type": "nested",
          "properties": {
            "depth": {
              "type": "long"
            },
            "title": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      }
    }
  }
}

重新索引到新索引

然后，我从旧索引重新索引到新索引。

{
  "source": {
    "index": "old_index"
  },
  "dest": {
    "index": "index_with_nested_mapping"
  }
}

使用嵌套聚合

然后，我使用了类似于以下的嵌套聚合:

{
  "aggs": {
    "categories": {
      "nested": {
        "path": "categoriesList"
      },
      "aggs": {
        "depth": {
          "terms": {
            "field": "categoriesList.depth"
          },
          "aggs": {
            "sub-categories": {
              "terms": {
                "field": "categoriesList.title.keyword"
              }
            }
          }
        }
      }
    }
  }
}

这给了我想要的结果:

{
  "aggregations": {
    "categories": {
      "doc_count": 96,
      "depth": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": 2,
            "doc_count": 49,
            "sub-categories": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "category_lvl_2_1",
                  "doc_count": 33
                },
                {
                  "key": "category_lvl_2_2",
                  "doc_count": 15
                }
              ]
            }
          },
          {
            "key": 1,
            "doc_count": 47,
            "sub-categories": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "category_lvl_1",
                  "doc_count": 47
                }
              ]
            }
          }
        ]
      }
    }
  }
}

关于elasticsearch - ES-基于存储桶属性中的值而不是文档值的子存储桶，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53285218/

elasticsearch - ES-基于存储桶属性中的值而不是文档值的子存储桶

上一篇：audio - 是否可以使用Office Common API从外接程序插入视频？

下一篇：ios - 在UNNotificationService扩展程序的Push Notification Payload中修改 'sound'属性