ElasticSearch 跨整个数据的不同出现总数

我对 ElasticSearch(版本 2.3.3)非常陌生，这是我的以下数据格式。

{   
   "title": "Doc 1 title",
   "year": "14",
   "month": "06",
   "sentences": [
        {
          "id": 1,
          "text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit",
          "class": "Introduction",
          "synth": "intr"
        },
        {
          "id": 2,
          "text": "Donec molestie pulvinar odio, ultricies dictum mi porttitor sit amet.",
          "class": "Introduction",
          "synth": "abstr"
        },
        {
          "id": 3,
          "text": "Aliquam id tristique diam. Suspendisse convallis convallis est ut condimentum.",
          "class": "Main_Content",
          "synth": "body"
        },
        {
          "id": 4,
          "text": "Nunc ornare eros at pretium faucibus. Praesent congue cursus aliquet.",
          "class": "Main_Content",
          "synth": "body"
        },
        {
          "id": 5,
          "text": "Integer pellentesque quam ut nulla dignissim hendrerit.",
          "class": "Future_Work",
          "synth": "ftr"
        },
        {
          "id": 6,
          "text": "Pellentesque faucibus vehicula diam.",
          "class": "Bibliography",
          "synth": "bio"
        }
    ]
}

并且，多个文档，例如 doc1、doc2、...、doc700。

我正在尝试生成这样的查询，以获取按年份排序的整个文档批量中每个不同“类”的出现总数。

因此，结果将类似于以下内容。

{
   "year" : "14",
   "count" : [
       { "Introduction" : 1357 },
       { "Main_Content" : 1021 },
       { "Future_Work" : 490 },
       { "Bibliography" : 241 }
   ],
   "year" : "15",
   "count" : [
       { "Introduction" : 972 } ,
       { "Main_Content" : 712 },
       { "Future_Work" : 335 },
       { "Bibliography" : 81 }
   ]
}

是否有可能实现我发布的内容？或者，为每个“类(class)”做这件事会更容易吗？

非常感谢你。

最佳答案

这可以使用 Nested Aggregation 来完成。 .如果您现有的映射没有嵌套映射，那么您也许可以使用以下内容:

    {
    "mappings": {
        "book": {
            "properties": {
            "title": {
                "type": "string"
            },
            "month": {
                "type": "string"
            },
            "year": {
                "type": "string"
            },
            "sentences": {
                "type": "nested",
                    "properties": {
                        "synth": {
                            "type": "string"
                        },
                        "id": {
                            "type": "long"
                        },
                        "text": {
                            "type": "string"
                        },
                        "class": {
                            "type": "string"
                        }
                    }
                }
            }
        }
    }
}

然后运行以下查询:

    {
    "size": 0,
    "aggs": {
        "years": {
            "terms": {
                "field": "year"
            },
            "aggs" : {
                "sentences" : {
                    "nested" : {
                        "path" : "sentences"
                    },
                    "aggs" : {
                        "classes" : { "terms" : { "field" : "sentences.class" } }
                    }
                }
            }
        }
    }
}

这是示例数据:

    "aggregations": { 
    "years": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
        {
            "key": "14",
            "doc_count": 2,
            "sentences": {
                "doc_count": 12,
                "classes": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                    {
                        "key": "introduction",
                        "doc_count": 4
                    },
                    {
                        "key": "main_content",
                        "doc_count": 4
                    },
                    {
                        "key": "bibliography",
                        "doc_count": 2
                    },
                    {
                        "key": "future_work",
                        "doc_count": 2
                    }
                    ]
                }
            }
        },
        {
            "key": "15",
            "doc_count": 1,
            "sentences": {
                "doc_count": 5,
                "classes": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                    {
                        "key": "main_content",
                        "doc_count": 2
                    },
                    {
                        "key": "bibliography",
                        "doc_count": 1
                    },
                    {
                        "key": "future_work",
                        "doc_count": 1
                    },
                    {
                        "key": "introduction",
                        "doc_count": 1
                    }
                    ]
                }
            }
        }
        ]
    } 
}

不要与此处的 doc_count 混淆，它们是您的“类”在主文档中的真实出现。它们实际上存储为与主文档相关联的嵌套文档。

希望能帮助到你。

关于ElasticSearch 跨整个数据的不同出现总数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37734693/

ElasticSearch 跨整个数据的不同出现总数

上一篇：java - 如何播放原始音频流？

下一篇：javascript - 将innerHTML从一个元素复制到另一个