ElasticSearch 跨整个数据的不同出现总数

标签 elasticsearch

我对 ElasticSearch(版本 2.3.3)非常陌生,这是我的以下数据格式。

{   
   "title": "Doc 1 title",
   "year": "14",
   "month": "06",
   "sentences": [
        {
          "id": 1,
          "text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit",
          "class": "Introduction",
          "synth": "intr"
        },
        {
          "id": 2,
          "text": "Donec molestie pulvinar odio, ultricies dictum mi porttitor sit amet.",
          "class": "Introduction",
          "synth": "abstr"
        },
        {
          "id": 3,
          "text": "Aliquam id tristique diam. Suspendisse convallis convallis est ut condimentum.",
          "class": "Main_Content",
          "synth": "body"
        },
        {
          "id": 4,
          "text": "Nunc ornare eros at pretium faucibus. Praesent congue cursus aliquet.",
          "class": "Main_Content",
          "synth": "body"
        },
        {
          "id": 5,
          "text": "Integer pellentesque quam ut nulla dignissim hendrerit.",
          "class": "Future_Work",
          "synth": "ftr"
        },
        {
          "id": 6,
          "text": "Pellentesque faucibus vehicula diam.",
          "class": "Bibliography",
          "synth": "bio"
        }
    ]
}

并且,多个文档,例如 doc1、doc2、...、doc700。

我正在尝试生成这样的查询,以获取按年份排序的整个文档批量中每个不同“类”的出现总数。

因此,结果将类似于以下内容。
{
   "year" : "14",
   "count" : [
       { "Introduction" : 1357 },
       { "Main_Content" : 1021 },
       { "Future_Work" : 490 },
       { "Bibliography" : 241 }
   ],
   "year" : "15",
   "count" : [
       { "Introduction" : 972 } ,
       { "Main_Content" : 712 },
       { "Future_Work" : 335 },
       { "Bibliography" : 81 }
   ]
}

是否有可能实现我发布的内容?或者,为每个“类(class)”做这件事会更容易吗?

非常感谢你。

最佳答案

这可以使用 Nested Aggregation 来完成。 .如果您现有的映射没有嵌套映射,那么您也许可以使用以下内容:

    {
    "mappings": {
        "book": {
            "properties": {
            "title": {
                "type": "string"
            },
            "month": {
                "type": "string"
            },
            "year": {
                "type": "string"
            },
            "sentences": {
                "type": "nested",
                    "properties": {
                        "synth": {
                            "type": "string"
                        },
                        "id": {
                            "type": "long"
                        },
                        "text": {
                            "type": "string"
                        },
                        "class": {
                            "type": "string"
                        }
                    }
                }
            }
        }
    }
}

然后运行以下查询:
    {
    "size": 0,
    "aggs": {
        "years": {
            "terms": {
                "field": "year"
            },
            "aggs" : {
                "sentences" : {
                    "nested" : {
                        "path" : "sentences"
                    },
                    "aggs" : {
                        "classes" : { "terms" : { "field" : "sentences.class" } }
                    }
                }
            }
        }
    }
}

这是示例数据:
    "aggregations": { 
    "years": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
        {
            "key": "14",
            "doc_count": 2,
            "sentences": {
                "doc_count": 12,
                "classes": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                    {
                        "key": "introduction",
                        "doc_count": 4
                    },
                    {
                        "key": "main_content",
                        "doc_count": 4
                    },
                    {
                        "key": "bibliography",
                        "doc_count": 2
                    },
                    {
                        "key": "future_work",
                        "doc_count": 2
                    }
                    ]
                }
            }
        },
        {
            "key": "15",
            "doc_count": 1,
            "sentences": {
                "doc_count": 5,
                "classes": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                    {
                        "key": "main_content",
                        "doc_count": 2
                    },
                    {
                        "key": "bibliography",
                        "doc_count": 1
                    },
                    {
                        "key": "future_work",
                        "doc_count": 1
                    },
                    {
                        "key": "introduction",
                        "doc_count": 1
                    }
                    ]
                }
            }
        }
        ]
    } 
}

不要与此处的 doc_count 混淆,它们是您的“类”在主文档中的真实出现。它们实际上存储为与主文档相关联的嵌套文档。

希望能帮助到你。

关于ElasticSearch 跨整个数据的不同出现总数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37734693/

相关文章:

elasticsearch - Elasticsearch汇总作业是否可以像Logstash一样动态创建索引?

amazon-web-services - 有没有办法一次恢复多个快照?

java - 如何减少 Elasticsearch 滚动响应时间?

javascript - 如何获得无限滚动,直到所有结果都显示在React中的Elasticsearch JS API中

Elasticsearch 聚合返回多个字段

java - 我想使用java将json文件发送到elasticsearch

database - 使用键值存储来保留rdbms的索引

elasticsearch - 弹性在压力下给出不一致的结果

elasticsearch - 搜索字符串中有数字时,Elasticsearch通配符失败

performance - Elasticsearch:数值数据类型,可在整数上获得最佳性能