我对 ElasticSearch(版本 2.3.3)非常陌生,这是我的以下数据格式。
{
"title": "Doc 1 title",
"year": "14",
"month": "06",
"sentences": [
{
"id": 1,
"text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit",
"class": "Introduction",
"synth": "intr"
},
{
"id": 2,
"text": "Donec molestie pulvinar odio, ultricies dictum mi porttitor sit amet.",
"class": "Introduction",
"synth": "abstr"
},
{
"id": 3,
"text": "Aliquam id tristique diam. Suspendisse convallis convallis est ut condimentum.",
"class": "Main_Content",
"synth": "body"
},
{
"id": 4,
"text": "Nunc ornare eros at pretium faucibus. Praesent congue cursus aliquet.",
"class": "Main_Content",
"synth": "body"
},
{
"id": 5,
"text": "Integer pellentesque quam ut nulla dignissim hendrerit.",
"class": "Future_Work",
"synth": "ftr"
},
{
"id": 6,
"text": "Pellentesque faucibus vehicula diam.",
"class": "Bibliography",
"synth": "bio"
}
]
}
并且,多个文档,例如 doc1、doc2、...、doc700。
我正在尝试生成这样的查询,以获取按年份排序的整个文档批量中每个不同“类”的出现总数。
因此,结果将类似于以下内容。
{
"year" : "14",
"count" : [
{ "Introduction" : 1357 },
{ "Main_Content" : 1021 },
{ "Future_Work" : 490 },
{ "Bibliography" : 241 }
],
"year" : "15",
"count" : [
{ "Introduction" : 972 } ,
{ "Main_Content" : 712 },
{ "Future_Work" : 335 },
{ "Bibliography" : 81 }
]
}
是否有可能实现我发布的内容?或者,为每个“类(class)”做这件事会更容易吗?
非常感谢你。
最佳答案
这可以使用 Nested Aggregation 来完成。 .如果您现有的映射没有嵌套映射,那么您也许可以使用以下内容:
{
"mappings": {
"book": {
"properties": {
"title": {
"type": "string"
},
"month": {
"type": "string"
},
"year": {
"type": "string"
},
"sentences": {
"type": "nested",
"properties": {
"synth": {
"type": "string"
},
"id": {
"type": "long"
},
"text": {
"type": "string"
},
"class": {
"type": "string"
}
}
}
}
}
}
}
然后运行以下查询:
{
"size": 0,
"aggs": {
"years": {
"terms": {
"field": "year"
},
"aggs" : {
"sentences" : {
"nested" : {
"path" : "sentences"
},
"aggs" : {
"classes" : { "terms" : { "field" : "sentences.class" } }
}
}
}
}
}
}
这是示例数据:
"aggregations": {
"years": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "14",
"doc_count": 2,
"sentences": {
"doc_count": 12,
"classes": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "introduction",
"doc_count": 4
},
{
"key": "main_content",
"doc_count": 4
},
{
"key": "bibliography",
"doc_count": 2
},
{
"key": "future_work",
"doc_count": 2
}
]
}
}
},
{
"key": "15",
"doc_count": 1,
"sentences": {
"doc_count": 5,
"classes": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "main_content",
"doc_count": 2
},
{
"key": "bibliography",
"doc_count": 1
},
{
"key": "future_work",
"doc_count": 1
},
{
"key": "introduction",
"doc_count": 1
}
]
}
}
}
]
}
}
不要与此处的 doc_count 混淆,它们是您的“类”在主文档中的真实出现。它们实际上存储为与主文档相关联的嵌套文档。
希望能帮助到你。
关于ElasticSearch 跨整个数据的不同出现总数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37734693/