在 elasticsearch 6.2 中,我有一个父子关系:
Document -> NamedEntity
我想通过计算 mention
字段和来聚合 NamedEntity,给出包含每个命名实体的文档数。
我的用例是:
doc1 contains 'NER'(_id=ner11), 'NER'(_id=ner12)
doc2 contains 'NER'(_id=ner2)
父子关系是用a join field实现的.在 Document
我有一个字段:
join: {
name: "Document"
}
在 NamedEntity
child 中:
join: {
name: "NamedEntity",
parent: "parent_id"
}
_routing
设置为 parent_id
。
所以我尝试使用术语子聚合:
curl -XPOST elasticsearch:9200/datashare-testjs/_search?pretty -H 'Content-Type: application/json' -d '
{"query":{"term":{"type":"NamedEntity"}},
"aggs":{
"mentions":{
"terms":{
"field":"mention"
},
"aggs":{
"docs":{
"terms":{"field":"join"}
}
}
}
}
}'
我有以下回应:
"aggregations" : {
"mentions" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "NER",
"doc_count" : 3,
"docs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "NamedEntity",
"doc_count" : 3 <-- WRONG ! There are 2 distinct documents
}
]
}
}
]
}
我在 mentions.buckets.doc_count
中找到了预期的 3 次出现。但是在 mentions.buckets.docs.buckets.doc_count
字段中,我希望 只有 2 个文档(不是 3 个)。就像 select count distinct
。
如果我使用 "terms":{"field":"join.parent"}
聚合,我有:
...
"docs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
...
我在 join
字段上绑定(bind)了 cardinality
聚合,我获得了值 1,并且在 join 上绑定(bind)了
返回值 0。cardinality
聚合.parent
那么如何在不使用 reverse nested aggregation 的情况下对父项进行不同的聚合计数? ?
正如@AndreiStefan 所问,这是映射。它是 ES 6 映射中 Document(content)
和 NamedEntity(mention)
之间的简单 1-N 关系(字段定义在同一级别):
curl -XPUT elasticsearch:9200/datashare-testjs -H 'Content-Type: application/json' -d '
{
"mappings": {
"doc": {
"properties": {
"content": {
"type": "text",
"index_options": "offsets"
},
"type": {
"type": "keyword"
},
"join": {
"type": "join",
"relations": {
"Document": "NamedEntity"
}
},
"mention": {
"type": "keyword"
}
}
}
}}
以及对最小数据集的请求:
curl -XPUT elasticsearch:9200/datashare-testjs/doc/doc1 -H 'Content-Type: application/json' -d '{"type": "Document", "join": {"name": "Document"}, "content": "a NER document contains 2 NER"}'
curl -XPUT elasticsearch:9200/datashare-testjs/doc/doc2 -H 'Content-Type: application/json' -d '{"type": "Document", "join": {"name": "Document"}, "content": "another NER document"}'
curl -XPUT elasticsearch:9200/datashare-testjs/doc/ner11?routing=doc1 -H 'Content-Type: application/json' -d '{"type": "NamedEntity", "join": {"name": "NamedEntity", "parent": "doc1"}, "mention": "NER"}'
curl -XPUT elasticsearch:9200/datashare-testjs/doc/ner12?routing=doc1 -H 'Content-Type: application/json' -d '{"type": "NamedEntity", "join": {"name": "NamedEntity", "parent": "doc1"}, "mention": "NER"}'
curl -XPUT elasticsearch:9200/datashare-testjs/doc/ner2?routing=doc2 -H 'Content-Type: application/json' -d '{"type": "NamedEntity", "join": {"name": "NamedEntity", "parent": "doc2"}, "mention": "NER"}'
最佳答案
"aggs": {
"mentions": {
"terms": {
"field": "mention"
},
"aggs": {
"docs": {
"terms": {
"field": "join"
},
"aggs": {
"uniques": {
"cardinality": {
"field": "join#Document"
}
}
}
}
}
}
}
或者如果你只是想要计数:
"aggs": {
"mentions": {
"terms": {
"field": "mention"
},
"aggs": {
"uniques": {
"cardinality": {
"field": "join#Document"
}
}
}
}
}
如果您需要自定义排序(按唯一计数):
"aggs": {
"mentions": {
"terms": {
"field": "mention",
"order": {
"uniques": "desc"
}
},
"aggs": {
"uniques": {
"cardinality": {
"field": "join#Document"
}
}
}
}
}
关于没有嵌套字段的elasticsearch不同父子聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49262205/