没有嵌套字段的elasticsearch不同父子聚合

标签 elasticsearch

在 elasticsearch 6.2 中,我有一个父子关系:

Document -> NamedEntity

我想通过计算 mention 字段来聚合 NamedEntity,给出包含每个命名实体的文档数。

我的用例是:

doc1 contains 'NER'(_id=ner11), 'NER'(_id=ner12)
doc2 contains 'NER'(_id=ner2)

父子关系是用a join field实现的.在 Document 我有一个字段:

join: {
  name: "Document"
}

NamedEntity child 中:

join: {
  name: "NamedEntity",
  parent: "parent_id"
}

_routing 设置为 parent_id

所以我尝试使用术语子聚合:

curl -XPOST elasticsearch:9200/datashare-testjs/_search?pretty -H 'Content-Type: application/json' -d '
{"query":{"term":{"type":"NamedEntity"}},
 "aggs":{
   "mentions":{
     "terms":{
       "field":"mention"
     },
     "aggs":{
       "docs":{
         "terms":{"field":"join"}
       }
     }
   }
 }
}'

我有以下回应:

"aggregations" : {
  "mentions" : {
    "doc_count_error_upper_bound" : 0,
    "sum_other_doc_count" : 0,
    "buckets" : [
      {
        "key" : "NER",
        "doc_count" : 3,
        "docs" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "NamedEntity",
              "doc_count" : 3 <-- WRONG ! There are 2 distinct documents
            }
          ]
        }
      }
    ]
  }

我在 mentions.buckets.doc_count 中找到了预期的 3 次出现。但是在 mentions.buckets.docs.buckets.doc_count 字段中,我希望 只有 2 个文档(不是 3 个)。就像 select count distinct

如果我使用 "terms":{"field":"join.parent"} 聚合,我有:

...
"docs" : {
    "doc_count_error_upper_bound" : 0,
    "sum_other_doc_count" : 0,
    "buckets" : [ ]
}
...

我在 join 字段上绑定(bind)了 cardinality 聚合,我获得了值 1,并且在 join 上绑定(bind)了 cardinality 聚合.parent 返回值 0。

那么如何在不使用 reverse nested aggregation 的情况下对父项进行不同的聚合计数? ?


正如@AndreiStefan 所问,这是映射。它是 ES 6 映射中 Document(content)NamedEntity(mention) 之间的简单 1-N 关系(字段定义在同一级别):

curl -XPUT elasticsearch:9200/datashare-testjs -H 'Content-Type: application/json' -d '
{
    "mappings": {
    "doc": {
      "properties": {
        "content": {
          "type": "text",
          "index_options": "offsets"
        },
        "type": {
          "type": "keyword"
        },
        "join": {
          "type": "join",
          "relations": {
            "Document": "NamedEntity"
          }
        },
        "mention": {
          "type": "keyword"
        }
      }
    }
}}

以及对最小数据集的请求:

curl -XPUT elasticsearch:9200/datashare-testjs/doc/doc1 -H 'Content-Type: application/json' -d '{"type": "Document", "join": {"name": "Document"}, "content": "a NER document contains 2 NER"}'
curl -XPUT elasticsearch:9200/datashare-testjs/doc/doc2 -H 'Content-Type: application/json' -d '{"type": "Document", "join": {"name": "Document"}, "content": "another NER document"}'
curl -XPUT elasticsearch:9200/datashare-testjs/doc/ner11?routing=doc1 -H 'Content-Type: application/json' -d '{"type": "NamedEntity", "join": {"name": "NamedEntity", "parent": "doc1"}, "mention": "NER"}'
curl -XPUT elasticsearch:9200/datashare-testjs/doc/ner12?routing=doc1 -H 'Content-Type: application/json' -d '{"type": "NamedEntity", "join": {"name": "NamedEntity", "parent": "doc1"}, "mention": "NER"}'
curl -XPUT elasticsearch:9200/datashare-testjs/doc/ner2?routing=doc2 -H 'Content-Type: application/json' -d '{"type": "NamedEntity", "join": {"name": "NamedEntity", "parent": "doc2"}, "mention": "NER"}'

最佳答案

  "aggs": {
    "mentions": {
      "terms": {
        "field": "mention"
      },
      "aggs": {
        "docs": {
          "terms": {
            "field": "join"
          },
          "aggs": {
            "uniques": {
              "cardinality": {
                "field": "join#Document"
              }
            }
          }
        }
      }
    }
  }

或者如果你只是想要计数:

  "aggs": {
    "mentions": {
      "terms": {
        "field": "mention"
      },
      "aggs": {
        "uniques": {
          "cardinality": {
            "field": "join#Document"
          }
        }
      }
    }
  }

如果您需要自定义排序(按唯一计数):

  "aggs": {
    "mentions": {
      "terms": {
        "field": "mention",
        "order": {
          "uniques": "desc"
        }
      },
      "aggs": {
        "uniques": {
          "cardinality": {
            "field": "join#Document"
          }
        }
      }
    }
  }

关于没有嵌套字段的elasticsearch不同父子聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49262205/

相关文章:

amazon-web-services - 如何防止 Recipe 的一部分执行

ruby-on-rails - ElasticSearch + Tire如何强制条件返回与字段相同的值

python - Elasticsearch DSL 中动态生成的 DocType

elasticsearch - elasticsearch custom_score乘法不正确

elasticsearch - 匹配查询不返回完全匹配作为第一行

java - Apache Lucene QueryParser.parse 未在 FuzzyQuery 上使用分析器

security - 无需第三方工具即可在 Elasticsearch 中进行用户身份验证

amazon-web-services - 如何在 AWS ES 中启用动态脚本?

solr - 如何在 lucene 中增强更长的文档

python - 使用 PyE 的 block 中的 Elasticsearch 批量索引