elasticsearch 使用 aggs 过滤数组数据

标签 elasticsearch

我使用 Elasticsearch 来存储我的生物数据。

我尝试使用过滤后的 aggs 进行查询,但返回的数据不是我想要的。
问题来自这样一个事实,即我为每个样本都有一个“d_”属性,它是一个数组。我只需要对该数组的某些元素进行聚合,但我无法过滤它们。

//我手动编辑数据以使其更易于理解,因此可能存在一些拼写错误

我的数据示例:

   [    {
        "_index": "botanique",
        "_type": "specimens",
        "_id": "227CB8A3E2834AAEB50B1ECF6B672180",
        "_score": 1,
        "_source": {
            ....
            "d_": [
                {     // -------------- dont want this
                    "taxonid": "BB7C33A3126648D095BEDDABB0BD2758",
                    "scientificname": "Lastreopsis effusa",
                    "scientificnameauthorship": "(Sw.) Tindale"
                },
                {    // -------------- want this
                    "taxonid": "704FC303D7F74C02912D0FEB5C6FC55D",
                    "scientificname": "Parapolystichum effusum",
                    "scientificnameauthorship": "(sw.) copel."
                }
            ]
        }
    } , {
        "_index": "botanique",
        "_type": "specimens",
        "_id": "11A22DE8E4AD45BBAC7783E508079DCD",
        "_score": 1,
        "_source": {
            ....
            "d_": [
                {     // -------------- want this
                    "taxonid": "A94D243348DF4CAD926B6C3965D948A3",
                    "scientificname": "Parapolystichum effusum",
                    "scientificnameauthorship": "(Sw.) Ching",
                }                   ,
                {    // -------------- dont want this
                    "taxonid": "B01A89AA961A46F2984722C311DC2BDD",
                    "scientificname": "Lastreopsis effusa",
                    "scientificnameauthorship": "(willd. ex schkuhr) proctor"
                }
            ]
        }
    },{
        "_index": "botanique",
        "_type": "specimens",
        "_id": "1647F5E23D304EFAAB9D3E3BE80FD3CE",
        "_score": 1,
        "_source": {
            ...
            "d_": [
                {    // -------------- want this
                    "taxonid": "D70C4478D2B0437AA940994E98D696C5",
                    "scientificname": "Parapolystichum effusum",
                    "scientificnameauthorship": "(Sw.) Ching"
                } ,
                {    // -------------- dont want this
                    "taxonid": "011E5DA526FC4098953DBD1F9E5F4424",
                    "scientificname": "Lastreopsis effusa",
                    "scientificnameauthorship": "(Sw.) Tindale",
                }
            ]
        }
    }
]

例如,我想要一个关于所有“d_.scientificnameauthorship”和“d_.taxonid”的 aggs,其中“d_.scientificname”等于“parapolystichum effusum”。
所以我应该(希望)得到“scientificnameauthorship”:“(sw.)copel。” , "(Sw.) Ching"但不是 "(willd. ex schkuhr) proctor"。我失败了……

我的查询:
{
  "_source": ["d_" ],
  "size": 3,
  "query": {
    "filtered": {"filter": {"bool": {"must": [{"term": {
                "d_.scientificname": "parapolystichum effusum"
    }}] } }}
  },
  "aggs": {
    "scientificname": {
      "terms": {
        "field": "d_.scientificname",
        "size": 1,
        "include": {
          "pattern": "parapolystichum effusum",
          "flags": "CANON_EQ|CASE_INSENSITIVE"
        }
      },
      "aggs": {
        "scientificnameauthorship": {
          "terms": {
            "field": "d_.scientificnameauthorship",
            "size": 10
          }
        }
      }
    }
  }
}

返回的数据包括标本的所有“科学名称作者”
{
    "aggregations": {
        "scientificname": {
            "buckets": [{
                "key": "parapolystichum effusum",
                "doc_count": 269,
                "scientificnameauthorship": {
                    "buckets": [
                        {   // ------ want this 
                            "key": "(sw.) ching",
                            "doc_count": 269
                        }                        ,
                        {   // ------ want this 
                            "key": "(sw.) copel.",
                            "doc_count": 34
                        }                        , 
                        {   // ------ dont want this 
                            "key": "(sw.) tindale",
                            "doc_count": 262
                        }                        ,
                        {   // ------ dont want this 
                            "key": "(willd. ex schkuhr) proctor",
                            "doc_count": 7
                        }                        ,
                        {   // ------ dont want this 
                            "key": "fée",
                            "doc_count": 2
                        }
                    ]
                }
            }]
        }
    }
}
  • 如何在 aggs 查询中进行编辑?
  • 如何仅在 hits 中获取数组的项目?

  • 得到这个 :
    {   
        "hits": {
            "total": 269,
            "max_score": 1,
            "hits": [
                {
                    "_index": "botanique",
                    "_type": "specimens",
                    "_id": "1647F5E23D304EFAAB9D3E3BE80FD3CE",
                    "_score": 1,
                    "_source": {
                        ...
                        "d_": [{    // -------------- want this
                                "taxonid": "D70C4478D2B0437AA940994E98D696C5",
                                "scientificname": "Parapolystichum effusum",
                                "scientificnameauthorship": "(Sw.) Ching"
                            }]
                    }                       
                }
        }
    }
    

    而不是这个:
    {   
        "hits": {
            "total": 269,
            "max_score": 1,
            "hits": [
                {
                    "_index": "botanique",
                    "_type": "specimens",
                    "_id": "1647F5E23D304EFAAB9D3E3BE80FD3CE",
                    "_score": 1,
                    "_source": {
                        ...
                        "d_": [
                            {    // -------------- want this
                                "taxonid": "D70C4478D2B0437AA940994E98D696C5",
                                "scientificname": "Parapolystichum effusum",
                                "scientificnameauthorship": "(Sw.) Ching"
                            } ,
                            {    // -------------- dont want this
                                "taxonid": "011E5DA526FC4098953DBD1F9E5F4424",
                                "scientificname": "Lastreopsis effusa",
                                "scientificnameauthorship": "(Sw.) Tindale",
                            }
                        ]
                    }
                }
        }
    }
    

    非常感谢你

    //编辑 1

    我也尝试像这样在 aggs 中放置一个过滤器,但不起作用:
    {
        "query": {
            "filtered": {"filter": {"bool": {"must": [{"term": {
                        "d_.scientificname": "parapolystichum effusum"
            }}] } }}
        },
        "aggs" : {
            "scientificname" : {
                "filter" : {"term": {
                        "d_.scientificname": "parapolystichum effusum"
                }},
                "aggs": {
                    "scientificnameauthorship": {
                      "terms": {
                        "field": "d_.scientificnameauthorship",
                        "size": 10
                      }
                    }
                  }
            }
        }
    }
    

    最佳答案

    您可以使用嵌套的聚合器作为父聚合器。然后在父聚合器中创建一个新的过滤聚合器来过滤数组(列表数据)并附加另一个子聚合器以进行术语聚合。
    https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-aggregations-bucket-nested-aggregation.html
    示例查询

    "filteredaggs" : {
              "nested" : {
                "path" : "D_"
              },
              "aggs" : {
                "maxdays" : {
                  "filter" : {
                    "terms" : {
                      "scientificname" : ["xyz", "pqr"]
                    }
                  },
                  "aggs" : {
                    "myfinalaggregator" : {
                      "terms" : {
                        "field" : "scientificnameauthorship"
                      }
                    }
                  }
                }
              }
            }
    

    希望这对你有用。

    关于elasticsearch 使用 aggs 过滤数组数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33626289/

    相关文章:

    elasticsearch - Logstash日期解析错误(日期字段没有任何时间)

    spring - Spring Data Elasticsearch-如何暂停刷新?

    elasticsearch - logstash,syslog和grok

    java - 为什么elasticsearch中的以下查询不起作用?

    elasticsearch - Elasticsearch:在同一字段上精确匹配多个匹配短语

    docker - 从tomcat docker容器收集tomcat日志到Filebeat docker容器

    ruby-on-rails - 为什么会出现 Searchkick::ImportError - "type"= >"unavailable_shards_exception", "reason"=> "Primary Shard is not active"?

    Elasticsearch 查询数组索引

    elasticsearch - 通过Logstash输入数据之前,我应该定义我的ElasticSearch映射吗?

    json - 无法在Elasticsearch-hadoop中使用SchemaRDD.saveToES()从HDFS索引JSON