elasticsearch - ElasticSearch-按字段值返回唯一结果

标签 elasticsearch group-by unique-values

我有3个“地方”,每个地方都有一个类型和一个位置:

PUT places
{
  "mappings": {
    "test": {
      "properties": {
        "type": { "type": "keyword" },
        "location": { "type": "geo_point" }
      }
    }
  }
}

POST places/test
{
   "type" : "A",
   "location": {
      "lat": 1.378446,
      "lon": 103.763427
   }
}

POST places/test
{
   "type" : "B",
   "location": {
      "lat": 1.478446,
      "lon": 104.763427
   }
}

POST places/test
{
   "type" : "A",
   "location": {
      "lat": 1.278446,
      "lon": 102.763427
   }
}

我只想为每个“类型”检索一个地方:离随机位置最近的地方说:“拉特”:1.178446,“lon”:101.763427

在我的示例结果中,答案应恰好由2个元素组成(一个代表“type:A”,另一个代表“type:B”)。

我还希望避免“聚合”,因为我需要每个地方的_source。

任何帮助都会很棒。

最佳答案

没有聚合,这种操作似乎不可能执行一个查询。
这可以通过top-hits-aggregation实现。

下列已通过elasticsearch 6测试:

POST /places/_search?size=0
{
  "aggs" : {
     "group-by-type" : {
        "terms" : { "field" : "type" },
        "aggs": {
            "min-distance": {
               "top_hits": {
                  "sort": {
                    "_script": { 
                       "type": "number",
                       "script": {
                          "source": "def x = doc['location'].lat; def y = doc['location'].lon; return Math.abs(x-1.178446) + Math.abs(y-101.763427)",
                          "lang": "painless"
                       },
                      "order": "asc"
                    }
                  },
                  "_source": {
                       "includes": [ "type", "location" ]
                    },
                    "size" : 1
                 }
             }
        }
     }
  }
}

注意,我将距离计算为:|location.x - givenPoint.x| + |location.y - givenPoint.y|
这是响应:
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
     "total": 3,
     "max_score": 0.0,
     "hits": []
  },
  "aggregations": {
     "group-by-type": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [{
           "key": "A",
           "doc_count": 2,
           "min-distance": {
              "hits": {
                "total": 2,
                "max_score": null,
                   "hits": [{
                      "_index": "places",
                      "_type": "test",
                      "_id": "3",
                      "_score": null,
                      "_source": {
                         "location": {
                           "lon": 102.763427,
                           "lat": 1.278446
                         },
                         "type": "A"
                      },
                      "sort": [1.1000006934661934]
                   }]
                 }
              }
          }, {
            "key": "B",
            "doc_count": 1,
            "min-distance": {
                "hits": {
                   "total": 1,
                   "max_score": null,
                   "hits": [{
                     "_index": "places",
                     "_type": "test",
                     "_id": "2",
                     "_score": null,
                     "_source": {
                         "location": {
                            "lon": 104.763427,
                             "lat": 1.478446 
                          },
                          "type": "B"
                      },
                      "sort": [3.3000007411499093]
                   }]
                 }
               }
            }]
          }
       }
 }

关于elasticsearch - ElasticSearch-按字段值返回唯一结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49346951/

相关文章:

solr - 从Solr 3迁移数据

java - 使用 Java 客户端的 Elasticsearch 查询序列化

python - 使用 Pandas 计算一组计数的情况

java - 在java中用随机值填充矩阵的空单元

elasticsearch - 将 NEST 与 Elastic Search 一起用于集合

elasticsearch - Elasticsearch 5中的词云

php - 我需要一个 mysql group by query 以在每个员工的一行中从每个员工的日期时间列返回每小时条目

python - Pandas 数据框获取每组的第一行

从数据框中删除具有相同值的列