有一个shareholder
索引想要获得以下信息
按hld_id从股东组中选择hld_id,com_id,count(*),按count(*)desc选择com_id顺序;
通过hld_id从股东组中选择hld_id,com_id,com_id的count(*)= 2;
那么如何通过
elasticsearch
搜索查询实现以上要求?
最佳答案
以下是示例映射,文档和聚合查询。我已经想出了三种可以完成/实现的方式。
对应:
PUT shareholder
{
"mappings": {
"properties": {
"hld_id": {
"type": "keyword"
},
"com_id":{
"type": "keyword"
}
}
}
}
文件:
POST shareholder/_doc/1
{
"hld_id": "001",
"com_id": "001"
}
POST shareholder/_doc/2
{
"hld_id": "001",
"com_id": "002"
}
POST shareholder/_doc/3
{
"hld_id": "002",
"com_id": "001"
}
POST shareholder/_doc/4
{
"hld_id": "002",
"com_id": "002"
}
POST shareholder/_doc/5
{
"hld_id": "002",
"com_id": "002" <--- Note I've changed this
}
解决方案1:使用Elasticsearch的聚合
聚合查询:1
请注意,我刚刚使用的Terms Query首先是
hld_id
,然后是com_id
POST shareholder/_search
{
"size": 0,
"aggs": {
"share_hoder": {
"terms": {
"field": "hld_id"
},
"aggs": {
"com_aggs": {
"terms": {
"field": "com_id",
"order": {
"_count": "desc"
}
}
}
}
}
}
}
以下是响应的显示方式:
响应:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"share_hoder" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "002",
"doc_count" : 3,
"com_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "002",
"doc_count" : 2 <---- Count you are looking for
},
{
"key" : "001",
"doc_count" : 1
}
]
}
},
{
"key" : "001",
"doc_count" : 2,
"com_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "001",
"doc_count" : 1
},
{
"key" : "002",
"doc_count" : 1
}
]
}
}
]
}
}
}
当然,由于Elasticsearch聚合的工作方式,您可能无法完全获得所需的结果表示。
聚合查询:2
为此,大多数操作与aggregation_1相同,在这里我使用了两个Terms Query,但我另外使用了Cardinality Aggregation Query来获取hld_id的计数,然后我又使用了Bucket Selector Aggregation,在其中添加了
count()==2
的条件POST shareholder/_search
{
"size": 0,
"aggs": {
"share_holder": {
"terms": {
"field": "hld_id",
"order": {
"_key": "desc"
}
},
"aggs": {
"com_aggs": {
"terms": {
"field": "com_id"
},
"aggs": {
"count_filter":{
"bucket_selector": {
"buckets_path": {
"count_path": "_count"
},
"script": "params.count_path == 2"
}
}
}
}
}
}
}
}
以下是响应的显示方式。
响应:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"share_holder" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "002",
"doc_count" : 3,
"com_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "002",
"doc_count" : 2 <---- Count == 2
}
]
}
},
{
"key" : "001",
"doc_count" : 2,
"com_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
]
}
}
}
请注意,第二个存储桶是空的。我试图查看是否可以过滤上面的查询,以便
"key": "001"
不会出现在第一位。解决方案2:使用Elasticsearch SQL:
如果您具有Kibana的x-pack版本,则可以以SQLish风格执行以下查询:
查询:1
POST /_sql?format=txt
{
"query": "SELECT hld_id, com_id, count(*) FROM shareholder GROUP BY hld_id, com_id ORDER BY count(*) desc"
}
响应:
hld_id | com_id | count(*)
---------------+---------------+---------------
002 |002 |2
001 |001 |1
001 |002 |1
002 |001 |1
查询2:
POST /_sql?format=txt
{
"query": "SELECT hld_id, com_id FROM shareholder GROUP BY hld_id, com_id HAVING count(*) = 2"
}
响应:
hld_id | com_id
---------------+---------------
002 |002
解决方案3:在术语聚合中使用脚本
聚合查询:
POST shareholder/_search
{
"size": 0,
"aggs": {
"query_groupby_count": {
"terms": {
"script": {
"source": """
doc['hld_id'].value + ", " + doc['com_id'].value
"""
}
}
},
"query_groupby_count_equals_2": {
"terms": {
"script": {
"source": """
doc['hld_id'].value + ", " + doc['com_id'].value
"""
}
},
"aggs": {
"myaggs": {
"bucket_selector": {
"buckets_path": {
"count": "_count"
},
"script": "params.count == 2"
}
}
}
}
}
}
响应:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"query_groupby_count_equals_2" : { <---- Group By Query For Count == 2
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "002, 002",
"doc_count" : 2
}
]
},
"query_groupby_count" : { <---- Group By Query
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "002, 002",
"doc_count" : 2
},
{
"key" : "001, 001",
"doc_count" : 1
},
{
"key" : "001, 002",
"doc_count" : 1
},
{
"key" : "002, 001",
"doc_count" : 1
}
]
}
}
}
使用CURL:
首先,让我们将查询保存在
.txt
或.json
文件中。例如,我创建了一个名为
query.json
的文件,仅将查询复制并粘贴到该文件中。{
"query": "SELECT hld_id, com_id, count(*) FROM shareholder GROUP BY hld_id, com_id ORDER BY count(*) desc"
}
现在执行以下curl命令,在其中引用文件,如下所示:
curl -XGET http://localhost:9200/_sql?format=txt -H "Content-Type: application/json" -d @query.json
希望这可以帮助!
关于elasticsearch - 关于Elasticsearch按两个字段分组,然后过滤或排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58568986/