我有包含产品的数据库。每个产品都由字段组成:uuid
、group_id
、title
、since
、till
.
since
和 till
定义可用性间隔。
间隔[since,til]
是每个group_id的不相交对。因此,一组内不存在间隔相交的 2 个产品。
我需要获取满足以下条件的产品列表:
- 列表中每组最多应包含 1 个产品
- 每个产品都与给定的标题匹配
- 每个产品都是当前产品(自<=现在<=直到),或者如果当前产品不存在于其组中,则它应该是距 future 最近的产品(最小(自),使得自>=现在)<
ES 映射:
{
"products": {
"mappings": {
"dynamic": "false",
"properties": {
"group_id": {
"type": "long",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"since": {
"type": "date",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"till": {
"type": "date",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
是否可以在 Elasticsearch 中创建这样的查询?
最佳答案
查看您的映射,我创建了示例文档、查询及其响应,如下所示:
示例文档:
POST product_index/_doc/1
{
"group_id": 1,
"title": "nike",
"since": "2020-01-01",
"till": "2020-03-31"
}
POST product_index/_doc/2
{
"group_id": 2,
"title": "nike",
"since": "2020-01-01",
"till": "2020-03-31"
}
POST product_index/_doc/3
{
"group_id": 3,
"title": "nike",
"since": "2020-03-15",
"till": "2020-03-31"
}
POST product_index/_doc/4
{
"group_id": 3,
"title": "nike",
"since": "2020-03-19",
"till": "2020-03-31"
}
如上所述,总共有 4 个文档,group 1
和 2
各有一个文档,而 group 3
有两个文档两者since >= now
查询请求:
查询摘要如下:
Bool
- Must
- Match title as nike
- Should
- clause 1 - since <= now <= till
- clause 2 - now <= since
Agg
- Terms on GroupId
- Top Hits (retrieve only 1st document as your clause is at most for each group, and sort them by asc order of since)
下面是实际的查询:
POST product_index/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"title": "nike"
}
},
{
"bool": {
"should": [
{ <--- since <=now <= till
"bool": {
"must": [
{
"range": {
"till": {
"gte": "now"
}
}
},
{
"range": {
"since": {
"lte": "now"
}
}
}
]
}
},
{ <---- since >= now
"bool": {
"must": [
{
"range": {
"since": {
"gte": "now"
}
}
}
]
}
}
]
}
}
]
}
},
"aggs": {
"my_groups": {
"terms": {
"field": "group_id.keyword",
"size": 10
},
"aggs": {
"my_docs": {
"top_hits": {
"size": 1, <--- Note this to return at most one document
"sort": [
{ "since": { "order": "asc"} <--- Sort to return the lowest value of since
}
]
}
}
}
}
}
}
请注意,我使用了 Terms Aggregation和 Top Hits作为其子聚合。
响应:
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_groups" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "3",
"doc_count" : 2,
"my_docs" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "product_index",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"group_id" : 3,
"title" : "nike",
"since" : "2020-03-15",
"till" : "2020-03-31"
},
"sort" : [
1584230400000
]
}
]
}
}
},
{
"key" : "1",
"doc_count" : 1,
"my_docs" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "product_index",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"group_id" : 1,
"title" : "nike",
"since" : "2020-01-01",
"till" : "2020-03-31"
},
"sort" : [
1577836800000
]
}
]
}
}
},
{
"key" : "2",
"doc_count" : 1,
"my_docs" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "product_index",
"_type" : "_doc",
"_id" : "2",
"_score" : null,
"_source" : {
"group_id" : 2,
"title" : "nike",
"since" : "2020-01-01",
"till" : "2020-03-31"
},
"sort" : [
1577836800000
]
}
]
}
}
}
]
}
}
}
请告诉我这是否有帮助!
关于带分组的 Elasticsearch 查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60621420/