我面临以下问题,即基于其子文档的汇总值来选择和排序父文档。汇总(例如sum)本身取决于查询字符串,即哪些子文档与汇总相关。
示例:给定文档篮子A 和篮子B ,对于每个basket document
,如果number
字段与我的查询匹配,例如,我希望对其fruit
子级的name
字段求和。 apples
。
PUT /baskets/_doc/0
{
"name": "basket A",
"fruit": [
{
"name": "apples",
"number": 2
},
{
"name": "oranges",
"number": 3
}
]
}
PUT /baskets/_doc/1
{
"name": "basket B",
"fruit": [
{
"name": "apples",
"number": 3
},
{
"name": "apples",
"number": 3
}
]
}
对应:PUT /baskets
{
"mappings": {
"properties": {
"name": { "type": "text" },
"fruit": {
"type": "nested",
"properties": {
"name": { "type": "text" },
"number": { "type": "long" }
}
}
}
}
}
如何使用Elasticsearch(7.8.0)查询DSL实现这一目标?
到目前为止,我已经尝试了nested queries and aggregations,但没有成功。
谢谢!
编辑:添加了映射
编辑:更新了数字以更好地反射(reflect)问题
*编辑:为用例2 添加了可能的答案(请参阅@joe对答案的评论):
GET /profiles/_search
{
"aggs": {
"aggs_baskets": {
"terms": {
"field": "name",
"order": {"nest > fruit_filter > fruit_sum": "desc"}
},
"aggs": {
"nest":{
"nested":{
"path": "fruit"
},
"aggs":{
"fruit_filter":{
"filter": {
"term": {"fruit.name": "apple"}
},
"aggs":{
"fruit_sum":{
"sum": {"field": "fruit.number"}
}
}
}
}
}
}
}
}
}
最佳答案
用例1:
GET baskets/_search
{
"query": {
"nested": {
"path": "fruit",
"inner_hits": {},
"query": {
"bool": {
"must": [
{
"term": {
"fruit.name": {
"value": "apples"
}
}
},
{
"range": {
"fruit.number": {
"gte": 5
}
}
}
]
}
}
}
}
}
严格超过5-> gt
; > = 5-> gte
。还要注意
inner_hits
部分-这为您提供了导致该特定查询篮与查询匹配的实际嵌套子文档。这不是必需的,但必须了解。用例2:
GET baskets/_search
{
"sort": [
{
"fruit.number": {
"nested_path": "fruit",
"order": "desc"
}
}
]
}
用例2编辑:可能有更干净的方法可以做到这一点,但我将遵循以下几点:
GET baskets/_search
{
"size": 0,
"aggs": {
"multiply_and_add": {
"scripted_metric": {
"params": {
"only_fruit_name": "apples"
},
"init_script": "state.by_basket_name = [:]",
"map_script": """
def basket_name = params._source['name'];
def fruits = params._source['fruit'].findAll(group -> group.name == params.only_fruit_name);
for (def fruit_group : fruits) {
def number = fruit_group.number;
if (state.by_basket_name.containsKey(basket_name)) {
state.by_basket_name[basket_name] += number;
} else {
state.by_basket_name[basket_name] = number;
}
}
""",
"combine_script": "return state.by_basket_name",
"reduce_script": "return states"
}
}
}
}
产生沿着{
...
"aggregations":{
"multiply_and_add":{
"value":[
{
"basket A":2,
"basket B":6
}
]
}
}
}
排序可以在reduce_script
中完成,也可以在ES响应后处理管道中完成。您当然可以选择带(排序的)列表和lambdas ...注意所需的
nested_path
。
关于elasticsearch - 对子文档字段值的Elasticsearch聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63078703/