如果我定义如下模式:
"mappings": {
"sales": {
"properties": {
"gender": { "type": "byte" },
"age": { "type": "byte" },
"amount": { "type": "integer" },
"dow": { "type": "byte" },
"day_of": { "type": "date" },
}
}
}
并将1000份销售文档添加到ES,其中包括男性0,女性1,陶氏星期一1,星期二2等数据。
我如何得到这样的结果:
gender 0: average amount of sales
gender 1: average amount of sales
要么
dow monday: average amount of sales
dow tues: average amount of sales
dow wed: average amount of sales
dow thurs: average amount of sales
dow friday: average amount of sales
和
dow monday AND age 18-24: average amount of sales
dow tues AND age 18-24 AND female: average amount of sales
dow wed AND age 18-24: average amount of sales
dow thurs AND age 18-24: average amount of sales
dow friday AND age 18-24: average amount of sales
最佳答案
这些中的每一个都很简单,但是您实际上是在问几个不同的问题。
无需像完成操作那样显式调用每个值(尽管从技术上讲,它没有任何问题)。相反,您可以问“更简单”的问题,并允许查询范围控制您什至看到的内容。
gender 0: average amount of sales gender 1: average amount of sales
这可以成为一个更简单的问题:
gender N: average amount of sales
{
"size": 0,
"aggs": {
"group_by_gender": {
"terms": {
"field": "gender"
},
"aggs": {
"avg_sales": {
"avg" :{
"field": "amount"
}
}
}
}
}
}
dow monday: average amount of sales dow tues: average amount of sales dow wed: average amount of sales dow thurs: average amount of sales dow friday: average amount of sales
这可以成为一个更简单的问题:
dow N, except Saturday or Sunday: average amount of sales
假设
dow == 0
为星期日,而dow == 6
为星期六:{
"size": 0,
"query": {
"bool" : {
"must_not": [
{
"terms": {
"dow": [0, 6]
}
}
]
}
},
"aggs": {
"group_by_dow": {
"terms": {
"field": "dow",
"size": 5
},
"aggs": {
"avg_sales": {
"avg": {
"field": "amount"
}
}
}
}
}
}
最后,最后一个仅向该问题添加另一个过滤器:
AND age 18-24 AND female
我认为
AND female
是为所有它们复制的,因为那是您回答的方式:{
"size": 0,
"query": {
"bool" : {
"must_not": [
{
"terms": {
"dow": [0, 6]
}
}
],
"filter": [
{
"term": {
"gender": 1
}
},
{
"range": {
"age": {
"gte": 18,
"lte": 24
}
}
}
]
}
},
"aggs": {
"group_by_dow": {
"terms": {
"field": "dow",
"size": 5
},
"aggs": {
"avg_sales": {
"avg": {
"field": "amount"
}
}
}
}
}
}
您已经发现了
stats
聚合,但是您只想求平均值,因此使用更具体的avg
聚合不会浪费时间执行您不关心的计算。您还需要阅读query context and the filter context之间的区别,以了解为什么我在上面使用
filter
而不是must
(基本上,过滤器可以缓存并且不计分;它们仅回答“是或否”问题,这就是您想要的)这里)。
关于elasticsearch - elasticsearch:我如何分组字段和平均总数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38539708/