elasticsearch - elasticsearch:我如何分组字段和平均总数?

标签 elasticsearch

如果我定义如下模式:

"mappings": {
    "sales": {
        "properties": {
            "gender": { "type": "byte" },
            "age":    { "type": "byte" },
            "amount": { "type": "integer" },
            "dow":    { "type": "byte" },
            "day_of": { "type": "date" },
        }
    }
}

并将1000份销售文档添加到ES,其中包括男性0,女性1,陶氏星期一1,星期二2等数据。

我如何得到这样的结果:
gender 0: average amount of sales
gender 1: average amount of sales

要么
dow monday: average amount of sales
dow tues: average amount of sales
dow wed: average amount of sales
dow thurs: average amount of sales
dow friday: average amount of sales


dow monday AND age 18-24: average amount of sales
dow tues AND age 18-24 AND female: average amount of sales
dow wed AND age 18-24: average amount of sales
dow thurs AND age 18-24: average amount of sales
dow friday AND age 18-24: average amount of sales

最佳答案

这些中的每一个都很简单,但是您实际上是在问几个不同的问题。

无需像完成操作那样显式调用每个值(尽管从技术上讲,它没有任何问题)。相反,您可以问“更简单”的问题,并允许查询范围控制您什至看到的内容。

gender 0: average amount of sales gender 1: average amount of sales



这可以成为一个更简单的问题:

gender N: average amount of sales


{
  "size": 0,
  "aggs": {
    "group_by_gender": {
      "terms": {
        "field": "gender"
      },
      "aggs": {
        "avg_sales": {
          "avg" :{
            "field": "amount"
          }
        }
      }
    }
  }
}

dow monday: average amount of sales dow tues: average amount of sales dow wed: average amount of sales dow thurs: average amount of sales dow friday: average amount of sales



这可以成为一个更简单的问题:

dow N, except Saturday or Sunday: average amount of sales



假设dow == 0为星期日,而dow == 6为星期六:
{
  "size": 0,
  "query": {
    "bool" : {
      "must_not": [
        {
          "terms": {
            "dow": [0, 6]
          }
        }
      ]
    }
  },
  "aggs": {
    "group_by_dow": {
      "terms": {
        "field": "dow",
        "size": 5
      },
      "aggs": {
        "avg_sales": {
          "avg": {
            "field": "amount"
          }
        }
      }
    }
  }
}

最后,最后一个仅向该问题添加另一个过滤器:

AND age 18-24 AND female



我认为AND female是为所有它们复制的,因为那是您回答的方式:
{
  "size": 0,
  "query": {
    "bool" : {
      "must_not": [
        {
          "terms": {
            "dow": [0, 6]
          }
        }
      ],
      "filter": [
        {
          "term": {
            "gender": 1
          }
        },
        {
          "range": {
            "age": {
              "gte": 18,
              "lte": 24
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "group_by_dow": {
      "terms": {
        "field": "dow",
        "size": 5
      },
      "aggs": {
        "avg_sales": {
          "avg": {
            "field": "amount"
          }
        }
      }
    }
  }
}

您已经发现了stats聚合,但是您只想求平均值,因此使用更具体的avg聚合不会浪费时间执行您不关心的计算。

您还需要阅读query context and the filter context之间的区别,以了解为什么我在上面使用filter而不是must(基本上,过滤器可以缓存并且不计分;它们仅回答“是或否”问题,这就是您想要的)这里)。

关于elasticsearch - elasticsearch:我如何分组字段和平均总数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38539708/

相关文章:

php - SuluArticleBundle 抛出 undefined index : article

从主机 curl VM 中的 Elasticsearch 实例

nginx - 使用Nginx的Elasticsearch基本身份验证

python - ES density_vector字段:必须指定 'dims'

elasticsearch - 为什么logstash停止处理日志

elasticsearch - 每次在Elasticsearch中更新文档时自动增加字段值

elasticsearch - 谷歌云平台 : How do I add another node to an elasticsearch deployment?

MongoDB 4.x 实时同步到 ElasticSearch 6.x +

css - 让我的搜索框更适合移动设备

elasticsearch - POST _cache/clear真正发生了什么?