sql - PostgreSQL 在一次查询中获取事件发生的每日、每周和每月平均值

标签 sql postgresql query-optimization aggregate analytics

目前我有一个相当大的查询可以通过

  1. 通过获取按事件名称和日期分组的事件的 count(),将每天、每周、每月的计数汇总到中间表中。
  2. 通过按事件进行 avg() 分组,选择每个中间表的平均计数,对结果进行联合,因为我想为每天、每周、每月设置一个单独的列,将填充值 0 放入空列中。
  3. 然后我对所有列求和,0 基本上充当空操作,这只为每个事件提供一个值。

虽然查询非常大,但我觉得我在做很多重复性工作。有什么办法可以更好地执行此查询或使其更小吗?我以前没有真正做过这样的查询,所以我不太确定。

WITH monthly_counts as (
  SELECT
    event,
    count(*) as count
  FROM tracking_stuff
  WHERE
    event = 'thing'
    OR event = 'thing2'
    OR event = 'thing3'
  GROUP BY event, date_trunc('month', created_at)
),
weekly_counts as (
  SELECT
    event,
    count(*) as count
  FROM tracking_stuff
  WHERE
    event = 'thing'
    OR event = 'thing2'
    OR event = 'thing3'
  GROUP BY event, date_trunc('week', created_at)
),
daily_counts as (
  SELECT
    event,
    count(*) as count
  FROM tracking_stuff
  WHERE
    event = 'thing'
    OR event = 'thing2'
    OR event = 'thing3'
  GROUP BY event, date_trunc('day', created_at)
),
query as (
  SELECT
    event,
    0 as daily_avg,
    0 as weekly_avg,
    avg(count) as monthly_avg
  FROM monthly_counts
  GROUP BY event
  UNION
  SELECT
    event,
    0 as daily_avg,
    avg(count) as weekly_avg,
    0 as monthly_avg
  FROM weekly_counts
  GROUP BY event
  UNION
  SELECT
    event,
    avg(count) as daily_avg,
    0 as weekly_avg,
    0 as monthly_avg
  FROM daily_counts
  GROUP BY event
)
SELECT
  event,
  sum(daily_avg) as daily_avg,
  sum(weekly_avg) as weekly_avg,
  sum(monthly_avg) as monthly_avg
FROM query
GROUP BY event;

最佳答案

我会这样写查询:

select event, daily_avg, weekly_avg, monthly_avg
from (
    select event, avg(count) monthly_avg
    from (
        select event, count(*)
        from tracking_stuff
        where event in ('thing1', 'thing2', 'thing3')
        group by event, date_trunc('month', created_at)
    ) s
    group by 1
) monthly
join (
    select event, avg(count) weekly_avg
    from (
        select event, count(*)
        from tracking_stuff
        where event in ('thing1', 'thing2', 'thing3')
        group by event, date_trunc('week', created_at)
    ) s
    group by 1
) weekly using(event)
join (
    select event, avg(count) daily_avg
    from (
        select event, count(*)
        from tracking_stuff
        where event in ('thing1', 'thing2', 'thing3')
        group by event, date_trunc('day', created_at)
    ) s
    group by 1
) daily using(event)
order by 1;

如果 where 条件消除了很大一部分数据(比如一半以上),使用 cte 可以略微加快查询执行速度:

with the_data as (
    select event, created_at
    from tracking_stuff
    where event in ('thing1', 'thing2', 'thing3')
    )

select event, daily_avg, weekly_avg, monthly_avg
from (
    select event, avg(count) monthly_avg
    from (
        select event, count(*)
        from the_data
        group by event, date_trunc('month', created_at)
    ) s
    group by 1
) monthly
--  etc ... 

出于好奇,我对数据进行了测试:

create table tracking_stuff (event text, created_at timestamp);
insert into tracking_stuff
    select 'thing' || random_int(9), '2016-01-01'::date+ random_int(365)
    from generate_series(1, 1000000);

在每个查询中,我都用 thing1 替换了 thing,因此查询消除了大约 2/3 的行。

10 次测试的平均执行时间:

Original query          1106 ms
My query without cte    1077 ms
My query with cte        902 ms
Clodoaldo's query       5187 ms

关于sql - PostgreSQL 在一次查询中获取事件发生的每日、每周和每月平均值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38226788/

相关文章:

sql - 递归oracle sql来识别一个值

sql - 如何从 PostgreSQL 中没有任何条件的表中删除前几条记录?

mysql - MySQL查询优化-加入?

MySql:复合唯一键

sql - 将列设置为空并加入 Postgresql

mysql - SUM 加入字段

SQL:控制返回的记录数

postgresql - 我无法在 Mac Big Sur 上的 MAMP 中添加 pgsql PDO 驱动程序

sql - 使用 UNION ALL 和 RANK 优化 Hive 查询

sql - 在 SQL 中实现子字符串搜索的最佳方法是什么?