SQL:当有些月份没有记录时,如何查询每月总和的平均值?

标签 sql postgresql datetime average aggregate-functions

TL;WR:当有些月份没有记录(所以应该为0)时,如何查询月总和的平均值?


背景

我的 children 每天都在报告他们做家务的时间(在 PostgreSQL 数据库中)。我的数据集看起来像这样:

date,user,duration

2020-01-01,Alice,120
2020-01-02,Bob,30
2020-01-03,Charlie,10
2020-01-23,Charlie,10

2020-02-03,Charlie,10
2020-02-23,Charlie,10

2020-03-02,Bob,30
2020-03-03,Charlie,10
2020-03-23,Charlie,10

我想知道他们平均每个月做多少。具体来说,我想要的结果是:

  • 爱丽丝:40 =(120+0+0)÷3
  • 鲍勃:20 =(30+0+30)÷3
  • 查理:20 =([10+10]+[10+10]+[10+10])÷3

问题

在某些月份,我没有某些用户的记录(例如,2 月和 3 月的 Alice)。因此,运行以下嵌套查询不会返回我想要的结果;事实上,这并没有考虑到因为没有这几个月的记录,所以 Alice 在 2 月和 3 月的贡献应该为 0(这里的平均值被错误地计算为 120)。

-- this does not work
SELECT
    "user",
    round(avg(monthly_duration)) as avg_monthly_sum
FROM (
    SELECT
        date_trunc('month', date),
        "user",
        sum(duration) as monthly_duration
    FROM
        public.chores_record
    GROUP BY
        date_trunc('month', date),
        "user"
) AS monthly_sum
GROUP BY
    "user"
;
-- Doesn't return what I want:
--
-- "unique_user","avg_monthly_sum"
-- "Alice",120
-- "Bob",30
-- "Charlie",20

因此,我构建了一个相当繁琐的查询如下:

  1. 列出独特的月份,
  2. 列出唯一用户,
  3. 生成月份×用户组合,
  4. 从原始数据中添加每月总和,
  5. 获取月总和的平均值(假设 'null' = 0)。
SELECT
    unique_user,
    round(avg(COALESCE(monthly_duration, 0))) -- COALESCE transforms 'null' into 0
FROM (
    -- monthly duration with 'null' if no record for that user×month
    SELECT
        month_user_combinations.month,
        month_user_combinations.unique_user,
        monthly_duration.monthly_duration
    FROM
    (
        (
            -- all months×users combinations
            SELECT
                month,
                unique_user
            FROM (
                (
                    -- list of unique months
                    SELECT DISTINCT
                        date_trunc('month', date) as month
                    FROM
                        public.chores_record
                ) AS unique_months
                CROSS JOIN
                (
                    -- list of unique users
                    SELECT DISTINCT
                        "user" as "unique_user"
                    FROM
                        public.chores_record
                ) AS unique_users
            )
        ) AS month_user_combinations
        LEFT OUTER JOIN
        (
            -- monthly duration for existing month×user combination only
            SELECT
                date_trunc('month', date) as month,
                "user",
                sum(duration) as monthly_duration
            FROM
                public.chores_record
            GROUP BY
                date_trunc('month', date),
                "user"
        ) AS monthly_duration
        ON (
            month_user_combinations.month = monthly_duration.month
            AND
            month_user_combinations.unique_user = monthly_duration.user
        )
    )
) AS monthly_duration_for_all_combinations
GROUP BY
    unique_user
;

这个查询有效,但是非常庞大。

问题

如何比上面更优雅的查询月总和的平均值,同时考虑“无记录⇒月总和=0”?

注意:可以安全地假设我想计算只有至少一个记录的月份的平均值(即这里不考虑 12 月或 4 月是正常的。)


MWE

CREATE TABLE public.chores_record
(
    date date NOT NULL,
    "user" text NOT NULL,
    duration integer NOT NULL,
    PRIMARY KEY (date, "user")
);

INSERT INTO
    public.chores_record(date, "user", duration)
VALUES
    ('2020-01-01','Alice',120),
    ('2020-01-02','Bob',30),
    ('2020-01-03','Charlie',10),
    ('2020-01-23','Charlie',10),
    ('2020-02-03','Charlie',10),
    ('2020-02-23','Charlie',10),
    ('2020-03-02','Bob',30),
    ('2020-03-03','Charlie',10),
    ('2020-03-23','Charlie',10)
;

最佳答案

您可以使用 CTE 构建日历表:


-- EXPLAIN
WITH cal AS ( -- The unique months
        SELECT DISTINCT date_trunc('mon', zdate) AS tick
        FROM chores_record
        )
, cnt AS (      -- the number of months (a scalar)
        SELECT COUNT(*) AS nmonth
        FROM cal
        )
SELECT
        x.zuser
        , SUM(x.duration) AS tot_duration
        , SUM(x.duration) / SUM(c.nmonth) AS Averarage_month -- this is ugly ...
FROM cal t
JOIN cnt c ON true -- This is ugly
LEFT JOIN chores_record x ON date_trunc('mon', x.zdate) = t.tick
GROUP BY x.zuser
        ;

关于SQL:当有些月份没有记录时,如何查询每月总和的平均值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64843225/

相关文章:

sql-server - SQL Server 到 PostgreSQL

mysql - 从数据库中删除项目但保留唯一ID

sql - 如何在postgresql中选择多行?

sql - Rails 事件记录查询 : Join on Child Condition to include only matching results

sql - 如何在iOS应用程序中嵌入SQL?

SQL:获取其他行未使用的数字列表

javascript - 如何禁用当前日期之前的日期?对于输入类型 Datetime-local

php - mysql:将默认值存储为变量

javascript - 如何在 JS/jQuery 中正确显示 php 服务器时间和时区,与用户位置无关?

python - 使用 pandas to_datetime 时如何定义格式?