我想计算自契约(Contract)开始日起 30 天期间客户拥有的“服务”数量。所以我必须从他的开始日期开始计算每月的服务。简化表格是这样的:
services
------------------
id serial
id_customer bigint
service_date date
让成像只有一种服务。我是这样解决的:
SELECT
DATE_PART('year',service_date)||'-'|| CASE WHEN DATE_PART('day',service_date) >= 15 THEN
DATE_PART('month',service_date)
ELSE
CASE WHEN DATE_PART('month',service_date) = 1 THEN
12
ELSE
DATE_PART('month',service_date)-1
END
END bill, count(id)
FROM services
WHERE id_customer = 1
GROUP BY bill
结果是
bill | count
-------------------
2019-02 | 2455333
在示例中,id_customer 1 的开始日期是 2019-02-15,但对于 2019-02 期间,我将统计到 2019-03-14 为止的服务。
我想知道的是,有更好/更高效的解决方案吗?
我看到了解决方案 here但暗示 INNER JOIN 与 GROUP BY 同一张表,我认为它会更慢,因为我的表有很多记录。
最佳答案
您无需担心一个月中的实际天数,也无需担心月份、年份或日期。
只需使用客户的开始日期,让 PostgreSQL 为您生成正确的计费周期。
为了对所有客户运行单个查询,我使用了一个单独的表,其中包含客户 id
以及配置的 billing_start
日期,然后我们可以运行如下查询:
WITH
periods (id, period_start, period_end) AS (
SELECT
id,
generate_series(billing_start, current_date, '1 month'::interval)::date,
(generate_series(billing_start, current_date, '1 month'::interval) + '1 month'::interval)::date
FROM test_customers
),
data AS (
SELECT
periods.id AS customer,
period_start,
count(test_services.*) AS service_calls
FROM periods INNER JOIN test_services ON (test_services.id_customer = periods.id)
WHERE test_services.service_date >= periods.period_start AND test_services.service_date < periods.period_end
GROUP BY 1, 2
)
SELECT customer, to_char(period_start, 'YYYY-MM') AS bill, service_calls
FROM data
ORDER BY 1, 2
;
...导致如下输出:
customer | bill | service_calls
----------+---------+---------------
1 | 2018-12 | 382736
1 | 2019-01 | 382735
1 | 2019-02 | 345696
2 | 2018-12 | 382736
2 | 2019-01 | 382734
2 | 2019-02 | 234580
3 | 2018-12 | 382734
3 | 2019-01 | 382736
3 | 2019-02 | 123463
4 | 2018-12 | 382734
4 | 2019-01 | 382736
4 | 2019-02 | 12346
5 | 2019-01 | 382735
5 | 2019-02 | 283965
6 | 2019-01 | 382735
6 | 2019-02 | 172848
7 | 2019-01 | 382734
7 | 2019-02 | 61732
8 | 2019-02 | 333351
9 | 2019-02 | 222234
10 | 2019-02 | 111117
(21 rows)
完整的在线示例:https://rextester.com/IHLJ95398
为了实现快速,需要注意的重要一点是 id_customer
和 service_date
上的多列索引,因为这是进行计数的地方,然后可以进行计数不排序:
CREATE INDEX idx_svc_customer_date ON test_services (id_customer, service_date);
(否则,排序很可能会在磁盘上完成,而不是在大型数据集的内存中完成)
如果您只想要单个客户的周期,请像这样使用它:
WITH
periods (id, period_start, period_end) AS (
SELECT
id,
generate_series(billing_start, current_date, '1 month'::interval)::date,
(generate_series(billing_start, current_date, '1 month'::interval) + '1 month'::interval)::date
FROM test_customers WHERE id = 4
),
data AS (
SELECT
periods.id AS customer,
period_start,
count(test_services.*) AS service_calls
FROM periods INNER JOIN test_services ON (test_services.id_customer = periods.id)
WHERE test_services.service_date >= periods.period_start AND test_services.service_date < periods.period_end
GROUP BY 1, 2
)
SELECT customer, to_char(period_start, 'YYYY-MM') AS bill, service_calls
FROM data
ORDER BY 1, 2
;
...给予:
bill | service_calls
---------+---------------
2018-12 | 382734
2019-01 | 382736
2019-02 | 12346
(3 rows)
关于sql - 按特定日期间隔计数分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55078561/