我的目标是创建一个查询,返回 365 天窗口内进行过购买的唯一客户的计数。我在 postgres 中创建了下面的查询,结果查询非常慢。我的表有 812,024 行,仅包含订单日期和客户 ID。当我删除 unique 语句时,我可以让查询在大约 60 秒内返回结果,有了它,我还没有完成。我在 (order_date, id) 上创建了一个索引。我对 SQL 完全是个新手,这确实是我第一次用它做任何事情,在尝试了一整天寻找这个问题的解决方案之后,我找不到任何可以工作的东西,尽管我已经看到很多关于distinct 性能缓慢的情况。
SELECT
(d1.Ordered) AS Ordered,
COUNT(distinct d2.ID) Users
FROM
(
SELECT order_date AS Ordered
FROM orders
GROUP BY order_date
) d1
INNER JOIN
(
SELECT order_date AS Ordered, id
FROM orders
) d2
ON d2.Ordered BETWEEN d1.Ordered - 364 AND d1.Ordered
GROUP BY d1.Ordered
ORDER BY d1.Ordered
"Sort (cost=3541596.30..3541596.80 rows=200 width=29)"
" Sort Key: orders_1.order_date"
" -> HashAggregate (cost=3541586.66..3541588.66 rows=200 width=29)"
" -> Nested Loop (cost=16121.73..3040838.52 rows=100149627 width=29)"
" -> HashAggregate (cost=16121.30..16132.40 rows=1110 width=4)"
" -> Seq Scan on orders orders_1 (cost=0.00..14091.24 rows=812024 width=4)"
" -> Index Only Scan using x on orders (cost=0.43..1822.70 rows=90225 width=29)"
" Index Cond: ((order_date >= (orders_1.order_date - 364)) AND (order_date <= orders_1.order_date))"
最佳答案
不需要自连接,使用generate_series
select
g.order_date as "Ordered",
count(distinct o.id) as "Users"
from
generate_series(
(select min(order_date) from orders),
(select max(order_date) from orders),
'1 day'
) g (order_date)
left join
orders o on o.order_date between g.order_date - 364 and g.order_date
group by 1
order by 1
关于sql - postgres 中使用不同计数的慢速查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25026079/