我在时间刻度数据库中有以下价格审计表。当我得到新价格时,我应该计算过去 30 天内的最低价格。但我们的价格数据不是每日的。在每次价格变化中,我都会将日期和价格写入表中。问题是,当我尝试查找最低价格时,我必须在过去 30 天之前查找,因为没有足够的数据。表中还有 1.3 亿行。
在这种情况下,最低价格应为 120。因为该产品在 7 月 12 日至 8 月 4 日期间的售价为 120 ₺。
DDL 示例:
create table price_audit (
timestamp timestamp with time zone not null,
partnerid varchar(200) not null,
productid varchar(200) not null,
price numeric(12, 2)
);
alter table price_audit
owner to appuser;
create index _price_audit_timestamp_idx
on price_audit (timestamp desc);
create index productid_time_desc
on price_audit (productid asc, timestamp desc)
where (productid IS NOT NULL);
create index partnerid_time_desc
on price_audit (partnerid asc, timestamp desc)
where (partnerid IS NOT NULL);
create index partnerid_productid_time_desc
on price_audit (partnerid asc, productid asc, timestamp desc)
where ((partnerid IS NOT NULL) AND (productid IS NOT NULL));
数据示例:
插入查询:
insert into price_audit ("timestamp", partnerid, productid, price)
values ('2022-07-09 18:18:39.000000 +00:00','b319406e-2ca7-4663-a203-68a1928fdc53','ABCTEST1', 120.00),
('2022-08-05 21:19:40.000000 +00:00','b319406e-2ca7-4663-a203-68a1928fdc53','ABCTEST1',130.00),
('2022-08-10 11:20:39.000000 +00:00','b319406e-2ca7-4663-a203-68a1928fdc53','ABCTEST1',140.00);
当新价格到来时,我们不应将以下内容纳入范围:
新价格日期:2022/08/11
减去30天日期:2022/07/12
以下查询获取 30 天之前的第一个价格以及过去 30 天的价格,并返回最小值。
SELECT min(price) from price_audit
where (timestamp >= (select timestamp from price_audit
where timestamp < '2022-07-12T15:30:00-00:00' and productid = 'ABCTEST1' and partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53'
order by timestamp desc limit 1) or
timestamp >= '2022-07-12T15:30:00-00:00' and timestamp < '2022-08-11T15:30:00-00:00') and
productid = 'ABCTEST1' and partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53' and price > 0
另一个带有 MAX 的:
SELECT min(price) from price_audit
where (timestamp >= (select MAX(timestamp) from price_audit
where timestamp < '2022-07-12T15:30:00-00:00' and productid = 'ABCTEST1' and partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53') or
timestamp >= '2022-07-12T15:30:00-00:00' and timestamp < '2022-08-11T15:30:00-00:00') and
productid = 'ABCTEST1' and partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53' and price > 0
查询成本:76
实际上这个查询运行良好,但我不确定所有情况都可以。因为如果 2022/07/12:00:00:00 中有任何数据查询工作错误。不过没关系,我可以忽略它。
此外,运行时间大约为 50-100 毫秒。如果该查询没问题,我想将其减少到 50 毫秒以下,因为我可能必须在 1 天内运行该查询 100 万次。
我们如何提高该查询的性能?
最佳答案
take the first price older than 30 days, along with prices from the last 30 days, and return the minimum
我希望它能表现得更好:
SELECT LEAST (
(SELECT min(price)
FROM price_audit
WHERE partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53'
AND productid = 'ABCTEST1'
AND price > 0
AND timestamp >= '2022-07-12 15:30:00-00:00'
)
, (SELECT price
FROM price_audit
WHERE partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53'
AND productid = 'ABCTEST1'
AND price > 0
AND timestamp < '2022-07-12 15:30:00-00:00'
ORDER BY timestamp DESC
LIMIT 1
)
);
您的索引 partnerid_productid_time_desc
应该适合于此,但此索引应该更好:
CREATE INDEX price_audit_partnerid_productid_time_desc ON price_audit (partnerid, productid, timestamp DESC) INCLUDE (price) -- ①
WHERE price > 0; -- ②
① 仅当您从中获得仅索引扫描时,覆盖索引才有意义。请参阅:
- Advantage of using INCLUDE as against adding the column in INDEX for covering index
- Do covering indexes in PostgreSQL help JOIN columns?
② 仅在排除相当大比例的行时才有意义。
WHERE Partnerid IS NOT NULL AND ProductID IS NOT NULL
(就像您现在一样)永远没有意义,因为两列都被定义为NOT NULL
。
旁白:
我不会为 timestamptz
列使用名称“timestamp”。最好根本不要使用基本类型名称作为标识符。太困惑了。
关于sql - 快速获取过去 30 天的最低价格以及之前的最后价格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73660296/