我想根据 PostgreSQL 中相邻行之间另一列值的差异来更新表中的特定列。
这是一个测试设置:
CREATE TABLE test(
main INTEGER,
sub_id INTEGER,
value_t INTEGER);
INSERT INTO test (main, sub_id, value_t)
VALUES
(1,1,8),
(1,2,7),
(1,3,3),
(1,4,85),
(1,5,40),
(2,1,3),
(2,2,1),
(2,3,1),
(2,4,8),
(2,5,41);
我的目标是确定从 sub_id 1
开始的每个组 main
中 diff
中的哪个值超过特定阈值(例如 <10 或>-10) 通过按 sub_id
按升序检查。在达到阈值之前,我想通过用值填充列 newval
来标记每个通过的行AND条件为FALSE
的一行例如1
。
我应该使用循环还是有更智能的解决方案?
伪代码中的任务描述:
FOR i in GROUP [PARTITION BY main ORDER BY sub_id]:
DO until diff > 10 OR diff <-10
SET newval = 1 AND LEAD(newval) = 1
最佳答案
基本 SELECT
尽快:
SELECT *, bool_and(diff BETWEEN -10 AND 10) OVER (PARTITION BY main ORDER BY sub_id) AS flag
FROM (
SELECT *, value_t - lag(value_t, 1, value_t) OVER (PARTITION BY main ORDER BY sub_id) AS diff
FROM test
) sub;
细点
你的思维模型围绕窗口函数发展
lead()
。但其对应的lag()
出于此目的,效率更高一些,因为在包含大间隙之前的行时不会出现差一错误。 或者,使用lead()
具有反向排序顺序 (ORDER BY sub_id DESC
)。避免
NULL
对于分区中的第一行,提供value_t
默认为第三个参数,这使得 diff0
而不是 NULL。两者lead()
和lag()
有这个能力。diff BETWEEN -10 AND 10
比@diff < 11
稍快(也更清晰、更灵活)。 (@
being the "absolute value" operator ,相当于abs()
function 。)bool_or()
orbool_and()
在外部窗口函数中,将所有行标记为大间隙可能是最便宜的。
您的UPDATE
Until the threshold is reached I would like to flag every passed row AND the one row where the condition is
FALSE
by filling columnnewval
with a value e.g.1
.
再次强调,尽快。
UPDATE test AS t
SET newval = 1
FROM (
SELECT main, sub_id
, bool_and(diff BETWEEN -10 AND 10) OVER (PARTITION BY main ORDER BY sub_id) AS flag
FROM (
SELECT main, sub_id
, value_t - lag(value_t, 1, value_t) OVER (PARTITION BY main ORDER BY sub_id) AS diff
FROM test
) sub
) u
WHERE (t.main, t.sub_id) = (u.main, u.sub_id)
AND u.flag;
细点
计算单个查询中的所有值通常比相关子查询快得多。
添加的WHERE条件
AND u.flag
确保我们只更新实际需要更新的行。
如果某些行可能已经在newval
中具有正确的值,添加另一个子句以避免那些空更新:AND t.newval IS DISTINCT FROM 1
请参阅:SET newval = 1
分配一个常量(即使我们可以在这种情况下使用实际计算的值),这会便宜一些。
db<> fiddle here
关于sql - 如何在某个值差距之前识别每组的行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64810641/