SQL选择连续运行数据的最大值

给定一个包含连续数据运行的表:一个数字在任务进行时总是增加，并在下一个任务开始时重置回零，您如何选择每次数据运行的最大值？

每次连续运行可以有任意数量的行，数据运行由“开始”和“结束”行标记，例如数据可能看起来像

user_id, action, qty, datetime
1,       start,  0,   2017-01-01 00:00:01
1,       record, 0,   2017-01-01 00:00:01
1,       record, 4,   2017-01-01 00:00:02
1,       record, 5,   2017-01-01 00:00:03
1,       record, 6,   2017-01-01 00:00:04
1,       end,    0,   2017-01-01 00:00:04
1,       start,  0,   2017-01-01 00:00:05
1,       record, 0,   2017-01-01 00:00:05
1,       record, 2,   2017-01-01 00:00:06
1,       record, 3,   2017-01-01 00:00:07
1,       end,    0,   2017-01-01 00:00:07
2,       start,  0,   2017-01-01 00:00:08
2,       record, 0,   2017-01-01 00:00:08
2,       record, 3,   2017-01-01 00:00:09
2,       record, 8,   2017-01-01 00:00:10
2,       end,    0,   2017-01-01 00:00:10

结果将是每次运行的最大值:

user_id, action, qty, datetime
1,       record, 6,   2017-01-01 00:00:04
1,       record, 3,   2017-01-01 00:00:07
2,       record, 8,   2017-01-01 00:00:10

使用任何 postgres sql 语法 (9.3)？它是某种分组，然后从每个组中选择最大值，但我不知道如何进行分组部分。

最佳答案

如果单个用户没有重叠并且下一次运行总是在稍后开始，那么您可以使用 LAG() 窗口函数。

with the_table(user_id, action, qty, datetime) as (
    select 1,'start',  0,   '2017-01-01 00:00:01'::timestamp union all
    select 1,'record', 0,   '2017-01-01 00:00:01'::timestamp union all
    select 1,'record', 4,   '2017-01-01 00:00:02'::timestamp union all
    select 1,'record', 5,   '2017-01-01 00:00:03'::timestamp union all
    select 1,'record', 6,   '2017-01-01 00:00:04'::timestamp union all
    select 1,'end',    0,   '2017-01-01 00:00:04'::timestamp union all
    select 1,'start',  0,   '2017-01-01 00:00:05'::timestamp union all
    select 1,'record', 0,   '2017-01-01 00:00:05'::timestamp union all
    select 1,'record', 2,   '2017-01-01 00:00:06'::timestamp union all
    select 1,'record', 3,   '2017-01-01 00:00:07'::timestamp union all
    select 1,'end',    0,   '2017-01-01 00:00:07'::timestamp union all
    select 2,'start',  0,   '2017-01-01 00:00:08'::timestamp union all
    select 2,'record', 0,   '2017-01-01 00:00:08'::timestamp union all
    select 2,'record', 3,   '2017-01-01 00:00:09'::timestamp union all
    select 2,'record', 8,   '2017-01-01 00:00:10'::timestamp union all
    select 2,'end',    0,   '2017-01-01 00:00:10'::timestamp  
)

select n_user_id, n_action, n_qty, n_datetime from (
    select action, 
    lag(user_id) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_user_id,
    lag(action) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_action,
    lag(qty) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_qty,
    lag(datetime) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_datetime 
    from the_table  
)t
where action = 'end'

因为一些 action = record 行与 start 和 end 行具有相同的日期时间，所以我使用 CASE ORDER BY，要明确的是start在前，然后是record，然后是end。

关于SQL选择连续运行数据的最大值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45181216/

SQL选择连续运行数据的最大值

上一篇：sql - 给定特定列排序顺序，为每一行返回表中的第一个条目

下一篇：node.js - pg-promise 和 postgresql 的关系不存在错误？