给定一个包含连续数据运行的表:一个数字在任务进行时总是增加,并在下一个任务开始时重置回零,您如何选择每次数据运行的最大值?
每次连续运行可以有任意数量的行,数据运行由“开始”和“结束”行标记,例如数据可能看起来像
user_id, action, qty, datetime
1, start, 0, 2017-01-01 00:00:01
1, record, 0, 2017-01-01 00:00:01
1, record, 4, 2017-01-01 00:00:02
1, record, 5, 2017-01-01 00:00:03
1, record, 6, 2017-01-01 00:00:04
1, end, 0, 2017-01-01 00:00:04
1, start, 0, 2017-01-01 00:00:05
1, record, 0, 2017-01-01 00:00:05
1, record, 2, 2017-01-01 00:00:06
1, record, 3, 2017-01-01 00:00:07
1, end, 0, 2017-01-01 00:00:07
2, start, 0, 2017-01-01 00:00:08
2, record, 0, 2017-01-01 00:00:08
2, record, 3, 2017-01-01 00:00:09
2, record, 8, 2017-01-01 00:00:10
2, end, 0, 2017-01-01 00:00:10
结果将是每次运行的最大值:
user_id, action, qty, datetime
1, record, 6, 2017-01-01 00:00:04
1, record, 3, 2017-01-01 00:00:07
2, record, 8, 2017-01-01 00:00:10
使用任何 postgres sql 语法 (9.3)?它是某种分组,然后从每个组中选择最大值,但我不知道如何进行分组部分。
最佳答案
如果单个用户没有重叠并且下一次运行总是在稍后开始,那么您可以使用 LAG()
窗口函数。
with the_table(user_id, action, qty, datetime) as (
select 1,'start', 0, '2017-01-01 00:00:01'::timestamp union all
select 1,'record', 0, '2017-01-01 00:00:01'::timestamp union all
select 1,'record', 4, '2017-01-01 00:00:02'::timestamp union all
select 1,'record', 5, '2017-01-01 00:00:03'::timestamp union all
select 1,'record', 6, '2017-01-01 00:00:04'::timestamp union all
select 1,'end', 0, '2017-01-01 00:00:04'::timestamp union all
select 1,'start', 0, '2017-01-01 00:00:05'::timestamp union all
select 1,'record', 0, '2017-01-01 00:00:05'::timestamp union all
select 1,'record', 2, '2017-01-01 00:00:06'::timestamp union all
select 1,'record', 3, '2017-01-01 00:00:07'::timestamp union all
select 1,'end', 0, '2017-01-01 00:00:07'::timestamp union all
select 2,'start', 0, '2017-01-01 00:00:08'::timestamp union all
select 2,'record', 0, '2017-01-01 00:00:08'::timestamp union all
select 2,'record', 3, '2017-01-01 00:00:09'::timestamp union all
select 2,'record', 8, '2017-01-01 00:00:10'::timestamp union all
select 2,'end', 0, '2017-01-01 00:00:10'::timestamp
)
select n_user_id, n_action, n_qty, n_datetime from (
select action,
lag(user_id) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_user_id,
lag(action) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_action,
lag(qty) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_qty,
lag(datetime) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_datetime
from the_table
)t
where action = 'end'
因为一些 action = record
行与 start
和 end
行具有相同的日期时间,所以我使用 CASE
ORDER BY
,要明确的是start
在前,然后是record
,然后是end
。
关于SQL选择连续运行数据的最大值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45181216/