SQL选择连续运行数据的最大值

标签 sql postgresql group-by

给定一个包含连续数据运行的表:一个数字在任务进行时总是增加,并在下一个任务开始时重置回零,您如何选择每次数据运行的最大值?

每次连续运行可以有任意数量的行,数据运行由“开始”和“结束”行标记,例如数据可能看起来像

user_id, action, qty, datetime
1,       start,  0,   2017-01-01 00:00:01
1,       record, 0,   2017-01-01 00:00:01
1,       record, 4,   2017-01-01 00:00:02
1,       record, 5,   2017-01-01 00:00:03
1,       record, 6,   2017-01-01 00:00:04
1,       end,    0,   2017-01-01 00:00:04
1,       start,  0,   2017-01-01 00:00:05
1,       record, 0,   2017-01-01 00:00:05
1,       record, 2,   2017-01-01 00:00:06
1,       record, 3,   2017-01-01 00:00:07
1,       end,    0,   2017-01-01 00:00:07
2,       start,  0,   2017-01-01 00:00:08
2,       record, 0,   2017-01-01 00:00:08
2,       record, 3,   2017-01-01 00:00:09
2,       record, 8,   2017-01-01 00:00:10
2,       end,    0,   2017-01-01 00:00:10

结果将是每次运行的最大值:

user_id, action, qty, datetime
1,       record, 6,   2017-01-01 00:00:04
1,       record, 3,   2017-01-01 00:00:07
2,       record, 8,   2017-01-01 00:00:10     

使用任何 postgres sql 语法 (9.3)?它是某种分组,然后从每个组中选择最大值,但我不知道如何进行分组部分。

最佳答案

如果单个用户没有重叠并且下一次运行总是在稍后开始,那么您可以使用 LAG() 窗口函数。

with the_table(user_id, action, qty, datetime) as (
    select 1,'start',  0,   '2017-01-01 00:00:01'::timestamp union all
    select 1,'record', 0,   '2017-01-01 00:00:01'::timestamp union all
    select 1,'record', 4,   '2017-01-01 00:00:02'::timestamp union all
    select 1,'record', 5,   '2017-01-01 00:00:03'::timestamp union all
    select 1,'record', 6,   '2017-01-01 00:00:04'::timestamp union all
    select 1,'end',    0,   '2017-01-01 00:00:04'::timestamp union all
    select 1,'start',  0,   '2017-01-01 00:00:05'::timestamp union all
    select 1,'record', 0,   '2017-01-01 00:00:05'::timestamp union all
    select 1,'record', 2,   '2017-01-01 00:00:06'::timestamp union all
    select 1,'record', 3,   '2017-01-01 00:00:07'::timestamp union all
    select 1,'end',    0,   '2017-01-01 00:00:07'::timestamp union all
    select 2,'start',  0,   '2017-01-01 00:00:08'::timestamp union all
    select 2,'record', 0,   '2017-01-01 00:00:08'::timestamp union all
    select 2,'record', 3,   '2017-01-01 00:00:09'::timestamp union all
    select 2,'record', 8,   '2017-01-01 00:00:10'::timestamp union all
    select 2,'end',    0,   '2017-01-01 00:00:10'::timestamp  
)

select n_user_id, n_action, n_qty, n_datetime from (
    select action, 
    lag(user_id) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_user_id,
    lag(action) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_action,
    lag(qty) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_qty,
    lag(datetime) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_datetime 
    from the_table  
)t
where action = 'end'

因为一些 action = record 行与 startend 行具有相同的日期时间,所以我使用 CASE ORDER BY,要明确的是start在前,然后是record,然后是end

关于SQL选择连续运行数据的最大值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45181216/

相关文章:

mysql - 安装组件时创建mysql函数

postgresql - 通过另一台服务器中继 PostgreSQL 连接

Postgresql 创建数据库

来自别名表的 MySQL Sum

MySQL - 每种类型值的总计

ios - 如何在swift中将sql时间戳转换为日期?

mysql - SQL语句优化: Why the command runs slow?

mysql - 如何选择带有计数的 1 列的多列

oracle - 查询以检查postgres中表的死锁状态

python - 添加一个 "flag"列,说明一个 ID 是否在另一列中具有特定值