我有如下示例数据,想要获得所需的o/p,请帮我一些想法。
我希望第 3,4 行的 prev_diff_value 的 o/p 为 2015-01-01 00:00:00 而不是 2015-01-02 00:00: 00。
with dat as (
select 1 as id,'20150101 02:02:50'::timestamp as dt union all
select 1,'20150101 03:02:50'::timestamp union all
select 1,'20150101 04:02:50'::timestamp union all
select 1,'20150102 02:02:50'::timestamp union all
select 1,'20150102 02:02:50'::timestamp union all
select 1,'20150102 02:02:51'::timestamp union all
select 1,'20150103 02:02:50'::timestamp union all
select 2,'20150101 02:02:50'::timestamp union all
select 2,'20150101 03:02:50'::timestamp union all
select 2,'20150101 04:02:50'::timestamp union all
select 2,'20150102 02:02:50'::timestamp union all
select 1,'20150104 02:02:50'::timestamp
)-- select * from dat
select id , dt , lag(trunc(dt)) over(partition by id order by dt asc) prev_diff_value
from dat
order by id,dt desc
O/P :
id dt prev_diff_value
1 2015-01-04 02:02:50 2015-01-03 00:00:00
1 2015-01-03 02:02:50 2015-01-02 00:00:00
1 2015-01-02 02:02:51 2015-01-02 00:00:00
1 2015-01-02 02:02:50 2015-01-02 00:00:00
1 2015-01-02 02:02:50 2015-01-01 00:00:00
最佳答案
据我了解,您希望获取 id 分区内每个时间戳的先前不同日期。然后,我将针对 id
和 date
的唯一组合应用 lag
并连接回原始数据集,如下所示:
with dat as (
select 1 as id,'20150101 02:02:50'::timestamp as dt union all
select 1,'20150101 03:02:50'::timestamp union all
select 1,'20150101 04:02:50'::timestamp union all
select 1,'20150102 02:02:50'::timestamp union all
select 1,'20150102 02:02:50'::timestamp union all
select 1,'20150102 02:02:51'::timestamp union all
select 1,'20150103 02:02:50'::timestamp union all
select 2,'20150101 02:02:50'::timestamp union all
select 2,'20150101 03:02:50'::timestamp union all
select 2,'20150101 04:02:50'::timestamp union all
select 2,'20150102 02:02:50'::timestamp union all
select 1,'20150104 02:02:50'::timestamp
)
,dat_unique_lag as (
select *, lag(date) over(partition by id order by date asc) prev_diff_value
from (
select distinct id,trunc(dt) as date
from dat
)
)
select *
from dat
join dat_unique_lag
using (id)
where trunc(dat.dt)=dat_unique_lag.date
order by id,dt desc;
但是,这并不是 super 性能。如果您的数据的性质是同一天的时间戳数量有限,您可以使用如下条件语句来延长滞后时间:
with dat as (
select 1 as id,'20150101 02:02:50'::timestamp as dt union all
select 1,'20150101 03:02:50'::timestamp union all
select 1,'20150101 04:02:50'::timestamp union all
select 1,'20150102 02:02:50'::timestamp union all
select 1,'20150102 02:02:50'::timestamp union all
select 1,'20150102 02:02:51'::timestamp union all
select 1,'20150103 02:02:50'::timestamp union all
select 2,'20150101 02:02:50'::timestamp union all
select 2,'20150101 03:02:50'::timestamp union all
select 2,'20150101 04:02:50'::timestamp union all
select 2,'20150102 02:02:50'::timestamp union all
select 1,'20150104 02:02:50'::timestamp
)
select id, dt,
case
when lag(trunc(dt)) over(partition by id order by dt asc)=trunc(dt)
then case
when lag(trunc(dt),2) over(partition by id order by dt asc)=trunc(dt)
then case
when lag(trunc(dt),3) over(partition by id order by dt asc)=trunc(dt)
then lag(trunc(dt),4) over(partition by id order by dt asc)
else lag(trunc(dt),3) over(partition by id order by dt asc)
end
else lag(trunc(dt),2) over(partition by id order by dt asc)
end
else lag(trunc(dt)) over(partition by id order by dt asc)
end as prev_diff_value
from dat
order by id,dt desc;
基本上,您查看上一条记录,如果它不适合您,那么您会返回到该记录之前的记录,依此类推,直到找到正确的记录或超出语句深度。在这里,它会查找直到返回第四条记录。
关于sql - 滞后函数获取最后一个不同的值(redshift),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44645751/