我有一个带有 created_at
时间戳的事件表。我想将它们分成间隔 N 秒的事件组,具体来说是 130 秒。然后对于每个组,我只需要知道最低时间戳和最高时间戳。
这是一些示例数据(忽略时间戳的格式,它是一个日期时间字段):
------------------------ | id | created_at | ------------------------ | 1 | 2013-1-20-08:00 | | 2 | 2013-1-20-08:01 | | 3 | 2013-1-20-08:05 | | 4 | 2013-1-20-08:07 | | 5 | 2013-1-20-08:09 | | 6 | 2013-1-20-08:12 | | 7 | 2013-1-20-08:20 | ------------------------
And what I would like to get as a result is:
------------------------------------- | started_at | ended_at | ------------------------------------- | 2013-1-20-08:00 | 2013-1-20-08:01 | | 2013-1-20-08:05 | 2013-1-20-08:09 | | 2013-1-20-08:12 | 2013-1-20-08:12 | | 2013-1-20-08:20 | 2013-1-20-08:20 | -------------------------------------
I've googled and searched every possible way of phrasing that question and experimented for some time, but I can't figure it out. I can already do this in Ruby, I'm just trying to figure out if it's possible to move this to the database level. If you're curious or it's easier to visualize, here's what it looks like in Ruby:
groups = SortedSet[*events].divide { |a,b| (a.created_at - b.created_at).abs <= 130 }
groups.map do |group|
{ started_at: group.to_a.first.created_at, ended_at: group.to_a.last.created_at }
end
有谁知道如何在 SQL(特别是 PostgreSQL)中执行此操作?
最佳答案
我认为您希望在与前一个分组的差异大于 130 秒时开始每个新分组。您可以使用滞后和日期算术来确定分组的开始位置。然后进行累加求和得到分组:
select Grouping, min(created_at), max(created_at)
from (select t.*, sum(GroupStartFlag) over (order by created_at) as Grouping
from (select t.*,
lag(created_at) over (order by created_at) as prevca,
(case when extract(epoch from created_at - lag(created_at) over (order by created_at)) < 130
then 0 else 1
end) as GroupStartFlag
from t
) t
) t
group by Grouping;
最后一步是通过“分组”标识符进行聚合,以获得最早和最晚的日期。
关于sql - 如何在 SQL 中按行之间的列差异进行分组?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18199770/