我的任务是确定不同帐户上三个事件的事实是在 1 小时窗口内。
解决方案可能是这样的count(distinct account_id) over (order by time_key range between 20 PRECEDING and CURRENT ROW)
并检查 count() > 3
但是 Oracle 不能在 order by 子句中使用不同的函数:
ORA-30487: 此处不允许 ORDER BY
我有下面的解决方案,但似乎很难
with t_data as (
select 1 as account_id, 1000 as time_key from dual union
select 1 as account_id, 1010 as time_key from dual union
select 1 as account_id, 1020 as time_key from dual union
select 1 as account_id, 1030 as time_key from dual union
select 2 as account_id, 1040 as time_key from dual union
select 3 as account_id, 1050 as time_key from dual union
select 3 as account_id, 1060 as time_key from dual union
select 3 as account_id, 1070 as time_key from dual union
select 3 as account_id, 1080 as time_key from dual union
select 3 as account_id, 1090 as time_key from dual
order by time_key
)
select *
from (
select account_id,
time_key,
max(
case
when account_id = 1 then 1
else 0
end
) over (order by time_key range between 20 PRECEDING and CURRENT ROW) as m1,
max(
case
when account_id = 2 then 1
else 0
end
) over (order by time_key range between 20 PRECEDING and CURRENT ROW) as m2,
max(
case
when account_id = 3 then 1
else 0
end
) over (order by time_key range between 20 PRECEDING and CURRENT ROW) as m3
from t_data
)
where m1 = 1 and m2 = 1 and m3 = 1
确定滑动窗口中不同事件数量的更简单方法是什么?
最佳答案
你如何用窗口函数做到这一点对我来说并不是很明显。您可以使用相关子查询:
select t.*,
(select count(distinct t2.account_id)
from t_data t2
where t2.time_key >= t.time_key - 20 and t2.time_key <= t.time_key
)
from t_data t;
另一种可能具有更好性能的方法是将问题视为间隙和孤岛问题。以下版本返回每个时间键的同时不同帐户的数量:
with t as (
select account_id, min(time_key) as min_time_key, max(time_key + 20) as max_time_key
from (select t.*, sum(case when time_key - prev_time_key <= 20 then 0 else 1 end) over (order by time_key) as grp
from (select t.*, lag(time_key) over (partition by account_id order by time_key) as prev_time_key
from t_data t
) t
) t
group by account_id
)
select td.account_id, td.time_key, count(distinct t.account_id) as num_distinct
from t_data td join
t
on td.time_key between t.min_time_key and t.max_time_key
group by td.account_id, td.time_key;
最后,如果您只想找到 3 个(或 2 个)帐户 ID,并且您只关心获取达到最大值的一些示例,那么您可以执行以下操作:
select t.*
from (select t.*,
min(account_id) over (order by time_key range between 20 preceding and 1 preceding) as min_account_id,
max(account_id) over (order by time_key range between 20 preceding and 1 preceding) as max_account_id
from t_data t
) t
where min_account_id <> max_account_id and
account_id <> min_account_id and
account_id <> max_account_id;
这将从前 20 行中获取最大和最小帐户 ID——不包括当前行。如果这些与当前值不同,那么您将拥有三个不同的值。
关于sql - 如何使用 Oracle 确定滑动窗口中不同事件的数量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51886674/