我有一张包含 stop_id
、sched_time
和 act_time
的表格,我希望在实际时间中填写空白有(使用线性插值)基于预定时间(因此保留停止之间的相对时间)。所以我想从这样的事情开始:
stop_id | sched_time | act_time | actual
------------------------------------------------
001 | 13:47:00 | 13:45:00 | TRUE
002 | 13:50:00 | null | FALSE
003 | 13:52:00 | 13:53:00 | TRUE
004 | 13:59:00 | null | FALSE
005 | 14:01:00 | null | FALSE
006 | 14:04:00 | 14:04:00 | TRUE
像这样:
stop_id | sched_time | act_time
-------------------------------------
001 | 13:47:00 | 13:45:00
002 | 13:50:00 | 13:49:48
003 | 13:52:00 | 13:53:00
004 | 13:59:00 | 13:59:25
005 | 14:01:00 | 14:01:15
006 | 14:04:00 | 14:04:00
如果要让插值尊重停止之间的原始时间要求太多,act_time
列上的简单线性插值将是一个很好的起点,因为没有太多停止之间的时间差的可变性。
提前致谢!
注意:第一个 act_time
可以在第一个 sched_time
并且可能有多个连续行没有实际时间。
最佳答案
这是一种“第三好的”解决方案,因为一旦您有了实际时间,它就会跟踪您比计划提前或落后多少,并将其应用于最近的没有实际时间的计划时间:
with q1 as (
select
t.stop_id, sched_time, act_time,
nvl2(act_time, t.sched_time - t.act_time, null) ahead,
sum (nvl2(act_time, 1, 0)) over
(partition by 1 order by stop_id) as actual_count
from schedule t
)
select
stop_id, sched_time,
act_time,
nvl (act_time, sched_time - min (ahead) over
(partition by actual_count)) as act_time2
from q1
结果与您所追求的不完全匹配,但它可能是您可以构建的东西:
STOP_ID SCHED_TIME ACT_TIME ACT_TIME2
001 13:47 13:45 13:45
002 13:50 13:48
003 13:52 13:53 13:53
004 13:59 14:00
005 14:01 14:02
006 14:04 14:04 14:04
-- 7/24/14 编辑 --
假设您的时间已按照您的建议转换为整数 (30s = 1),我试了一下。这是一个可怕的解决方案,但我认为它会按照您的建议进行。我不确定它是否比您的程序循环更快。我很好奇是不是。 Oracle 的分析功能很棒,但您可以看到我确实使用了它们来完成我认为您描述的事情:
with q1 as (
select
t.stop_id, t.sched_time, t.act_time,
sum (nvl2(act_time, 1, 0)) over
(partition by 1 order by stop_id) as group_id,
lead (sched_time) over (order by stop_id) as next_sched
from schedule2 t
), q2 as (
select
stop_id, sched_time, act_time, group_id, next_sched,
next_sched - sched_time as elapsed,
row_number() over (partition by group_id order by stop_id) as stops,
min (act_time) over (partition by group_id) as min_time,
min (sched_time) over (partition by group_id) as min_sched
from q1
), q3 as (
select
stop_id, sched_time, act_time, group_id, stops, min_time,
min_sched, next_sched,
sum (elapsed) over (partition by group_id order by stop_id) as elapsed,
max (stops) over (partition by group_id) as grp_stops,
lead (min_time, 1) over (order by stop_id) as next_grp_actual,
lead (min_sched, 1) over (order by stop_id) as next_grp_sched
from q2
), q4 as (
select
stop_id, sched_time, act_time, stops, grp_stops,
min_time, lag (elapsed, 1, 0) over
(partition by group_id order by stop_id) as elapsed,
max (next_grp_sched) over (partition by group_id) - min_sched
as time_btw_sched,
max (next_grp_actual) over (partition by group_id) - min_time
as time_btw_actuals
from q3
)
select
stop_id, sched_time, act_time,
nvl (act_time, min_time + (elapsed / time_btw_sched) *
time_btw_actuals) as act_time2
from q4
这是我从你的样本中得到的结果:
id sched actual actual (calc)
001 1654 1650 1650
002 1660 1659.6
003 1664 1666 1666
004 1678 1678.83333333333
005 1682 1682.5
006 1688 1688 1688
我认为这可以在编程语言包装器中做得更干净(也更有效)。我只精通 C# 和 Perl,但他们都可以做得很好
关于sql - 在 SQL Developer/Oracle 10g 中插入缺失值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24913264/