python - 根据缺失时间范围自动填充数据库中缺失的行

我在 Postgresql 数据库中有一个表，它存储日期时间以及一些如下所示的整数:

      dt                total                                                   
--------------------------------                                        
2019-07-01 10:00:00     150                                      
2019-07-01 10:15:00     153                                      
2019-07-01 10:30:00     155                                      
2019-07-01 10:45:00     160                                      
2019-07-01 11:00:00     161                                   
....

如您所见，dt 列中的日期时间将连续在 15 分钟范围内。我的问题是，有时传入数据可能会丢失某些行。

例如:

     dt                total                                                   
--------------------------------                                        
2019-07-01 10:00:00     150                                      
2019-07-01 10:15:00     153                                      
2019-07-01 10:30:00     155                                      
2019-07-01 10:45:00     160                                      
2019-07-01 11:00:00     161
2019-07-01 11:15:00     163
2019-07-01 12:00:00     170

在此示例中，有 n=2 缺失行，即时间 11:30 和 11:45 的行。我在这里想做的是自动填充这些行的日期时间，并使用缺失行之前的最后一行(11:15)和缺失行之后的第一行(12:00)的总列的平均值作为每个缺失的总计行

对于此示例，每个缺失行的总列将加上 (170-163)/(n+1) = 7/3 = 2.333(这里使用 3 位小数)，因此结果将如下所示:

     dt                total                                                   
--------------------------------                                        
2019-07-01 10:00:00     150                                      
2019-07-01 10:15:00     153                                      
2019-07-01 10:30:00     155                                      
2019-07-01 10:45:00     160                                      
2019-07-01 11:00:00     161
2019-07-01 11:15:00     163
2019-07-01 11:30:00     165.333
2019-07-01 11:45:00     167.666
2019-07-01 12:00:00     170

我认为不能直接用SQL来完成。所以，我认为Python可能有助于解决这个目的。有什么想法吗？

最佳答案

您可以使用generate_series()和一些数学。以下假设 total 正在增加(如示例数据中所示):

select d.dt, seqnum,
       coalesce(t.total,
                (max(t.total) over (order by d.dt asc) +
                 (min(t.total) over (order by d.dt desc) - 
                  max(t.total) over (order by d.dt asc)
                 ) *
                 (seqnum - max(seqnum) filter (where t.total is not null) over (order by d.dt asc)) /
                  nullif(min(seqnum) filter (where t.total is not null) over (order by d.dt desc) -
                         max(seqnum) filter (where t.total is not null) over (order by d.dt asc),
                         0
                        )
                 )
                )
from (select dt, count(*) over (order by dt) as seqnum
      from (select generate_series(min(dt), max(dt), interval '15 minute') as dt
            from t
            ) d
     ) d left join
     t
     on t.dt = d.dt;

Here是一个数据库<> fiddle 。

计算很麻烦，因为您需要进行加权平均才能获得中间值。公式为:

prev_value + (next_value - previous_value) * ratio

比例为:

(current_time - prev_time) / (next_time - prev_time)

但它使用的是顺序计数，而不是时间。

关于python - 根据缺失时间范围自动填充数据库中缺失的行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57282042/

python - 根据缺失时间范围自动填充数据库中缺失的行

上一篇：sql - 更新 SELECT 输出的行的列

下一篇：python - 在 SQLAlchemy 中使用 PostgresQL INTERVAL，其中持续时间动态存储在数据库中并且不是参数