sql - TimescaleDB 中的缺口填充 OHLCV(开高低收盘量)

标签 sql postgresql time-series timescaledb ohlc

我在 TimescaleDB 中存储了一些 OHLCV 数据,其中包含某些时间范围内的缺失数据。此数据需要重新采样到不同的时间段(即 1 天)并包含连续的、有序的时间段。

TimescaleDB 提供了功能 time_bucket_gapfill去做这个。我目前的查询是:

SELECT 
    time_bucket_gapfill(
        '1 day', 
        "timestamp",
        '2017-07-25 00:00', 
        '2018-01-01 00:00'
    ) as date,
    FIRST(open, "timestamp") as open,
    MAX(high) as high,
    MIN(low) as low,
    LAST(close, "timestamp") as close,
    SUM(volume) as volume
FROM ohlcv
WHERE "timestamp" > '2017-07-25'
GROUP BY date ORDER BY date ASC LIMIT 10

结果
date                    open        high        low         close       volume
2017-07-25 00:00:00+00                  
2017-07-26 00:00:00+00                  
2017-07-27 00:00:00+00  0.00992     0.010184    0.009679    0.010039    65553.5299999999
2017-07-28 00:00:00+00  0.00999     0.010059    0.009225    0.009248    43049.93
2017-07-29 00:00:00+00  
2017-07-30 00:00:00+00  0.009518    0.0098      0.009286    0.009457    40510.0599999999

...

问题:看起来只有date列已被填空。通过修改SQL语句,是否也可以填补列open , high , low , closevolume这样我们就得到了结果:
date                    open        high        low         close       volume
2017-07-25 00:00:00+00  0           0           0           0           0               
2017-07-26 00:00:00+00  0           0           0           0           0               
2017-07-27 00:00:00+00  0.00992     0.010184    0.009679    0.010039    65553.5299999999
2017-07-28 00:00:00+00  0.00999     0.010059    0.009225    0.009248    43049.93
2017-07-29 00:00:00+00  0.009248    0.009248    0.009248    0.009248    0   
2017-07-30 00:00:00+00  0.009518    0.0098      0.009286    0.009457    40510.0599999999

...

还是建议在收到查询结果后执行这个数据输入,比如在Python/Nodejs中?

如何使用 Python/pandas 完成的示例

更喜欢使用 TimescaleDB 而不是使用我的 Nodejs 应用程序执行此间隙填充/输入,因为...使用 Nodejs 执行此操作会慢得多,而且我不想将 Python 引入应用程序只是为了执行此处理

import pandas as pd

# Building the test dataset simulating missing values after time_bucket
data = [
    (pd.Timestamp('2020-01-01'), None, None, None, None, None),
    (pd.Timestamp('2020-01-02'), 100, 110, 90, 95, 3),
    (pd.Timestamp('2020-01-03'), None, None, None, None, None),
    (pd.Timestamp('2020-01-04'), 98, 150, 100, 100, 4),
]
df = pd.DataFrame(data, columns=['date', 'open' , 'high', 'low', 'close', 'volume']).set_index('date')

#              open   high    low  close  volume
# date                                          
# 2020-01-01    NaN    NaN    NaN    NaN     NaN
# 2020-01-02  100.0  110.0   90.0   95.0     3.0
# 2020-01-03    NaN    NaN    NaN    NaN     NaN
# 2020-01-04   98.0  150.0  100.0  100.0     4.0


# Perform gap filling
df.close = df.close.fillna(method='ffill')
df.volume = df.volume.fillna(0)                 # fill missing volume with 0
df['open'] = df['open'].fillna(df['close'])     # fill missing open by forward-filling close
df['high'] = df['high'].fillna(df['close'])     # fill missing high by forward-filling close
df['low'] = df['low'].fillna(df['close'])       # fill missing low by forward-filling close
df = df.fillna(0)                               # fill missing OHLC with 0 if no previous values available

#               open   high    low  close  volume
# date                                          
# 2020-01-01    0.0    0.0    0.0    0.0     0.0
# 2020-01-02  100.0  110.0   90.0   95.0     3.0
# 2020-01-03   95.0   95.0   95.0   95.0     0.0
# 2020-01-04   98.0  150.0  100.0  100.0     4.0

最佳答案

SELECT "tickerId",
       "ts",
       coalesce("open", "close")  "open",
       coalesce("high", "close")  "high",
       coalesce("low", "close")   "low",
       coalesce("close", "close") "close",
       coalesce("volume", 0)      "volume",
       coalesce("count", 0)       "count"

FROM (
     SELECT "tickerId",
            time_bucket_gapfill('1 hour', at)   "ts",
            first(price, "eId")                 "open",
            MAX(price)                          "high",
            MIN(price)                          "low",
            locf(last(price, "eId"))            "close",
            SUM(volume)                         "volume",
            COUNT(1)                            "count"
     FROM "PublicTrades"
     WHERE at >= date_trunc('day', now() - INTERVAL '1 year')
       AND at < NOW()
     GROUP BY "tickerId", "ts"
     ORDER BY "tickerId", "ts" DESC
     LIMIT 100
 ) AS P
通知:eId是交易所公共(public)交易 ID

关于sql - TimescaleDB 中的缺口填充 OHLCV(开高低收盘量),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60254902/

相关文章:

sql - 如何在 QuestDB 上使用滞后(偏移)窗口函数?

Python Pandas : Resampling Multivariate Time Series with a Groupby

mysql - 如何根据 WHERE 条件计算表中的行数?

mysql select concat(charfield, format(doublefield,8)) 给出错误 1267

postgresql - 在 jOOQ 更新中使用 jsonb_set

sql - 选择 ID 列表的最新条目

sql - 向查询结果中添加 "empty"行

sql - postgres 加入最大日期

java - 使用 jooq/postgresql 从 json 中提取键/值对 - java

python - 如何处理 LSTM 无法学习的情况(不断做出相同的错误预测)