我在 TimescaleDB 中存储了一些 OHLCV 数据,其中包含某些时间范围内的缺失数据。此数据需要重新采样到不同的时间段(即 1 天)并包含连续的、有序的时间段。
TimescaleDB 提供了功能 time_bucket_gapfill
去做这个。我目前的查询是:
SELECT
time_bucket_gapfill(
'1 day',
"timestamp",
'2017-07-25 00:00',
'2018-01-01 00:00'
) as date,
FIRST(open, "timestamp") as open,
MAX(high) as high,
MIN(low) as low,
LAST(close, "timestamp") as close,
SUM(volume) as volume
FROM ohlcv
WHERE "timestamp" > '2017-07-25'
GROUP BY date ORDER BY date ASC LIMIT 10
结果
date open high low close volume
2017-07-25 00:00:00+00
2017-07-26 00:00:00+00
2017-07-27 00:00:00+00 0.00992 0.010184 0.009679 0.010039 65553.5299999999
2017-07-28 00:00:00+00 0.00999 0.010059 0.009225 0.009248 43049.93
2017-07-29 00:00:00+00
2017-07-30 00:00:00+00 0.009518 0.0098 0.009286 0.009457 40510.0599999999
...
问题:看起来只有
date
列已被填空。通过修改SQL语句,是否也可以填补列open
, high
, low
, close
和 volume
这样我们就得到了结果:date open high low close volume
2017-07-25 00:00:00+00 0 0 0 0 0
2017-07-26 00:00:00+00 0 0 0 0 0
2017-07-27 00:00:00+00 0.00992 0.010184 0.009679 0.010039 65553.5299999999
2017-07-28 00:00:00+00 0.00999 0.010059 0.009225 0.009248 43049.93
2017-07-29 00:00:00+00 0.009248 0.009248 0.009248 0.009248 0
2017-07-30 00:00:00+00 0.009518 0.0098 0.009286 0.009457 40510.0599999999
...
还是建议在收到查询结果后执行这个数据输入,比如在Python/Nodejs中?
如何使用 Python/pandas 完成的示例
更喜欢使用 TimescaleDB 而不是使用我的 Nodejs 应用程序执行此间隙填充/输入,因为...使用 Nodejs 执行此操作会慢得多,而且我不想将 Python 引入应用程序只是为了执行此处理
import pandas as pd
# Building the test dataset simulating missing values after time_bucket
data = [
(pd.Timestamp('2020-01-01'), None, None, None, None, None),
(pd.Timestamp('2020-01-02'), 100, 110, 90, 95, 3),
(pd.Timestamp('2020-01-03'), None, None, None, None, None),
(pd.Timestamp('2020-01-04'), 98, 150, 100, 100, 4),
]
df = pd.DataFrame(data, columns=['date', 'open' , 'high', 'low', 'close', 'volume']).set_index('date')
# open high low close volume
# date
# 2020-01-01 NaN NaN NaN NaN NaN
# 2020-01-02 100.0 110.0 90.0 95.0 3.0
# 2020-01-03 NaN NaN NaN NaN NaN
# 2020-01-04 98.0 150.0 100.0 100.0 4.0
# Perform gap filling
df.close = df.close.fillna(method='ffill')
df.volume = df.volume.fillna(0) # fill missing volume with 0
df['open'] = df['open'].fillna(df['close']) # fill missing open by forward-filling close
df['high'] = df['high'].fillna(df['close']) # fill missing high by forward-filling close
df['low'] = df['low'].fillna(df['close']) # fill missing low by forward-filling close
df = df.fillna(0) # fill missing OHLC with 0 if no previous values available
# open high low close volume
# date
# 2020-01-01 0.0 0.0 0.0 0.0 0.0
# 2020-01-02 100.0 110.0 90.0 95.0 3.0
# 2020-01-03 95.0 95.0 95.0 95.0 0.0
# 2020-01-04 98.0 150.0 100.0 100.0 4.0
最佳答案
SELECT "tickerId",
"ts",
coalesce("open", "close") "open",
coalesce("high", "close") "high",
coalesce("low", "close") "low",
coalesce("close", "close") "close",
coalesce("volume", 0) "volume",
coalesce("count", 0) "count"
FROM (
SELECT "tickerId",
time_bucket_gapfill('1 hour', at) "ts",
first(price, "eId") "open",
MAX(price) "high",
MIN(price) "low",
locf(last(price, "eId")) "close",
SUM(volume) "volume",
COUNT(1) "count"
FROM "PublicTrades"
WHERE at >= date_trunc('day', now() - INTERVAL '1 year')
AND at < NOW()
GROUP BY "tickerId", "ts"
ORDER BY "tickerId", "ts" DESC
LIMIT 100
) AS P
通知:eId
是交易所公共(public)交易 ID
关于sql - TimescaleDB 中的缺口填充 OHLCV(开高低收盘量),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60254902/