描述
系统:Postgres 13 + TimescaleDB
我有一个消息时间序列,其中包含设备以 300 秒的间隔生成的错误代码。 应聚合该时间序列,以便对同一设备产生的后续错误代码(在几个连续间隔内)进行分组并对间隔进行求和。
源格式
目标格式
进度
我尝试过使用[LAG()/LEAD()](https://www.postgresql.org/docs/13/functions-window.html)
和PARITION BY (code, device)
但我无法使用仅聚合后续行的条件来使其工作:
SELECT ts,
device,
code,
LEAD(ts) OVER (PARTITION BY device, code ORDER BY ts) as next_ts
FROM source_format
数据库 fiddle
架构 (PostgreSQL v13)
CREATE TABLE timeseries (
ts timestamptz,
code bigint,
device varchar
);
INSERT INTO timeseries VALUES ('2023-03-01 12:00:00', 1, 'A');
INSERT INTO timeseries VALUES ('2023-03-01 12:05:00', 1, 'A');
INSERT INTO timeseries VALUES ('2023-03-01 12:10:00', 1, 'A');
INSERT INTO timeseries VALUES ('2023-03-01 12:10:00', 2, 'A');
INSERT INTO timeseries VALUES ('2023-03-01 12:25:00', 1, 'A');
INSERT INTO timeseries VALUES ('2023-03-01 12:30:00', 1, 'A');
INSERT INTO timeseries VALUES ('2023-03-01 12:00:00', 1, 'B');
INSERT INTO timeseries VALUES ('2023-03-01 12:20:00', 1, 'B');
INSERT INTO timeseries VALUES ('2023-03-01 12:20:00', 3, 'B');
INSERT INTO timeseries VALUES ('2023-03-01 12:25:00', 3, 'B');
查询 #1(获取同一设备和代码的下一条消息的 Timediff)
SELECT ts,
device,
code,
LEAD(ts) OVER (PARTITION BY device, code ORDER BY ts) - ts as diff_to_next_ts
FROM timeseries;
期望结果
我如何才能包含条件,然后将后续代码消息的开始和结束“合并”到具有间隔的单行中?有没有更合适的使用方法? pgSQL 函数会更合适吗?
最佳答案
以下查询将生成指定的结果(时区可能有所不同,因为插入的时间不包含时区,但列的类型为 timestamptz
):
WITH labeled_ends AS (
SELECT
lag(ts.ts) OVER (PARTITION BY device,
code ORDER BY ts.ts) = ts.ts - interval '5' minute IS NOT TRUE AS begins_period,
ts.ts,
lead(ts.ts) OVER (PARTITION BY device,
code ORDER BY ts.ts) = ts.ts + interval '5' minute IS NOT TRUE AS ends_period,
ts.device,
ts.code
FROM
timeseries ts
),
periods AS (
SELECT
labeled_ends.ts,
CASE WHEN labeled_ends.ends_period THEN
labeled_ends.ts
ELSE
lead(labeled_ends.ts) OVER (PARTITION BY labeled_ends.device,
labeled_ends.code ORDER BY labeled_ends.ts)
END AS period_end,
labeled_ends.device,
labeled_ends.code,
labeled_ends.begins_period
FROM
labeled_ends
WHERE
labeled_ends.begins_period
OR labeled_ends.ends_period
)
SELECT
tstzrange(periods.ts, periods.period_end, '[]') AS valid_interval,
periods.device,
periods.code
FROM
periods
WHERE
periods.begins_period
ORDER BY
periods.device,
periods.code,
periods.ts;
查询的第一个 CTE,labeled_ends,确定时间序列中的每条消息是否开始或结束设备具有特定代码的时间段。第二个 CTE,周期,每个设备和代码的连续系列中最后一条消息的时间。决赛SELECT
返回开始连续系列的每个周期的范围。
考虑更改封闭范围 tstzrange(periods.ts, periods.period_end, '[]') AS valid_interval
至半开区间tstzrange(periods.ts, periods.period_end + interval '5' minute, '[)') AS valid_interval
。这样做有助于使用范围运算符来确定重叠和邻接,而与时间序列的粒度无关。将名称更改为 valid_period 也可能更具描述性,因为 interval 通常定义为 unanchored 时间长度。
关于sql - 根据时间差将时间序列数据组合成具有开始结束间隔的数据点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75920050/