给出下表:
CREATE TABLE table
(
"id" serial NOT NULL,
"timestamp" timestamp without time zone NOT NULL,
"count" integer NOT NULL DEFAULT 0
)
我正在搜索“罕见事件”。罕见事件是拥有以下属性的行:
- 简单:
count = 1
- 困难:10 分钟时间跨度内的所有行(在当前行的时间戳之前和之后)都有
count = 0
(当然给定的行除外)。
例子:
id timestamp count
0 08:00 0
1 08:11 0
2 08:15 2 <== not rare event (count!=1)
3 08:19 0
4 08:24 0
5 08:25 0
6 08:29 1 <== not rare event (see 8:35)
7 08:31 0
8 08:35 1
9 08:40 0
10 08:46 1 <== rare event!
10 08:48 0
10 08:51 0
10 08:55 0
10 08:58 1 <== rare event!
10 09:02 0
10 09:09 1
现在,我有以下 PL/pgSQL 函数:
SELECT curr.*
FROM gm_inductionloopdata curr
WHERE curr.count = 1
AND (
SELECT SUM(count)
FROM gm_inductionloopdata
WHERE timestamp BETWEEN curr.timestamp + '10 minutes'::INTERVAL
AND curr.timestamp - '10 minutes'::INTERVAL
)<2
太慢了。 :-(
关于如何提高性能有什么建议吗?我在这里处理 > 1 mio 行,可能需要定期查找那些“罕见事件”。
最佳答案
我认为这是使用 lead and lag window functions 的好案例- 此查询过滤计数 = 1 的所有记录,然后获取上一行和下一行以查看它是否接近 10 分钟:
with cte as (
select
"id", "timestamp", "count",
lag("timestamp") over(w) + '10 minutes'::interval as "lag_timestamp",
lead("timestamp") over(w) - '10 minutes'::interval as "lead_timestamp"
from gm_inductionloopdata as curr
where curr."count" <> 0
window w as (order by "timestamp")
)
select "id", "timestamp"
from cte
where
"count" = 1 and
("lag_timestamp" is null or "lag_timestamp" < "timestamp") and
("lead_timestamp" is null or "lead_timestamp" > "timestamp")
或者您可以试试这个,并确保您在表的 timestamp
列上有索引:
select *
from gm_inductionloopdata as curr
where
curr."count" = 1 and
not exists (
select *
from gm_inductionloopdata as g
where
-- you can change this to between, I've used this just for readability
g."timestamp" <= curr."timestamp" + '10 minutes'::interval and
g."timestamp" >= curr."timestamp" - '10 minutes'::interval and
g."id" <> curr."id" and
g."count" = 1
);
顺便说一句,请不要将您的列称为 "count"
、"timestamp"
或其他关键字、函数名称和类型名称。
关于sql - 在时间戳的流动窗口中查找罕见事件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18593903/