sql - 在时间戳的流动窗口中查找罕见事件

给出下表:

CREATE TABLE table
(
 "id" serial NOT NULL,
 "timestamp" timestamp without time zone NOT NULL,
 "count" integer NOT NULL DEFAULT 0
)

我正在搜索“罕见事件”。罕见事件是拥有以下属性的行:

简单:count = 1
困难:10 分钟时间跨度内的所有行(在当前行的时间戳之前和之后)都有 count = 0(当然给定的行除外)。

例子:

id   timestamp  count
0    08:00      0    
1    08:11      0    
2    08:15      2     <== not rare event (count!=1)   
3    08:19      0    
4    08:24      0    
5    08:25      0   
6    08:29      1     <== not rare event (see 8:35)
7    08:31      0    
8    08:35      1    
9    08:40      0    
10   08:46      1     <== rare event!  
10   08:48      0   
10   08:51      0   
10   08:55      0   
10   08:58      1     <== rare event!  
10   09:02      0   
10   09:09      1

现在，我有以下 PL/pgSQL 函数:

SELECT curr.* 
    FROM gm_inductionloopdata curr
    WHERE curr.count = 1
    AND (
      SELECT SUM(count)
      FROM gm_inductionloopdata
      WHERE timestamp BETWEEN curr.timestamp + '10 minutes'::INTERVAL
      AND curr.timestamp - '10 minutes'::INTERVAL
    )<2

太慢了。 :-(

关于如何提高性能有什么建议吗？我在这里处理 > 1 mio 行，可能需要定期查找那些“罕见事件”。

最佳答案

我认为这是使用 lead and lag window functions 的好案例- 此查询过滤计数 = 1 的所有记录，然后获取上一行和下一行以查看它是否接近 10 分钟:

with cte as (
  select
      "id", "timestamp", "count",
      lag("timestamp") over(w) + '10 minutes'::interval as "lag_timestamp",
      lead("timestamp") over(w) - '10 minutes'::interval as "lead_timestamp"
  from gm_inductionloopdata as curr
  where curr."count" <> 0
  window w as (order by "timestamp")
)
select "id", "timestamp"
from cte
where
    "count" = 1 and
    ("lag_timestamp" is null or "lag_timestamp" < "timestamp") and
    ("lead_timestamp" is null or "lead_timestamp" > "timestamp")

sql fiddle demo

或者您可以试试这个，并确保您在表的 timestamp 列上有索引:

select *
from gm_inductionloopdata as curr
where
    curr."count" = 1 and
    not exists (
        select *
        from gm_inductionloopdata as g
        where 
           -- you can change this to between, I've used this just for readability
           g."timestamp" <= curr."timestamp" + '10 minutes'::interval and
           g."timestamp" >= curr."timestamp" - '10 minutes'::interval and
           g."id" <> curr."id" and
           g."count" = 1
    );

sql fiddle demo

顺便说一句，请不要将您的列称为 "count"、"timestamp" 或其他关键字、函数名称和类型名称。

关于sql - 在时间戳的流动窗口中查找罕见事件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18593903/

sql - 在时间戳的流动窗口中查找罕见事件

上一篇：postgresql - 没有可用的隐式 session Scala Slick

下一篇：PostgreSQL - 确定更新了哪些列