sql - 使用窗口函数计算每个事件行在给定间隔内事件的先前发生次数

标签 sql postgresql aggregate-functions window-functions postgresql-performance

我有表存储用户发生的事件,如 http://sqlfiddle.com/#!15/2b559/2/0 所示。

event_id(integer)
user_id(integer)
event_type(integer)
timestamp(timestamp)

数据样本如下所示:

+-----------+----------+-------------+----------------------------+
| event_id  | user_id  | event_type  |         timestamp          |
+-----------+----------+-------------+----------------------------+
|        1  |       1  |          1  | January, 01 2015 00:00:00  |
|        2  |       1  |          1  | January, 10 2015 00:00:00  |
|        3  |       1  |          1  | January, 20 2015 00:00:00  |
|        4  |       1  |          1  | January, 30 2015 00:00:00  |
|        5  |       1  |          1  | February, 10 2015 00:00:00 |
|        6  |       1  |          1  | February, 21 2015 00:00:00 |
|        7  |       1  |          1  | February, 22 2015 00:00:00 |
+-----------+----------+-------------+----------------------------+

对于每个事件,我想获取事件发生前 30 天内发生的同一用户和同一事件类型的事件数。

它应该如下所示:

+-----------+----------+-------------+-----------------------------+-------+
| event_id  | user_id  | event_type  |         timestamp           | count |
+-----------+----------+-------------+-----------------------------+-------+
|        1  |       1  |          1  | January, 01 2015 00:00:00   |     1 |
|        2  |       1  |          1  | January, 10 2015 00:00:00   |     2 |
|        3  |       1  |          1  | January, 20 2015 00:00:00   |     3 |
|        4  |       1  |          1  | January, 30 2015 00:00:00   |     4 |
|        5  |       1  |          1  | February, 10 2015 00:00:00  |     3 |
|        6  |       1  |          1  | February, 21 2015 00:00:00  |     3 |
|        7  |       1  |          1  | February, 22 2015 00:00:00  |     4 |
+-----------+----------+-------------+-----------------------------+-------+

该表包含数百万行,因此我无法按照@jpw 在下面的答案中所建议的那样使用相关子查询。

到目前为止,我通过使用以下查询设法获得了之前发生的具有相同 user_id 和相同 event_id 的事件总数:

SELECT event_id, user_id,event_type,"timestamp",
COUNT(event_type) OVER w
FROM events
WINDOW w AS (PARTITION BY user_id,event_type ORDER BY timestamp
ROWS UNBOUNDED PRECEDING);

结果如下:

+-----------+----------+-------------+-----------------------------+-------+
| event_id  | user_id  | event_type  |         timestamp           | count |
+-----------+----------+-------------+-----------------------------+-------+
|        1  |       1  |          1  | January, 01 2015 00:00:00   |     1 |
|        2  |       1  |          1  | January, 10 2015 00:00:00   |     2 |
|        3  |       1  |          1  | January, 20 2015 00:00:00   |     3 |
|        4  |       1  |          1  | January, 30 2015 00:00:00   |     4 |
|        5  |       1  |          1  | February, 10 2015 00:00:00  |     5 |
|        6  |       1  |          1  | February, 21 2015 00:00:00  |     6 |
|        7  |       1  |          1  | February, 22 2015 00:00:00  |     7 |
+-----------+----------+-------------+-----------------------------+-------+

您知道是否有办法更改窗口框架规范或 COUNT 函数,以便仅返回 x 天内发生的事件数?

第二次,我想排除重复事件,即相同的事件类型和相同的时间戳。

最佳答案

我在 duplicate question on dba.SE 下提供了更详细的答案和 fiddle .

基本上:

CREATE INDEX events_fast_idx ON events (user_id, event_type, ts);

或者:

SELECT *
FROM   events e
    ,  LATERAL (
   SELECT count(*) AS ct
   FROM   events 
   WHERE  user_id    = e.user_id 
   AND    event_type = e.event_type
   AND    ts >= e.ts - interval '30 days'
   AND    ts <= e.ts
   ) ct
ORDER  BY event_id;

或者:

SELECT e.*, count(*) AS ct
FROM   events e
JOIN   events x USING (user_id, event_type)
WHERE  x.ts >= e.ts - interval '30 days'
AND    x.ts <= e.ts
GROUP  BY e.event_id
ORDER  BY e.event_id;

关于sql - 使用窗口函数计算每个事件行在给定间隔内事件的先前发生次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29452766/

相关文章:

SQL Server 2012 - 没有 Order 子句的分页

sql - 如何根据某些条件从不同区域开始在 SQL Server 中生成两个唯一 ID?

javascript - 未解析时为 undefined json,解析时为 undefined token

sql - 在多个连接表上聚合函数

c# - SQL 垂直显示表

MySQL 查询或存储过程递归调用自身并返回所选父节点的所有节点

postgresql - 在 Postgresql 上使用 group by 计数

sql - 条件左连接最大日期和第二个表中的 where 子句

mysql - 梦工厂: How to use MySQL aggregate functions in queries

sql - 查找列中具有重复值的行