我有表存储用户发生的事件,如 http://sqlfiddle.com/#!15/2b559/2/0 所示。
event_id(integer)
user_id(integer)
event_type(integer)
timestamp(timestamp)
数据样本如下所示:
+-----------+----------+-------------+----------------------------+
| event_id | user_id | event_type | timestamp |
+-----------+----------+-------------+----------------------------+
| 1 | 1 | 1 | January, 01 2015 00:00:00 |
| 2 | 1 | 1 | January, 10 2015 00:00:00 |
| 3 | 1 | 1 | January, 20 2015 00:00:00 |
| 4 | 1 | 1 | January, 30 2015 00:00:00 |
| 5 | 1 | 1 | February, 10 2015 00:00:00 |
| 6 | 1 | 1 | February, 21 2015 00:00:00 |
| 7 | 1 | 1 | February, 22 2015 00:00:00 |
+-----------+----------+-------------+----------------------------+
对于每个事件,我想获取事件发生前 30 天内发生的同一用户和同一事件类型的事件数。
它应该如下所示:
+-----------+----------+-------------+-----------------------------+-------+
| event_id | user_id | event_type | timestamp | count |
+-----------+----------+-------------+-----------------------------+-------+
| 1 | 1 | 1 | January, 01 2015 00:00:00 | 1 |
| 2 | 1 | 1 | January, 10 2015 00:00:00 | 2 |
| 3 | 1 | 1 | January, 20 2015 00:00:00 | 3 |
| 4 | 1 | 1 | January, 30 2015 00:00:00 | 4 |
| 5 | 1 | 1 | February, 10 2015 00:00:00 | 3 |
| 6 | 1 | 1 | February, 21 2015 00:00:00 | 3 |
| 7 | 1 | 1 | February, 22 2015 00:00:00 | 4 |
+-----------+----------+-------------+-----------------------------+-------+
该表包含数百万行,因此我无法按照@jpw 在下面的答案中所建议的那样使用相关子查询。
到目前为止,我通过使用以下查询设法获得了之前发生的具有相同 user_id 和相同 event_id 的事件总数:
SELECT event_id, user_id,event_type,"timestamp",
COUNT(event_type) OVER w
FROM events
WINDOW w AS (PARTITION BY user_id,event_type ORDER BY timestamp
ROWS UNBOUNDED PRECEDING);
结果如下:
+-----------+----------+-------------+-----------------------------+-------+
| event_id | user_id | event_type | timestamp | count |
+-----------+----------+-------------+-----------------------------+-------+
| 1 | 1 | 1 | January, 01 2015 00:00:00 | 1 |
| 2 | 1 | 1 | January, 10 2015 00:00:00 | 2 |
| 3 | 1 | 1 | January, 20 2015 00:00:00 | 3 |
| 4 | 1 | 1 | January, 30 2015 00:00:00 | 4 |
| 5 | 1 | 1 | February, 10 2015 00:00:00 | 5 |
| 6 | 1 | 1 | February, 21 2015 00:00:00 | 6 |
| 7 | 1 | 1 | February, 22 2015 00:00:00 | 7 |
+-----------+----------+-------------+-----------------------------+-------+
您知道是否有办法更改窗口框架规范或 COUNT 函数,以便仅返回 x 天内发生的事件数?
第二次,我想排除重复事件,即相同的事件类型和相同的时间戳。
最佳答案
我在 duplicate question on dba.SE 下提供了更详细的答案和 fiddle .
基本上:
CREATE INDEX events_fast_idx ON events (user_id, event_type, ts);
或者:
SELECT *
FROM events e
, LATERAL (
SELECT count(*) AS ct
FROM events
WHERE user_id = e.user_id
AND event_type = e.event_type
AND ts >= e.ts - interval '30 days'
AND ts <= e.ts
) ct
ORDER BY event_id;
或者:
SELECT e.*, count(*) AS ct
FROM events e
JOIN events x USING (user_id, event_type)
WHERE x.ts >= e.ts - interval '30 days'
AND x.ts <= e.ts
GROUP BY e.event_id
ORDER BY e.event_id;
关于sql - 使用窗口函数计算每个事件行在给定间隔内事件的先前发生次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29452766/