sql - 如何计算PostgreSQL中以前事件的数量和时间差？

我是 SQL 的新手，我正在通过查询与 PostgreSQL 数据库进行通信。我有以下问题:我的(简化的)数据 table 看起来像这样。

 DROP TABLE table;
    CREATE TABLE table(
       id   INTEGER  NOT NULL PRIMARY KEY 
      ,date DATE  NOT NULL
      ,key  BIT  NOT NULL
    );
    INSERT INTO table(id,date,key) VALUES (1,'18/02/05',0);
    INSERT INTO table(id,date,key) VALUES (1,'20/02/05',1);
    INSERT INTO table(id,date,key) VALUES (1,'21/02/05',0);
    INSERT INTO table(id,date,key) VALUES (1,'10/04/06',0);
    INSERT INTO table(id,date,key) VALUES (2,'09/05/08',0);
    INSERT INTO table(id,date,key) VALUES (2,'17/06/08',1);
    INSERT INTO table(id,date,key) VALUES (2,'22/06/08',1);
    INSERT INTO table(id,date,key) VALUES (2,'23/06/08',1);

+----+------------+-----+
| id |    date    | key |
+----+------------+-----+
|  1 | 2005-02-18 |   0 |
|  1 | 2005-02-20 |   1 |
|  1 | 2005-02-21 |   0 |
|  1 | 2006-04-10 |   0 |
|  2 | 2008-05-09 |   0 |
|  2 | 2008-06-17 |   1 |
|  2 | 2008-06-22 |   1 |
|  2 | 2008-06-23 |   1 |
+----+------------+-----+

Where id identifies different groups in my data, date (formatted as date column) indicates the date a particular event occurred and key identifies important events in my data set. Now, I need to conduct the following tasks for each group of observations.

A) Count the number of past key events in a particular time window for each date entry (let's say 7 days for the moment), in other words: for every date entry: How many times did a key event occur in the last 7 days (count key=1 for date-7 days) Comment: this is how it looks like in stata
B) Calculate the time difference in days between each event and the most recent key event, (date - last(date where key=1) =x. (ANSWERED, check out Gordon's post) The final result should look like this:

+----+------------+-----+--------+-----------+
| id |    date    | key | number | time_diff |
+----+------------+-----+--------+-----------+
|  1 | 2005-02-18 |   0 |      0 | NA        |
|  1 | 2005-02-20 |   1 |      0 | 0         |
|  1 | 2005-02-21 |   0 |      1 | 1         |
|  1 | 2006-04-10 |   0 |      0 | 413       |
|  2 | 2008-05-09 |   0 |      0 | NA        |
|  2 | 2008-06-17 |   1 |      0 | 0         |
|  2 | 2008-06-22 |   1 |      1 | 5         |
|  2 | 2008-06-23 |   1 |      2 | 1         |
+----+------------+-----+--------+-----------+

All events that occurred before the first key event in a particular group should be tagged as NULL or NA.

I tried to solve B with help of this blog but I am using Postgresql 9.3. and the FILTER clause is a feature of v9.4 if I am not mistaken.

My idea was to try the following:

         dataset <- dbGetQuery(channel, "SELECT t1.*, t1.date -
                               (
                                    SELECT MIN(t2.date)
                                    FROM table t2 
                                    WHERE t1.id = t2.id AND t2.key==1 
                                    AND t1.date-t2.date <= 7 AND t1.date-t2.date >= 0
                               ) AS time_diff FROM table t1 ORDER BY t1.id, t1.date")

但是结果不是很令人满意，如果我的时间窗口中有不止一个关键事件。我假设我需要使用窗口函数，将我的关键事件指定为 FIRST_VALUE 或设置某种时间间隔，但我不清楚如何实现它才能达到预期的结果。如您所见，我正在使用 R 将查询发送到数据库。

感谢任何帮助。如果您需要更多信息，请告诉我，尤其是因为这是我关于 SO 的第一个问题。

最佳答案

你的问题似乎是围绕“B”而不是“A”。

您可以使用窗口函数执行“B”，但它涉及日期的有条件前向最大扫描，而不是 lag():

select t.*,
       (date -
        max(case when key = 1 then date end) over (partition by id order by date)
       ) as time_diff
from t;

关于sql - 如何计算PostgreSQL中以前事件的数量和时间差？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35630413/

sql - 如何计算PostgreSQL中以前事件的数量和时间差？

上一篇：postgresql - Redshift COPY 语句加载日期格式，带两位数年份 (mm/dd/yy)

下一篇：postgresql - 在有冲突的值中插入数组