sql - 如何计算PostgreSQL中以前事件的数量和时间差?

标签 sql postgresql select count subquery

我是 SQL 的新手,我正在通过查询与 PostgreSQL 数据库进行通信。我有以下问题:我的(简化的)数据 table 看起来像这样。

 DROP TABLE table;
    CREATE TABLE table(
       id   INTEGER  NOT NULL PRIMARY KEY 
      ,date DATE  NOT NULL
      ,key  BIT  NOT NULL
    );
    INSERT INTO table(id,date,key) VALUES (1,'18/02/05',0);
    INSERT INTO table(id,date,key) VALUES (1,'20/02/05',1);
    INSERT INTO table(id,date,key) VALUES (1,'21/02/05',0);
    INSERT INTO table(id,date,key) VALUES (1,'10/04/06',0);
    INSERT INTO table(id,date,key) VALUES (2,'09/05/08',0);
    INSERT INTO table(id,date,key) VALUES (2,'17/06/08',1);
    INSERT INTO table(id,date,key) VALUES (2,'22/06/08',1);
    INSERT INTO table(id,date,key) VALUES (2,'23/06/08',1);  
+----+------------+-----+
| id |    date    | key |
+----+------------+-----+
|  1 | 2005-02-18 |   0 |
|  1 | 2005-02-20 |   1 |
|  1 | 2005-02-21 |   0 |
|  1 | 2006-04-10 |   0 |
|  2 | 2008-05-09 |   0 |
|  2 | 2008-06-17 |   1 |
|  2 | 2008-06-22 |   1 |
|  2 | 2008-06-23 |   1 |
+----+------------+-----+ 

Where id identifies different groups in my data, date (formatted as date column) indicates the date a particular event occurred and key identifies important events in my data set. Now, I need to conduct the following tasks for each group of observations.

  • A) Count the number of past key events in a particular time window for each date entry (let's say 7 days for the moment), in other words: for every date entry: How many times did a key event occur in the last 7 days (count key=1 for date-7 days) Comment: this is how it looks like in stata

  • B) Calculate the time difference in days between each event and the most recent key event, (date - last(date where key=1) =x. (ANSWERED, check out Gordon's post) The final result should look like this:

+----+------------+-----+--------+-----------+
| id |    date    | key | number | time_diff |
+----+------------+-----+--------+-----------+
|  1 | 2005-02-18 |   0 |      0 | NA        |
|  1 | 2005-02-20 |   1 |      0 | 0         |
|  1 | 2005-02-21 |   0 |      1 | 1         |
|  1 | 2006-04-10 |   0 |      0 | 413       |
|  2 | 2008-05-09 |   0 |      0 | NA        |
|  2 | 2008-06-17 |   1 |      0 | 0         |
|  2 | 2008-06-22 |   1 |      1 | 5         |
|  2 | 2008-06-23 |   1 |      2 | 1         |
+----+------------+-----+--------+-----------+ 

All events that occurred before the first key event in a particular group should be tagged as NULL or NA.

I tried to solve B with help of this blog but I am using Postgresql 9.3. and the FILTER clause is a feature of v9.4 if I am not mistaken.

My idea was to try the following:

         dataset <- dbGetQuery(channel, "SELECT t1.*, t1.date -
                               (
                                    SELECT MIN(t2.date)
                                    FROM table t2 
                                    WHERE t1.id = t2.id AND t2.key==1 
                                    AND t1.date-t2.date <= 7 AND t1.date-t2.date >= 0
                               ) AS time_diff FROM table t1 ORDER BY t1.id, t1.date")

但是结果不是很令人满意,如果我的时间窗口中有不止一个关键事件。 我假设我需要使用窗口函数,将我的关键事件指定为 FIRST_VALUE 或设置某种时间间隔,但我不清楚如何实现它才能达到预期的结果。如您所见,我正在使用 R 将查询发送到数据库。

感谢任何帮助。如果您需要更多信息,请告诉我,尤其是因为这是我关于 SO 的第一个问题。

最佳答案

你的问题似乎是围绕“B”而不是“A”。

您可以使用窗口函数执行“B”,但它涉及日期的有条件前向最大扫描,而不是 lag():

select t.*,
       (date -
        max(case when key = 1 then date end) over (partition by id order by date)
       ) as time_diff
from t;

关于sql - 如何计算PostgreSQL中以前事件的数量和时间差?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35630413/

相关文章:

ruby-on-rails - 如何使用 SQL 获取按年份分组的最新值

android - jquery mobile,Android 中的 Chrome 版本 50 选择菜单出错

mysql - 包含时间戳的字符串值。将其转换为日期

mysql - 从这 3 个 MySQL SELECT 查询中进行一个查询的技巧?

mysql - 仅当文本值等于特定文本时如何更新?

c# - 从外部文件读取连接字符串

SQL 日期重叠

mysql - 计算以特定字母开头的记录

node.js - 如何在具有相同表的 Prisma 客户端中检索子查询?

postgresql - PostGIS 在 LINESTRING Z 上创建缓冲区以获得 POLYGON Z