我是 SQL 的新手,我正在通过查询与 PostgreSQL 数据库进行通信。我有以下问题:我的(简化的)数据 table
看起来像这样。
DROP TABLE table;
CREATE TABLE table(
id INTEGER NOT NULL PRIMARY KEY
,date DATE NOT NULL
,key BIT NOT NULL
);
INSERT INTO table(id,date,key) VALUES (1,'18/02/05',0);
INSERT INTO table(id,date,key) VALUES (1,'20/02/05',1);
INSERT INTO table(id,date,key) VALUES (1,'21/02/05',0);
INSERT INTO table(id,date,key) VALUES (1,'10/04/06',0);
INSERT INTO table(id,date,key) VALUES (2,'09/05/08',0);
INSERT INTO table(id,date,key) VALUES (2,'17/06/08',1);
INSERT INTO table(id,date,key) VALUES (2,'22/06/08',1);
INSERT INTO table(id,date,key) VALUES (2,'23/06/08',1);
+----+------------+-----+ | id | date | key | +----+------------+-----+ | 1 | 2005-02-18 | 0 | | 1 | 2005-02-20 | 1 | | 1 | 2005-02-21 | 0 | | 1 | 2006-04-10 | 0 | | 2 | 2008-05-09 | 0 | | 2 | 2008-06-17 | 1 | | 2 | 2008-06-22 | 1 | | 2 | 2008-06-23 | 1 | +----+------------+-----+
Where id
identifies different groups in my data, date
(formatted as date column) indicates the date a particular event occurred and key
identifies important events in my data set.
Now, I need to conduct the following tasks for each group of observations.
A) Count the number of past key events in a particular time window for each date entry (let's say 7 days for the moment), in other words: for every date entry: How many times did a key event occur in the last 7 days (count key=1 for date-7 days) Comment: this is how it looks like in stata
B) Calculate the time difference in days between each event and the most recent key event, (date - last(date where key=1) =x. (ANSWERED, check out Gordon's post) The final result should look like this:
+----+------------+-----+--------+-----------+ | id | date | key | number | time_diff | +----+------------+-----+--------+-----------+ | 1 | 2005-02-18 | 0 | 0 | NA | | 1 | 2005-02-20 | 1 | 0 | 0 | | 1 | 2005-02-21 | 0 | 1 | 1 | | 1 | 2006-04-10 | 0 | 0 | 413 | | 2 | 2008-05-09 | 0 | 0 | NA | | 2 | 2008-06-17 | 1 | 0 | 0 | | 2 | 2008-06-22 | 1 | 1 | 5 | | 2 | 2008-06-23 | 1 | 2 | 1 | +----+------------+-----+--------+-----------+
All events that occurred before the first key event in a particular group should be tagged as NULL or NA.
I tried to solve B with help of this blog but I am using Postgresql 9.3. and the FILTER clause is a feature of v9.4 if I am not mistaken.
My idea was to try the following:
dataset <- dbGetQuery(channel, "SELECT t1.*, t1.date -
(
SELECT MIN(t2.date)
FROM table t2
WHERE t1.id = t2.id AND t2.key==1
AND t1.date-t2.date <= 7 AND t1.date-t2.date >= 0
) AS time_diff FROM table t1 ORDER BY t1.id, t1.date")
但是结果不是很令人满意,如果我的时间窗口中有不止一个关键事件。 我假设我需要使用窗口函数,将我的关键事件指定为 FIRST_VALUE 或设置某种时间间隔,但我不清楚如何实现它才能达到预期的结果。如您所见,我正在使用 R 将查询发送到数据库。
感谢任何帮助。如果您需要更多信息,请告诉我,尤其是因为这是我关于 SO 的第一个问题。
最佳答案
你的问题似乎是围绕“B”而不是“A”。
您可以使用窗口函数执行“B”,但它涉及日期的有条件前向最大扫描,而不是 lag()
:
select t.*,
(date -
max(case when key = 1 then date end) over (partition by id order by date)
) as time_diff
from t;
关于sql - 如何计算PostgreSQL中以前事件的数量和时间差?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35630413/