我有来自不同行业的不同公司的每日时间序列,并使用 PostgreSQL。我从一个例子开始解释我的问题。我所拥有的是:
+------------+---------+-------------+----+
| day | company | industry | v |
+------------+---------+-------------+----+
| 2012-01-12 | A | consumer | 2 |
| 2012-01-12 | B | consumer | 2 |
| 2012-01-12 | C | health | 4 |
| 2012-01-12 | D | health | 4 |
| 2012-01-13 | A | consumer | 5 |
| 2012-01-13 | B | consumer | 5 |
| 2012-01-13 | C | health | 7 |
| 2012-01-13 | D | health | 7 |
| 2012-01-16 | A | consumer | 8 |
| 2012-01-16 | B | consumer | 8 |
| 2012-01-16 | C | health | 3 |
| 2012-01-16 | D | health | 3 |
+------------+---------+-------------+----+
来自不同行业的不同公司具有一些值 v 作为跨行业的每日平均值。 我需要的是:
+------------+---------+----------+---+------------+
| day | company | industry | v | delta_v |
+------------+---------+----------+---+------------+
| 2012-01-12 | A | consumer | 2 | NULL |
| 2012-01-12 | B | consumer | 2 | NULL |
| 2012-01-12 | C | health | 4 | NULL |
| 2012-01-12 | D | health | 4 | NULL |
| 2012-01-13 | A | consumer | 5 | 1.5 |
| 2012-01-13 | B | consumer | 5 | 1.5 |
| 2012-01-13 | C | health | 7 | 0.75 |
| 2012-01-13 | D | health | 7 | 0.75 |
| 2012-01-16 | A | consumer | 8 | 0.6 |
| 2012-01-16 | B | consumer | 8 | 0.6 |
| 2012-01-16 | C | health | 3 | -0.571428 |
| 2012-01-16 | D | health | 3 | -0.571428 |
+------------+---------+----------+---+------------+
我需要变量 v 的每日变化。例如,2012 年 1 月 12 日行业“消费者”v 的平均值为 2,2012 年 1 月 13 日为 5。因此增长为 (5- 2)/2 = 1.5。
我试过这个:
SELECT *
, (v - LAG(v) OVER (PARTITION BY industry ORDER BY day) )
/ LAG (v) OVER (PARTITION BY industry ORDER BY day) AS delta_v
FROM mytable
ORDER BY day, industry
问题在于,如果一天内有不止一家来自同一行业的公司,它也会计算值 v 的变化“日内”。
我希望它只需要在“PARTITION BY”子句中做一个小的修正,但我真的不知道该怎么做。你有什么想法可以帮助我吗?
最佳答案
我想你也希望公司在里面:
SELECT t.*,
((v - LAG(v) OVER (PARTITION BY industry, company ORDER BY day) )
/ LAG (v) OVER (PARTITION BY industry, company ORDER BY day)
) AS delta_v
FROM mytable t
ORDER BY day, industry;
我不确定 Postgres 是否真的计算了 lag()
两次,但这更容易维护:
SELECT t.*,
(v / LAG(v) OVER (PARTITION BY industry, company ORDER BY day) ) - 1
) AS delta_v
FROM mytable t
ORDER BY day, industry;
关于sql - 使用滞后窗口函数找到正确的分区,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22119992/