假设我有一个表(MyTable
)如下:
item_id | date
----------------
1 | 2016-06-08
1 | 2016-06-07
1 | 2016-06-05
1 | 2016-06-04
1 | 2016-05-31
...
2 | 2016-06-08
2 | 2016-06-06
2 | 2016-06-04
2 | 2016-05-31
...
3 | 2016-05-31
...
我想建立一个每周汇总表,报告运行 7 天的窗口。该窗口基本上会显示“在过去 7 天内报告了多少个唯一的 item_id
”?
因此,在这种情况下,输出表将类似于:
date | weekly_ids
----------------------
2016-05-31| 3 # All 3 were present on the 31st
2016-06-01| 3 # All 3 were present on the 31st which is < 7 days before the 1st
2016-06-02| 3 # Same
2016-06-03| 3 # Same
2016-06-04| 3 # Same
2016-06-05| 3 # Same
2016-06-06| 3 # Same
2016-06-07| 3 # Same
2016-06-08| 2 # item 3 was not present for the entire last week so it does not add to the count.
我试过:
SELECT
item_id,
date,
MAX(present) OVER (
PARTITION BY item_id
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS is_present
FROM (
# Inner query
SELECT
item_id,
date,
1 AS present,
FROM MyTable
)
GROUP BY date
ORDER BY date DESC
这感觉好像在朝着正确的方向前进。但实际上,当日期不存在(日期太多)时,窗口会在错误的时间范围内运行,并且当 item_id
不存在时,它也不会输出日期记录(即使它出现在前一个日期)。这个问题有简单的解决方法吗?
如果有帮助和必要
- 我可以硬编码最早的日期
- 我还可以获得包含所有存在的
item_id
的表格。 - 此查询将仅在 BigQuery 上运行,因此 BQ 特定函数/语法是公平的游戏,不幸的是,不在 BigQuery 上运行的 SQL 函数/语法对我没有帮助......
最佳答案
我已经创建了一个临时表来保存日期,但是,您可能会从为这些连接添加一个永久表到数据库中受益。相信我,它会减少头痛。
DECLARE @my_table TABLE
(
item_id int,
date DATETIME
)
INSERT @my_table SELECT 1,'2016-06-08'
INSERT @my_table SELECT 1,'2016-06-07'
INSERT @my_table SELECT 1,'2016-06-05'
INSERT @my_table SELECT 1,'2016-06-04'
INSERT @my_table SELECT 1,'2016-05-31'
INSERT @my_table SELECT 2,'2016-06-08'
INSERT @my_table SELECT 2,'2016-06-06'
INSERT @my_table SELECT 2,'2016-06-04'
INSERT @my_table SELECT 2,'2016-05-31'
INSERT @my_table SELECT 3,'2016-05-31'
DECLARE @TrailingDays INT=7
DECLARE @LowDate DATETIME='01/01/2016'
DECLARE @HighDate DATETIME='12/31/2016'
DECLARE @Calendar TABLE(CalendarDate DATETIME)
DECLARE @LoopDate DATETIME=@LowDate
WHILE(@LoopDate<=@HighDate) BEGIN
INSERT @Calendar SELECT @LoopDate
SET @LoopDate=DATEADD(DAY,1,@LoopDate)
END
SELECT
date=HighDate,
weekly_ids=COUNT(DISTINCT item_id)
FROM
(
SELECT
HighDate=C.CalendarDate,
LowDate=LAG(C.CalendarDate, @TrailingDays,0) OVER (ORDER BY C.CalendarDate)
FROM
@Calendar C
WHERE
CalendarDate BETWEEN @LowDate AND @HighDate
)AS X
LEFT OUTER JOIN @my_table MT ON MT.date BETWEEN LowDate AND HighDate
GROUP BY
LowDate,
HighDate
关于sql - 缺少数据的窗口函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37710857/