我是 BigQuery 和 SQL 的新手,一直在解决分组问题。在 BigQuery 中使用标准 SQL,我想按 X 天对数据进行分组。这是一个数据表:
event_id | url | timestamp
-----------------------------------------------------------
xx a.html 2016-10-18 15:55:16 UTC
xx a.html 2016-10-19 16:68:55 UTC
xx a.html 2016-10-25 20:55:57 UTC
yy b.html 2016-10-18 15:58:09 UTC
yy b.html 2016-10-18 08:32:43 UTC
zz a.html 2016-10-20 04:44:22 UTC
zz c.html 2016-10-21 02:12:34 UTC
我想从给定日期开始,每隔 X 天计算每个 url 上发生的每个事件的数量。例如:我如何以 3 天为间隔对它进行分组,我的第一个间隔从 2016-10-18 00:00:00 UTC 开始?另外,我可以将间隔的第 3 天分配给每一行吗?示例输出:
event_id | url | count | 3dayIntervalLabel
-----------------------------------------------------------
xx a.html 2 2016-10-20 --> [18th thru 20th]
yy b.html 2 2016-10-20
zz a.html 1 2016-10-20
zz c.html 1 2016-10-23 --> [21th thru 23th]
xx a.html 1 2016-10-26 --> [24th thru 26th]
我添加了三个注释来阐明 3dayIntervalLabel 值。
一般来说,我希望解决:从日期 Y 开始按 X 天的间隔分组,并使用每个间隔的最终日期标记间隔。
如果需要更多说明,请告诉我。
如果您有兴趣,我还在 StackOverflow 上提出了关于使用滚动窗口对这些数据进行分组的类似问题(并得到了答案):initial question和 follow-up .
谢谢!
最佳答案
WITH dailyAggregations AS (
SELECT
DATE(ts) AS day,
url,
event_id,
UNIX_SECONDS(TIMESTAMP(DATE(ts))) AS sec,
COUNT(1) AS events
FROM yourTable
GROUP BY day, url, event_id, sec
),
calendar AS (
SELECT day, DATE_ADD(day, INTERVAL 2 DAY) AS endday
FROM UNNEST (GENERATE_DATE_ARRAY('2016-10-18', '2016-11-06', INTERVAL 3 DAY)) AS day
)
SELECT
event_id,
url,
SUM(events) AS `count`,
c.endday AS `ThreedayIntervalLabel`
FROM calendar AS c
JOIN dailyAggregations AS a
ON a.day BETWEEN c.day AND c.endday
GROUP BY endday, url, event_id
关于sql - BigQuery 和标准 SQL : how to group by arbitrary day interval,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40497815/