我想对行进行求和,但只包含五分钟前存在时间戳的值。我正在使用 bigquery,它似乎不支持 timestamp_diff 操作(但支持 datediff)。
考虑以下数据(mytable):
time (timestamp) meter (int) value (float)
2013-07-03 07:50:00 1 3
2013-07-03 07:50:00 2 4
2013-07-03 07:55:00 1 3
2013-07-03 07:55:00 2 4
2013-07-03 08:00:00 1 3
2013-07-03 08:00:00 2 4
2013-07-03 08:05:00 1 3
2013-07-03 08:10:00 1 3
2013-07-03 08:10:00 2 4
我想扩展的查询首先可以定义为:
SELECT time, SUM(value) AS sumValue, COUNT(value) AS obs
FROM mytable
GROUP BY time
输出为:
time sumValue obs
2013-07-03 07:50:00 7 2
2013-07-03 07:55:00 7 2
2013-07-03 08:00:00 7 2
2013-07-03 08:05:00 3 1
2013-07-03 08:10:00 7 2
我希望扩展此查询,以便表 2 的值不包含在 2013-07-03 08:10:00 的 sumValue 中(因此此处的 sumValue = 3),因为它在五分钟内没有条目早些时候。这样做的另一个结果是第一个时间戳的 sumValue 将为零。所需的输出将是:
time sumValue obs
2013-07-03 07:50:00 0 0
2013-07-03 07:55:00 7 2
2013-07-03 08:00:00 7 2
2013-07-03 08:05:00 3 1
2013-07-03 08:10:00 3 1
这可以在bigquery中完成吗?
最佳答案
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT
time,
SUM(IF(delta = 300, value, 0)) sumValue,
COUNTIF(delta = 300) obs
FROM (
SELECT time, meter, value,
UNIX_SECONDS(time) - LAG(UNIX_SECONDS(time))
OVER(PARTITION BY meter ORDER BY time) delta
FROM `project.dataset.table`
)
GROUP BY time
您可以使用问题中的虚拟数据来测试/玩上面的内容
#standardSQL
WITH `project.dataset.table` AS (
SELECT TIMESTAMP '2013-07-03 07:50:00' time, 1 meter, 3 value UNION ALL
SELECT TIMESTAMP '2013-07-03 07:50:00', 2, 4 UNION ALL
SELECT TIMESTAMP '2013-07-03 07:55:00', 1, 3 UNION ALL
SELECT TIMESTAMP '2013-07-03 07:55:00', 2, 4 UNION ALL
SELECT TIMESTAMP '2013-07-03 08:00:00', 1, 3 UNION ALL
SELECT TIMESTAMP '2013-07-03 08:00:00', 2, 4 UNION ALL
SELECT TIMESTAMP '2013-07-03 08:05:00', 1, 3 UNION ALL
SELECT TIMESTAMP '2013-07-03 08:10:00', 1, 3 UNION ALL
SELECT TIMESTAMP '2013-07-03 08:10:00', 2, 4
)
SELECT
time,
SUM(IF(delta = 300, value, 0)) sumValue,
COUNTIF(delta = 300) obs
FROM (
SELECT time, meter, value,
UNIX_SECONDS(time) - LAG(UNIX_SECONDS(time))
OVER(PARTITION BY meter ORDER BY time) delta
FROM `project.dataset.table`
)
GROUP BY time
ORDER BY time
结果是
time sumValue obs
2013-07-03 07:50:00 UTC 0 0
2013-07-03 07:55:00 UTC 7 2
2013-07-03 08:00:00 UTC 7 2
2013-07-03 08:05:00 UTC 3 1
2013-07-03 08:10:00 UTC 3 1
关于sql - 当较早时间戳不存在条目时,从 sql 操作中排除行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47715030/