假设我们有如下数据:
ID tag data timestamp
001 A walter 2021-06-04 09:46:25
005 F junior 2021-06-05 09:47:25
001 B junior 2021-06-04 09:47:25
002 C soprano 2021-06-04 09:48:25
002 C alto 2021-06-04 09:49:25
001 A brown 2021-06-04 09:50:25
003 A cleave 2021-06-04 09:51:25
003 B land 2021-06-04 09:52:25
004 C before 2021-06-04 09:53:25
005 H junior 2021-06-04 09:47:25
我需要知道每个 ID
值中出现次数最多的 tag
。如果出现平局,请使用最新 ID 标签,以时间戳表示。
预期结果:
ID tag
001 A
002 C
003 B
004 C
005 F
最佳答案
使用QUALIFY和RANK过滤分组结果:
SELECT ID, tag, COUNT(*) AS cnt, MAX(timestamp) AS max_t
FROM tab
GROUP BY ID, tag
QUALIFY RANK() OVER(PARTITION BY ID ORDER BY cnt DESC, max_t DESC) = 1
示例数据:
CREATE OR REPLACE TABLE tab(ID STRING, tag STRING, data STRING, timestamp TIMESTAMP)
AS
SELECT '001', 'A' ,' walter','2021-06-04 09:46:25'
UNION ALL SELECT '005', 'F' ,' junior','2021-06-05 09:47:25'
UNION ALL SELECT '001', 'B' ,' junior','2021-06-04 09:47:25'
UNION ALL SELECT '002', 'C' ,'soprano','2021-06-04 09:48:25'
UNION ALL SELECT '002', 'C' ,' alto','2021-06-04 09:49:25'
UNION ALL SELECT '001', 'A' ,' brown','2021-06-04 09:50:25'
UNION ALL SELECT '003', 'A' ,' cleave','2021-06-04 09:51:25'
UNION ALL SELECT '003', 'B' ,' land','2021-06-04 09:52:25'
UNION ALL SELECT '004', 'C' ,' before','2021-06-04 09:53:25'
UNION ALL SELECT '005', 'H' ,' junior','2021-06-04 09:47:25';
简化查询:
SELECT ID, tag
FROM tab
GROUP BY ID, tag
QUALIFY RANK() OVER(PARTITION BY ID ORDER BY COUNT(*) DESC, MAX(timestamp) DESC) = 1
ORDER BY ID;
输出:
关于sql - 如何确定 SQL 表(雪花)中出现次数最多的值并考虑关系?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69802160/