sql - 使用间隙和孤岛查找连续的小时/日期- SQL/BigQuery

标签 sql google-bigquery gaps-and-islands

我在 BigQuery 中有一个表,如下所示:

Caller_Number | month  |  day| call_time
--------------|--------|-----|----------
1             |  5     |  15 | 12:56:17

我想为 BigQuery 编写一个 SQL 查询,它将允许我计算至少进行一次调用的连续小时数(按 caller_number 排序),以及至少连续 10 小时发生调用的连续天数(按 caller_number 排序)。我一直在查看有关间隙和岛屿的现有资源,但似乎无法弄清楚如何将其应用于连续的日期和时间。

最佳答案

以下是连续几个小时的工作示例
步骤是
1.从call_time中“提取”小时

HOUR(TIMESTAMP(CURRENT_DATE() + ' ' + call_time))

2.查找前一小时

LAG([hour]) OVER(PARTITION BY Caller_Number, [month], [day] ORDER BY [hour])

3.计算连续小时组的开始 - 1 - 开始,0 - 组继续

IFNULL(INTEGER([hour] - prev_hour > 1), 1)

4.给每个组分配组号

SUM(seq) OVER(PARTITION BY Caller_Number, [month], [day] ORDER BY [hour])

5.最后——按组号分组并计算通话次数和小时数

希望这能为您在连续数小时结果之上连续几天实现类似逻辑提供良好的开端

SELECT Caller_Number, [month], [day], seq_group, 
  EXACT_COUNT_DISTINCT([hour]) AS hours_count, COUNT(1) AS calls_count 
FROM (
  SELECT Caller_Number, [month], [day], [hour],  
    SUM(seq) OVER(PARTITION BY Caller_Number, [month], [day] 
                  ORDER BY [hour]) AS seq_group
  FROM (
    SELECT Caller_Number, [month], [day], [hour], 
      IFNULL(INTEGER([hour] - prev_hour > 1), 1) AS seq
    FROM (
      SELECT Caller_Number, [month], [day], [hour], 
        LAG([hour]) OVER(PARTITION BY Caller_Number, [month], [day] 
                         ORDER BY [hour]) AS prev_hour
      FROM (
        SELECT Caller_Number, [month], [day], 
          HOUR(TIMESTAMP(CURRENT_DATE() + ' ' + call_time)) AS [hour] 
        FROM YourTable
      )
    )
  )
)
GROUP BY Caller_Number, [month], [day], seq_group

关于sql - 使用间隙和孤岛查找连续的小时/日期- SQL/BigQuery,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36141237/

相关文章:

MySQL 试图返回交错的结果

java - INSERT 操作在使用 Java 的 SQLITE 中不起作用

c# - 拒绝 BigQuery 中的结果

sql - 如何填补 Postgres 查询中的时间戳空白?

mysql - MYSQL中将日期转换为日期范围---如何处理日期中的间隙

mysql - 选择没有另一行具有值的行

sql - Oracle:避免 to_date 中出现 NULL 值

sql - 如何在 Google BigQuery 中透视数据集?

python - BigQuery : Load from CSV, 跳过列

sql - 查找分区内最大的连续数字组