基于最后七个条目设置列的 SQL 查询

标签 sql google-bigquery

问题

我无法弄清楚如何创建一个查询,该查询可以判断任何用户条目之前是否有 7 天没有任何事件 (secondsPlayed == 0),如果是,则用值 1 表示它,否则用 0 表示。

这也意味着,如果用户的条目少于 7 个,则所有条目的值将为 0。


Input table:
+------------------------------+-------------------------+---------------+
|            userid            |     estimationDate      | secondsPlayed |
+------------------------------+-------------------------+---------------+
| a                            | 2016-07-14 00:00:00 UTC | 192.5         |
| a                            | 2016-07-15 00:00:00 UTC | 357.3         |
| a                            | 2016-07-16 00:00:00 UTC | 0             |
| a                            | 2016-07-17 00:00:00 UTC | 0             |
| a                            | 2016-07-18 00:00:00 UTC | 0             |
| a                            | 2016-07-19 00:00:00 UTC | 0             |
| a                            | 2016-07-20 00:00:00 UTC | 0             |
| a                            | 2016-07-21 00:00:00 UTC | 0             |
| a                            | 2016-07-22 00:00:00 UTC | 0             |
| a                            | 2016-07-23 00:00:00 UTC | 0             |
| a                            | 2016-07-24 00:00:00 UTC | 0             |
| ---------------------------- | ----------------------  | ----          |
| b                            | 2016-07-02 00:00:00 UTC | 31.2          |
| b                            | 2016-07-03 00:00:00 UTC | 42.1          |
| b                            | 2016-07-04 00:00:00 UTC | 41.9          |
| b                            | 2016-07-05 00:00:00 UTC | 43.2          |
| b                            | 2016-07-06 00:00:00 UTC | 91.5          |
| b                            | 2016-07-07 00:00:00 UTC | 0             |
| b                            | 2016-07-08 00:00:00 UTC | 0             |
| b                            | 2016-07-09 00:00:00 UTC | 239.1         |
| b                            | 2016-07-10 00:00:00 UTC | 0             |
+------------------------------+-------------------------+---------------+

预期的输出表应如下所示:


Output table:

+------------------------------+-------------------------+---------------+----------+
|            userid            |     estimationDate      | secondsPlayed | inactive |
+------------------------------+-------------------------+---------------+----------+
| a                            | 2016-07-14 00:00:00 UTC | 192.5         | 0        |
| a                            | 2016-07-15 00:00:00 UTC | 357.3         | 0        |
| a                            | 2016-07-16 00:00:00 UTC | 0             | 0        |
| a                            | 2016-07-17 00:00:00 UTC | 0             | 0        |
| a                            | 2016-07-18 00:00:00 UTC | 0             | 0        |
| a                            | 2016-07-19 00:00:00 UTC | 0             | 0        |
| a                            | 2016-07-20 00:00:00 UTC | 0             | 0        |
| a                            | 2016-07-21 00:00:00 UTC | 0             | 0        |
| a                            | 2016-07-22 00:00:00 UTC | 0             | 1        |
| a                            | 2016-07-23 00:00:00 UTC | 0             | 1        |
| a                            | 2016-07-24 00:00:00 UTC | 0             | 1        |
| ---------------------------- | ----------------------- | -----         | -----    |
| b                            | 2016-07-02 00:00:00 UTC | 31.2          | 0        |
| b                            | 2016-07-03 00:00:00 UTC | 42.1          | 0        |
| b                            | 2016-07-04 00:00:00 UTC | 41.9          | 0        |
| b                            | 2016-07-05 00:00:00 UTC | 43.2          | 0        |
| b                            | 2016-07-06 00:00:00 UTC | 91.5          | 0        |
| b                            | 2016-07-07 00:00:00 UTC | 0             | 0        |
| b                            | 2016-07-08 00:00:00 UTC | 0             | 0        |
| b                            | 2016-07-09 00:00:00 UTC | 239.1         | 0        |
| b                            | 2016-07-10 00:00:00 UTC | 0             | 0        |
+------------------------------+-------------------------+---------------+----------+


想法

起初我考虑使用偏移量为 7 的 Lag 函数,但这显然与中间的任何主题都无关。

我还在考虑创建一个 7 天的滚动窗口/平均值,并评估它是否高于 0。但这可能有点超出我的技能水平。

任何人都有机会解决这个问题。

最佳答案

假设您每天都有数据(这似乎是一个合理的假设),您可以对窗口函数求和:

select t.*,
       (case when sum(secondsplayed) over (partition by userid
                                           order by estimationdate
                                           rows between 6 preceding and current row
                                          ) = 0 and
                  row_number() over (partition by userid order by estimationdate) >= 7
             then 1
             else 0
        end) as inactive                  
from t;

除了日期中没有漏洞之外,还假设 secondsplayed 永远不会为负数。 (负值可以很容易地合并到逻辑中,但这似乎没有必要。)

关于基于最后七个条目设置列的 SQL 查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52277324/

相关文章:

php - 准备好的语句不适用于 ALTER 表查询

sql - 如何查找只有一个值的列 - Postgresql

mysql - 如何在mysql中只返回唯一值

google-analytics - BigQuery合并/合并错误

Sqlite 将数据库合并为一个,具有唯一值,保留外键关系

mysql - 从表中消除排列

google-bigquery - bigquery.readonly 范围允许哪些操作?

google-bigquery - Datalab 创建返回 STRUCT 的 BigQuery UDF

sql - BigQuery 连接三个表

sql - BigQuery 中的查询命中和自定义维度?