google-bigquery - 大查询 : Selecting the smallest difference among fields in a repeated record

标签 google-bigquery

考虑 BigQuery 上的这个表架构:

Table User
{
user_id: STRING (REQUIRED)
user_name: STRING (REQUIRED)
actions: RECORD (REPEATED) 
    {
        action_id: STRING (REQUIRED)
        action_type: INTEGER (REQUIRED)
        action_date: TIMESTAMP (REQUIRED)
    }
}

我想找到所有多次创建某种类型操作的用户(user_id 和 user_name),并且这些操作之间的最短时间少于 X 天。

未定义每个用户存储的操作数(可以是 1、2 或 n)。这些操作不按任何标准排序(但我认为这可以通过使用 ORDER BY 来解决)。

例如,与用户:

{
    user_id: "u1", 
    user_name: "User 1", 
    actions: 
    {action_id: "a1", action_type: 1, action_date: "2016-02-22"},
    {action_id: "a2", action_type: 1, action_date: "2016-01-22"},
    {action_id: "a3", action_type: 1, action_date: "2015-12-22"}
},
{
    user_id: "u2", 
    user_name: "User 2", 
    actions: 
    {action_id: "a4", action_type: 1, action_date: "2016-02-22"},
    {action_id: "a5", action_type: 2, action_date: "2016-01-22"},
    {action_id: "a6", action_type: 1, action_date: "2015-12-22"}
},
{
    user_id: "u3", 
    user_name: "User 3", 
    actions: 
    {action_id: "a7", action_type: 1, action_date: "2016-02-22"}
},
{
    user_id: "u4", 
    user_name: "User 4", 
    actions: 
    {action_id: "a8", action_type: 1, action_date: "2016-02-22"},
    {action_id: "a9", action_type: 1, action_date: "2015-02-22"},
    {action_id: "a10", action_type: 1, action_date: "2015-01-22"}
},

查询“选择多次执行 1 类型操作的用户,并且每次执行之间的最短时间小于 45 天”应该返回 用户 1用户 4

关于如何在 BigQuery 上执行此操作的任何想法?

最佳答案

试试下面
随手写,因此未经测试,但我觉得它应该可以工作并且可以满足您的需求

SELECT 
  user_id, 
  user_name, 
  action_type, 
  MIN(DATEDIFF(action_date_next, action_date)) AS min_distance
FROM (
  SELECT 
    user_id, 
    user_name, 
    action_type, 
    action_date, 
    LAG(action_date) 
        OVER(PARTITION BY user_id, action_type 
        ORDER BY action_date DESC) AS action_date_next
  FROM (
    SELECT 
      user_id, 
      user_name, 
      actions.action_type AS action_type, 
      actions.action_date AS action_date 
    FROM table_users 
  )
)
WHERE action_date_next IS NOT NULL
GROUP BY user_id, user_name, action_type
HAVING action_type = 1 AND min_distance < 45

下面的版本更紧凑 - 试试吧

SELECT 
  user_id, 
  user_name, 
  action_type, 
  MIN(DATEDIFF(action_date_next, action_date)) AS min_distance
FROM (
  SELECT 
    user_id, 
    user_name, 
    actions.action_type AS action_type, 
    actions.action_date AS action_date, 
    LAG(actions.action_date) 
        OVER(PARTITION BY user_id, actions.action_type 
        ORDER BY actions.action_date DESC) AS action_date_next
  FROM table_users
)
WHERE action_date_next IS NOT NULL
GROUP BY user_id, user_name, action_type
HAVING action_type = 1 AND min_distance < 45

关于google-bigquery - 大查询 : Selecting the smallest difference among fields in a repeated record,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35567979/

相关文章:

python-3.x - 为什么我无法将云函数中的 csv 分隔符设置为管道

python - 使用 python 将嵌套 BigQuery 数据导出到云存储

oracle - 是否有实用程序可以将 oracle 转储到 Google BigQuery 中?

sql - Bigquery 错误 : 400 No matching signature for operator BETWEEN for argument types: DATE, TIMESTAMP, TIMESTAMP

python - 数据实验室 : How to export Big Query standard SQL query to dataframe?

google-bigquery - BigQuery - 在标准 SQL 中查询嵌套字段

sql - 在bigquery中显示上个月的数据

java - 写入 BigQuery 时处理卡住

lua - 如何定时向第三方服务发送Redis数据?

google-bigquery - 我们可以在BigQuery中强制转换类型吗?