sql - 组合两个具有多个粒度级别的表

标签 sql postgresql

我有两个表,我想合并它们。 在这里 fiddle :http://sqlfiddle.com/#!17/999d0

Result表是我期望得到的。表utm是一个源主表和表report包含 utm 的数据行。我需要什么:

  • utm 中获取 idutm_表并从表 report 添加统计信息适当造粒。

示例

在表中utm我有一行:(24611609, 'myTarget', 'Media', 'Social', NULL, NULL)在表格报告中我有 2 行:

(24611609, '2022-08-01', 200, 150, 15, 'myTarget', 'Media', 'Social', 'premium', 'subcribe'),
(24611609, '2022-08-01', 25, 10, 1, 'myTarget', 'Media', 'Social', 'free', 'subcribe')

常见的是:'myTarget', 'Media', 'Social'

正确的粒度级别是 id, utm_campaign, utm_source, utm_medium ,所以我需要通过这些键对两行进行求和和分组。因此,为此我需要这样的东西:

SELECT 
utm.row_id AS id,
utm.utm_campaign,
utm.utm_source,
utm.utm_medium,
utm.utm_content,
utm.utm_term,
report.date_of_visit,
sum(report.sessions) as sessions,
sum(report.pageviews) as pageviews,
sum(report.bounces) as bounces
FROM utm
inner join report on utm.row_id = report.id and utm.utm_campaign = report.utm_campaign and utm.utm_source = report.utm_source and utm.utm_medium = report.utm_medium
group by utm.row_id,
utm.utm_campaign,
utm.utm_source,
utm.utm_medium,
utm.utm_content,
utm.utm_term,
report.date_of_visit

我不知道如何处理所有可能的粒度组合。我的想法只是使用不同的 JOINS 变体并将结果与​​ UNION 合并,例如:

join on id, utm_campaign
union
...
join on id, utm_campaign, utm_medium
union
...
join on id, utm_campaign, utm_source
...

但这真的很愚蠢,我应该创建 > 1000 个联合和连接。

有什么建议吗?

最佳答案

享受

with 
r as
(
    select      id
               ,date_of_visit
               
               ,sum(sessions)               as sessions
               ,sum(pageviews)              as pageviews
               ,sum(bounces)                as bounce
               
               ,coalesce(utm_campaign ,'')  as utm_campaign 
               ,coalesce(utm_source   ,'')  as utm_source 
               ,coalesce(utm_medium   ,'')  as utm_medium 
               ,coalesce(utm_content  ,'')  as utm_content
               ,coalesce(utm_term     ,'')  as utm_term   
             
    from        report as r

    group by    id
               ,date_of_visit
               ,cube(6, 7, 8, 9, 10)

)         
select  r.*

from            r 

        join    utm as u 
        
        on      r.id = u.row_id
        
            and (r.utm_campaign, r.utm_source, r.utm_medium, r.utm_content, r.utm_term)
                is not distinct from 
                (u.utm_campaign, u.utm_source, u.utm_medium, u.utm_content, u.utm_term)
   
where   'NA' in (r.utm_campaign, r.utm_source, r.utm_medium, r.utm_content, r.utm_term) is not true
<表类=“s-表”> <标题> id 访问日期 session 综合浏览量 反弹 utm_campaign utm_source utm_medium utm_content utm_term <正文> 28573041 2022-08-01 1000 900 10 Beeline_uppers_2022 空 空 空 空 24611609 2022-08-01 225 160 16 我的目标 媒体 社交 空 空 28573041 2022-08-01 900 885 34 shop_smartfony my_beeline 横幅 空 空 24611609 2022-08-01 1 1 0 事件 来源 中 内容 术语

Fiddle

关于sql - 组合两个具有多个粒度级别的表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73557461/

相关文章:

node.js - Express + PostgreSQL 的最佳 session 存储中间件

postgresql - 如何在 postgresql 数据库中获取正在运行的查询的状态

postgresql - 整数列可以为空吗?

php - CakePHP-DateTime 和 php 的 date()

mysql - 插入时如何将一个数字连接到另一个数字

php - 看似正确的mysql抛出错误

sql - 根据观察结果创建稀疏矩阵

sql - 选择用户未发表评论的博文

sql - 如何使 Amazon Redshift 中的 TO_DATE 在同一列中使用多种日期格式?

sql - 复杂的 postgres json order by 子句