sql - 从多个表中获取组合行

标签 sql postgresql greatest-n-per-group

我有一个主表和一些子表,这些子表至少包含主表中的 1 列。子表是主表某些部分的更新。我想在特定日期获取主表的更新行。

Main table:

table1
| id | colA | colB | colC | colD | colE | createDate  |
|:---|:----:|:----:|:----:|:----:|:----:|:-----------:|
| a1 |  1   |  1   |  1   |  1   |  1   |  2017/01/01 |

Sub-tables :

table2
| mainid | colA | colB | createdate  |
|:------:|:----:|:----:|:-----------:|
|   a1   |  2   |  2   |  2018/05/01 |
|   a1   |  3   |  3   |  2019/01/01 |
|   a1   |  4   |  4   |  2020/01/01 |

table3
| mainid | colA | colB | colC | createDate  |
|:------:|:----:|:----:|:----:|:-----------:|
|   a1   |  6   |  6   |  6   |  2019/01/01 |
|   a1   |  7   |  7   |  7   |  2020/01/01 |
|   a1   |  8   |  8   |  8   |  2021/01/01 |

table4
| mainid | colA | colE | colC | createDate  |
|:------:|:----:|:----:|:----:|:-----------:|
|   a1   |  9   |  9   |  9   |  2018/06/01 |
|   a1   |  10  |  10  |  10  |  2017/01/01 |
|   a1   |  12  |  12  |  12  |  2020/01/01 |

我通过以下代码从每个表中获取行:

select * from table2 where createDate < '2018-07-01' and mainid='a1' order by createDate desc limit 1;
select * from table3 where createDate < '2018-07-01' and mainid='a1' order by createDate desc limit 1;
select * from table4 where createDate < '2018-07-01' and mainid='a1 'order by createDate desc limit 1;

select * from table1 where id = 'a1'; 

现在我想将这些行与主表的行合并。如果 1 个特定列有来自不同表的多个值,它应该像这样使用最新的行:

table1 -> colD: 1
table2 -> colB: 2
table3 -> nothing
table4 -> colA: 9, colC: 9, colE: 9

 selected row :
| id | colA | colB | colC | colD | colE |filteredDate |
|:---|:----:|:----:|:----:|:----:|:----:|:-----------:|
| a1 |  9   |  2   |  9   |  1   |  9   |  2018/07/01 |

如何在一次查询中完成这项工作?这可能吗?我应该以不同的方式尝试吗?

最佳答案

假设所有列都为 NOT NULL,否则您将不得不做更多事情。

首先,您可以UNION ALL 您列出的查询并为缺失的列填充 NULL 值以获得兼容的行类型。然后聚合。剩下的困难是任务的完美聚合函数没有在普通的 Postgres 中实现......

使用标准的常用 SQL 工具

SELECT id
    , (array_agg(colA ORDER BY colA IS NULL, createDate DESC))[1] AS colA
    , (array_agg(colB ORDER BY colB IS NULL, createDate DESC))[1] AS colB
    , (array_agg(colC ORDER BY colC IS NULL, createDate DESC))[1] AS colC
    , (array_agg(colD ORDER BY colD IS NULL, createDate DESC))[1] AS colD
    , (array_agg(colE ORDER BY colE IS NULL, createDate DESC))[1] AS colE
FROM (
   select      id, colA, colB, colC, colD, colE, createDate from table1 where id = 'a1'
   UNION ALL
   (select mainid, colA, colB, NULL, NULL, NULL, createDate from table2 where createDate < '2018-07-01' and mainid='a1' order by createDate desc limit 1)
   UNION ALL
   (select mainid, colA, colB, colC, NULL, NULL, createDate from table3 where createDate < '2018-07-01' and mainid='a1' order by createDate desc limit 1)
   UNION ALL
   (select mainid, colA, NULL, colc, NULL, colE, createDate from table4 where createDate < '2018-07-01' and mainid='a1' order by createDate desc limit 1)
   ) sub
GROUP BY 1;

使用自定义聚合函数first()

借助 Postgres Wiki 中说明的自定义聚合函数变得更简单、更快速 here :

CREATE OR REPLACE FUNCTION first_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE SQL IMMUTABLE STRICT AS 'SELECT $1';
 
CREATE AGGREGATE FIRST (
        sfunc    = first_agg,
        basetype = anyelement,
        stype    = anyelement
);

然后:

SELECT id
     , first(colA) AS colA
     , first(colB) AS colB
     , first(colC) AS colC
     , first(colD) AS colD
     , first(colE) AS colE
FROM (
   SELECT      id, colA, colB, colC, colD, colE, createDate FROM table1 WHERE     id='a1'
   UNION ALL
   (SELECT mainid, colA, colB, NULL, NULL, NULL, createDate FROM table2 WHERE mainid='a1' AND createDate < '2018-07-01' ORDER BY createDate DESC LIMIT 1)
   UNION ALL
   (SELECT mainid, colA, colB, colC, NULL, NULL, createDate FROM table3 WHERE mainid='a1' AND createDate < '2018-07-01' ORDER BY createDate DESC LIMIT 1)
   UNION ALL
   (SELECT mainid, colA, NULL, colc, NULL, colE, createDate FROM table4 WHERE mainid='a1' AND createDate < '2018-07-01' ORDER BY createDate DESC LIMIT 1)
   ORDER BY createDate DESC
   ) sub
GROUP  BY 1;

使用 this additional module 提供的 C 实现更快.

db<> fiddle here

相关的,有更多的细节和选项:

关于sql - 从多个表中获取组合行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57246321/

相关文章:

sql - 如何对表格中的项目重新排序

sql - 调用远程标量值函数时出错

postgresql - PostGIS 查找包含特定点的所有圆圈 - 如何定义索引

sql - 选择表中元组的最大值

sql - 如何将计数列添加到查询中?

macos - 在 macosx 上安装 pgroonga(使用 Postgres.app)

java - Postgresql 格式化日期

mysql - 在 GROUP BY 中使用 LIMIT 来获得每组 N 个结果?

php - 计算 MySQL 中特定行的平均值

sql - 最近执行的 SQL 不在 V$SQL 中