如何使用两个聚合函数(例如 double string_agg
或只是 sum
),但确保结果不包含由另一个聚合函数引起的重复(由第二次连接引起)?我使用 PostgreSQL。
示例
我有三个表:
create table boxes
(
id bigserial primary key,
name varchar(255)
);
create table animals
(
id bigserial primary key,
name varchar(255),
age numeric,
box_id bigint constraint animals_boxes_id references boxes
);
create table vegetables
(
id bigserial primary key,
name varchar(255),
weight numeric,
box_id bigint constraint vegatables_box_id references boxes
);
一些输入数据:
insert into boxes (name) values ('First box');
insert into animals (box_id, name, age) values (1, 'Cat', 2);
insert into animals (box_id, name, age) values (1, 'Cat', 3);
insert into animals (box_id, name, age) values (1, 'Dog', 5);
insert into vegetables (box_id, name, weight) values (1, 'Tomato', 20);
insert into vegetables (box_id, name, weight) values (1, 'Cucumber', 30);
insert into vegetables (box_id, name, weight) values (1, 'Potato', 50);
我想在框中获取动物名称:
select b.name as box_name,
string_agg(a.name, ', ' order by a.id) as animal_names
from boxes as b
left join animals a on b.id = a.box_id
group by b.name;
它有效:
但我也想得到蔬菜名称。但它不起作用:
select b.name as box_name,
string_agg(a.name, ', ' order by a.id) as animal_names,
string_agg(v.name, ', ' order by v.id) as vegatable_names
from boxes as b
left join animals a on b.id = a.box_id
left join vegetables v on b.id = v.box_id
group by b.name;
它会产生重复的动物名称和蔬菜名称:
结果应该是:
我不能只添加 distinct
删除重复项,因为:
- 表中的名称可以重复(两只动物的名称为
Cat
)。如果我使用distinct
它产生Cat, Dog
而不是Cat, Cat, Dog
. - 我使用
order by
在string_agg
(添加ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list
时会导致distinct
)。即使我删除order by
(string_agg(distinct a.name, ', ')
) 我无法使用它,因为第一点。
其他信息
它适用于所有聚合函数:string_agg
, array_agg
, json_object_agg
甚至sum
.
动物的总年龄:
select sum(a.age)
from boxes as b
left join animals a on b.id = a.box_id
-- left join vegetables v on b.id = v.box_id
group by b.name;
如果没有第二次连接,它会正确计算( 10
),但计算错误( 30
) - 由于重复。
最佳答案
基本问题解释如下:
对于较小的选择,每行聚合通常会更快。
使用LATERAL
子查询(更通用):
SELECT b.name AS box_name, a.*, v.*
FROM boxes b
LEFT JOIN LATERAL (
SELECT string_agg(a.name, ', ' ORDER BY a.id) AS animal_names
FROM animals a
WHERE a.box_id = b.id
) a ON true
LEFT JOIN LATERAL (
SELECT string_agg(v.name, ', ' ORDER BY v.id) AS vegetable_names
FROM vegetables v
WHERE v.box_id = b.id
) v ON true;
或者使用相关子查询(更简单,通常更快一点):
SELECT b.name AS box_name
, (SELECT string_agg(a.name, ', ' ORDER BY a.id)
FROM animals a
WHERE a.box_id = b.id) AS animal_names
, (SELECT string_agg(v.name, ', ' ORDER BY v.id)
FROM vegetables v
WHERE v.box_id = b.id) AS vegetable_names
FROM boxes b;
参见:
- What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
- Multiple array_agg() calls in a single query
聚合整个表时,速度更快:
SELECT b.name AS box_name, a.animal_names, v.vegetable_names
FROM boxes b
LEFT JOIN (
SELECT box_id, string_agg(a.name, ', ') AS animal_names
FROM (
SELECT box_id, id, name
FROM animals a
ORDER BY box_id, id
) a
GROUP BY 1
) a ON a.box_id = b.id
LEFT JOIN (
SELECT box_id, string_agg(v.name, ', ') AS vegetable_names
FROM (
SELECT box_id, id, name
FROM vegetables v
ORDER BY box_id, id
) v
GROUP BY 1
) v ON v.box_id = b.id;
请注意我如何在子查询中排序,这通常比按聚合排序更快。可选的优化。
旁白:关于测试设置中的 varchar(255)
:
关于SQL:两个没有重复的聚合函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76514703/