我有一个帐户表、单位表和报告表。一个账户有多个单位(单位外键为account_id),一个单位有多个报表(报表外键为unit_id)。我要选择账户名称、该账户的总单位数和最后报告时间:
SELECT accounts.name AS account_name,
COUNT(units.id) AS unit_count,
(SELECT reports.time FROM reports INNER JOIN units ON units.id = reports.unit_id ORDER BY time desc LIMIT 1) AS last_reported_time
FROM accounts
INNER JOIN units ON accounts.id = units.account_id
INNER JOIN reports ON units.id = reports.unit_id
GROUP BY account_name, last_reported_time
ORDER BY unit_count desc;
此查询一直在运行,我不确定它是否按照我的预期进行。
一个账户有很多单位,一个单位有很多报告。我想显示每个给定帐户关联的所有单位的最新报告的时间。这个查询是否正确?如果没有,我该如何完成我的任务(如果可能,不使用脚本语言)?
解释的结果:
Sort (cost=21466114.58..21466547.03 rows=172980 width=38)
Sort Key: (count(public.units.id))
InitPlan 1 (returns $0)
-> Limit (cost=0.00..12.02 rows=1 width=8)
-> Nested Loop (cost=0.00..928988485.04 rows=77309416 width=8)
-> Index Scan Backward using index_reports_on_time on reports (cost=0.00..296291138.34 rows=77309416 width=12)
-> Index Scan using units_pkey on units (cost=0.00..8.17 rows=1 width=4)
Index Cond: (public.units.id = public.reports.unit_id)
-> GroupAggregate (cost=20807359.99..21446321.09 rows=172980 width=38)
-> Sort (cost=20807359.99..20966559.70 rows=63679885 width=38)
Sort Key: accounts.name, public.units.last_reported_time
-> Hash Join (cost=975.50..3846816.82 rows=63679885 width=38)
Hash Cond: (public.reports.unit_id = public.units.id)
-> Seq Scan on reports (cost=0.00..2919132.16 rows=77309416 width=4)
-> Hash (cost=961.43..961.43 rows=1126 width=38)
-> Hash Join (cost=16.37..961.43 rows=1126 width=38)
Hash Cond: (public.units.account_id = accounts.id)
-> Seq Scan on units (cost=0.00..928.67 rows=1367 width=28)
-> Hash (cost=11.72..11.72 rows=372 width=18)
-> Seq Scan on accounts (cost=0.00..11.72 rows=372 width=18)
最佳答案
大约 95% 的查询成本都在这里:
-> Sort (cost=20807359.99..20966559.70 rows=63679885 width=38)
Sort Key: accounts.name, public.units.last_reported_time
-> Hash Join (cost=975.50..3846816.82 rows=63679885 width=38)
Hash Cond: (public.reports.unit_id = public.units.id)
-> Seq Scan on reports (cost=0.00..2919132.16 rows=77309416 width=4)
reports.unit_id
有索引吗?如果没有,您绝对应该添加一个。
除此之外,输出列 unit_count
似乎给出了每个帐户的单位数,但是在所有连接之后计算它,然后按它排序非常浪费。选择列表中的子查询对我来说有些神秘;我假设您想要每个单位的最近报告时间,但它只会给您最后一次报告所有单位的总和。试试这个:
SELECT a.account_name, u.unit_count, r.last_reported_time
FROM account a
JOIN (
SELECT account_id, COUNT(*) AS unit_count
FROM units
GROUP BY 1) u ON u.account_id = a.id
LEFT JOIN ( -- allow for units that have not yet submitted a report
SELECT u.account_id, max(r.time) AS last_reported_time
FROM reports r
JOIN units u ON u.id = r.unit_id
GROUP BY 1) r ON r.account_id = a.id
ORDER BY 2 DESC;
关于postgresql - 来自连接表的 select 语句中的子查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37915335/