PostgreSQL 比预期慢

我正在为商店目录设计应用程序，并且遇到了 PostgreSQL 相当慢的性能问题。

这是简化的数据库方案(实际上还有额外的表用于多对多关系): db scheme

我想根据所选目录类别(T 恤、包等)按属性(颜色、尺码、品牌等)实现过滤器

这里是查询为所选类别列表选择可用属性的示例。

SELECT DISTINCT T1.attribute_id
FROM item T0 LEFT OUTER JOIN item_attr_color T1 ON ( T0.id = T1.item_id ) 
WHERE T0.catalog_id IN (1, 2, 6, 7, 14, 23, 26, 31, 36, 37, 45, 67, 70, 76, 77, 81, 95, 112, 118, 119, 120, 10, 11, 29, 101, 12, 13, 16, 17, 19, 20, 30, 33, 35, 42, 43, 47, 48, 54, 57, 58, 69, 78, 109, 56, 64, 65, 66, 68, 71, 74, 75, 93, 72, 73, 87, 88, 96, 99, 103, 105, 108, 110);

目前数据库相当小~100k 条记录，但这个查询仍然花费了 400 毫秒，这是相当多的，因为我有 10 个不同的过滤器属性，而这些查询单独花费 4 秒，这是 Not Acceptable 。

我在所有重要字段上都有索引(btree 类型)，这是解释命令的输出

HashAggregate  (cost=28309.30..28309.43 rows=13 width=4) (actual time=343.343..343.347 rows=14 loops=1)
->  Hash Right Join  (cost=24284.42..28074.04 rows=94103 width=4) (actual time=185.278..315.749 rows=115745 loops=1)
     Hash Cond: (t1.item_id = t0.id)
     ->  Seq Scan on core_item_attr_colors t1  (cost=0.00..1797.13 rows=108913 width=8) (actual time=0.006..18.387 rows=107175 loops=1)
     ->  Hash  (cost=23108.13..23108.13 rows=94103 width=4) (actual time=185.182..185.182 rows=93778 loops=1)
           Buckets: 16384  Batches: 1  Memory Usage: 3297kB
           ->  Seq Scan on core_item t0  (cost=0.00..23108.13 rows=94103 width=4) (actual time=0.020..153.334 rows=93778 loops=1)
                 Filter: (catalog_id = ANY ('{1,2,6,7,14,23,26,31,36,37,45,67,70,76,77,81,95,112,118,119,120,10,11,29,101,12,13,16,17,19,20,30,33,35,42,43,47,48,54,57,58,69,78,109,56,64,65,66,68,71,74,75,93,72,73,87,88,96,99,103,105,108,110}'::integer[]))
                 Rows Removed by Filter: 19677
Total runtime: 361.231 ms

如您所见，它不使用任何索引，但我注意到类别数量的减少最终会迫使它使用索引:

 HashAggregate  (cost=18685.04..18685.17 rows=13 width=4) (actual time=166.760..166.764 rows=14 loops=1)
 ->  Hash Right Join  (cost=15515.08..18626.42 rows=23447 width=4) (actual time=56.499..156.865 rows=26501 loops=1)
     Hash Cond: (u2.item_id = u0.id)
     ->  Seq Scan on core_item_attr_colors u2  (cost=0.00..1797.13 rows=108913 width=8) (actual time=0.010..25.706 rows=107175 loops=1)
     ->  Hash  (cost=15221.99..15221.99 rows=23447 width=4) (actual time=56.444..56.444 rows=23099 loops=1)
           Buckets: 4096  Batches: 1  Memory Usage: 813kB
           ->  Bitmap Heap Scan on core_item u0  (cost=1058.03..15221.99 rows=23447 width=4) (actual time=9.732..45.643 rows=23099 loops=1)
                 Recheck Cond: (catalog_id = ANY ('{1,2,6,7,14,23,26,31,36,37,45,67,70,76,77,81,95,112,118,119}'::integer[]))
                 ->  Bitmap Index Scan on core_item_89ed0239  (cost=0.00..1052.17 rows=23447 width=0) (actual time=6.523..6.523 rows=23099 loops=1)
                       Index Cond: (catalog_id = ANY ('{1,2,6,7,14,23,26,31,36,37,45,67,70,76,77,81,95,112,118,119}'::integer[]))
Total runtime: 166.858 ms

我已经尝试用 sqllite 替换 postgresql 并且在完全相同的数据集上进行相同的查询得到了非常令人印象深刻的结果，它花费了不到 60 毫秒。

这是我的配置文件:

max_connections = 100
temp_buffers = 8MB
work_mem = 96MB
maintenance_work_mem = 512MB
effective_cache_size = 512MB

服务器有 6G RAM 和 SSD 磁盘。

我错过了什么？对于如何提高此处性能的任何建议，我将不胜感激。

更新 1: shared_buffers = 1024MB 并且它是 PostgreSQL v.9.3

最佳答案

首先，left join 是不必要的，除非您真的想要返回一个NULL 值(这是值得怀疑的)。所以，你可以这样写:

SELECT DISTINCT T1.attribute_id
FROM item T0 JOIN
     item_attr_color T1
     ON T0.id = T1.item_id
WHERE T0.catalog_id IN (1, 2, 6, 7, 14, 23, 26, 31, 36, 37, 45, 67, 70, 76, 77, 81, 95, 112, 118, 119, 120, 10, 11, 29, 101, 12, 13, 16, 17, 19, 20, 30, 33, 35, 42, 43, 47, 48, 54, 57, 58, 69, 78, 109, 56, 64, 65, 66, 68, 71, 74, 75, 93, 72, 73, 87, 88, 96, 99, 103, 105, 108, 110);

接下来，假设您有一个属性表，您可以去掉 distinct 并使用子查询:

select a.id
from attribute a
where exists (select 1
              from item_attr_color iac join
                   item i
                   on i.id = iac.item_id
              where i.catalog_id in ( . . .) and
                    iac.attribute_id = a.attribute_id
             );

然后，对于此查询，您需要以下索引:item(id, catalog_id)、item_attr_color(attribute_id, item_id)，当然还有 attribute( id).

通过引入索引并消除对 distinct 的处理，这可能有助于提高性能。

in 版本也值得一试:

select a.id
from attribute a
where a.attribute_id in (select iac.attribute_id
                         from item_attr_color iac join
                              item i
                              on i.id = iac.item_id
                         where i.catalog_id in ( . . .)
                        );

此查询的索引是:item(catalog_id, id)、item_attr_color(item_id, attribute_id) 当然还有 attribute(id).

关于PostgreSQL 比预期慢，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33566785/

PostgreSQL 比预期慢

上一篇：postgresql - Codenvy添加postgresql到项目报错

下一篇：python - Django 在 PostgreSQL 死机时不重新连接，需要自定义后端吗？