performance - 为什么我在此 PostgreSQL 查询中得到 'Hash Join' 和 FTS?

标签 performance postgresql join hash indexing

我正在尝试优化以下场景:

以文字格式:我有 2 个表,alertsuser_devices;在 user_devices 中,我们跟踪连接到 user_id 的设备是否想要接收通知,在 alerts 表中,我们跟踪用户与通知者的关系.基本上,任务是选择每个具有任何警报的 user_id 并允许通知注册到它的任何设备。

表“警报”,大约 90 万条记录:

               Table "public.alerts"
   Column    |           Type           | Modifiers 
-------------+--------------------------+-----------
 id          | uuid                     | not null
 user_id     | uuid                     | 
 target_id   | uuid                     | 
 target_type | text                     | 
 added_on    | timestamp with time zone | 
 old_id      | text                     | 
Indexes:
    "alerts_pkey" PRIMARY KEY, btree (id)
    "one_alert_per_business_per_user" UNIQUE CONSTRAINT, btree (user_id, target_id)
    "addedon" btree (added_on)
    "targetid" btree (target_id)
    "userid" btree (user_id)
    "userid_targetid" btree (user_id, target_id)
Foreign-key constraints:
    "alerts_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)

表 'user_devices',大约 12k 条记录:

                Table "public.user_devices"
       Column        |           Type           | Modifiers 
---------------------+--------------------------+-----------
 id                  | uuid                     | not null
 user_id             | uuid                     | 
 device_id           | text                     | 
 device_token        | text                     | 
 push_notify_enabled | boolean                  | 
 device_type         | integer                  | 
 device_name         | text                     | 
 badge_count         | integer                  | 
 added_on            | timestamp with time zone | 
Indexes:
    "user_devices_pkey" PRIMARY KEY, btree (id)
    "push_notification" btree (push_notify_enabled)
    "user_id" btree (user_id)
    "user_id_push_notification" btree (user_id, push_notify_enabled)
Foreign-key constraints:
    "user_devices_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)

以下查询:

select COUNT(DISTINCT a.user_id) 
from alerts a 
  inner join user_devices ud on a.user_id = ud.user_id 
WHERE ud.push_notify_enabled = true;

大约需要 3 秒并生成以下计划:

explain select COUNT(DISTINCT a.user_id) from alerts a inner join user_devices ud on a.user_id = ud.user_id WHERE ud.push_notify_enabled = true;
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Aggregate  (cost=49777.32..49777.33 rows=1 width=16)
   ->  Hash Join  (cost=34508.97..48239.63 rows=615074 width=16)
         Hash Cond: (ud.user_id = a.user_id)
         ->  Seq Scan on user_devices ud  (cost=0.00..480.75 rows=9202 width=16)
               Filter: push_notify_enabled
         ->  Hash  (cost=20572.32..20572.32 rows=801732 width=16)
               ->  Seq Scan on alerts a  (cost=0.00..20572.32 rows=801732 width=16)

我错过了什么,有没有办法加快速度?

谢谢。

== 编辑 ==

根据建议,尝试在连接内移动条件,没有区别:

=> explain select COUNT(DISTINCT a.user_id) from alerts a inner join user_devices ud on a.user_id = ud.user_id and ud.push_notify_enabled;
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Aggregate  (cost=49777.32..49777.33 rows=1 width=16)
   ->  Hash Join  (cost=34508.97..48239.63 rows=615074 width=16)
         Hash Cond: (ud.user_id = a.user_id)
         ->  Seq Scan on user_devices ud  (cost=0.00..480.75 rows=9202 width=16)
               Filter: push_notify_enabled
         ->  Hash  (cost=20572.32..20572.32 rows=801732 width=16)
               ->  Seq Scan on alerts a  (cost=0.00..20572.32 rows=801732 width=16)

所以,没有办法摆脱 2 FTS 吗?如果我至少能让它以某种方式使用“警报”表上的索引,那就太好了……

== 编辑 ==

添加`EXPLAIN ANALYZE。

=> explain ANALYZE select COUNT(DISTINCT a.user_id) from alerts a inner join user_devices ud on a.user_id = ud.user_id and ud.push_notify_enabled;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=49777.32..49777.33 rows=1 width=16) (actual time=5254.355..5254.356 rows=1 loops=1)
   ->  Hash Join  (cost=34508.97..48239.63 rows=615074 width=16) (actual time=1824.607..2863.635 rows=614768 loops=1)
         Hash Cond: (ud.user_id = a.user_id)
         ->  Seq Scan on user_devices ud  (cost=0.00..480.75 rows=9202 width=16) (actual time=0.048..16.784 rows=9186 loops=1)
               Filter: push_notify_enabled
         ->  Hash  (cost=20572.32..20572.32 rows=801732 width=16) (actual time=1824.229..1824.229 rows=801765 loops=1)
               Buckets: 4096  Batches: 32  Memory Usage: 990kB
               ->  Seq Scan on alerts a  (cost=0.00..20572.32 rows=801732 width=16) (actual time=0.047..878.429 rows=801765 loops=1)
 Total runtime: 5255.427 ms
(9 rows)

===编辑===

添加请求的配置。其中大部分是 Ubuntu PG9.1 默认值:

/etc/postgresql/9.1/main# cat postgresql.conf | grep -e "work_mem" -e "effective_cache" -e "shared_buff" -e "random_page_c"
shared_buffers = 24MB           # min 128kB
#work_mem = 1MB             # min 64kB
#maintenance_work_mem = 16MB        # min 1MB
#wal_buffers = -1           # min 32kB, -1 sets based on shared_buffers
#random_page_cost = 4.0         # same scale as above
#effective_cache_size = 128MB

最佳答案

用部分索引替换索引:

DROP INDEX    user_id_push_notification ;
CREATE INDEX    user_id_push_notification ON user_devices (user_id)
 WHERE push_notify_enabled =True
 ;

,并将 random_page_cost 设置为较低的值:

SET random_page_cost = 1.1;

使用 push_notification on user_devices ud 对我造成了索引扫描(< 300 毫秒)。 YMMV。

警报上的 seqscan 似乎或多或少是不可避免的,因为您期望 800K/900K := 88%) 行。恕我直言,索引扫描仅在行大小非常​​大时才有效。

更新:将用户表添加到查询中似乎强制执行三重索引扫描。 (但大约在同一时间)

explain  ANALYZE
select COUNT(DISTINCT a.user_id)
from alerts a
join user_devices ud on a.user_id = ud.user_id
join users us on a.user_id = us.id
WHERE ud.push_notify_enabled = true;

关于performance - 为什么我在此 PostgreSQL 查询中得到 'Hash Join' 和 FTS?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14404209/

相关文章:

java - Play Framework - 部署到 Heroku 时出错(postgresql 'user' 表)

sql - 强制内部查询在外部查询之前执行

php - 从多个表中获取图像

javascript - 如何在不阻塞 onload 或等待 onload 的情况下加载 iframe

c# - 循环 : calling a property twice, 和存储属性一次哪个更快?

sql - Postgresql,具有多个键的 jsonb 为每个键返回一行

mysql - 连接的子查询(存在/存在)

python - 距离矩阵的并行构造

performance - 如何提高 Golang 的编译速度?

sql - 您会推荐哪些优化工作来加速 PostgreSQL 中的查询执行?