这是对我们在生产中遇到的更复杂情况的简化提炼。可以在 https://drive.google.com/file/d/0B2I7_NGvCSVOT3ZNNWhpeFdFbTg/view?usp=sharing 找到用于此测试用例的数据和设置。 .

背景

我有两个非常相似的专用虚拟机运行 PostgreSQL。一个运行 PG 8.4，另一个运行 PG 9.4，但两者都使用几乎相同的配置。下表列出了一些其他差异。

这个问题有两个部分:

为什么与 9.4 相比，PG 8.4 为版本 1 查询选择了更快的查询计划？两个计划的计算成本相似，但实际花费的时间在 PG 9.4 中要多 10 倍。
为什么将 WHERE 子句改为引用 a.id 而不是 r.a_id 会如此显着地修改查询计划？

系统信息

                  | PG 8.4     | PG 9.4
:---------------- | :--------- |
OS                | CentOS 5.5 | Ubuntu 14.04
RAM               | 16GB       | 16GB
CPUs              | 4 x vCPU   | 4 x vCPU
VMware VM version | 4          | 8
Disk Size         | 50GB       | 200GB

系统基准

                                | PG 8.4   | PG 9.4
:------------------------------ | :--------|
dd write (32GB)                 | 38 MB/s  | 277 MB/s
dd read (32GB)                  | 241 MB/s | 243 MB/s
bonnie++ 1.03 block write K/sec | 208941   | 248528
bonnie++ 1.03 block read K/sec  | 172184   | 321814
bonnie++ seek /sec              | 543.5    | 1559.8
pgbench (-s 1000, -t 2000) TPS  | 345      | 325

查询

版本 1 ```

EXPLAIN ANALYZE SELECT DISTINCT
    t.id
FROM
    a
INNER JOIN b --USING(a_id)
    ON b.a_id = a.id
INNER JOIN r -- USING(a_id)
    ON r.a_id = a.id
INNER JOIN t
    ON t.session_id = '1'
        AND a.inst_id = t.inst_id
        AND b.study_id = t.study_id
        AND r.q_id = t.q_id
WHERE
    r.a_id IN (1, 2, 3)
    AND (
        r.q_id in ('q1', 'q2', 'q3') OR
        r.q_id in ('q4', 'q5', 'q6') OR
        r.q_id in ('q7', 'q8', 'q9') OR
        r.q_id in ('q10', 'q11', 'q12')
    )

```

版本 2 ```

EXPLAIN ANALYZE SELECT DISTINCT
    t.id
FROM
    a
INNER JOIN b --USING(a_id)
    ON b.a_id = a.id
INNER JOIN r -- USING(a_id)
    ON r.a_id = a.id
INNER JOIN t
    ON t.session_id = '1'
        AND a.inst_id = t.inst_id
        AND b.study_id = t.study_id
        AND r.q_id = t.q_id
WHERE
    a.id IN (1, 2, 3) -- << THIS IS WHAT CHANGED
    AND (
        r.q_id in ('q1', 'q2', 'q3') OR
        r.q_id in ('q4', 'q5', 'q6') OR
        r.q_id in ('q7', 'q8', 'q9') OR
        r.q_id in ('q10', 'q11', 'q12')
    )

```

查询性能

                | PG 8.4 | PG 9.4 |
 -------------- | ------ | ------ |
 version 1 (ms) | 0.718  | 12.355 |
 version 2 (ms) | 1.799  | 3.288  |

解释计划

PG 8.4，版本 1

"HashAggregate  (cost=63.78..63.79 rows=1 width=4) (actual time=0.603..0.603 rows=1 loops=1)"
"  ->  Hash Join  (cost=61.02..63.78 rows=1 width=4) (actual time=0.540..0.593 rows=1 loops=1)"
"        Hash Cond: ((b.a_id = a.id) AND (b.study_id = t.study_id))"
"        ->  Seq Scan on b  (cost=0.00..2.00 rows=100 width=8) (actual time=0.015..0.041 rows=100 loops=1)"
"        ->  Hash  (cost=60.99..60.99 rows=2 width=16) (actual time=0.513..0.513 rows=1 loops=1)"
"              ->  Hash Join  (cost=58.22..60.99 rows=2 width=16) (actual time=0.435..0.511 rows=1 loops=1)"
"                    Hash Cond: ((a.id = r.a_id) AND ((a.inst_id)::text = (t.inst_id)::text))"
"                    ->  Seq Scan on a  (cost=0.00..2.00 rows=100 width=6) (actual time=0.005..0.026 rows=100 loops=1)"
"                    ->  Hash  (cost=58.13..58.13 rows=6 width=44) (actual time=0.418..0.418 rows=3 loops=1)"
"                          ->  Hash Join  (cost=17.54..58.13 rows=6 width=44) (actual time=0.044..0.416 rows=3 loops=1)"
"                                Hash Cond: ((r.q_id)::text = (t.q_id)::text)"
"                                ->  Seq Scan on r  (cost=0.00..40.44 rows=23 width=7) (actual time=0.014..0.368 rows=34 loops=1)"
"                                      Filter: ((a_id = ANY ('{1,2,3}'::integer[])) AND (((q_id)::text = ANY ('{q1,q2,q3}'::text[])) OR ((q_id)::text = ANY ('{q4,q5,q6}'::text[])) OR ((q_id)::text = ANY ('{q7,q8,q9}'::text[])) OR ((q_id)::text = ANY ('{q10, (...)"
"                                ->  Hash  (cost=17.50..17.50 rows=3 width=72) (actual time=0.020..0.020 rows=1 loops=1)"
"                                      ->  Seq Scan on t  (cost=0.00..17.50 rows=3 width=72) (actual time=0.006..0.016 rows=1 loops=1)"
"                                            Filter: ((session_id)::text = '1'::text)"
"Total runtime: 0.718 ms"

PG 8.4，版本 2

"HashAggregate  (cost=61.77..61.78 rows=1 width=4) (actual time=1.685..1.686 rows=1 loops=1)"
"  ->  Hash Join  (cost=22.41..61.77 rows=1 width=4) (actual time=0.243..1.677 rows=1 loops=1)"
"        Hash Cond: (((a.inst_id)::text = (t.inst_id)::text) AND (b.study_id = t.study_id) AND ((r.q_id)::text = (t.q_id)::text))"
"        ->  Hash Join  (cost=4.85..43.94 rows=23 width=9) (actual time=0.203..1.626 rows=34 loops=1)"
"              Hash Cond: (r.a_id = b.a_id)"
"              ->  Seq Scan on r  (cost=0.00..35.95 rows=776 width=7) (actual time=0.024..1.120 rows=1198 loops=1)"
"                    Filter: (((q_id)::text = ANY ('{q1,q2,q3}'::text[])) OR ((q_id)::text = ANY ('{q4,q5,q6}'::text[])) OR ((q_id)::text = ANY ('{q7,q8,q9}'::text[])) OR ((q_id)::text = ANY ('{q10,q11,q12}'::text[])))"
"              ->  Hash  (cost=4.82..4.82 rows=3 width=14) (actual time=0.138..0.138 rows=3 loops=1)"
"                    ->  Hash Join  (cost=2.41..4.82 rows=3 width=14) (actual time=0.057..0.135 rows=3 loops=1)"
"                          Hash Cond: (b.a_id = a.id)"
"                          ->  Seq Scan on b  (cost=0.00..2.00 rows=100 width=8) (actual time=0.006..0.049 rows=100 loops=1)"
"                          ->  Hash  (cost=2.38..2.38 rows=3 width=6) (actual time=0.040..0.040 rows=3 loops=1)"
"                                ->  Seq Scan on a  (cost=0.00..2.38 rows=3 width=6) (actual time=0.008..0.035 rows=3 loops=1)"
"                                      Filter: (id = ANY ('{1,2,3}'::integer[]))"
"        ->  Hash  (cost=17.50..17.50 rows=3 width=72) (actual time=0.020..0.020 rows=1 loops=1)"
"              ->  Seq Scan on t  (cost=0.00..17.50 rows=3 width=72) (actual time=0.008..0.016 rows=1 loops=1)"
"                    Filter: ((session_id)::text = '1'::text)"
"Total runtime: 1.799 ms"

PG 9.4，版本 1

"HashAggregate  (cost=63.54..63.55 rows=1 width=4) (actual time=11.393..11.394 rows=1 loops=1)"
"  Group Key: t.id"
"  ->  Nested Loop  (cost=19.96..63.54 rows=1 width=4) (actual time=0.223..11.387 rows=1 loops=1)"
"        Join Filter: ((b.a_id = r.a_id) AND ((t.q_id)::text = (r.q_id)::text))"
"        Rows Removed by Join Filter: 1155"
"        ->  Hash Join  (cost=19.96..22.72 rows=1 width=44) (actual time=0.202..0.294 rows=34 loops=1)"
"              Hash Cond: ((b.a_id = a.id) AND (b.study_id = t.study_id))"
"              ->  Seq Scan on b  (cost=0.00..2.00 rows=100 width=8) (actual time=0.016..0.030 rows=100 loops=1)"
"              ->  Hash  (cost=19.93..19.93 rows=2 width=44) (actual time=0.174..0.174 rows=34 loops=1)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 2kB"
"                    ->  Hash Join  (cost=17.54..19.93 rows=2 width=44) (actual time=0.079..0.155 rows=34 loops=1)"
"                          Hash Cond: ((a.inst_id)::text = (t.inst_id)::text)"
"                          ->  Seq Scan on a  (cost=0.00..2.00 rows=100 width=6) (actual time=0.007..0.026 rows=100 loops=1)"
"                          ->  Hash  (cost=17.50..17.50 rows=3 width=72) (actual time=0.025..0.025 rows=1 loops=1)"
"                                Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"                                ->  Seq Scan on t  (cost=0.00..17.50 rows=3 width=72) (actual time=0.012..0.021 rows=1 loops=1)"
"                                      Filter: ((session_id)::text = '1'::text)"
"                                      Rows Removed by Filter: 35"
"        ->  Seq Scan on r  (cost=0.00..40.44 rows=25 width=7) (actual time=0.008..0.314 rows=34 loops=34)"
"              Filter: ((a_id = ANY ('{1,2,3}'::integer[])) AND (((q_id)::text = ANY ('{q1,q2,q3}'::text[])) OR ((q_id)::text = ANY ('{q4,q5,q6}'::text[])) OR ((q_id)::text = ANY ('{q7,q8,q9}'::text[])) OR ((q_id)::text = ANY ('{q10,q11,q12}'::text[]))))"
"              Rows Removed by Filter: 1164"
"Planning time: 0.856 ms"
"Execution time: 11.499 ms"

PG 9.4，版本 2

"HashAggregate  (cost=62.23..62.24 rows=1 width=4) (actual time=2.197..2.197 rows=1 loops=1)"
"  Group Key: t.id"
"  ->  Nested Loop  (cost=19.95..62.22 rows=1 width=4) (actual time=0.193..2.189 rows=1 loops=1)"
"        Join Filter: ((b.a_id = r.a_id) AND ((a.inst_id)::text = (t.inst_id)::text) AND (b.study_id = t.study_id))"
"        Rows Removed by Join Filter: 299"
"        ->  Hash Join  (cost=17.54..56.68 rows=12 width=44) (actual time=0.065..1.761 rows=100 loops=1)"
"              Hash Cond: ((r.q_id)::text = (t.q_id)::text)"
"              ->  Seq Scan on r  (cost=0.00..35.95 rows=819 width=7) (actual time=0.030..1.271 rows=1198 loops=1)"
"                    Filter: (((q_id)::text = ANY ('{q1,q2,q3}'::text[])) OR ((q_id)::text = ANY ('{q4,q5,q6}'::text[])) OR ((q_id)::text = ANY ('{q7,q8,q9}'::text[])) OR ((q_id)::text = ANY ('{q10,q11,q12}'::text[])))"
"              ->  Hash  (cost=17.50..17.50 rows=3 width=72) (actual time=0.022..0.022 rows=1 loops=1)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"                    ->  Seq Scan on t  (cost=0.00..17.50 rows=3 width=72) (actual time=0.008..0.018 rows=1 loops=1)"
"                          Filter: ((session_id)::text = '1'::text)"
"                          Rows Removed by Filter: 35"
"        ->  Materialize  (cost=2.41..4.83 rows=3 width=14) (actual time=0.001..0.003 rows=3 loops=100)"
"              ->  Hash Join  (cost=2.41..4.82 rows=3 width=14) (actual time=0.119..0.172 rows=3 loops=1)"
"                    Hash Cond: (b.a_id = a.id)"
"                    ->  Seq Scan on b  (cost=0.00..2.00 rows=100 width=8) (actual time=0.007..0.028 rows=100 loops=1)"
"                    ->  Hash  (cost=2.38..2.38 rows=3 width=6) (actual time=0.064..0.064 rows=3 loops=1)"
"                          Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"                          ->  Seq Scan on a  (cost=0.00..2.38 rows=3 width=6) (actual time=0.016..0.058 rows=3 loops=1)"
"                                Filter: (id = ANY ('{1,2,3}'::integer[]))"
"                                Rows Removed by Filter: 97"
"Planning time: 0.979 ms"
"Execution time: 2.309 ms"

更新

我想明确表示，我非常感谢所提供的调优和数据建模建议。然而，这个例子是系统范围问题的简化，我们希望找到一种方法，在不修改现有模式的情况下，将性能恢复到升级到 PG9.4 之前的水平。希望那是不可能的。

最佳答案

恕我直言，下面的查询要简单得多，至少阅读起来是这样。

EXPLAIN ANALYZE SELECT DISTINCT t.id
FROM t
INNER JOIN a ON a.inst_id = t.inst_id
INNER JOIN r ON r.a_id = a.id AND r.q_id = t.q_id
INNER JOIN b ON b.a_id = a.id AND b.study_id = t.study_id
WHERE t.session_id = '1'
  AND r.a_id IN (1, 2, 3)
  AND r.q_id IN ('q1', 'q2', 'q3'
                ,'q4', 'q5', 'q6'
                ,'q7', 'q8', 'q9'
                ,'q10', 'q11', 'q12')
    ;

为连续剧添加PRIMARY KEY约束会有很大帮助
将 FOREIGN KEY 约束添加到引用的 JOIN 字段(以及为引用字段添加 UNIQUE 约束)将更有帮助
为 FK 添加支持索引完成工作
即:在运行 VACUUM ANALYZE
顺便说一句，您的数据模型似乎包含一个循环。 {study_id,inst_id} in tables {a,t,b} 这可能表示冗余(或遗漏候选键)
您的新机器似乎具有快速搜索功能，您可以尝试将 random_page_cost 降低到大约 2。假设 effective_cache_size 和 shared_buffers 足够高。 (但是:在调整之前，让您的数据模型成形)

关于postgresql - Postgres 8.4 和 9.4 中的不同查询计划，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31259726/

postgresql - Postgres 8.4 和 9.4 中的不同查询计划

背景

系统信息

系统基准

查询

查询性能

解释计划

更新

上一篇：Postgresql 几何线 - 找到 Y 截距

下一篇：node.js - 使用 sails.js 和 postgreSQL 时出错(使用 sails-postgresql 模块)