postgresql - Postgresql 下的连接中未使用 JSON 表达式索引

因此，我在 postgresql 数据库中有两个表，一个名为 foo，其结构如下:

Column | Type
-------------
id     | integer
raw    | json

另一个叫做bar，看起来像这样:

Column | Type
-------------
id     | integer
action | character varying

foo 在表达式上有一个索引，如下所示:"foo_expr_idx"btree ((raw ->> 'action'::text))

这个索引有效。尽管 foo 有数百万行，但我可以运行如下查询:

SELECT * from foo WHERE raw->>'action' = 'open'

并且它使用索引并且不进行顺序扫描。我使用 EXPLAIN ANALYZE 验证了这一点。

但是，当我在连接中使用该表达式时，它不使用索引。所以像这样加入:

SELECT * from bar LEFT OUTER JOIN foo ON (bar.action = foo.raw->>'action');

速度慢得可怕。当我检查它在做什么时，它肯定在进行顺序扫描。我怎样才能使 postgresql 在这样的连接中使用表达式索引？

explain 的输出如下所示:

Merge Left Join  (cost=1101140.74..1207570.52 rows=5560478 width=175)        (actual time=815671.230..873493.479 rows=16673 loops=1)
   Output: bar.id, bar.action, foo.id, foo.raw, 
   Merge Cond: ((bar.action)::text = ((foo.raw ->> 'action'::text)))
   ->  Sort  (cost=1719.29..1745.81 rows=10607 width=131) (actual  time=47.439..60.859 rows=10628 loops=1)
       Output: bar.id, bar.action
       Sort Key: bar.action
       Sort Method: external merge  Disk: 1024kB
       ->  Seq Scan on public.bar  (cost=0.00..282.07 rows=10607 width=131) (actual time=0.008..10.186 rows=10628 loops=1)
           Output: bar.id, bar.action
   ->  Materialize  (cost=1099421.45..1117505.18 rows=3616747 width=44) (actual time=815623.382..864899.131 rows=3614363 loops=1)
           Output: foo.id, foo.raw, ((foo.raw ->> 'action'::text))
           ->  Sort  (cost=1099421.45..1108463.31 rows=3616747 width=44) (actual time=815623.356..851265.324 rows=3608287 loops=1)
               Output: foo.id, foo.raw, ((foo.raw ->> 'action'::text))
               Sort Key: ((foo.raw ->> 'action'::text))
               Sort Method: external merge  Disk: 2611952kB
               ->  Seq Scan on public.foo  (cost=0.00..371670.47 rows=3616747 width=44) (actual time=0.052..121762.195 rows=3612522 loops=1)
                   Output: foo.id, foo.raw, (foo.raw ->> 'action'::text)
  Total runtime: 874110.670 ms
 (18 rows)

最佳答案

问题似乎是 Postgresql 对数据做出了一些不真实的假设。简单的解决方案是分析表...

运行后

analyze foo;

我的查询运行时间从 14 分钟减少到 259 毫秒。

关于postgresql - Postgresql 下的连接中未使用 JSON 表达式索引，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29966722/

postgresql - Postgresql 下的连接中未使用 JSON 表达式索引

上一篇：python - 通过 python/psycopg2 将 XML 转换为 Postgres

下一篇：postgresql - dokku + postgres : "the scheme postgres does not accept registry part"