arrays - Postgresql IN 与 ANY 运算符与子查询的性能差异

我有两个查询做同样的事情。
1

    SELECT *
    FROM "Products_product"
    WHERE ("Products_product"."id" IN
           (SELECT U0."product_id"
            FROM "Products_purchase" U0
            WHERE (U0."state" = 1
                   AND U0."user_id" = 5))
           AND "Products_product"."state" IN (1,
                                              6,
                                              3)
           AND UPPER("Products_product"."title" :: TEXT) LIKE UPPER('%toronto%'))
    ORDER BY "Products_product"."title_index" ASC
    LIMIT 10;

SELECT *
FROM "Products_product"
WHERE ("Products_product"."id" = ANY (ARRAY(
       (SELECT U0."product_id"
        FROM "Products_purchase" U0
        WHERE (U0."state" = 1
               AND U0."user_id" = 5))))
       AND "Products_product"."state" IN (1,
                                          6,
                                          3)
       AND UPPER("Products_product"."title" :: TEXT) LIKE UPPER('%toronto%'))
ORDER BY "Products_product"."title_index" ASC
LIMIT 10;

唯一的区别是第一个使用 IN用于子查询，而第二次使用 =ANY(ARRAY()) .但是第二个比第一个快了大约 10 倍。
我运行了解释，我得到了这两个结果:
1

 Limit  (cost=5309.92..5309.93 rows=1 width=1906) (actual time=3414.185..3414.190 rows=10 loops=1)
   ->  Sort  (cost=5309.92..5309.93 rows=1 width=1906) (actual time=3414.184..3414.185 rows=10 loops=1)
         Sort Key: "Products_product".title
         Sort Method: quicksort  Memory: 57kB
         ->  Nested Loop Semi Join  (cost=92.66..5309.91 rows=1 width=1906) (actual time=3385.153..3414.099 rows=16 loops=1)
               ->  Bitmap Heap Scan on "Products_product"  (cost=13.85..256.32 rows=61 width=1906) (actual time=3381.327..3384.430 rows=63 loops=1)
                     Recheck Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))
                     Rows Removed by Index Recheck: 1
                     Heap Blocks: exact=64
                     ->  Bitmap Index Scan on "Products_product_state_id_upper_idx"  (cost=0.00..13.83 rows=61 width=0) (actual time=3381.001..3381.001 rows=64 loops=1)
                           Index Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))
               ->  Bitmap Heap Scan on "Products_purchase" u0  (cost=78.82..82.84 rows=1 width=4) (actual time=0.467..0.467 rows=0 loops=63)
                     Recheck Cond: ((product_id = "Products_product".id) AND (user_id = 5))
                     Filter: (state = 1)
                     Heap Blocks: exact=16
                     ->  BitmapAnd  (cost=78.82..78.82 rows=1 width=0) (actual time=0.465..0.465 rows=0 loops=63)
                           ->  Bitmap Index Scan on "Products_purchase_product_id"  (cost=0.00..5.06 rows=84 width=0) (actual time=0.265..0.265 rows=30 loops=63)
                                 Index Cond: (product_id = "Products_product".id)
                           ->  Bitmap Index Scan on "Products_purchase_user_id"  (cost=0.00..72.57 rows=3752 width=0) (actual time=0.242..0.242 rows=3335 loops=51)
                                 Index Cond: (user_id = 5)
 Planning time: 7.540 ms
 Execution time: 3414.356 ms
(22 rows)

Limit  (cost=7378.07..7378.07 rows=1 width=1906) (actual time=116.559..116.562 rows=10 loops=1)
   InitPlan 1 (returns $0)
     ->  Index Scan using "Products_purchase_user_id" on "Products_purchase" u0  (cost=0.43..7329.83 rows=3752 width=4) (actual time=0.021..15.535 rows=3335 loops=1)
           Index Cond: (user_id = 5)
           Filter: (state = 1)
   ->  Sort  (cost=48.24..48.25 rows=1 width=1906) (actual time=116.558..116.559 rows=10 loops=1)
         Sort Key: "Products_product".title
         Sort Method: quicksort  Memory: 57kB
         ->  Bitmap Heap Scan on "Products_product"  (cost=44.20..48.23 rows=1 width=1906) (actual time=116.202..116.536 rows=16 loops=1)
               Recheck Cond: ((id = ANY ($0)) AND (upper((title)::text) ~~ '%TORONTO%'::text))
               Filter: (state = ANY ('{1,6,3}'::integer[]))
               Rows Removed by Filter: 2
               Heap Blocks: exact=18
               ->  Bitmap Index Scan on "Products_product_id_upper_idx1"  (cost=0.00..44.20 rows=1 width=0) (actual time=116.103..116.103 rows=18 loops=1)
                     Index Cond: ((id = ANY ($0)) AND (upper((title)::text) ~~ '%TORONTO%'::text))
 Planning time: 1.054 ms
 Execution time: 116.663 ms
(17 rows)

从文档来看，IN 之间没有实质性区别。或 ANY .但为什么我得到如此不同的结果。是吗ANY优于IN任何状况之下？

更新:
有人指出这个问题可能与 IN vs ANY operator in PostgreSQL 重复.
它们是相同的问题，但该问题的答案并没有解决我的问题，因为除了该答案之外，我还有更详细的案例。

But the second variant of each is not equivalent to the other. The second variant of the ANY construct takes an array (must be an actual array type), while the second variant of IN takes a comma-separated list of values. This leads to different restrictions in passing values and can also lead to different query plans in special cases:

https://dba.stackexchange.com/a/125500/3684

Pass multiple sets or arrays of values to a function

在我的问题中，这两个问题都不是。我只是将一个数组作为子查询传递。而我的情况与第一个 URL 正好相反。我的索引只用在 ANY但不在 IN .所以基本上，这个答案并没有解决我的问题。

更新2:
我更新索引:CREATE INDEX ON "Products_product" USING GIST (state, id, upper((title) :: TEXT) gist_trgm_ops); .我可以确认两个查询的情况相同，这意味着索引存在于那里，但第一个不使用它。

更新3:
我只是删除 ARRAY在代码中。但结果是一样的。

 explain analyze SELECT *
FROM "Products_product"
WHERE ("Products_product"."id" = ANY(
       (SELECT U0."product_id"
        FROM "Products_purchase" U0
        WHERE (U0."state" = 1
               AND U0."user_id" = 5)))
       AND "Products_product"."state" IN (1,
                                          6,
                                          3)
       AND UPPER("Products_product"."title" :: TEXT) LIKE UPPER('%toronto%'))
ORDER BY "Products_product"."title" ASC
LIMIT 10;
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5309.92..5309.93 rows=1 width=1906) (actual time=228.980..228.983 rows=10 loops=1)
   ->  Sort  (cost=5309.92..5309.93 rows=1 width=1906) (actual time=228.979..228.980 rows=10 loops=1)
         Sort Key: "Products_product".title
         Sort Method: quicksort  Memory: 57kB
         ->  Nested Loop Semi Join  (cost=92.66..5309.91 rows=1 width=1906) (actual time=216.392..228.913 rows=16 loops=1)
               ->  Bitmap Heap Scan on "Products_product"  (cost=13.85..256.32 rows=61 width=1906) (actual time=214.332..215.260 rows=63 loops=1)
                     Recheck Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))
                     Rows Removed by Index Recheck: 1
                     Heap Blocks: exact=64
                     ->  Bitmap Index Scan on "Products_product_state_id_upper_idx"  (cost=0.00..13.83 rows=61 width=0) (actual time=214.296..214.296 rows=64 loops=1)
                           Index Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))
               ->  Bitmap Heap Scan on "Products_purchase" u0  (cost=78.82..82.84 rows=1 width=4) (actual time=0.215..0.215 rows=0 loops=63)
                     Recheck Cond: ((product_id = "Products_product".id) AND (user_id = 5))
                     Filter: (state = 1)
                     Heap Blocks: exact=16
                     ->  BitmapAnd  (cost=78.82..78.82 rows=1 width=0) (actual time=0.212..0.212 rows=0 loops=63)
                           ->  Bitmap Index Scan on "Products_purchase_product_id"  (cost=0.00..5.06 rows=84 width=0) (actual time=0.017..0.017 rows=30 loops=63)
                                 Index Cond: (product_id = "Products_product".id)
                           ->  Bitmap Index Scan on "Products_purchase_user_id"  (cost=0.00..72.57 rows=3752 width=0) (actual time=0.239..0.239 rows=3335 loops=51)
                                 Index Cond: (user_id = 5)
 Planning time: 5.083 ms
 Execution time: 229.904 ms
(22 rows)

我认为 ANY 并非如此或 ANY(ARRAY()) ，这只是IN之间的区别和 ANY

最佳答案

不同的计划不是由IN造成的对比 = ANY ，但通过附加 ARRAY()围绕第二个查询中的子选择。没有那个，计划是相同的。

不同之处在于，在慢速执行中，索引扫描需要很长时间，而在您编辑的(完全相同的)计划中，相同的扫描速度很快:

减缓:

->  Bitmap Index Scan on "Products_product_state_id_upper_idx"  (cost=0.00..13.83 rows=61 width=0) (actual time=3381.001..3381.001 rows=64 loops=1)
      Index Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))

快速地:

->  Bitmap Index Scan on "Products_product_state_id_upper_idx"  (cost=0.00..13.83 rows=61 width=0) (actual time=214.296..214.296 rows=64 loops=1)
      Index Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))

同样有趣的是，在慢速计划中，生成索引扫描的第一行需要 3 秒。

无论问题是什么，都是暂时的。唯一想到的是killed index tuples : 大量删除留下了许多指向死堆元组的索引元组，这些元组只需要第一次扫描，因为之后它们被标记为死。

你有没有批量删除？

关于arrays - Postgresql IN 与 ANY 运算符与子查询的性能差异，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56917588/

arrays - Postgresql IN 与 ANY 运算符与子查询的性能差异

上一篇：Android RTL 支持形状角半径

下一篇：time - 您可以在同一仪表板上的不同面板上设置不同的时间范围吗？