sql - 我的 PostgreSQL 10 查询非常慢,需要一种方法让它更快

标签 sql postgresql query-performance prisma

我正在使用 PostgreSQL 10

这是我的模型:
https://imgur.com/bibWSq8

每个评论只属于一个产品。每个 product 可以属于许多 category。每个类别 只能有一个父类别。 我正在使用 Prisma 查询数据库。这是一种 ORM。 我想选择属于 类别 的所有 product 的前 10 个 reviewid = 27.

这是 Prisma 生成的查询:

select
"Alias"."id"
from "database"."review" as "Alias"
where ("Alias"."id"
       in (select "database"."review"."id"
           from "database"."review"
           where "database"."review"."product"
                 in (select "database"."category_to_product"."product"
                     from "database"."category_to_product"
                     join "database"."category" as "category_product_Alias"
                        on "category_product_Alias"."id" = "database"."category_to_product"."category"
                     where ("category_product_Alias"."id" = 27
                            or "category_product_Alias"."id"
                               in (select "database"."category"."id"
                                   from "database"."category"
                                   join "database"."category" as "category_category_product_Alias"
                                      on "category_category_product_Alias"."id" = "database"."category"."parent"
                                   where "category_category_product_Alias"."id" = 27
                                  )
                           )
                    )
          )
      )
order by "Alias"."id" desc
limit 11
offset 0;

有 1.500.000 个评论、12.000 个产品 和 130 个类别。该查询需要将近 3 秒才能完成。

我尝试创建索引但没有成功:

CREATE UNIQUE INDEX category_pkey ON "database".category USING btree (id)
CREATE INDEX idx_category_parent ON "database".category USING btree (parent)
CREATE UNIQUE INDEX "category_to_product_AB_unique" ON "database".category_to_product USING btree (category, product)
CREATE INDEX "category_to_product_B" ON "database".category_to_product USING btree (product))
CREATE UNIQUE INDEX product_pkey ON "database".product USING btree (id)
CREATE INDEX idx_review_product ON "database".review USING btree (product)
CREATE UNIQUE INDEX review_pkey ON "database".review USING btree (id)

下面是运行 explain analyze 的结果:

Limit  (cost=9.00..101.89 rows=11 width=4) (actual time=3428.508..3431.048 rows=11 loops=1)
  ->  Merge Semi Join  (cost=9.00..12584725.82 rows=1490226 width=4) (actual time=3428.507..3431.043 rows=11 loops=1)
        Merge Cond: ("Alias".id = review.id)
        ->  Index Only Scan Backward using review_pkey on review "Alias"  (cost=0.43..84869.82 rows=1490226 width=4) (actual time=0.008..152.954 rows=1054436 loops=1)
              Heap Fetches: 0
        ->  Nested Loop Semi Join  (cost=8.57..12477502.61 rows=1490226 width=4) (actual time=3188.974..3191.303 rows=11 loops=1)
              ->  Index Scan Backward using review_pkey on review  (cost=0.43..266561.32 rows=1490226 width=8) (actual time=0.004..415.244 rows=1054436 loops=1)
              ->  Nested Loop  (cost=8.14..8.18 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1054436)
                    ->  Index Scan using "category_to_product_B" on category_to_product  (cost=0.29..0.30 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=1054436)
                          Index Cond: (product = review.product)
                    ->  Index Only Scan using category_pkey on category "category_product_Alias"  (cost=7.86..7.88 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=1084175)
                          Index Cond: (id = category_to_product.category)
                          Filter: ((id = 27) OR (hashed SubPlan 1))
                          Rows Removed by Filter: 1
                          Heap Fetches: 0
                          SubPlan 1
                            ->  Nested Loop  (cost=0.00..7.71 rows=1 width=4) (actual time=0.016..0.016 rows=0 loops=1)
                                  ->  Seq Scan on category  (cost=0.00..3.85 rows=1 width=8) (actual time=0.015..0.016 rows=0 loops=1)
                                        Filter: (parent = 27)
                                        Rows Removed by Filter: 148
                                  ->  Seq Scan on category "category_category_product_Alias"  (cost=0.00..3.85 rows=1 width=4) (never executed)
                                        Filter: (id = 27)
Planning time: 0.649 ms
Execution time: 3431.098 ms

我觉得我的数据不算大,但是查询太慢了。有什么方法可以让它更快吗?

更新 1 我只是按照@Laurenz Albe 的方式进行操作,速度更快。这是结果

Limit  (cost=217773.56..217773.59 rows=11 width=8) (actual time=735.033..735.041 rows=11 loops=1)
  ->  Sort  (cost=217773.56..221499.13 rows=1490226 width=8) (actual time=735.031..735.033 rows=11 loops=1)
        Sort Key: (("Alias".id + 0)) DESC
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Hash Semi Join  (cost=99929.33..184545.76 rows=1490226 width=8) (actual time=354.030..733.405 rows=13589 loops=1)
              Hash Cond: ("Alias".id = review.id)
              ->  Seq Scan on review "Alias"  (cost=0.00..60400.26 rows=1490226 width=4) (actual time=0.005..157.747 rows=1482065 loops=1)
              ->  Hash  (cost=81301.50..81301.50 rows=1490226 width=4) (actual time=350.842..350.842 rows=13589 loops=1)
                    Buckets: 2097152  Batches: 1  Memory Usage: 16862kB
                    ->  Hash Join  (cost=410.63..81301.50 rows=1490226 width=4) (actual time=3.363..347.392 rows=13589 loops=1)
                          Hash Cond: (review.product = category_to_product.product)
                          ->  Seq Scan on review  (cost=0.00..60400.26 rows=1490226 width=8) (actual time=0.011..144.852 rows=1482065 loops=1)
                          ->  Hash  (cost=326.86..326.86 rows=6702 width=4) (actual time=2.121..2.121 rows=100 loops=1)
                                Buckets: 8192  Batches: 1  Memory Usage: 68kB
                                ->  HashAggregate  (cost=259.84..326.86 rows=6702 width=4) (actual time=2.064..2.103 rows=100 loops=1)
                                      Group Key: category_to_product.product
                                      ->  Hash Join  (cost=12.86..243.08 rows=6702 width=4) (actual time=0.336..2.026 rows=100 loops=1)
                                            Hash Cond: (category_to_product.category = "category_product_Alias".id)
                                            ->  Seq Scan on category_to_product  (cost=0.00..194.03 rows=13403 width=8) (actual time=0.004..0.873 rows=12063 loops=1)
                                            ->  Hash  (cost=11.93..11.93 rows=74 width=4) (actual time=0.037..0.037 rows=1 loops=1)
                                                  Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                  ->  Seq Scan on category "category_product_Alias"  (cost=7.71..11.93 rows=74 width=4) (actual time=0.025..0.035 rows=1 loops=1)
                                                        Filter: ((id = 27) OR (hashed SubPlan 1))
                                                        Rows Removed by Filter: 147
                                                        SubPlan 1
                                                          ->  Nested Loop  (cost=0.00..7.71 rows=1 width=4) (actual time=0.015..0.015 rows=0 loops=1)
                                                                ->  Seq Scan on category  (cost=0.00..3.85 rows=1 width=8) (actual time=0.015..0.015 rows=0 loops=1)
                                                                      Filter: (parent = 27)
                                                                      Rows Removed by Filter: 148
                                                                ->  Seq Scan on category "category_category_product_Alias"  (cost=0.00..3.85 rows=1 width=4) (never executed)
                                                                      Filter: (id = 27)
Planning time: 0.591 ms
Execution time: 735.127 ms

更新 2 我试图简化查询:

explain analyze select
"review"."id"
from "review"
where "review"."product" in
(
select "category_to_product"."product"
from "category_to_product"
join "category"
on "category"."id" = "category_to_product"."category"
where "category"."id" = 27 or "category"."parent" = 27
)
order by "reviewty$dev"."review"."id" desc
limit 11
offset 0;

但结果变化不大

Limit  (cost=0.86..456.52 rows=11 width=4) (actual time=3354.756..3357.181 rows=11 loops=1)
  ->  Nested Loop Semi Join  (cost=0.86..1019733.07 rows=24617 width=4) (actual time=3354.754..3357.176 rows=11 loops=1)
        ->  Index Scan Backward using review_pkey on review  (cost=0.43..266561.32 rows=1490226 width=8) (actual time=0.007..391.076 rows=1054436 loops=1)
        ->  Nested Loop  (cost=0.43..0.50 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1054436)
              ->  Index Scan using "category_to_product_B" on category_to_product  (cost=0.29..0.30 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=1054436)
                    Index Cond: (product = review.product)
              ->  Index Scan using category_pkey on category  (cost=0.14..0.17 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=1084175)
                    Index Cond: (id = category_to_product.category)
                    Filter: ((id = 27) OR (parent = 27))
                    Rows Removed by Filter: 1
Planning time: 0.434 ms
Execution time: 3357.210 ms

我现在唯一能做的就是在 order by "Alias"."id" 之后附加 + 0。很遗憾,正如我所说,这个查询是由 Prisma (prisma.io) 生成的,而不是由我生成的,我想编写 native sql。

更新 3 @Ancoron 是对的,set enable_neSTLoop = off 在运行我的查询之前将使它更快。它强制 PostgreSQL 使用 hash join 而不是 nested loop

Limit  (cost=10000238022.63..10000238023.45 rows=11 width=4) (actual time=629.606..629.804 rows=11 loops=1)
  ->  Merge Semi Join  (cost=10000238022.63..10000348970.97 rows=1490226 width=4) (actual time=629.605..629.797 rows=11 loops=1)
        Merge Cond: ("Alias".id = review.id)
        ->  Index Only Scan Backward using review_pkey on review "Alias"  (cost=0.43..84869.82 rows=1490226 width=4) (actual time=0.006..152.252 rows=1054436 loops=1)
              Heap Fetches: 0
        ->  Sort  (cost=10000238022.20..10000241747.77 rows=1490226 width=4) (actual time=390.996..391.000 rows=11 loops=1)
              Sort Key: review.id DESC
              Sort Method: quicksort  Memory: 1021kB
              ->  Hash Semi Join  (cost=10000000604.70..10000085221.14 rows=1490226 width=4) (actual time=4.306..388.164 rows=13589 loops=1)
                    Hash Cond: (review.product = category_to_product.product)
                    ->  Seq Scan on review  (cost=0.00..60400.26 rows=1490226 width=8) (actual time=0.004..157.976 rows=1482065 loops=1)
                    ->  Hash  (cost=10000000529.30..10000000529.30 rows=6032 width=4) (actual time=0.617..0.617 rows=100 loops=1)
                          Buckets: 8192  Batches: 1  Memory Usage: 68kB
                          ->  Merge Join  (cost=10000000008.29..10000000529.30 rows=6032 width=4) (actual time=0.555..0.603 rows=100 loops=1)
                                Merge Cond: (category_to_product.category = "category_product_Alias".id)
                                ->  Index Only Scan using "category_to_product_AB_unique" on category_to_product  (cost=0.29..419.82 rows=12063 width=8) (actual time=0.007..0.374 rows=2272 loops=1)
                                      Heap Fetches: 1123
                                ->  Index Only Scan using category_pkey on category "category_product_Alias"  (cost=10000000007.86..10000000018.82 rows=74 width=4) (actual time=0.024..0.035 rows=1 loops=1)
                                      Filter: ((id = 27) OR (hashed SubPlan 1))
                                      Rows Removed by Filter: 147
                                      Heap Fetches: 0
                                      SubPlan 1
                                        ->  Nested Loop  (cost=10000000000.00..10000000007.71 rows=1 width=4) (actual time=0.015..0.015 rows=0 loops=1)
                                              ->  Seq Scan on category  (cost=0.00..3.85 rows=1 width=8) (actual time=0.015..0.015 rows=0 loops=1)
                                                    Filter: (parent = 27)
                                                    Rows Removed by Filter: 148
                                              ->  Seq Scan on category "category_category_product_Alias"  (cost=0.00..3.85 rows=1 width=4) (never executed)
                                                    Filter: (id = 27)
Planning time: 0.594 ms
Execution time: 629.857 ms

但我问自己,为什么我必须这样做,PostgreSQL 选择了错误的计划,它使用嵌套循环而不是散列连接,这让我的查询变慢了。它是成熟的数据库,所以当查询慢时我认为是我的错,我尝试创建索引,重写查询,希望 PostgreSQL 改变它的计划,但它没有。可以接受吗?另一件事,我确信我的查询在任何情况下都会运行得更快。 这是我的 Prisma 查询:

# Write your query or mutation here
query {
  reviews (where: {
    product:{
      categories_some: {
        OR:[
          {
            id: 27
          },
          {
            parent: {
              id: 27
            }
          }
        ]
      }
    }
  }, orderBy:id_DESC, first:11, skip:0){
    id
  }
}

我找不到其他方法来更改我的 Prisma 查询。

最佳答案

我的猜测是,由于表中行的对手分布,当 PostgreSQL 尝试使用索引扫描来获得正确的顺序时,有趣的行最后出现。

尝试通过将 ORDER BY 子句更改为

来避免索引扫描并使用显式排序
ORDER BY "Alias".id + 0 DESC

“对手分布”是什么意思?根据它的估计,PostgreSQL 认为有相当多的行满足条件,因此它认为如果按降序处理这些行是最便宜的 Alias.id 并一直这样做直到它找到了 11 行满足条件。即使猜测正确,也可能满足条件的(许多)行的 Alias.id 都较低,因此它必须计算比预期更多的行。

看到你的第二个执行计划,我怀疑至少部分问题是PostgreSQL高估了满足条件的行数:1490226而不是13589行。简化查询可能会有所帮助。

关于sql - 我的 PostgreSQL 10 查询非常慢,需要一种方法让它更快,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56126280/

相关文章:

php - 在 SQL 中搜索列中的数据行

sql - 使用 IN 与 = 运算符时在非主键列上使用聚簇索引

sql - SQL LIKE查找-,/,_

java - 在 Java 中将 Blob 转换为 bytea

php - 获取数据时性能低下

oracle - 1000万条记录的sql更新需要4天

mysql - 导入 50GB mysql 转储文件

sql - 以不同的顺序在多列中查找匹配项

SQL 查询值除以最大值(值)

SQL每月滚动总和