postgresql - 为什么包装的 PLPGSQL 函数中的查询比本身慢得多?

标签 postgresql performance query-optimization plpgsql postgresql-performance

使用 PostgreSQL 12.8 (Ubuntu 12.8-1.pgdg18.04+1) 一段时间了。普通查询和包含在具有相同输入的 PLPGSQL 函数中的相同查询之间存在显着的时间差异。表架构和查询描述如下:

  • 架构:
    CREATE TABLE public."AircraftData"
    (
        "uid" BIGINT NOT NULL,
        "time" BIGINT NOT NULL,
        "action" SMALLINT NOT NULL,
        "position" GEOMETRY NOT NULL
    )
    1. 正常查询 -> 1 秒 279 毫秒:
    SELECT
        A.uid AS uid,
        A.max_time AS time,
        ST_AsText
        (
            ST_Collect
            (
                ARRAY
                (
                    SELECT
                        "position"
                    FROM
                        public."AircraftData"
                    WHERE
                        "uid" = A.uid
                        AND
                        "time" BETWEEN 1617608651642 AND 1617657618453
                    ORDER BY
                        "time" DESC
                    LIMIT 254
                )
            )
        ) AS trajectory
    FROM
        (
            SELECT
                M.uid AS uid,
                M.max_time AS max_time
            FROM
                (
                    SELECT
                        "uid",
                        MAX("time") AS max_time
                    FROM
                        public."AircraftData"
                    WHERE
                        "time" BETWEEN 1617608651642 AND 1617657618453
                    GROUP BY
                        "uid"
                ) M
            INNER JOIN
                (
                    SELECT
                        "uid"
                    FROM
                        public."AircraftData"
                    WHERE
                        "time" > 1617657618453
                    GROUP BY
                        "uid"
                ) N
            ON M.uid = N.uid
        ) A
    INNER JOIN
        public."AircraftData" B
    ON
        A.uid = B.uid AND A.max_time = B.time AND B.action != 2
    ORDER BY
        "time" ASC
  • PLPGSQL 函数 -> 4 秒 90 毫秒:
    DROP FUNCTION IF EXISTS foo(BIGINT,BIGINT);
    CREATE OR REPLACE FUNCTION foo(last_server_time BIGINT, start_time BIGINT)
    RETURNS TABLE
    (
        "uid" BIGINT,
        "time" BIGINT,
        "trajectory" TEXT
    )
    AS $BODY$
    BEGIN
        RETURN QUERY
        SELECT
            A.uid AS uid,
            A.max_time AS time,
            ST_AsText
            (
                ST_Collect
                (
                    ARRAY
                    (
                        SELECT
                            "position"
                        FROM
                            public."AircraftData"
                        WHERE
                            "uid" = A.uid
                            AND
                            "time" BETWEEN $1 AND $2
                        ORDER BY
                            "time" DESC
                        LIMIT 254
                    )
                )
            ) AS trajectory
        FROM
            (
                SELECT
                    M.uid AS uid,
                    M.max_time AS max_time
                FROM
                    (
                        SELECT
                            "uid",
                            MAX("time") AS max_time
                        FROM
                            public."AircraftData"
                        WHERE
                            "time" BETWEEN $1 AND $2
                        GROUP BY
                            "uid"
                    ) M
                INNER JOIN
                    (
                        SELECT
                            "uid"
                        FROM
                            public."AircraftData"
                        WHERE
                            "time" > $2
                        GROUP BY
                            "uid"
                    ) N
                ON M.uid = N.uid
            ) A
        INNER JOIN
            public."AircraftData" B
        ON
            A.uid = B.uid AND A.max_time = B.time AND B.action != 2
        ORDER BY
            "time" ASC;
    END;
    $BODY$ LANGUAGE PLPGSQL;
  • 动态 PLPGSQL 函数 -> 4 秒 97 毫秒:
    DROP FUNCTION IF EXISTS foo(BIGINT,BIGINT);
    CREATE OR REPLACE FUNCTION foo(last_server_time BIGINT, start_time BIGINT)
    RETURNS TABLE
    (
        "uid" BIGINT,
        "time" BIGINT,
        "trajectory" TEXT
    )
    AS $BODY$
    BEGIN
        RETURN QUERY
        EXECUTE
        'SELECT
            A.uid AS uid,
            A.max_time AS time,
            ST_AsText
            (
                ST_Collect
                (
                    ARRAY
                    (
                        SELECT
                            "position"
                        FROM
                            public."AircraftData"
                        WHERE
                            "uid" = A.uid
                            AND
                            "time" BETWEEN $1 AND $2
                        ORDER BY
                            "time" DESC
                        LIMIT 254
                    )
                )
            ) AS trajectory
        FROM
            (
                SELECT
                    M.uid AS uid,
                    M.max_time AS max_time
                FROM
                    (
                        SELECT
                            "uid",
                            MAX("time") AS max_time
                        FROM
                            public."AircraftData"
                        WHERE
                            "time" BETWEEN $1 AND $2
                        GROUP BY
                            "uid"
                    ) M
                INNER JOIN
                    (
                        SELECT
                            "uid"
                        FROM
                            public."AircraftData"
                        WHERE
                            "time" > $2
                        GROUP BY
                            "uid"
                    ) N
                ON M.uid = N.uid
            ) A
        INNER JOIN
            public."AircraftData" B
        ON
            A.uid = B.uid AND A.max_time = B.time AND B.action != 2
        ORDER BY
            "time" ASC'
        USING last_server_time, start_time;
    END;
    $BODY$ LANGUAGE PLPGSQL;
  • SQL 函数 -> 2 分 28 秒:
    DROP FUNCTION IF EXISTS foo(BIGINT,BIGINT);
    CREATE OR REPLACE FUNCTION foo(last_server_time BIGINT, start_time BIGINT)
    RETURNS TABLE
    (
        "uid" BIGINT,
        "time" BIGINT,
        "trajectory" TEXT
    )
    AS $BODY$
        SELECT
            A.uid AS uid,
            A.max_time AS time,
            ST_AsText
            (
                ST_Collect
                (
                    ARRAY
                    (
                        SELECT
                            "position"
                        FROM
                            public."AircraftData"
                        WHERE
                            "uid" = A.uid
                            AND
                            "time" BETWEEN $1 AND $2
                        ORDER BY
                            "time" DESC
                        LIMIT 254
                    )
                )
            ) AS trajectory
        FROM
            (
                SELECT
                    M.uid AS uid,
                    M.max_time AS max_time
                FROM
                    (
                        SELECT
                            "uid",
                            MAX("time") AS max_time
                        FROM
                            public."AircraftData"
                        WHERE
                            "time" BETWEEN $1 AND $2
                        GROUP BY
                            "uid"
                    ) M
                INNER JOIN
                    (
                        SELECT
                            "uid"
                        FROM
                            public."AircraftData"
                        WHERE
                            "time" > $2
                        GROUP BY
                            "uid"
                    ) N
                ON M.uid = N.uid
            ) A
        INNER JOIN
            public."AircraftData" B
        ON
            A.uid = B.uid AND A.max_time = B.time AND B.action != 2
        ORDER BY
            "time" ASC;
    $BODY$ LANGUAGE SQL;

那么这个函数时间开销的原因是什么?

还有改进的余地吗?

更新 1

将标签STABLE添加到SQL函数(4)将执行时间更改为4秒67毫秒!

更新2

  • 解释查询 1 的分析输出:
    Sort  (cost=719260.66..719260.69 rows=12 width=48) (actual time=1543.822..1543.930 rows=200 loops=1)
      Sort Key: (max("AircraftData"."time"))
      Sort Method: quicksort  Memory: 1637kB
      Buffers: shared hit=895329, temp read=398 written=401
      ->  Merge Join  (cost=493439.89..719260.44 rows=12 width=48) (actual time=1267.917..1543.749 rows=200 loops=1)
            Merge Cond: ("AircraftData".uid = "AircraftData_1".uid)
            Buffers: shared hit=895329, temp read=398 written=401
            ->  Nested Loop  (cost=91702.67..111690.17 rows=1 width=24) (actual time=272.476..298.257 rows=200 loops=1)
                  Buffers: shared hit=273686
                  ->  Finalize GroupAggregate  (cost=91702.23..92568.25 rows=2279 width=16) (actual time=272.460..274.158 rows=2100 loops=1)
                        Group Key: "AircraftData".uid
                        Buffers: shared hit=179393
                        ->  Gather Merge  (cost=91702.23..92511.27 rows=6837 width=16) (actual time=272.450..273.457 rows=6600 loops=1)
                              Workers Planned: 3
                              Workers Launched: 3
                              Buffers: shared hit=179393
                              ->  Sort  (cost=90702.19..90707.89 rows=2279 width=16) (actual time=249.231..249.334 rows=1650 loops=4)
                                    Sort Key: "AircraftData".uid
                                    Sort Method: quicksort  Memory: 39kB
                                    Worker 0:  Sort Method: quicksort  Memory: 195kB
                                    Worker 1:  Sort Method: quicksort  Memory: 195kB
                                    Worker 2:  Sort Method: quicksort  Memory: 195kB
                                    Buffers: shared hit=179393
                                    ->  Partial HashAggregate  (cost=90552.30..90575.09 rows=2279 width=16) (actual time=248.748..248.935 rows=1650 loops=4)
                                          Group Key: "AircraftData".uid
                                          Buffers: shared hit=179372
                                          ->  Parallel Index Only Scan using AircraftData_time_uid_index on "AircraftData"  (cost=0.44..85716.66 rows=967127 width=16) (actual time=0.044..77.510 rows=760982 loops=4)
                                                Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint))
                                                Heap Fetches: 0
                                                Buffers: shared hit=179372
                  ->  Index Scan using AircraftData_time_index on "AircraftData" b  (cost=0.44..8.37 rows=1 width=16) (actual time=0.011..0.011 rows=0 loops=2100)
                        Index Cond: ("time" = (max("AircraftData"."time")))
                        Filter: ((action <> 2) AND ("AircraftData".uid = uid))
                        Rows Removed by Filter: 50
                        Buffers: shared hit=94293
            ->  Group  (cost=401737.23..402563.36 rows=2279 width=8) (actual time=954.077..988.991 rows=124598 loops=1)
                  Group Key: "AircraftData_1".uid
                  Buffers: shared hit=542041, temp read=398 written=401
                  ->  Gather Merge  (cost=401737.23..402546.27 rows=6837 width=8) (actual time=954.067..977.742 rows=178728 loops=1)
                        Workers Planned: 3
                        Workers Launched: 2
                        Buffers: shared hit=542041, temp read=398 written=401
                        ->  Sort  (cost=400737.19..400742.88 rows=2279 width=8) (actual time=944.755..949.248 rows=59650 loops=3)
                              Sort Key: "AircraftData_1".uid
                              Sort Method: external merge  Disk: 1064kB
                              Worker 0:  Sort Method: external merge  Disk: 1056kB
                              Worker 1:  Sort Method: external merge  Disk: 1064kB
                              Buffers: shared hit=542041, temp read=398 written=401
                              ->  Partial HashAggregate  (cost=400587.30..400610.09 rows=2279 width=8) (actual time=922.174..930.673 rows=59687 loops=3)
                                    Group Key: "AircraftData_1".uid
                                    Buffers: shared hit=542027
                                    ->  Parallel Index Only Scan using AircraftData_time_uid_index on "AircraftData" "AircraftData_1"  (cost=0.44..388576.19 rows=4804444 width=8) (actual time=0.023..451.211 rows=4960127 loops=3)
                                          Index Cond: ("time" > '1617657618453'::bigint)
                                          Heap Fetches: 0
                                          Buffers: shared hit=542027
            SubPlan 1
              ->  Limit  (cost=0.44..17080.15 rows=254 width=48) (actual time=0.008..1.169 rows=254 loops=200)
                    Buffers: shared hit=79602
                    ->  Index Scan Backward using AircraftData_time_uid_index on "AircraftData" "AircraftData_2"  (cost=0.44..88492.18 rows=1316 width=48) (actual time=0.007..1.156 rows=254 loops=200)
                          Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint) AND (uid = "AircraftData".uid))
                          Buffers: shared hit=79602
    Planning Time: 0.741 ms
    JIT:
      Functions: 64
      Options: Inlining true, Optimization true, Expressions true, Deforming true
      Timing: Generation 7.142 ms, Inlining 163.028 ms, Optimization 205.236 ms, Emission 145.702 ms, Total 521.108 ms
    Execution Time: 1549.206 ms
  • 查询 4 ​​的 auto_explain 输出:
    Sort  (cost=871918.05..871918.08 rows=12 width=48) (actual time=5482.659..5482.683 rows=200 loops=1)
      Sort Key: (max("AircraftData"."time"))
      Sort Method: quicksort  Memory: 1637kB
      Buffers: shared hit=855647
      ->  Nested Loop  (cost=647721.19..871917.84 rows=12 width=48) (actual time=3269.758..5482.342 rows=200 loops=1)
            Join Filter: ("AircraftData".uid = "AircraftData_1".uid)
            Rows Removed by Join Filter: 12419447
            Buffers: shared hit=855647
            ->  Nested Loop  (cost=121017.25..140161.52 rows=1 width=24) (actual time=734.056..774.883 rows=200 loops=1)
                  Buffers: shared hit=269116
                  ->  HashAggregate  (cost=121016.81..121039.60 rows=2279 width=16) (actual time=732.927..733.488 rows=2100 loops=1)
                        Group Key: "AircraftData".uid
                        Buffers: shared hit=174823
                        ->  Index Only Scan using AircraftData_time_uid_index on "AircraftData"  (cost=0.44..106026.34 rows=2998095 width=16) (actual time=0.047..279.640 rows=3043927 loops=1)
                              Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint))
                              Heap Fetches: 0
                              Buffers: shared hit=174823
                  ->  Index Scan using AircraftData_time_index on "AircraftData" b  (cost=0.44..8.37 rows=1 width=16) (actual time=0.019..0.019 rows=0 loops=2100)
                        Index Cond: ("time" = (max("AircraftData"."time")))
                        Filter: ((action <> 2) AND ("AircraftData".uid = uid))
                        Rows Removed by Filter: 50
                        Buffers: shared hit=94293
            ->  HashAggregate  (cost=526703.94..526726.73 rows=2279 width=8) (actual time=12.619..20.103 rows=62098 loops=200)
                  Group Key: "AircraftData_1".uid
                  Buffers: shared hit=506929
                  ->  Index Only Scan using AircraftData_time_uid_index on "AircraftData" "AircraftData_1"  (cost=0.44..489469.50 rows=14893775 width=8) (actual time=0.010..1258.534 rows=14880381 loops=1)
                        Index Cond: ("time" > '1617657618453'::bigint)
                        Heap Fetches: 0
                        Buffers: shared hit=506929
            SubPlan 1
              ->  Limit  (cost=0.44..17080.15 rows=254 width=48) (actual time=0.014..1.280 rows=254 loops=200)
                    Buffers: shared hit=79602
                    ->  Index Scan Backward using AircraftData_time_uid_index on "AircraftData" "AircraftData_2"  (cost=0.44..88492.18 rows=1316 width=48) (actual time=0.013..1.265 rows=254 loops=200)
                          Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint) AND (uid = "AircraftData".uid))
                          Buffers: shared hit=79602
    Planning Time: 1.638 ms
    JIT:
      Functions: 28
      Options: Inlining true, Optimization true, Expressions true, Deforming true
      Timing: Generation 7.741 ms, Inlining 23.622 ms, Optimization 98.256 ms, Emission 56.727 ms, Total 186.345 ms
    Execution Time: 5491.240 ms

最佳答案

> 那么这个函数时间开销的原因是什么?

很难说,你没有透露太多关于你的设置的信息。大概有几个原因。如果您的表(如您的执行时间所示),则 PL/pgSQL 函数的主要嫌疑人将是 RETURN QUERY这不允许并行。即将推出的 PostgreSQL 14 删除了此限制。发行说明:

Allow plpgsql's RETURN QUERY to execute its query using parallelism (Tom Lane)

SQL 函数的糟糕性能是一个谜。也许是一个通用的查询计划,而你的普通 SQL 会得到一个定制的查询计划,因为它是使用常量过滤器而不是参数来计划的。
这通常不应该发生。但它可以。请参阅:

(通常,背后还有其他问题......)

>还有改进的空间吗?

是的。

SELECT uid, time
     , ST_AsText(ST_Collect(ARRAY(
                     SELECT "position"
                     FROM   "AircraftData"
                     WHERE  uid = A.uid
                     AND    time BETWEEN 1617608651642 AND 1617657618453
                     ORDER  BY time DESC
                     LIMIT  254))) AS trajectory
FROM  (
   SELECT DISTINCT ON (uid)
          uid, time, action
   FROM   "AircraftData"
   WHERE  time BETWEEN 1617608651642 AND 1617657618453
   ORDER  BY uid, time DESC, action = 2  -- get action <> 2 if any
   ) A
WHERE  action <> 2
AND    EXISTS (
   SELECT FROM "AircraftData"
   WHERE  time > 1617657618453
   AND    uid = A.uid
   );

等效,但有一个区别:此查询仅返回单个 trajectoryuid ,如果有多于一行包含最新时间和action <> 2,则原始文件将返回多个。我想,无论如何它应该是这样的(或者每个设计永远不会有超过一个 action <> 2)。

关于DISTINCT ON :

排序依据action = 2有效是因为 false排序在 true 之前(和 null 最后排序)按升序排列。因此任何具有 action <> 2 的同行首先排序并由 DISTINCT ON 选择- 这符合 uid 的条件在你的查询中。请参阅:

另外,过滤uid其中还有一个较晚的条目,EXISTS子查询通常比在某个时间戳等之后聚合所有行要便宜得多。

关于postgresql - 为什么包装的 PLPGSQL 函数中的查询比本身慢得多?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68982025/

相关文章:

r - 使用另一个多边形裁剪/剪辑复杂的 SpatialPolygonsDataFrame 的最快方法

python - 如何以 60fps+ 将所有屏幕像素读入 python numpy 数组

ruby-on-rails - 在 Rails 应用程序中减少数据库查询的技术

postgresql - 我们可以使用假脱机命令将查询结果导出到 postgres 中吗?

java - 在 Java 中将字符串解析为 int 的快速方法

sql - PostgreSQL:计数查询花费太多时间

mysql - JOIN 以奇怪的顺序完成;搞乱 ORDER BY?

sql - 在 SQL Server 2008 中执行 View 时出现 "Timeout expired"错误

sql - 通过数据库中的文本字段实现智能搜索

javascript - 有没有办法在 Op.op sequelize 中循环遍历许多参数的对象