使用 PostgreSQL 12.8 (Ubuntu 12.8-1.pgdg18.04+1) 一段时间了。普通查询和包含在具有相同输入的 PLPGSQL 函数中的相同查询之间存在显着的时间差异。表架构和查询描述如下:
- 架构:
CREATE TABLE public."AircraftData"
(
"uid" BIGINT NOT NULL,
"time" BIGINT NOT NULL,
"action" SMALLINT NOT NULL,
"position" GEOMETRY NOT NULL
)
-
- 正常查询 ->
1 秒 279 毫秒
:
- 正常查询 ->
SELECT
A.uid AS uid,
A.max_time AS time,
ST_AsText
(
ST_Collect
(
ARRAY
(
SELECT
"position"
FROM
public."AircraftData"
WHERE
"uid" = A.uid
AND
"time" BETWEEN 1617608651642 AND 1617657618453
ORDER BY
"time" DESC
LIMIT 254
)
)
) AS trajectory
FROM
(
SELECT
M.uid AS uid,
M.max_time AS max_time
FROM
(
SELECT
"uid",
MAX("time") AS max_time
FROM
public."AircraftData"
WHERE
"time" BETWEEN 1617608651642 AND 1617657618453
GROUP BY
"uid"
) M
INNER JOIN
(
SELECT
"uid"
FROM
public."AircraftData"
WHERE
"time" > 1617657618453
GROUP BY
"uid"
) N
ON M.uid = N.uid
) A
INNER JOIN
public."AircraftData" B
ON
A.uid = B.uid AND A.max_time = B.time AND B.action != 2
ORDER BY
"time" ASC
-
- PLPGSQL 函数 ->
4 秒 90 毫秒
:
- PLPGSQL 函数 ->
DROP FUNCTION IF EXISTS foo(BIGINT,BIGINT);
CREATE OR REPLACE FUNCTION foo(last_server_time BIGINT, start_time BIGINT)
RETURNS TABLE
(
"uid" BIGINT,
"time" BIGINT,
"trajectory" TEXT
)
AS $BODY$
BEGIN
RETURN QUERY
SELECT
A.uid AS uid,
A.max_time AS time,
ST_AsText
(
ST_Collect
(
ARRAY
(
SELECT
"position"
FROM
public."AircraftData"
WHERE
"uid" = A.uid
AND
"time" BETWEEN $1 AND $2
ORDER BY
"time" DESC
LIMIT 254
)
)
) AS trajectory
FROM
(
SELECT
M.uid AS uid,
M.max_time AS max_time
FROM
(
SELECT
"uid",
MAX("time") AS max_time
FROM
public."AircraftData"
WHERE
"time" BETWEEN $1 AND $2
GROUP BY
"uid"
) M
INNER JOIN
(
SELECT
"uid"
FROM
public."AircraftData"
WHERE
"time" > $2
GROUP BY
"uid"
) N
ON M.uid = N.uid
) A
INNER JOIN
public."AircraftData" B
ON
A.uid = B.uid AND A.max_time = B.time AND B.action != 2
ORDER BY
"time" ASC;
END;
$BODY$ LANGUAGE PLPGSQL;
-
- 动态 PLPGSQL 函数 ->
4 秒 97 毫秒
:
- 动态 PLPGSQL 函数 ->
DROP FUNCTION IF EXISTS foo(BIGINT,BIGINT);
CREATE OR REPLACE FUNCTION foo(last_server_time BIGINT, start_time BIGINT)
RETURNS TABLE
(
"uid" BIGINT,
"time" BIGINT,
"trajectory" TEXT
)
AS $BODY$
BEGIN
RETURN QUERY
EXECUTE
'SELECT
A.uid AS uid,
A.max_time AS time,
ST_AsText
(
ST_Collect
(
ARRAY
(
SELECT
"position"
FROM
public."AircraftData"
WHERE
"uid" = A.uid
AND
"time" BETWEEN $1 AND $2
ORDER BY
"time" DESC
LIMIT 254
)
)
) AS trajectory
FROM
(
SELECT
M.uid AS uid,
M.max_time AS max_time
FROM
(
SELECT
"uid",
MAX("time") AS max_time
FROM
public."AircraftData"
WHERE
"time" BETWEEN $1 AND $2
GROUP BY
"uid"
) M
INNER JOIN
(
SELECT
"uid"
FROM
public."AircraftData"
WHERE
"time" > $2
GROUP BY
"uid"
) N
ON M.uid = N.uid
) A
INNER JOIN
public."AircraftData" B
ON
A.uid = B.uid AND A.max_time = B.time AND B.action != 2
ORDER BY
"time" ASC'
USING last_server_time, start_time;
END;
$BODY$ LANGUAGE PLPGSQL;
-
- SQL 函数 ->
2 分 28 秒
:
- SQL 函数 ->
DROP FUNCTION IF EXISTS foo(BIGINT,BIGINT);
CREATE OR REPLACE FUNCTION foo(last_server_time BIGINT, start_time BIGINT)
RETURNS TABLE
(
"uid" BIGINT,
"time" BIGINT,
"trajectory" TEXT
)
AS $BODY$
SELECT
A.uid AS uid,
A.max_time AS time,
ST_AsText
(
ST_Collect
(
ARRAY
(
SELECT
"position"
FROM
public."AircraftData"
WHERE
"uid" = A.uid
AND
"time" BETWEEN $1 AND $2
ORDER BY
"time" DESC
LIMIT 254
)
)
) AS trajectory
FROM
(
SELECT
M.uid AS uid,
M.max_time AS max_time
FROM
(
SELECT
"uid",
MAX("time") AS max_time
FROM
public."AircraftData"
WHERE
"time" BETWEEN $1 AND $2
GROUP BY
"uid"
) M
INNER JOIN
(
SELECT
"uid"
FROM
public."AircraftData"
WHERE
"time" > $2
GROUP BY
"uid"
) N
ON M.uid = N.uid
) A
INNER JOIN
public."AircraftData" B
ON
A.uid = B.uid AND A.max_time = B.time AND B.action != 2
ORDER BY
"time" ASC;
$BODY$ LANGUAGE SQL;
那么这个函数时间开销的原因是什么?
还有改进的余地吗?
更新 1
将标签STABLE
添加到SQL函数(4)将执行时间更改为4秒67毫秒
!
更新2
- 解释查询 1 的分析输出:
Sort (cost=719260.66..719260.69 rows=12 width=48) (actual time=1543.822..1543.930 rows=200 loops=1)
Sort Key: (max("AircraftData"."time"))
Sort Method: quicksort Memory: 1637kB
Buffers: shared hit=895329, temp read=398 written=401
-> Merge Join (cost=493439.89..719260.44 rows=12 width=48) (actual time=1267.917..1543.749 rows=200 loops=1)
Merge Cond: ("AircraftData".uid = "AircraftData_1".uid)
Buffers: shared hit=895329, temp read=398 written=401
-> Nested Loop (cost=91702.67..111690.17 rows=1 width=24) (actual time=272.476..298.257 rows=200 loops=1)
Buffers: shared hit=273686
-> Finalize GroupAggregate (cost=91702.23..92568.25 rows=2279 width=16) (actual time=272.460..274.158 rows=2100 loops=1)
Group Key: "AircraftData".uid
Buffers: shared hit=179393
-> Gather Merge (cost=91702.23..92511.27 rows=6837 width=16) (actual time=272.450..273.457 rows=6600 loops=1)
Workers Planned: 3
Workers Launched: 3
Buffers: shared hit=179393
-> Sort (cost=90702.19..90707.89 rows=2279 width=16) (actual time=249.231..249.334 rows=1650 loops=4)
Sort Key: "AircraftData".uid
Sort Method: quicksort Memory: 39kB
Worker 0: Sort Method: quicksort Memory: 195kB
Worker 1: Sort Method: quicksort Memory: 195kB
Worker 2: Sort Method: quicksort Memory: 195kB
Buffers: shared hit=179393
-> Partial HashAggregate (cost=90552.30..90575.09 rows=2279 width=16) (actual time=248.748..248.935 rows=1650 loops=4)
Group Key: "AircraftData".uid
Buffers: shared hit=179372
-> Parallel Index Only Scan using AircraftData_time_uid_index on "AircraftData" (cost=0.44..85716.66 rows=967127 width=16) (actual time=0.044..77.510 rows=760982 loops=4)
Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint))
Heap Fetches: 0
Buffers: shared hit=179372
-> Index Scan using AircraftData_time_index on "AircraftData" b (cost=0.44..8.37 rows=1 width=16) (actual time=0.011..0.011 rows=0 loops=2100)
Index Cond: ("time" = (max("AircraftData"."time")))
Filter: ((action <> 2) AND ("AircraftData".uid = uid))
Rows Removed by Filter: 50
Buffers: shared hit=94293
-> Group (cost=401737.23..402563.36 rows=2279 width=8) (actual time=954.077..988.991 rows=124598 loops=1)
Group Key: "AircraftData_1".uid
Buffers: shared hit=542041, temp read=398 written=401
-> Gather Merge (cost=401737.23..402546.27 rows=6837 width=8) (actual time=954.067..977.742 rows=178728 loops=1)
Workers Planned: 3
Workers Launched: 2
Buffers: shared hit=542041, temp read=398 written=401
-> Sort (cost=400737.19..400742.88 rows=2279 width=8) (actual time=944.755..949.248 rows=59650 loops=3)
Sort Key: "AircraftData_1".uid
Sort Method: external merge Disk: 1064kB
Worker 0: Sort Method: external merge Disk: 1056kB
Worker 1: Sort Method: external merge Disk: 1064kB
Buffers: shared hit=542041, temp read=398 written=401
-> Partial HashAggregate (cost=400587.30..400610.09 rows=2279 width=8) (actual time=922.174..930.673 rows=59687 loops=3)
Group Key: "AircraftData_1".uid
Buffers: shared hit=542027
-> Parallel Index Only Scan using AircraftData_time_uid_index on "AircraftData" "AircraftData_1" (cost=0.44..388576.19 rows=4804444 width=8) (actual time=0.023..451.211 rows=4960127 loops=3)
Index Cond: ("time" > '1617657618453'::bigint)
Heap Fetches: 0
Buffers: shared hit=542027
SubPlan 1
-> Limit (cost=0.44..17080.15 rows=254 width=48) (actual time=0.008..1.169 rows=254 loops=200)
Buffers: shared hit=79602
-> Index Scan Backward using AircraftData_time_uid_index on "AircraftData" "AircraftData_2" (cost=0.44..88492.18 rows=1316 width=48) (actual time=0.007..1.156 rows=254 loops=200)
Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint) AND (uid = "AircraftData".uid))
Buffers: shared hit=79602
Planning Time: 0.741 ms
JIT:
Functions: 64
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 7.142 ms, Inlining 163.028 ms, Optimization 205.236 ms, Emission 145.702 ms, Total 521.108 ms
Execution Time: 1549.206 ms
- 查询 4 的 auto_explain 输出:
Sort (cost=871918.05..871918.08 rows=12 width=48) (actual time=5482.659..5482.683 rows=200 loops=1)
Sort Key: (max("AircraftData"."time"))
Sort Method: quicksort Memory: 1637kB
Buffers: shared hit=855647
-> Nested Loop (cost=647721.19..871917.84 rows=12 width=48) (actual time=3269.758..5482.342 rows=200 loops=1)
Join Filter: ("AircraftData".uid = "AircraftData_1".uid)
Rows Removed by Join Filter: 12419447
Buffers: shared hit=855647
-> Nested Loop (cost=121017.25..140161.52 rows=1 width=24) (actual time=734.056..774.883 rows=200 loops=1)
Buffers: shared hit=269116
-> HashAggregate (cost=121016.81..121039.60 rows=2279 width=16) (actual time=732.927..733.488 rows=2100 loops=1)
Group Key: "AircraftData".uid
Buffers: shared hit=174823
-> Index Only Scan using AircraftData_time_uid_index on "AircraftData" (cost=0.44..106026.34 rows=2998095 width=16) (actual time=0.047..279.640 rows=3043927 loops=1)
Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint))
Heap Fetches: 0
Buffers: shared hit=174823
-> Index Scan using AircraftData_time_index on "AircraftData" b (cost=0.44..8.37 rows=1 width=16) (actual time=0.019..0.019 rows=0 loops=2100)
Index Cond: ("time" = (max("AircraftData"."time")))
Filter: ((action <> 2) AND ("AircraftData".uid = uid))
Rows Removed by Filter: 50
Buffers: shared hit=94293
-> HashAggregate (cost=526703.94..526726.73 rows=2279 width=8) (actual time=12.619..20.103 rows=62098 loops=200)
Group Key: "AircraftData_1".uid
Buffers: shared hit=506929
-> Index Only Scan using AircraftData_time_uid_index on "AircraftData" "AircraftData_1" (cost=0.44..489469.50 rows=14893775 width=8) (actual time=0.010..1258.534 rows=14880381 loops=1)
Index Cond: ("time" > '1617657618453'::bigint)
Heap Fetches: 0
Buffers: shared hit=506929
SubPlan 1
-> Limit (cost=0.44..17080.15 rows=254 width=48) (actual time=0.014..1.280 rows=254 loops=200)
Buffers: shared hit=79602
-> Index Scan Backward using AircraftData_time_uid_index on "AircraftData" "AircraftData_2" (cost=0.44..88492.18 rows=1316 width=48) (actual time=0.013..1.265 rows=254 loops=200)
Index Cond: (("time" >= '1617608651642'::bigint) AND ("time" <= '1617657618453'::bigint) AND (uid = "AircraftData".uid))
Buffers: shared hit=79602
Planning Time: 1.638 ms
JIT:
Functions: 28
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 7.741 ms, Inlining 23.622 ms, Optimization 98.256 ms, Emission 56.727 ms, Total 186.345 ms
Execution Time: 5491.240 ms
最佳答案
> 那么这个函数时间开销的原因是什么?
很难说,你没有透露太多关于你的设置的信息。大概有几个原因。如果您的表大(如您的执行时间所示),则 PL/pgSQL 函数的主要嫌疑人将是 RETURN QUERY
这不允许并行。即将推出的 PostgreSQL 14 删除了此限制。发行说明:
Allow plpgsql's
RETURN QUERY
to execute its query using parallelism (Tom Lane)
SQL 函数的糟糕性能是一个谜。也许是一个通用的查询计划,而你的普通 SQL 会得到一个定制的查询计划,因为它是使用常量过滤器而不是参数来计划的。
这通常不应该发生。但它可以。请参阅:
(通常,背后还有其他问题......)
>还有改进的空间吗?
是的。
SELECT uid, time
, ST_AsText(ST_Collect(ARRAY(
SELECT "position"
FROM "AircraftData"
WHERE uid = A.uid
AND time BETWEEN 1617608651642 AND 1617657618453
ORDER BY time DESC
LIMIT 254))) AS trajectory
FROM (
SELECT DISTINCT ON (uid)
uid, time, action
FROM "AircraftData"
WHERE time BETWEEN 1617608651642 AND 1617657618453
ORDER BY uid, time DESC, action = 2 -- get action <> 2 if any
) A
WHERE action <> 2
AND EXISTS (
SELECT FROM "AircraftData"
WHERE time > 1617657618453
AND uid = A.uid
);
等效,但有一个区别:此查询仅返回单个 trajectory
每uid
,如果有多于一行包含最新时间和action <> 2
,则原始文件将返回多个。我想,无论如何它应该是这样的(或者每个设计永远不会有超过一个 action <> 2
)。
关于DISTINCT ON
:
排序依据action = 2
有效是因为 false
排序在 true
之前(和 null
最后排序)按升序排列。因此任何具有 action <> 2
的同行首先排序并由 DISTINCT ON
选择- 这符合 uid
的条件在你的查询中。请参阅:
另外,过滤uid
其中还有一个较晚的条目,EXISTS
子查询通常比在某个时间戳等之后聚合所有行要便宜得多。
关于postgresql - 为什么包装的 PLPGSQL 函数中的查询比本身慢得多?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68982025/