sql - PostgreSQL 和用于分析频繁大型插入的性能成本的工具

我正在设计一个摄取流程，将许多行插入到多个表中。我们在 Azure 的 PostgreSQL 灵活服务器上运行 PostgreSQL。目前看来，该服务器的写入/插入速度是提高有效负载吞吐量的瓶颈(通过队列使用 REST API 处理 JSON 有效负载，目标完成率为每分钟 100 个有效负载)。

是否有任何用于分析 INSERT 性能的 PostgreSQL/数据库工具，类似于我们如何使用 EXPLAIN ANALYZE 分析 SELECT 性能？我知道索引和外键会对 INSERT 速度产生影响。我想每个表的实际列类型以及插入字符串的长度也会产生影响。

以下是 2 个较大表的一些测量插入速度。我们在 Python 中使用 psycopg2 包批量执行值“INSERT INTO ... VALUES”:

资源表，更好的性能和对行数不太敏感。

"Resource Insert: 35 ms, 120 rows, 0.2916666666666667 ms/row" # 1 Slide model
"Resource Insert: 36 ms, 120 rows, 0.3 ms/row"
"Resource Insert: 47 ms, 120 rows, 0.39166666666666666 ms/row"
"Resource Insert: 99 ms, 381 rows, 0.25984251968503935 ms/row" # 10 Slides
"Resource Insert: 97 ms, 381 rows, 0.2545931758530184 ms/row"
"Resource Insert: 117 ms, 381 rows, 0.30708661417322836 ms/row"
"Resource Insert: 654 ms, 2991 rows, 0.21865596790371114 ms/row" # 100 Slides
"Resource Insert: 683 ms, 2991 rows, 0.22835172183216315 ms/row"
"Resource Insert: 665 ms, 2991 rows, 0.2223336676696757 ms/row"
"Resource Insert: 5604 ms, 29091 rows, 0.19263689800969372 ms/row" # 1000 Slides
"Resource Insert: 5498 ms, 29091 rows, 0.1889931593963769 ms/row"
"Resource Insert: 5021 ms, 29091 rows, 0.17259633563645113 ms/row"
"Resource Insert: 4743 ms, 29091 rows, 0.16304011549963907 ms/row"
"Resource Insert: 8428 ms, 29091 rows, 0.2897115946512667 ms/row"
"Resource Insert: 7788 ms, 29091 rows, 0.26771166340105185 ms/row"
"Resource Insert: 7367 ms, 29091 rows, 0.25323983362551994 ms/row"

资源值表，插入性能较差，行数较高时扩展性较差

"Resource Val Insert: 378 ms, 113 rows, 3.3451327433628317 ms/row" ~ 1 Slide + Acc,Pat,Cli,TestOrder,Block
"Resource Val Insert: 365 ms, 113 rows, 3.230088495575221 ms/row" 
"Resource Val Insert: 356 ms, 113 rows, 3.150442477876106 ms/row"
"Resource Val Insert: 422 ms, 113 rows, 3.734513274336283 ms/row"
"Resource Val Insert: 439 ms, 113 rows, 3.8849557522123894 ms/row"
"Resource Val Insert: 354 ms, 113 rows, 3.1327433628318584 ms/row"
"Resource Val Insert: 1498 ms, 365 rows, 4.104109589041096 ms/row" # 10 Slides
"Resource Val Insert: 1509 ms, 365 rows, 4.134246575342465 ms/row"
"Resource Val Insert: 1568 ms, 365 rows, 4.295890410958904 ms/row"
"Resource Val Insert: 1607 ms, 365 rows, 4.402739726027397 ms/row"
"Resource Val Insert: 1553 ms, 365 rows, 4.254794520547946 ms/row"
"Resource Val Insert: 14187 ms, 2885 rows, 4.917504332755633 ms/row" # 100 Slides
"Resource Val Insert: 14239 ms, 2885 rows, 4.935528596187175 ms/row"
"Resource Val Insert: 13494 ms, 2885 rows, 4.677296360485268 ms/row"
"Resource Val Insert: 14489 ms, 2885 rows, 5.0221837088388215 ms/row"
"Resource Val Insert: 14044 ms, 2885 rows, 4.867937608318891 ms/row"
"Resource Val Insert: 131665 ms, 28085 rows, 4.688089727612605 ms/row" # 1000 Slides
"Resource Val Insert: 133214 ms, 28085 rows, 4.743243724408047 ms/row"
"Resource Val Insert: 132070 ms, 28085 rows, 4.7025102367812 ms/row"

这是相关表的 DDL，这可能是一个因素:平均性能插入 - 资源表

                                                    Table "public.resource"
      Column       |            Type             | Collation | Nullable |   Default    | Storage  | Stats target | Description
-------------------+-----------------------------+-----------+----------+--------------+----------+--------------+-------------
 resource_id       | integer                     |           | not null |              | plain    |              |
 uuid              | uuid                        |           | not null |              | plain    |              |
 name              | character varying           |           |          |              | extended |              |
 url               | character varying           |           |          |              | extended |              |
 desc              | character varying           |           |          |              | extended |              |
 barcode           | character varying           |           |          |              | extended |              |
 barcode_type      | character varying           |           |          |              | extended |              |
 meta              | jsonb                       |           |          |              | extended |              |
 cls               | integer                     |           | not null |              | plain    |              |
 archived          | boolean                     |           | not null |              | plain    |              |
 view_template     | character varying           |           | not null |              | extended |              |
 created_timestamp | timestamp without time zone |           | not null |              | plain    |              |
 updated_timestamp | timestamp without time zone |           | not null |              | plain    |              |
 owner_resource_id | integer                     |           |          |              | plain    |              |
 tenant            | character varying           |           | not null | CURRENT_USER | extended |              |
Indexes:
    "resource_pk" PRIMARY KEY, btree (resource_id, tenant)
    "_uuid_tenant_uc" UNIQUE CONSTRAINT, btree (uuid, tenant)
    "param_group_unique_name" UNIQUE, btree (name) WHERE cls = 600
    "resource_resource_id_key" UNIQUE CONSTRAINT, btree (resource_id)
    "ix_resource_barcode" btree (barcode)
    "ix_resource_desc" btree ("desc")
    "ix_resource_name" btree (name)
    "ix_resource_uuid" btree (uuid)
    "lab7_role_unique_name" btree (lower(name::text)) WHERE cls = 221 OR cls = 222 OR cls = 220
    "resource_created_timestamp_idx" btree (created_timestamp DESC)
    "resource_updated_timestamp_idx" btree (updated_timestamp DESC)
Check constraints:
    "resource_check" CHECK (resource_id <> owner_resource_id)
Foreign-key constraints:
    "resource_owner_resource_id_fkey" FOREIGN KEY (owner_resource_id) REFERENCES resource(resource_id)
Referenced by:
    TABLE "address" CONSTRAINT "address_address_id_fkey" FOREIGN KEY (address_id) REFERENCES resource(resource_id)
    # .... and many more references! This is our most basic table that acts sort of like an abstract base class

表插入性能不佳

\d+ resource_val
                                                Table "public.resource_val"
         Column          |       Type        | Collation | Nullable |   Default    | Storage  | Stats target | Description
-------------------------+-------------------+-----------+----------+--------------+----------+--------------+-------------
 resource_val_id         | integer           |           | not null |              | plain    |              |
 resource_var_id         | integer           |           |          |              | plain    |              |
 value_group_id          | integer           |           |          |              | plain    |              |
 sort_id                 | integer           |           |          |              | plain    |              |
 value                   | character varying |           |          |              | extended |              |
 expression              | character varying |           |          |              | extended |              |
 dropdown                | jsonb             |           |          |              | extended |              |
 bound_resource_id       | integer           |           |          |              | plain    |              |
 error_msg               | character varying |           |          |              | extended |              |
 dropdown_error_msg      | character varying |           |          |              | extended |              |
 tenant                  | character varying |           | not null | CURRENT_USER | extended |              |
 step_instance_sample_id | integer           |           |          |              | plain    |              |
Indexes:
    "resource_val_pkey" PRIMARY KEY, btree (resource_val_id)
    "idx_res_val_hash" hash (value)
    "ix_r_val_bound_resource_id" btree (bound_resource_id)
    "ix_r_val_sis_id" btree (step_instance_sample_id)
Foreign-key constraints:
    "resource_val_bound_resource_id_fkey" FOREIGN KEY (bound_resource_id) REFERENCES resource(resource_id)
    "resource_val_resource_val_id_fkey" FOREIGN KEY (resource_val_id) REFERENCES resource(resource_id)
    "resource_val_resource_var_id_fkey" FOREIGN KEY (resource_var_id) REFERENCES resource_var(resource_var_id)
    "resource_val_step_instance_sample_id_fkey" FOREIGN KEY (step_instance_sample_id) REFERENCES step_instance_sample(association_id)
    "resource_val_value_group_id_fkey" FOREIGN KEY (value_group_id) REFERENCES resource_val_group(id)

最佳答案

EXPLAIN (ANALYZE, BUFFERS) 适用于 INSERT。它对他们来说不如对 SELECT 有用，但它仍然是您拥有的最佳起点。它至少单独划分了外键约束的时间，尽管它没有划分索引维护时间。

你的“规模很差”对我来说并不真实。性能下降很小，并且不一致。例如，“100 张幻灯片”的大多数时间比“1000 张幻灯片”的任何时间都要糟糕。假设演示文稿按时间顺序排列，随着时间的推移，可能会发生其他情况，例如用完云提供商的“积分”或操作系统缓存被脏页堵塞，或者只是索引增长到可以达到的程度不再保留在缓存中(但这种差异会比您看到的要大得多)。为了令人信服，您需要运行更多的测试，并且批量大小按随机顺序而不是按可预测的顺序进行。如果差异(超过批量大小)是真实的，那么它是否足够大到值得担心？

现在插入的扩展性很差，但这与插入的表索引的大小有关，而不是与插入的批处理的大小有关。当不良行为出现时，就像从悬崖上掉下来一样，不仅仅是性能下降 20%。

通常，调查此类事情的最佳方法是假设仅查看 EXPLAIN (ANALYZE, BUFFERS) 还不够，只需删除一些约束/索引并看看会发生什么。

关于sql - PostgreSQL 和用于分析频繁大型插入的性能成本的工具，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/75656735/

sql - PostgreSQL 和用于分析频繁大型插入的性能成本的工具

上一篇：azure - 使用 KQL 添加最后一行以根据列值显示总计

下一篇：c# - 在 Azure SQL 数据库中插入/更新/删除新记录时将更新推送到 SignalR 流中