我有一个带有 ORDER 和 LIMIT 的查询来支持分页界面:
SELECT segment_members.id AS t0_r0,
segment_members.segment_id AS t0_r1,
segment_members.account_id AS t0_r2,
segment_members.score AS t0_r3,
segment_members.created_at AS t0_r4,
segment_members.updated_at AS t0_r5,
segment_members.posts_count AS t0_r6,
accounts.id AS t1_r0,
accounts.platform AS t1_r1,
accounts.username AS t1_r2,
accounts.created_at AS t1_r3,
accounts.updated_at AS t1_r4,
accounts.remote_id AS t1_r5,
accounts.name AS t1_r6,
accounts.language AS t1_r7,
accounts.description AS t1_r8,
accounts.timezone AS t1_r9,
accounts.profile_image_url AS t1_r10,
accounts.post_count AS t1_r11,
accounts.follower_count AS t1_r12,
accounts.following_count AS t1_r13,
accounts.uri AS t1_r14,
accounts.location AS t1_r15,
accounts.favorite_count AS t1_r16,
accounts.raw AS t1_r17,
accounts.followers_completed_at AS t1_r18,
accounts.followings_completed_at AS t1_r19,
accounts.followers_started_at AS t1_r20,
accounts.followings_started_at AS t1_r21,
accounts.profile_fetched_at AS t1_r22,
accounts.managed_source_id AS t1_r23
FROM segment_members
INNER JOIN accounts ON accounts.id = segment_members.account_id
WHERE segment_members.segment_id = 1
ORDER BY accounts.follower_count ASC LIMIT 20
OFFSET 0;
以下是表的索引:
accounts
"accounts_pkey" PRIMARY KEY, btree (id)
"index_accounts_on_remote_id_and_platform" UNIQUE, btree (remote_id, platform)
"index_accounts_on_description" btree (description)
"index_accounts_on_favorite_count" btree (favorite_count)
"index_accounts_on_follower_count" btree (follower_count)
"index_accounts_on_following_count" btree (following_count)
"index_accounts_on_lower_username_and_platform" btree (lower(username::text), platform)
"index_accounts_on_post_count" btree (post_count)
"index_accounts_on_profile_fetched_at_and_platform" btree (profile_fetched_at, platform)
"index_accounts_on_username" btree (username)
segment_members
"segment_members_pkey" PRIMARY KEY, btree (id)
"index_segment_members_on_segment_id_and_account_id" UNIQUE, btree (segment_id, account_id)
"index_segment_members_on_account_id" btree (account_id)
"index_segment_members_on_segment_id" btree (segment_id)
在我的开发和暂存数据库中,查询计划如下所示,查询执行得非常快。
Limit (cost=4802.15..4802.20 rows=20 width=2086)
-> Sort (cost=4802.15..4803.20 rows=421 width=2086)
Sort Key: accounts.follower_count
-> Nested Loop (cost=20.12..4790.95 rows=421 width=2086)
-> Bitmap Heap Scan on segment_members (cost=19.69..1244.24 rows=421 width=38)
Recheck Cond: (segment_id = 1)
-> Bitmap Index Scan on index_segment_members_on_segment_id_and_account_id (cost=0.00..19.58 rows=
421 width=0)
Index Cond: (segment_id = 1)
-> Index Scan using accounts_pkey on accounts (cost=0.43..8.41 rows=1 width=2048)
Index Cond: (id = segment_members.account_id)
然而,在生产中,查询计划如下,并且查询会一直持续(几分钟直到它达到语句超时)。
Limit (cost=0.86..25120.72 rows=20 width=2130)
-> Nested Loop (cost=0.86..4614518.64 rows=3674 width=2130)
-> Index Scan using index_accounts_on_follower_count on accounts (cost=0.43..2779897.53 rows=3434917 width=209
2)
-> Index Scan using index_segment_members_on_segment_id_and_account_id on segment_members (cost=0.43..0.52 row
s=1 width=38)
Index Cond: ((segment_id = 1) AND (account_id = accounts.id))
accounts
在暂存中有大约 600 万行,在生产中有 300 万行。 segment_members
有大约 300k 行在暂存中,400 万行在生产中。是表大小的差异导致了查询计划选择的差异吗?有什么方法可以让 Postgres 在生产环境中使用更快的查询计划?
更新: 这是来自缓慢的生产服务器的 EXPLAIN ANALYZE:
Limit (cost=0.86..22525.66 rows=20 width=2127) (actual time=173.148..187568.247 rows=20 loops=1)
-> Nested Loop (cost=0.86..4654749.92 rows=4133 width=2127) (actual time=173.141..187568.193 rows=20 loops=1)
-> Index Scan using index_accounts_on_follower_count on accounts (cost=0.43..2839731.81 rows=3390197 width=2089) (actual time=0.110..180374.279 rows=1401278 loops=1)
-> Index Scan using index_segment_members_on_segment_id_and_account_id on segment_members (cost=0.43..0.53 rows=1 width=38) (actual time=0.003..0.003 rows=0 loops=1401278)
Index Cond: ((segment_id = 1) AND (account_id = accounts.id))
Total runtime: 187568.318 ms
(6 rows)
最佳答案
要么您的表格统计信息不是最新的,要么您提供的两个查询非常不同。 第二个估计要检索 350 万行 (rows=3434917
)。 ORDER BY
/LIMIT 20
被迫对所有 350 万行进行排序以找到前 20 行,这将非常昂贵 - 除非您有匹配的索引。
第一个查询计划期望对 421 行进行排序。差远了。不同的查询计划并不奇怪。
看到 EXPLAIN ANALYZE
的输出会很有趣,而不仅仅是 EXPLAIN
。 (第二个查询很贵!)
这在很大程度上取决于每个 segment_id
有多少个 account_id
。如果 segment_id
不是选择性的,则查询不能很快。您唯一的其他选择是 MATERIALIZED VIEW
每个 segment_id
的前 n 行和一个适当的制度来保持最新。
如果您的统计信息不是最新的,只需对两个表运行 ANALYZE
并重试。
这可能有助于增加选定列的统计目标:
ALTER TABLE segment_members ALTER segment_id SET STATISTICS 1000;
ALTER TABLE segment_members ALTER account_id SET STATISTICS 1000;
ALTER TABLE accounts ALTER id SET STATISTICS 1000;
ALTER TABLE accounts ALTER follower_count SET STATISTICS 1000;
ANALYZE segment_members(segment_id, account_id);
ANALYZE accounts (id, follower_count);
详细信息:
更好的索引
我除了在 segment_members
上现有的 UNIQUE
约束 index_segment_members_on_segment_id_and_account_id
之外,我建议在 accounts
上使用多列索引:
CREATE INDEX index_accounts_on_follower_count ON accounts (id, follower_count)
同样,在创建索引后运行 ANALYZE
。
有些索引没用?
您问题中的所有其他索引与此查询无关。它们可能对其他目的有用或无用。
这个指标是100%空舱费,降了吧。 (Detailed explanation here.)
<strike>"index_segment_members_on_segment_id" btree (segment_id)</strike>
这个可能没用:
"index_accounts_on_description" btree (description)
因为“描述”通常是自由文本,几乎不用于对行进行排序或在 WHERE
条件下使用合适的运算符。但这只是一个有根据的猜测。
关于postgresql - Postgres 在生产中选择次优查询计划,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25899827/