sql - 随机页面成本和计划

标签 sql optimization postgresql

查询(见下文)使用气象站实际拥有数据的日期从城市给定半径范围内的气象站提取气候数据。查询使用表的唯一索引,相当有效:

CREATE UNIQUE INDEX measurement_001_stc_idx
  ON climate.measurement_001
  USING btree
  (station_id, taken, category_id);

减少 random_page_cost 的服务器配置值从 2.0 到 1.1 在给定范围内(接近一个数量级)有巨大的性能改进,因为它向 PostgreSQL 建议它应该使用索引。虽然结果现在在 5 秒内返回(从 85 秒左右下降),但问题行仍然存在。将查询的结束日期增加一年会导致全表扫描:

sc.taken_start >= '1900-01-01'::date AND
sc.taken_end <= '1997-12-31'::date AND

我如何说服 PostgreSQL 使用索引而不考虑两个日期之间的年份? (对 4300 万行进行全表扫描可能不是最佳计划。)在查询下方找到 EXPLAIN ANALYZE 结果。

谢谢!

查询

  SELECT
    extract(YEAR FROM m.taken) AS year,
    avg(m.amount) AS amount
  FROM
    climate.city c,
    climate.station s,
    climate.station_category sc,
    climate.measurement m
  WHERE
    c.id = 5182 AND
    earth_distance(
      ll_to_earth(c.latitude_decimal,c.longitude_decimal),
      ll_to_earth(s.latitude_decimal,s.longitude_decimal)) / 1000 <= 30 AND
    s.elevation BETWEEN 0 AND 3000 AND
    s.applicable = TRUE AND
    sc.station_id = s.id AND
    sc.category_id = 1 AND
    sc.taken_start >= '1900-01-01'::date AND
    sc.taken_end <= '1996-12-31'::date AND
    m.station_id = s.id AND
    m.taken BETWEEN sc.taken_start AND sc.taken_end AND
    m.category_id = sc.category_id
  GROUP BY
    extract(YEAR FROM m.taken)
  ORDER BY
    extract(YEAR FROM m.taken)

1900 到 1996 年:指数

"Sort  (cost=1348597.71..1348598.21 rows=200 width=12) (actual time=2268.929..2268.935 rows=92 loops=1)"
"  Sort Key: (date_part('year'::text, (m.taken)::timestamp without time zone))"
"  Sort Method:  quicksort  Memory: 32kB"
"  ->  HashAggregate  (cost=1348586.56..1348590.06 rows=200 width=12) (actual time=2268.829..2268.886 rows=92 loops=1)"
"        ->  Nested Loop  (cost=0.00..1344864.01 rows=744510 width=12) (actual time=0.807..2084.206 rows=134893 loops=1)"
"              Join Filter: ((m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (sc.station_id = m.station_id))"
"              ->  Nested Loop  (cost=0.00..12755.07 rows=1220 width=18) (actual time=0.502..521.937 rows=23 loops=1)"
"                    Join Filter: ((sec_to_gc(cube_distance((ll_to_earth((c.latitude_decimal)::double precision, (c.longitude_decimal)::double precision))::cube, (ll_to_earth((s.latitude_decimal)::double precision, (s.longitude_decimal)::double precision))::cube)) / 1000::double precision) <= 30::double precision)"
"                    ->  Index Scan using city_pkey1 on city c  (cost=0.00..2.47 rows=1 width=16) (actual time=0.014..0.015 rows=1 loops=1)"
"                          Index Cond: (id = 5182)"
"                    ->  Nested Loop  (cost=0.00..9907.73 rows=3659 width=34) (actual time=0.014..28.937 rows=3458 loops=1)"
"                          ->  Seq Scan on station_category sc  (cost=0.00..970.20 rows=3659 width=14) (actual time=0.008..10.947 rows=3458 loops=1)"
"                                Filter: ((taken_start >= '1900-01-01'::date) AND (taken_end <= '1996-12-31'::date) AND (category_id = 1))"
"                          ->  Index Scan using station_pkey1 on station s  (cost=0.00..2.43 rows=1 width=20) (actual time=0.004..0.004 rows=1 loops=3458)"
"                                Index Cond: (s.id = sc.station_id)"
"                                Filter: (s.applicable AND (s.elevation >= 0) AND (s.elevation <= 3000))"
"              ->  Append  (cost=0.00..1072.27 rows=947 width=18) (actual time=6.996..63.199 rows=5865 loops=23)"
"                    ->  Seq Scan on measurement m  (cost=0.00..25.00 rows=6 width=22) (actual time=0.000..0.000 rows=0 loops=23)"
"                          Filter: (m.category_id = 1)"
"                    ->  Bitmap Heap Scan on measurement_001 m  (cost=20.79..1047.27 rows=941 width=18) (actual time=6.995..62.390 rows=5865 loops=23)"
"                          Recheck Cond: ((m.station_id = sc.station_id) AND (m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (m.category_id = 1))"
"                          ->  Bitmap Index Scan on measurement_001_stc_idx  (cost=0.00..20.55 rows=941 width=0) (actual time=5.775..5.775 rows=5865 loops=23)"
"                                Index Cond: ((m.station_id = sc.station_id) AND (m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (m.category_id = 1))"
"Total runtime: 2269.264 ms"

1900 到 1997:全表扫描

"Sort  (cost=1370192.26..1370192.76 rows=200 width=12) (actual time=86165.797..86165.809 rows=94 loops=1)"
"  Sort Key: (date_part('year'::text, (m.taken)::timestamp without time zone))"
"  Sort Method:  quicksort  Memory: 32kB"
"  ->  HashAggregate  (cost=1370181.12..1370184.62 rows=200 width=12) (actual time=86165.654..86165.736 rows=94 loops=1)"
"        ->  Hash Join  (cost=4293.60..1366355.81 rows=765061 width=12) (actual time=534.786..85920.007 rows=139721 loops=1)"
"              Hash Cond: (m.station_id = sc.station_id)"
"              Join Filter: ((m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end))"
"              ->  Append  (cost=0.00..867005.80 rows=43670150 width=18) (actual time=0.009..79202.329 rows=43670079 loops=1)"
"                    ->  Seq Scan on measurement m  (cost=0.00..25.00 rows=6 width=22) (actual time=0.001..0.001 rows=0 loops=1)"
"                          Filter: (category_id = 1)"
"                    ->  Seq Scan on measurement_001 m  (cost=0.00..866980.80 rows=43670144 width=18) (actual time=0.008..73312.008 rows=43670079 loops=1)"
"                          Filter: (category_id = 1)"
"              ->  Hash  (cost=4277.93..4277.93 rows=1253 width=18) (actual time=534.704..534.704 rows=25 loops=1)"
"                    ->  Nested Loop  (cost=847.87..4277.93 rows=1253 width=18) (actual time=415.837..534.682 rows=25 loops=1)"
"                          Join Filter: ((sec_to_gc(cube_distance((ll_to_earth((c.latitude_decimal)::double precision, (c.longitude_decimal)::double precision))::cube, (ll_to_earth((s.latitude_decimal)::double precision, (s.longitude_decimal)::double precision))::cube)) / 1000::double precision) <= 30::double precision)"
"                          ->  Index Scan using city_pkey1 on city c  (cost=0.00..2.47 rows=1 width=16) (actual time=0.012..0.014 rows=1 loops=1)"
"                                Index Cond: (id = 5182)"
"                          ->  Hash Join  (cost=847.87..1352.07 rows=3760 width=34) (actual time=6.427..35.107 rows=3552 loops=1)"
"                                Hash Cond: (s.id = sc.station_id)"
"                                ->  Seq Scan on station s  (cost=0.00..367.25 rows=7948 width=20) (actual time=0.004..23.529 rows=7949 loops=1)"
"                                      Filter: (applicable AND (elevation >= 0) AND (elevation <= 3000))"
"                                ->  Hash  (cost=800.87..800.87 rows=3760 width=14) (actual time=6.416..6.416 rows=3552 loops=1)"
"                                      ->  Bitmap Heap Scan on station_category sc  (cost=430.29..800.87 rows=3760 width=14) (actual time=2.316..5.353 rows=3552 loops=1)"
"                                            Recheck Cond: (category_id = 1)"
"                                            Filter: ((taken_start >= '1900-01-01'::date) AND (taken_end <= '1997-12-31'::date))"
"                                            ->  Bitmap Index Scan on station_category_station_category_idx  (cost=0.00..429.35 rows=6376 width=0) (actual time=2.268..2.268 rows=6339 loops=1)"
"                                                  Index Cond: (category_id = 1)"
"Total runtime: 86165.936 ms"

最佳答案

看起来 Postgres 高估了城市 5182 附近有多少个车站。它认为有 1220 个,但实际上只有 23 个。

您可以通过两个查询强制首先获取电台,像这样(未测试,可能需要调整):

start transaction;
create temporary table s(id int);
insert into s
  select id from
    climate.city c,
    climate.station s
  where
    c.id = 5182 AND
    earth_distance(
      ll_to_earth(c.latitude_decimal,c.longitude_decimal),
      ll_to_earth(s.latitude_decimal,s.longitude_decimal)) / 1000 <= 30 AND
    s.elevation BETWEEN 0 AND 3000 AND
    s.applicable = TRUE;
analyze s;

SELECT
    extract(YEAR FROM m.taken) AS year,
    avg(m.amount) AS amount
  FROM
    climate.station_category sc,
    climate.measurement m,
    s
  WHERE
    sc.category_id = 1 AND
    sc.taken_start >= '1900-01-01'::date AND
    sc.taken_end <= '1996-12-31'::date AND
    m.station_id = sc.station_id AND
    m.taken BETWEEN sc.taken_start AND sc.taken_end AND
    m.category_id = sc.category_id AND
    sc.station_id = s.id
  GROUP BY
    extract(YEAR FROM m.taken)
  ORDER BY
    extract(YEAR FROM m.taken);
rollback;

您还可以为此查询设置 enable_seqscan=off。这将迫使 Postgres 不惜一切代价避免顺序扫描。

关于sql - 随机页面成本和计划,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2902399/

相关文章:

sql - 在多个条件下返回左外连接中的最后一行

python - 在 PostgreSQL 中计算给定 GPS 坐标的日出和日落时间

PHP 选择日期时间和文本

sql - 如何在 SQL Server 和 Oracle SQL 中基于固定字符选择直到第 N 个空格的子字符串

c# - 从 datagridview 写入/更新本地数据库

c++ - 我如何在 std::atomic<T> 上实现一个简单的自旋锁,以便编译器不会优化它?

c++ - 使程序运行更快的 Visual C++ 9 编译器选项

html - 当有一个包含太多 dom 元素的列表时,页面会因选项卡而卡住

php - 我应该如何从 PostgreSQL 下的一系列时间段中找到列的总和?

sql - 我如何在互联网上找到值得信赖的数据库帮助?