postgresql 查询需要永远

标签 postgresql postgis

我有下面的查询需要很长时间(几天),如果你能提供任何帮助来改进它会很棒 服务器有 2 个 xeon e5-2630 v3 CPU(8 核,每个 16 线程),128 GB RAM 和 SSD 磁盘,postgres 11。

        SELECT distinct on (location_signals.p_key)  ooh_data.*, 
        location_signals."Lat" AS did_lat, location_signals."Lon" As did_lon,  location_signals.device,
        location_signals.timestamp AS did_timestamp, location_signals.p_key AS did_p_key
        FROM ooh_data , 
        location_signals 
        WHERE ST_DWithin( 
           ST_SetSRID(ST_MakePoint(ooh_data.offset_lon, ooh_data.offset_lat), 4326)::geography,
           ST_SetSRID(ST_MakePoint(location_signals."Lon", location_signals."Lat"), 4326)::geography,
           100
        ) 
        ORDER BY location_signals.p_key;

location_signals有3亿条记录,ooh_data有6000条记录

这里通过极大地限制选择来解释:

explain analyse        SELECT  distinct on (location_signals.p_key)  ooh_data.*
        FROM ooh_data , 
        location_signals 
        WHERE ST_DWithin( 
        ST_SetSRID(ST_MakePoint(ooh_data.offset_lon, ooh_data.offset_lat), 4326)::geography,
        ST_SetSRID(ST_MakePoint(location_signals."Lon", location_signals."Lat"), 4326)::geography,
        100
        ) 
        AND ooh_data.p_key > 5700
        AND location_signals.timestamp > '2019-05-31 23:57:00'
        ORDER BY location_signals.p_key;

结果:

QUERY PLAN
Unique  (cost=100551.80..100551.80 rows=1 width=84) (actual time=305.190..305.193 rows=2 loops=1)
  ->  Sort  (cost=100551.80..100551.80 rows=1 width=84) (actual time=305.189..305.190 rows=3 loops=1)
        Sort Key: location_signals.p_key
        Sort Method: quicksort  Memory: 25kB
        ->  Gather  (cost=1029.18..100551.79 rows=1 width=84) (actual time=305.180..310.644 rows=3 loops=1)
              Workers Planned: 1
              Workers Launched: 1
              ->  Nested Loop  (cost=29.18..99551.69 rows=1 width=84) (actual time=195.851..277.511 rows=2 loops=2)
                    Join Filter: (((st_setsrid(st_makepoint(ooh_data.offset_lon, ooh_data.offset_lat), 4326))::geography && _st_expand((st_setsrid(st_makepoint(location_signals."Lon", location_signals."Lat"), 4326))::geography, '100'::double precision)) AND ((st_setsrid(st_makepoint(location_signals."Lon", location_signals."Lat"), 4326))::geography && _st_expand((st_setsrid(st_makepoint(ooh_data.offset_lon, ooh_data.offset_lat), 4326))::geography, '100'::double precision)) AND _st_dwithin((st_setsrid(st_makepoint(ooh_data.offset_lon, ooh_data.offset_lat), 4326))::geography, (st_setsrid(st_makepoint(location_signals."Lon", location_signals."Lat"), 4326))::geography, '100'::double precision, true))
                    Rows Removed by Join Filter: 139156
                    ->  Parallel Bitmap Heap Scan on location_signals  (cost=28.89..2814.14 rows=1482 width=24) (actual time=1.144..10.886 rows=1288 loops=2)
                          Recheck Cond: ("timestamp" > '2019-05-31 23:57:00'::timestamp without time zone)
                          Heap Blocks: exact=1396
                          ->  Bitmap Index Scan on idx_timestamp  (cost=0.00..28.27 rows=2519 width=0) (actual time=1.355..1.356 rows=2577 loops=1)
                                Index Cond: ("timestamp" > '2019-05-31 23:57:00'::timestamp without time zone)
                    ->  Index Scan using ooh_data_pkey on ooh_data  (cost=0.28..5.35 rows=107 width=76) (actual time=0.004..0.025 rows=108 loops=2577)
                          Index Cond: (p_key > 5700)
Planning Time: 0.424 ms
Execution Time: 310.738 ms

感谢任何帮助,谢谢

最佳答案

我会先在两个表中创建地理列,然后将点保存在那里。然后向两个表添加空间索引: https://postgis.net/workshops/postgis-intro/indexing.html 请使用这些索引点加入,这应该会更快。

没有索引就是full cross join,开销很大。使用索引它应该工作得更快,尽管对于单个框可能仍然是一个繁重的查询。

关于postgresql 查询需要永远,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56614957/

相关文章:

Django - 类型字符变化的值太长(但似乎在 max_length 限制内)

sql - 按多个条件分组

postgresql - 将时间戳字段添加到 ogr2ogr 导入 Postgis 中

python - 在 field_name 中指定反向关系时,GeoDjango GeoQuerySet.distance() 结果为 'ST_Distance output only available on GeometryFields'

java - 使用 Spring JDBC 插入 postgis MultiPolygon 数据类型时出错

postgresql - AWS RDS Postgresql Pgadmin - 服务器不听

mysql/postgres窗口函数限制没有子查询的结果

sql - 如何在忽略两列在 Postgres 中的顺序的同时限制它们的唯一性

postgresql - postgres - 错误 : operator does not exist

postgresql - 使用 st_dwithin 限制将 table raw 分成 Postgres 中的 block