sql - PostgreSQL 性能: Query to find stocks reaching 52 week highs (joining rows of max values)

标签 sql postgresql join distinct-on

我有一个非常简单的数据库结构,其中包含“日终”股票价格,类似于:

finalyzer_pricedata=> \d pdEndOfDayPricEentity
              Table "public.pdendofdaypriceentity"
    Column     |     Type      | Collation | Nullable | Default 
---------------+---------------+-----------+----------+---------
 id            | uuid          |           | not null | 
 close         | numeric(19,2) |           | not null | 
 day           | date          |           | not null | 
 instrument_id | uuid          |           | not null | 

(instrument_id是股票的唯一ID)

我现在想要选择本周达到 52 周高点的所有 instrument_id。 (即过去 7 天内收盘价 列高于之前 52 周的所有股票)

我尝试了许多不同的方法:group by和max()、选择不同的on、窗口函数(row_number),但我没有设法让它低于150秒。目前我最好的(也是最简单的)方法是:

select CAST(weekHigh.instrument_id AS VARCHAR) instrumentId,
                       weekHigh.maxClose                       weekHighValue,
                       yearHigh.maxClose                       yearHighValue,
                       yearHigh.maxDay                         yearHighDay
                from 
                     (select distinct on (eod.instrument_id) instrument_id,
                                         eod.close  maxClose,
                                         eod.day as maxDay
                                  from pdendofdaypriceentity eod
                                  where eod.day BETWEEN (CAST('2018-11-12' AS date) - interval '52 weeks') AND (CAST('2018-11-12' AS date) - interval '1 day')
                                  order by eod.instrument_id, close desc) yearHigh
                       inner join (select eod.instrument_id instrument_id, max(eod.close) maxClose
                                   from pdendofdaypriceentity eod
                                   where eod.day BETWEEN CAST('2018-11-12' AS date) AND CAST('2018-11-18' AS date)
                                   group by eod.instrument_id) weekHigh
                         on weekHigh.instrument_id = yearHigh.instrument_id
                where weekHigh.maxClose > yearHigh.maxClose;

我非常清楚有很多类似的问题,但这些方法让我找到了一个有效的解决方案,但没有一个能帮助我提高性能。该表包含来自不同 28000 只股票的 1000 万行。而且这个数字只会变得更大。有没有一种方法可以在不进行非规范化的情况下用不到 2 秒的查询来实现此要求?任何类型的索引等显然都可以。

上述方法的查询计划:

                                                                  QUERY PLAN                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=148153.45..1136087.99 rows=6112 width=74) (actual time=3056.748..144632.288 rows=411 loops=1)
   Hash Cond: (eod.instrument_id = eod_1.instrument_id)
   Join Filter: ((max(eod_1.close)) > eod.close)
   Rows Removed by Join Filter: 27317
   ->  Unique  (cost=0.56..987672.73 rows=18361 width=26) (actual time=2.139..141494.533 rows=28216 loops=1)
         ->  Index Scan using test3 on pdendofdaypriceentity eod  (cost=0.56..967290.80 rows=8152771 width=26) (actual time=2.117..79396.893 rows=8181608 loops=1)
               Filter: ((day >= '2017-11-13 00:00:00'::timestamp without time zone) AND (day <= '2018-11-11 00:00:00'::timestamp without time zone))
               Rows Removed by Filter: 1867687
   ->  Hash  (cost=147923.68..147923.68 rows=18337 width=48) (actual time=2793.633..2793.639 rows=27917 loops=1)
         Buckets: 32768  Batches: 1  Memory Usage: 1739kB
         ->  HashAggregate  (cost=147556.94..147740.31 rows=18337 width=48) (actual time=2301.968..2550.387 rows=27917 loops=1)
               Group Key: eod_1.instrument_id
               ->  Bitmap Heap Scan on pdendofdaypriceentity eod_1  (cost=2577.01..146949.83 rows=121422 width=22) (actual time=14.264..1146.610 rows=115887 loops=1)
                     Recheck Cond: ((day >= '2018-11-12'::date) AND (day <= '2018-11-18'::date))
                     Heap Blocks: exact=11992
                     ->  Bitmap Index Scan on idx5784y3l3mqprlmeyyrmwnkt3n  (cost=0.00..2546.66 rows=121422 width=0) (actual time=12.784..12.791 rows=115887 loops=1)
                           Index Cond: ((day >= '2018-11-12'::date) AND (day <= '2018-11-18'::date))
 Planning time: 13.758 ms
 Execution time: 144635.973 ms
(19 rows)

我当前的(基本上是随机的)索引:

Indexes:
    "pdendofdaypriceentity_pkey" PRIMARY KEY, btree (id)
    "ukcaddwp8kcx2uox18vss7o5oly" UNIQUE CONSTRAINT, btree (instrument_id, day)
    "idx5784y3l3mqprlmeyyrmwnkt3n" btree (day)
    "idx5vqqjfube2j1qkstc741ll19u" btree (close)
    "idxcaddwp8kcx2uox18vss7o5oly" btree (instrument_id, day)
    "test1" btree (close DESC, instrument_id, day)
    "test2" btree (instrument_id, day, close DESC)
    "test3" btree (instrument_id, close DESC)

最佳答案

尝试以下查询

select weekHigh.instrument_id,
       weekHigh.maxClose                       weekLowValue,
       yearHigh.maxClose                       yearLowValue
from (
    select instrument_id,
         max(eod.close)  maxClose
    from pdendofdaypriceentity eod
    where eod.day BETWEEN (CAST('2018-11-12' AS date) - interval '52 weeks') AND (CAST('2018-11-12' AS date) - interval '1 day')
    group by eod.instrument_id
) yearHigh
inner join (
    select eod.instrument_id instrument_id, max(eod.close) maxClose
    from pdendofdaypriceentity eod
    where eod.day BETWEEN CAST('2018-11-12' AS date) AND CAST('2018-11-18' AS date)
    group by eod.instrument_id
) weekHigh on weekHigh.instrument_id = yearHigh.instrument_id
where weekHigh.maxClose > yearHigh.maxClose;

使用索引pdendofdaypriceentity(day, Instrument_id, close)。请注意,它缺少查询中的 maxDay

可以通过另一个与 pdendofdaypriceentity 的连接来添加 maxDay,但是,我会从上面的查询开始,而无需 distinct on 和第一个子查询中的 order by

关于sql - PostgreSQL 性能: Query to find stocks reaching 52 week highs (joining rows of max values),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53394909/

相关文章:

ruby-on-rails - rails-在连接表中创建记录

java - 如何从 Oracle 中的 Collection (Map) 类型的 TYPE 读取值

php - 如何获取一年中每个月结束日期的数据?

postgresql - yoyo migrations 将选定的迁移应用到 postgreSQL

sql - 如何查询用户与所有其他用户之间的评分平均差异

mysql - 创建查询以按 2 个字段获取未完成调用组的计数

MySQL - 选择时间戳在另一行的设定时间内的行

mysql - SYSDATE 与以 MM-DD 格式存储的日期之间的日期差异

php - 如何在 Laravel Homestead 中使用 postgreSQL

SQL 数组聚合和连接