sql - 汇总重叠段以测量有效长度

标签 sql oracle select oracle12c asset-management

我有一个road_events表:

create table road_events (
    event_id number(4,0),
    road_id number(4,0),
    year number(4,0),
    from_meas number(10,2),
    to_meas number(10,2),
    total_road_length number(10,2)
    );

insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (1,1,2020,25,50,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (2,1,2000,25,50,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (3,1,1980,0,25,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (4,1,1960,75,100,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (5,1,1940,1,100,100);

insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (6,2,2000,10,30,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (7,2,1975,30,60,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (8,2,1950,50,90,100);

insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (9,3,2050,40,90,100);

insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (10,4,2040,0,200,200);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (11,4,2013,0,199,200);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (12,4,2001,0,200,200);

insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (13,5,1985,50,70,300);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (14,5,1985,10,50,300);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (15,5,1965,1,301,300);
commit;

select * from road_events;
  EVENT_ID    ROAD_ID       YEAR  FROM_MEAS    TO_MEAS TOTAL_ROAD_LENGTH
---------- ---------- ---------- ---------- ---------- -----------------
         1          1       2020         25         50               100
         2          1       2000         25         50               100
         3          1       1980          0         25               100
         4          1       1960         75        100               100
         5          1       1940          1        100               100

         6          2       2000         10         30               100
         7          2       1975         30         60               100
         8          2       1950         50         90               100

         9          3       2050         40         90               100

        10          4       2040          0        200               200
        11          4       2013          0        199               200
        12          4       2001          0        200               200

        13          5       1985         50         70               300
        14          5       1985         10         50               300
        15          5       1965          1        301               300

我想选择代表每条道路上最新作品的事件。

这是一项棘手的操作,因为事件通常只涉及道路的一部分。这意味着我不能简单地选择每条道路的最新事件。我只需要选择不重复的最新 Activity 里程

可能的逻辑(按顺序):

我不愿意猜测如何解决此问题,因为它最终可能会伤害到其帮助之外(有点像XY Problem)。另一方面,它可以提供对问题本质的洞察力,因此,它就可以了:
  • 为每条道路选择最新事件。我们将调用最近的事件:event A
  • 如果event A>= total_road_length,那就是我所需要的。该算法到此结束。
  • 另外,获取下一个按时间顺序排列的事件(event B),它与event A的范围不同。
  • 如果event B的范围与event A的范围重叠,则仅获得event B的不重叠的部分。
  • 重复步骤3和4,直到事件总长度为= total_road_length为止。或者,当该路没有更多 Activity 时,停下来。


  • 问题:

    我知道这是一个艰巨的任务,但是会怎么做?

    这是经典的线性引用问题。如果我可以将线性引用操作作为查询的一部分,那将非常有帮助。

    结果将是:
      EVENT_ID    ROAD_ID       YEAR  TOTAL_ROAD_LENGTH   EVENT_LENGTH
    ---------- ---------- ----------  -----------------   ------------
             1          1       2020                100             25
             3          1       1980                100             25
             4          1       1960                100             25
             5          1       1940                100             25
    
             6          2       2000                100             20
             7          2       1975                100             30
             8          2       1950                100             30
    
             9          3       2050                100             50
    
            10          4       2040                200            200
    
            13          5       1985                300             20
            14          5       1985                300             40
            15          5       1965                300            240
    

    相关问题:Select where number range does not overlap

    最佳答案

    我的主要DBMS是Teradata,但这也可以在Oracle中按原样工作。

    WITH all_meas AS
     ( -- get a distinct list of all from/to points
       SELECT road_id, from_meas AS meas
       FROM road_events
       UNION
       SELECT road_id, to_meas
       FROM road_events
     )
    -- select * from all_meas order by 1,2
     , all_ranges AS
     ( -- create from/to ranges
       SELECT road_id, meas AS from_meas 
         ,Lead(meas)
          Over (PARTITION BY road_id
                ORDER BY meas) AS to_meas
       FROM all_meas
      )
     -- SELECT * from all_ranges order by 1,2
    , all_event_ranges AS
     ( -- now match the ranges to the event ranges
       SELECT 
          ar.*
         ,re.event_id
         ,re.year
         ,re.total_road_length
         ,ar.to_meas - ar.from_meas AS event_length
         -- used to filter the latest event as multiple events might cover the same range 
         ,Row_Number()
          Over (PARTITION BY ar.road_id, ar.from_meas
                ORDER BY year DESC) AS rn
       FROM all_ranges ar
       JOIN road_events re
         ON ar.road_id = re.road_id
        AND ar.from_meas < re.to_meas
        AND ar.to_meas > re.from_meas
       WHERE ar.to_meas IS NOT NULL
     )
    SELECT event_id, road_id, year, total_road_length, Sum(event_length)
    FROM all_event_ranges
    WHERE rn = 1 -- latest year only
    GROUP BY event_id, road_id, year, total_road_length
    ORDER BY road_id, year DESC;
    

    如果您需要返回实际覆盖的from/to_meas(如在编辑之前的问题中所示),则可能会更复杂。第一部分是相同的,但是在不进行聚合的情况下,查询可以返回具有相同event_id的相邻行(例如,对于事件3:0-1和1-25):
    SELECT * FROM all_event_ranges
    WHERE rn = 1
    ORDER BY road_id, from_meas;
    

    如果要合并相邻的行,还需要两个步骤(使用标准方法,标记组的第一行并计算组号):
    WITH all_meas AS
     (
       SELECT road_id, from_meas AS meas
       FROM road_events
       UNION
       SELECT road_id, to_meas
       FROM road_events
     )
    -- select * from all_meas order by 1,2
     , all_ranges AS
     ( 
       SELECT road_id, meas AS from_meas 
         ,Lead(meas)
          Over (PARTITION BY road_id
                ORDER BY meas) AS to_meas
       FROM all_meas
      )
    -- SELECT * from all_ranges order by 1,2
    , all_event_ranges AS
     (
       SELECT 
          ar.*
         ,re.event_id
         ,re.year
         ,re.total_road_length
         ,ar.to_meas - ar.from_meas AS event_length
         ,Row_Number()
          Over (PARTITION BY ar.road_id, ar.from_meas
                ORDER BY year DESC) AS rn
       FROM all_ranges ar
       JOIN road_events  re
         ON ar.road_id = re.road_id
        AND ar.from_meas < re.to_meas
        AND ar.to_meas > re.from_meas
       WHERE ar.to_meas IS NOT NULL
     )
    -- SELECT * FROM all_event_ranges WHERE rn = 1 ORDER BY road_id, from_meas
    , adjacent_events AS 
     ( -- assign 1 to the 1st row of an event
       SELECT t.*
         ,CASE WHEN Lag(event_id)
                    Over(PARTITION BY road_id
                         ORDER BY from_meas) = event_id
               THEN 0 
               ELSE 1 
          END AS flag
       FROM all_event_ranges t
       WHERE rn = 1
     )
    -- SELECT * FROM adjacent_events ORDER BY road_id, from_meas 
    , grouped_events AS
     ( -- assign a groupnumber to adjacent rows using a Cumulative Sum over 0/1
       SELECT t.*
         ,Sum(flag)
          Over (PARTITION BY road_id
                ORDER BY from_meas
                ROWS Unbounded Preceding) AS grp
       FROM adjacent_events t
    )
    -- SELECT * FROM grouped_events ORDER BY  road_id, from_meas
    SELECT event_id, road_id, year, Min(from_meas), Max(to_meas), total_road_length, Sum(event_length)
    FROM grouped_events
    GROUP BY event_id, road_id, grp, year, total_road_length
    ORDER BY 2, Min(from_meas);
    

    编辑:

    Ups,我刚刚发现一个博客Overlapping ranges with priority与某些简化的Oracle语法完全相同。实际上,我将查询从Teradata中的其他一些简化语法转换为Standard / Oracle SQL :-)

    关于sql - 汇总重叠段以测量有效长度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52081473/

    相关文章:

    sql - 为什么索引不用于某些值?

    mysql - SQL根据某些数据进行选择

    Mysql 根据条件选择不重复的值

    mysql - 如何找到点,根据紧密程度排序,得到1-20、21-40等。高效使用Myisam和mysql以及空间索引

    java - 丢失更新与不可重复读取有何不同?

    sql - 更改oracle中的同义词

    sql - 如何获得今年年初以来过去了多少天?

    sql - 如何根据连接更新特定行的值?

    mysql - 为什么这个左外连接会产生错误的结果?

    mysql UPDATE 说列不能为空。为什么它是空的?