sql - 在 T-SQL 中计算相交时间间隔

标签 sql sql-server tsql sql-server-2012

代码:

CREATE TABLE #Temp1 (CoachID INT, BusyST DATETIME, BusyET DATETIME)
CREATE TABLE #Temp2 (CoachID INT, AvailableST DATETIME, AvailableET DATETIME)

INSERT INTO #Temp1 (CoachID, BusyST, BusyET)
SELECT 1,'2016-08-17 09:12:00','2016-08-17 10:11:00'
UNION
SELECT 3,'2016-08-17 09:30:00','2016-08-17 10:00:00'
UNION
SELECT 4,'2016-08-17 12:07:00','2016-08-17 13:10:00'

INSERT INTO #Temp2 (CoachID, AvailableST, AvailableET)
SELECT 1,'2016-08-17 09:07:00','2016-08-17 11:09:00'
UNION
SELECT 2,'2016-08-17 09:11:00','2016-08-17 09:30:00'
UNION
SELECT 3,'2016-08-17 09:24:00','2016-08-17 13:08:00'
UNION
SELECT 1,'2016-08-17 11:34:00','2016-08-17 12:27:00'
UNION
SELECT 4,'2016-08-17 09:34:00','2016-08-17 13:00:00'
UNION
SELECT 5,'2016-08-17 09:10:00','2016-08-17 09:55:00'

--RESULT-SET QUERY GOES HERE

DROP TABLE #Temp1
DROP TABLE #Temp2

期望的输出:

CoachID CanCoachST                  CanCoachET                  NumOfCoaches
1       2016-08-17 09:12:00.000     2016-08-17 09:24:00.000     2 --(ID2 = 2,5)
1       2016-08-17 09:24:00.000     2016-08-17 09:30:00.000     3 --(ID2 = 2,3,5)
1       2016-08-17 09:30:00.000     2016-08-17 09:34:00.000     1 --(ID2 = 5)
1       2016-08-17 09:34:00.000     2016-08-17 09:55:00.000     2 --(ID2 = 4,5)
1       2016-08-17 09:55:00.000     2016-08-17 10:00:00.000     1 --(ID2 = 4)
1       2016-08-17 10:00:00.000     2016-08-17 10:11:00.000     2 --(ID2 = 3,4)
3       2016-08-17 09:30:00.000     2016-08-17 09:34:00.000     1 --(ID2 = 5)
3       2016-08-17 09:34:00.000     2016-08-17 09:55:00.000     2 --(ID2 = 4,5)
3       2016-08-17 09:55:00.000     2016-08-17 10:00:00.000     1 --(ID2 = 4)
4       2016-08-17 12:07:00.000     2016-08-17 12:27:00.000     2 --(ID2 = 1,3)
4       2016-08-17 12:27:00.000     2016-08-17 13:08:00.000     1 --(ID2 = 3)
4       2016-08-17 13:08:00.000     2016-08-17 13:10:00.000     0 --(No one is available)

目标: 将 #Temp1 视为团队教练 (ID1) 及其 session 时间(ST1 = session 开始时间,ET1 = session 结束时间)的表。 将 #Temp2 视为团队教练 (ID2) 及其总可用时间(ST2 = 可用开始时间,ET2 = 可用结束时间)的表。

现在,我们的目标是从 #Temp2 中找到所有可能的教练,这些教练可以在 #Temp1 的教练 session 时间进行指导。

例如,对于 ID1 = 1 的教练,他在 9:12 到 10:11 之间很忙(如果该信息很重要,数据可以跨越多天),我们有 教练 ID2 = 2 和 5,可以在 9:12 到 9:24 之间教练 ,教练 ID2 = 2,3, 5 可以在 9:24 到 9:30 之间教练 ,教练 ID2 = 5,可以在 9:30 到 9:34 之间教练 ,教练 ID2 = 4 和 5,可以在 9:34 到 9:55 之间教练 ,教练ID2 = 4,可以在9:55到10:00之间教练 ,并且教练 ID2 = 3 和 4 可以在 10:00 到 10:11 之间进行教练(请注意,ID 3 虽然在 9:24 到 13:08 之间在 #Temp2 表中可用,但它无法在 10:00 到 13:08 之间进行教练 ID1 = 1 9:24 和 10:00,因为 9:30 到 10:00 之间也很忙。

到目前为止我的努力:到目前为止只处理打破 #Temp1 的时间范围。仍然需要弄清楚 A) 如何从输出中删除非繁忙时间窗口 B) 添加一个字段/将其映射到右侧 T1 的 CoachID。

;WITH ED
AS (SELECT BusyET, CoachID FROM #Temp1  
    UNION ALL   
    SELECT BusyST, CoachID FROM #Temp1
    )
,Brackets
AS (SELECT MIN(BusyST) AS BusyST
        ,(  SELECT MIN(BusyET)
            FROM ED e
            WHERE e.BusyET > MIN(BusyST)
            ) AS BusyET
    FROM #Temp1 T   
    UNION ALL   
    SELECT B.BusyET
        ,e.BusyET
    FROM Brackets B
    INNER JOIN ED E ON B.BusyET < E.BusyET
    WHERE NOT EXISTS (
            SELECT *
            FROM ED E2
            WHERE E2.BusyET > B.BusyET
                AND E2.BusyET < E.BusyET
            )
    )
SELECT *
FROM Brackets
ORDER BY BusyST;

我认为我需要加入比较 ID 彼此不匹配的两个表之间的 ST/ET 日期。但我无法弄清楚如何实际获取 session 时间窗口和唯一计数。

更新了更好的架构/数据集。另请注意,尽管 CoachID 4 未“计划”可用,但他在最后几分钟仍被列为忙碌。并且可能存在在这段时间没有其他人可以工作的情况,在这种情况下,我们可以返回 0 cnt 记录(如果真的很复杂,则不返回它)。

同样,我们的目标是找到所有可用 CoachID 的计数和组合及其可用时间窗口,以指导繁忙表中列出的 CoachID。

更新了更多与示例数据匹配的示例描述。

最佳答案

此答案中的查询受到 Packing Intervals 的启发。作者:Itzik Ben-Gan。

<小时/>

起初,我并不理解要求的全部复杂性,并假设 Table1Table2 中的间隔不重叠。我认为同一个教练不可能同时忙碌和空闲。

事实证明我的假设是错误的,因此我在下面留下的查询的第一个变体必须通过初步步骤进行扩展,该步骤从存储在 Table1 中的间隔中减去存储在 Table1 中的所有间隔表2

它使用了类似的想法。每个“可用”间隔的开始都用 +1 EventType 标记,“可用”间隔的结束用 -1 EventType 标记。对于“忙”间隔,标记相反。 “忙”间隔以 -1 开始,以 +1 结束。这是在 C1_Subtract 中完成的。

然后运行总计告诉我们“真正”可用的间隔在哪里 (C2_Subtract)。最后,CTE_Available 只留下“真正”可用的间隔。

示例数据

我添加了几行来说明如果没有可用的教练会发生什么。我还添加了 CoachID=9,它不在查询的第一个变体的初始结果中。

CREATE TABLE #Temp1 (CoachID INT, BusyST DATETIME, BusyET DATETIME);
CREATE TABLE #Temp2 (CoachID INT, AvailableST DATETIME, AvailableET DATETIME);
-- Start time is inclusive
-- End time is exclusive

INSERT INTO #Temp1 (CoachID, BusyST, BusyET) VALUES
(1, '2016-08-17 09:12:00','2016-08-17 10:11:00'),
(3, '2016-08-17 09:30:00','2016-08-17 10:00:00'),
(4, '2016-08-17 12:07:00','2016-08-17 13:10:00'),

(6, '2016-08-17 15:00:00','2016-08-17 16:00:00'),
(9, '2016-08-17 15:00:00','2016-08-17 16:00:00');

INSERT INTO #Temp2 (CoachID, AvailableST, AvailableET) VALUES
(1,'2016-08-17 09:07:00','2016-08-17 11:09:00'),
(2,'2016-08-17 09:11:00','2016-08-17 09:30:00'),
(3,'2016-08-17 09:24:00','2016-08-17 13:08:00'),
(1,'2016-08-17 11:34:00','2016-08-17 12:27:00'),
(4,'2016-08-17 09:34:00','2016-08-17 13:00:00'),
(5,'2016-08-17 09:10:00','2016-08-17 09:55:00'),

(7,'2016-08-17 15:10:00','2016-08-17 15:20:00'),
(8,'2016-08-17 15:15:00','2016-08-17 15:25:00'),
(7,'2016-08-17 15:40:00','2016-08-17 15:55:00'),
(9,'2016-08-17 15:05:00','2016-08-17 15:07:00'),
(9,'2016-08-17 15:40:00','2016-08-17 16:55:00');

CTE_Available的中间结果

+---------+-------------------------+-------------------------+
| CoachID |       AvailableST       |       AvailableET       |
+---------+-------------------------+-------------------------+
|       1 | 2016-08-17 09:07:00.000 | 2016-08-17 09:12:00.000 |
|       1 | 2016-08-17 10:11:00.000 | 2016-08-17 11:09:00.000 |
|       1 | 2016-08-17 11:34:00.000 | 2016-08-17 12:27:00.000 |
|       2 | 2016-08-17 09:11:00.000 | 2016-08-17 09:30:00.000 |
|       3 | 2016-08-17 09:24:00.000 | 2016-08-17 09:30:00.000 |
|       3 | 2016-08-17 10:00:00.000 | 2016-08-17 13:08:00.000 |
|       4 | 2016-08-17 09:34:00.000 | 2016-08-17 12:07:00.000 |
|       5 | 2016-08-17 09:10:00.000 | 2016-08-17 09:55:00.000 |
|       7 | 2016-08-17 15:10:00.000 | 2016-08-17 15:20:00.000 |
|       7 | 2016-08-17 15:40:00.000 | 2016-08-17 15:55:00.000 |
|       8 | 2016-08-17 15:15:00.000 | 2016-08-17 15:25:00.000 |
|       9 | 2016-08-17 16:00:00.000 | 2016-08-17 16:55:00.000 |
+---------+-------------------------+-------------------------+

现在,我们可以在查询的第一个变体中使用 CTE_Available 的这些中间结果,而不是 #Temp2。请参阅查询第一个变体下面的详细说明。

完整查询

WITH
C1_Subtract
AS
(
    SELECT
        CoachID
        ,AvailableST AS ts
        ,+1 AS EventType
    FROM #Temp2

    UNION ALL

    SELECT
        CoachID
        ,AvailableET AS ts
        ,-1 AS EventType
    FROM #Temp2

    UNION ALL

    SELECT
        CoachID
        ,BusyST AS ts
        ,-1 AS EventType
    FROM #Temp1

    UNION ALL

    SELECT
        CoachID
        ,BusyET AS ts
        ,+1 AS EventType
    FROM #Temp1
)
,C2_Subtract AS
(
    SELECT
        C1_Subtract.*
        ,SUM(EventType)
            OVER (
            PARTITION BY CoachID
            ORDER BY ts, EventType DESC
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
        AS cnt
        ,LEAD(ts) 
            OVER (
            PARTITION BY CoachID
            ORDER BY ts, EventType DESC)
        AS NextTS
    FROM C1_Subtract
)
,CTE_Available
AS
(
    SELECT
        C2_Subtract.CoachID
        ,C2_Subtract.ts AS AvailableST
        ,C2_Subtract.NextTS AS AvailableET
    FROM C2_Subtract
    WHERE cnt > 0
)
,CTE_Intervals
AS
(
    SELECT
        TBusy.CoachID AS BusyCoachID
        ,TBusy.BusyST
        ,TBusy.BusyET
        ,CA.CoachID AS AvailableCoachID
        ,CA.AvailableST
        ,CA.AvailableET
        -- max of start time
        ,CASE WHEN CA.AvailableST < TBusy.BusyST
        THEN TBusy.BusyST
        ELSE CA.AvailableST 
        END AS ST
        -- min of end time
        ,CASE WHEN CA.AvailableET > TBusy.BusyET
        THEN TBusy.BusyET
        ELSE CA.AvailableET
        END AS ET
    FROM
        #Temp1 AS TBusy
        CROSS APPLY
        (
            SELECT
                TAvailable.*
            FROM
                CTE_Available AS TAvailable
            WHERE
                -- the same coach can't be available and busy
                TAvailable.CoachID <> TBusy.CoachID
                -- intervals intersect
                AND TAvailable.AvailableST < TBusy.BusyET
                AND TAvailable.AvailableET > TBusy.BusyST
        ) AS CA
)
,C1 AS
(
    SELECT
        BusyCoachID
        ,AvailableCoachID
        ,ST AS ts
        ,+1 AS EventType
    FROM CTE_Intervals

    UNION ALL

    SELECT
        BusyCoachID
        ,AvailableCoachID
        ,ET AS ts
        ,-1 AS EventType
    FROM CTE_Intervals

    UNION ALL

    SELECT
        CoachID AS BusyCoachID
        ,CoachID AS AvailableCoachID
        ,BusyST AS ts
        ,+1 AS EventType
    FROM #Temp1

    UNION ALL

    SELECT
        CoachID AS BusyCoachID
        ,CoachID AS AvailableCoachID
        ,BusyET AS ts
        ,-1 AS EventType
    FROM #Temp1
)
,C2 AS
(
    SELECT
        C1.*
        ,SUM(EventType)
            OVER (
            PARTITION BY BusyCoachID
            ORDER BY ts, EventType DESC, AvailableCoachID
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
        - 1 AS cnt
        ,LEAD(ts) 
            OVER (
            PARTITION BY BusyCoachID 
            ORDER BY ts, EventType DESC, AvailableCoachID) 
        AS NextTS
    FROM C1
)
SELECT
    BusyCoachID AS CoachID
    ,ts AS CanCoachST
    ,NextTS AS CanCoachET
    ,cnt AS NumOfCoaches
FROM C2
WHERE ts <> NextTS
ORDER BY BusyCoachID, CanCoachST
;

最终结果

+---------+-------------------------+-------------------------+--------------+
| CoachID |       CanCoachST        |       CanCoachET        | NumOfCoaches |
+---------+-------------------------+-------------------------+--------------+
|       1 | 2016-08-17 09:12:00.000 | 2016-08-17 09:24:00.000 |            2 |
|       1 | 2016-08-17 09:24:00.000 | 2016-08-17 09:30:00.000 |            3 |
|       1 | 2016-08-17 09:30:00.000 | 2016-08-17 09:34:00.000 |            1 |
|       1 | 2016-08-17 09:34:00.000 | 2016-08-17 09:55:00.000 |            2 |
|       1 | 2016-08-17 09:55:00.000 | 2016-08-17 10:00:00.000 |            1 |
|       1 | 2016-08-17 10:00:00.000 | 2016-08-17 10:11:00.000 |            2 |
|       3 | 2016-08-17 09:30:00.000 | 2016-08-17 09:34:00.000 |            1 |
|       3 | 2016-08-17 09:34:00.000 | 2016-08-17 09:55:00.000 |            2 |
|       3 | 2016-08-17 09:55:00.000 | 2016-08-17 10:00:00.000 |            1 |
|       4 | 2016-08-17 12:07:00.000 | 2016-08-17 12:27:00.000 |            2 |
|       4 | 2016-08-17 12:27:00.000 | 2016-08-17 13:08:00.000 |            1 |
|       4 | 2016-08-17 13:08:00.000 | 2016-08-17 13:10:00.000 |            0 |
|       6 | 2016-08-17 15:00:00.000 | 2016-08-17 15:10:00.000 |            0 |
|       6 | 2016-08-17 15:10:00.000 | 2016-08-17 15:15:00.000 |            1 |
|       6 | 2016-08-17 15:15:00.000 | 2016-08-17 15:20:00.000 |            2 |
|       6 | 2016-08-17 15:20:00.000 | 2016-08-17 15:25:00.000 |            1 |
|       6 | 2016-08-17 15:25:00.000 | 2016-08-17 15:40:00.000 |            0 |
|       6 | 2016-08-17 15:40:00.000 | 2016-08-17 15:55:00.000 |            1 |
|       6 | 2016-08-17 15:55:00.000 | 2016-08-17 16:00:00.000 |            0 |
|       9 | 2016-08-17 15:00:00.000 | 2016-08-17 15:10:00.000 |            0 |
|       9 | 2016-08-17 15:10:00.000 | 2016-08-17 15:15:00.000 |            1 |
|       9 | 2016-08-17 15:15:00.000 | 2016-08-17 15:20:00.000 |            2 |
|       9 | 2016-08-17 15:20:00.000 | 2016-08-17 15:25:00.000 |            1 |
|       9 | 2016-08-17 15:25:00.000 | 2016-08-17 15:40:00.000 |            0 |
|       9 | 2016-08-17 15:40:00.000 | 2016-08-17 15:55:00.000 |            1 |
|       9 | 2016-08-17 15:55:00.000 | 2016-08-17 16:00:00.000 |            0 |
+---------+-------------------------+-------------------------+--------------+

我建议创建以下索引以避免执行计划中的某些排序。

CREATE UNIQUE NONCLUSTERED INDEX [IX_CoachID_BusyST] ON #Temp1
(
    CoachID ASC,
    BusyST ASC
);

CREATE UNIQUE NONCLUSTERED INDEX [IX_CoachID_BusyET] ON #Temp1
(
    CoachID ASC,
    BusyET ASC
);

CREATE UNIQUE NONCLUSTERED INDEX [IX_CoachID_AvailableST] ON #Temp2
(
    CoachID ASC,
    AvailableST ASC
);

CREATE UNIQUE NONCLUSTERED INDEX [IX_CoachID_AvailableET] ON #Temp2
(
    CoachID ASC,
    AvailableET ASC
);

但是,在实际数据上,瓶颈可能在其他地方,这可能取决于数据分布。该查询相当复杂,在没有实际数据的情况下对其进行调整将需要过多的猜测。

<小时/>

查询的第一个变体

逐步、CTE 到 CTE 运行查询并检查中间结果以了解其工作原理。

CTE_Intervals 为我们提供了与繁忙间隔相交的可用间隔列表。 C1 将开始时间和结束时间与相应的 EventType 放在同一列中。这将帮助我们跟踪间隔何时开始或结束。 EventType 的运行总数给出了可用教练的数量。 C1 将忙碌的教练加入其中,以便在没有教练可用时正确计算案例。

WITH
CTE_Intervals
AS
(
    SELECT
        TBusy.CoachID AS BusyCoachID
        ,TBusy.BusyST
        ,TBusy.BusyET
        ,CA.CoachID AS AvailableCoachID
        ,CA.AvailableST
        ,CA.AvailableET
        -- max of start time
        ,CASE WHEN CA.AvailableST < TBusy.BusyST
        THEN TBusy.BusyST
        ELSE CA.AvailableST 
        END AS ST
        -- min of end time
        ,CASE WHEN CA.AvailableET > TBusy.BusyET
        THEN TBusy.BusyET
        ELSE CA.AvailableET
        END AS ET
    FROM
        #Temp1 AS TBusy
        CROSS APPLY
        (
            SELECT
                TAvailable.*
            FROM
                #Temp2 AS TAvailable
            WHERE
                -- the same coach can't be available and busy
                TAvailable.CoachID <> TBusy.CoachID
                -- intervals intersect
                AND TAvailable.AvailableST < TBusy.BusyET
                AND TAvailable.AvailableET > TBusy.BusyST
        ) AS CA
)
,C1 AS
(
    SELECT
        BusyCoachID
        ,AvailableCoachID
        ,ST AS ts
        ,+1 AS EventType
    FROM CTE_Intervals

    UNION ALL

    SELECT
        BusyCoachID
        ,AvailableCoachID
        ,ET AS ts
        ,-1 AS EventType
    FROM CTE_Intervals

    UNION ALL

    SELECT
        CoachID AS BusyCoachID
        ,CoachID AS AvailableCoachID
        ,BusyST AS ts
        ,+1 AS EventType
    FROM #Temp1

    UNION ALL

    SELECT
        CoachID AS BusyCoachID
        ,CoachID AS AvailableCoachID
        ,BusyET AS ts
        ,-1 AS EventType
    FROM #Temp1
)
,C2 AS
(
    SELECT
        C1.*
        ,SUM(EventType)
            OVER (
            PARTITION BY BusyCoachID
            ORDER BY ts, EventType DESC, AvailableCoachID
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
        - 1 AS cnt
        ,LEAD(ts) 
            OVER (
            PARTITION BY BusyCoachID 
            ORDER BY ts, EventType DESC, AvailableCoachID) 
        AS NextTS
    FROM C1
)
SELECT
    BusyCoachID AS CoachID
    ,ts AS CanCoachST
    ,NextTS AS CanCoachET
    ,cnt AS NumOfCoaches
FROM C2
WHERE ts <> NextTS
ORDER BY BusyCoachID, CanCoachST
;

DROP TABLE #Temp1;
DROP TABLE #Temp2;

结果

我已为每行添加了注释,其中包含已统计的可用教练的 ID。

现在我明白为什么我的初始结果与您的预期结果不一样了。

+---------+---------------------+---------------------+--------------+
| CoachID |       CanCoachST    |       CanCoachET    | NumOfCoaches |
+---------+---------------------+---------------------+--------------+
|       1 | 2016-08-17 09:12:00 | 2016-08-17 09:24:00 |            2 |  2,5
|       1 | 2016-08-17 09:24:00 | 2016-08-17 09:30:00 |            3 |  2,3,5
|       1 | 2016-08-17 09:30:00 | 2016-08-17 09:34:00 |            2 |  3,5
|       1 | 2016-08-17 09:34:00 | 2016-08-17 09:55:00 |            3 |  3,4,5
|       1 | 2016-08-17 09:55:00 | 2016-08-17 10:11:00 |            2 |  3,4
|       3 | 2016-08-17 09:30:00 | 2016-08-17 09:34:00 |            2 |  1,5
|       3 | 2016-08-17 09:34:00 | 2016-08-17 09:55:00 |            3 |  1,4,5
|       3 | 2016-08-17 09:55:00 | 2016-08-17 10:00:00 |            2 |  1,4
|       4 | 2016-08-17 12:07:00 | 2016-08-17 12:27:00 |            2 |  3,1
|       4 | 2016-08-17 12:27:00 | 2016-08-17 13:08:00 |            1 |  3
|       4 | 2016-08-17 13:08:00 | 2016-08-17 13:10:00 |            0 |  none
|       6 | 2016-08-17 15:00:00 | 2016-08-17 15:10:00 |            0 |  none
|       6 | 2016-08-17 15:10:00 | 2016-08-17 15:15:00 |            1 |  7
|       6 | 2016-08-17 15:15:00 | 2016-08-17 15:20:00 |            2 |  7,8
|       6 | 2016-08-17 15:20:00 | 2016-08-17 15:25:00 |            1 |  8
|       6 | 2016-08-17 15:25:00 | 2016-08-17 15:40:00 |            0 |  none
|       6 | 2016-08-17 15:40:00 | 2016-08-17 15:55:00 |            1 |  7
|       6 | 2016-08-17 15:55:00 | 2016-08-17 16:00:00 |            0 |  none
+---------+---------------------+---------------------+--------------+

关于sql - 在 T-SQL 中计算相交时间间隔,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39009645/

相关文章:

php - 父子关系,无需使用PHP获取并重用返回值

mysql - 选择每个子查询的前 n 行

mysql - 优化 SQL 查找体育(篮球)阵容

sql - 我的 SQL Server 存储过程中的 BIT 值类型存在语法问题

sql - 如何使用可变数据库?

SQL - 在生产环境中运行 RECONFIGURE 是危险的

mysql - 如何获取仅在特定时间段内购买商品的用户(MySQL 数据库)

SQL Server 2005 : Order with NULL values at the end

c# - .NET 中用于处理 SQL Server 中数据的最佳免费库?

SQL Server : loop through every row, 向列添加增量值