我有以下数据表
Animal Immunization_Date
Cat 1/18/2017
Cat 1/27/2017
Cat 5/7/2017
Cat 5/12/2017
Dog 1/1/2017
Dog 1/5/2017
Dog 1/7/2017
Dog 3/25/2017
Dog 4/18/2017
我正在尝试根据动物 10 天的间隔创建排名,这将导致以下结果。 (查找动物的第一个日期,然后在该日期 10 天内的任何日期分配一组 1
。然后为未分配给 1
的动物取下一个日期并将其分配给 2
和然后将 2
分配给该日期 10 天内的任何日期等...)
Animal Immunization_Date 10_Day_Group_Rank
Cat 1/18/2017 1
Cat 1/27/2017 1
Cat 5/7/2017 2
Cat 5/12/2017 2
Dog 1/1/2017 1
Dog 1/5/2017 1
Dog 1/7/2017 1
Dog 3/25/2017 2
Dog 4/18/2017 3
我一直在尝试以下代码,但我似乎无法让 10 天组工作。
Select
dt.Animal,
dt.Immunization_Date,
sum(dt.10_day_Group) over(partition dt.Animal order by dt.Immunization_Date rows unbounded preceding) as 10_day_Group --creates a running total that is also the group
from
(
Select
Animal,
Immunization_Date,
case when min(Immunization_Date) over (partition by Animal order by Immunization_Date) <=10 then 1 else 0 end as 10_Day_Group --Create intervals of 10 days
from Table_A
) as dt
我不太确定如何将 10 天分组。
case when min(Immunization_Date) over (partition by Animal order by Immunization_Date) <=10 then 1 else 0 end as 10_Day_Group
我可以使用以下命令在 Excel 中执行此操作。我知道 excel 和 SQL 是不同的,但我希望如果有什么可以在 SQL 中完成的事情,我希望能看到如何在 Excel 中完成。
Excel 数据表如下所示(表从单元格 A1
开始)。 (注意 Animal
需要排序,并且 Immunization_Date
需要排序才能使 Excel 公式发挥作用)
Animal Immunization_Date Dummy_1 10_Day_Group
Cat 1/18/2017 1/18/2017 1
Cat 1/27/2017 1/18/2017 1
Cat 5/7/2017 5/7/2017 2
Cat 5/12/2017 5/7/2017 2
Dog 1/1/2017 1/1/2017 1
Dog 1/5/2017 1/1/2017 1
Dog 1/7/2017 1/1/2017 1
Dog 3/25/2017 3/25/2017 2
Dog 4/18/2017 4/18/2017 3
Dummy_1
的公式是以下
IFERROR(IF(AND(A2=A1,B2-C1<=10),C1,B2),B2)
10_Day_Group
的公式是以下
IFERROR(IF(AND(C2=C1,A2=A1),D1,IF(AND(A2=A1,C2<>C1),D1+1,1)),1)
最佳答案
@MatBailie 的递归答案非常好,但是当每只动物的行数增加时,性能会变差。
当第一个 CTE 可以在 volatile 表中实现时,它将降低资源使用率(因为 Teradata 的优化器不会实现这个结果,该死):
CREATE VOLATILE TABLE boundaries AS
(
SELECT
i.*, -- need to add the alias
(
SELECT MIN(immunization_date)
FROM immunizations
WHERE animal = i.animal
AND immunization_date >= i.immunization_date + 10
)
AS next_boundary_date
FROM
immunizations i
)
WITH DATA
UNIQUE PRIMARY INDEX(animal, immunization_date)
ON COMMIT PRESERVE ROWS;
但是当您可以使用临时表时,您也可以使用简单的递归:
CREATE VOLATILE TABLE vt AS
(
SELECT
animal,
immunization_date,
Row_Number() -- add row number to simplify recursive processing
Over (PARTITION BY animal
ORDER BY immunization_date) AS rn
FROM immunizations AS i
)
WITH DATA
UNIQUE PRIMARY INDEX(animal, rn)
ON COMMIT PRESERVE ROWS;
WITH RECURSIVE cte AS
(
SELECT
animal, immunization_date, rn,
immunization_date+10 AS end_date, -- define the end of the range
1 AS grp -- SMALLINT = limited to 127 group, CAST to a larger INT for more groups
FROM vt
WHERE rn = 1 -- oldest row
UNION ALL
SELECT
vt.animal, vt.immunization_date, vt.rn,
-- check if the current row's date is within the 10 day range
-- otherwise increase the group number and define the new range end
CASE WHEN vt.immunization_date < end_date THEN cte.end_date ELSE vt.immunization_date +10 END,
CASE WHEN vt.immunization_date < end_date THEN cte.grp ELSE cte.grp+1 END
FROM cte
JOIN vt
ON vt.animal = cte.animal
AND vt.rn = cte.rn+1
)
SELECT *
FROM cte
ORDER BY 1,2
关于sql - 根据 10 天间隔创建组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49881060/