我有一个与下面的测试数据类似的数据集:
create table #colors (mon int, grp varchar(1), color varchar(5))
insert #colors values
(201501,'A','Red'),
(201502,'A','Red'),
(201503,'A','Red'),
(201504,'A','Red'),
(201505,'A','Red'),
(201506,'A','Red'),
(201501,'B','Red'),
(201502,'B','Red'),
(201503,'B','Blue'),
(201504,'B','Blue'),
(201505,'B','Blue'),
(201506,'B','Blue'),
(201501,'C','Red'),
(201502,'C','Red'),
(201503,'C','Blue'),
(201504,'C','Green'),
(201505,'C','Green'),
(201506,'C','Green'),
(201501,'D','Red'),
(201502,'D','Red'),
(201503,'D','Blue'),
(201504,'D','Blue'),
(201505,'D','Red'),
(201506,'D','Red')
我想知道每个组在颜色方面采取的路径,以及最近一个月某个类别在颜色更改之前是特定颜色的情况。这样,与颜色关联的月份将充当类别-颜色组合的时间上限。
我尝试使用 CTE 和 row_number()
函数来实现此目的,如下面的代码所示,但它不太有效。
这里是示例代码:
; with colors (grp, color, mon, rn) as (
select grp
, color
, mon
, row_number() over (partition by grp order by mon asc) rn
from (
select grp
, color
, max(mon) mon
from #colors
group by grp, color
) as z
)
select grp
, firstColor
, firstMonth
, secondColor
, secondMonth
, thirdColor
, thirdMonth
from (
select c1.grp
, c1.color firstColor
, c1.mon firstMonth
, c2.color secondColor
, c2.mon secondMonth
, c3.color thirdColor
, c3.mon thirdMonth
, row_number() over (partition by c1.grp order by c1.mon asc) rn
from colors c1 left outer join colors c2 on (
c1.grp = c2.grp
and c1.color <> c2.color
and c1.rn = c2.rn - 1
) left outer join colors c3 on (
c1.grp = c3.grp
and c2.color <> c3.color
and c2.rn = c3.rn - 1
)
) as d
where rn = 1
order by grp
这会产生以下(不正确)结果集:
正如你所看到的,没有迹象表明D组的原始颜色是红色——它应该是红色(201502)-->蓝色(201504)-->红色(201506)。这是因为使用了 max() 函数,但删除它需要以我无法推断的方式修改连接逻辑。
我尝试删除 max()
函数并更改 row_number()
上的分区以包含颜色,但我认为这会减少到相同的集合从逻辑上讲。
当类别数量少于这些类别之间的变化时,我该如何解释这种情况?
最佳答案
我会采取不同的方法,通常我会避免“预定义”列中的月份数(如果可能)。这是一个可以将月份分隔成行的解决方案,但实际上它会将结果组合成预期的输出格式:
WITH nCTE (mon, grp, color, n) AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY mon) n
FROM colors
), monthsCTE (mon, grp, color, n) AS (
SELECT l.mon, l.grp, l.color, ROW_NUMBER() OVER(PARTITION BY l.grp ORDER BY l.mon) n
FROM nCTE l LEFT JOIN nCTE r
ON l.grp = r.grp AND l.n = r.n - 1
WHERE l.color != r.color OR r.color IS NULL
)
SELECT m1.grp, m1.color, m1.mon, m2.color, m2.mon, m3.color, m3.mon
FROM monthsCTE m1 LEFT JOIN monthsCTE m2
ON m1.grp = m2.grp AND m2.n = 2 LEFT JOIN monthsCTE m3
ON m1.grp = m3.grp AND m3.n = 3
WHERE m1.n = 1
ORDER BY 1
还有一个fiddle
您可以使用月份CTE的“内部”而不是外部SELECT
来获取单独行中的结果(这样您就不需要ROW_NUMBER...
> 部分),或者像这样保留...
EDIT: It's actually easier to do what you REALLY wanted. Just remove the
GROUP BY
clause (and the interruptingMAX()
functions).
EDIT2: As noted by Me.Name, old solution would fail over years. Corrected code fragment & fiddle.
关于sql - 查找窗口中 n 个类别之间的变化之前的最大值(类别之间有 m>n 个变化),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31009582/