sql - 查找窗口中 n 个类别之间的变化之前的最大值(类别之间有 m>n 个变化)

标签 sql sql-server group-by window-functions

我有一个与下面的测试数据类似的数据集:

create table #colors (mon int, grp varchar(1), color varchar(5)) 
insert #colors values 
(201501,'A','Red'),
(201502,'A','Red'),
(201503,'A','Red'),
(201504,'A','Red'),
(201505,'A','Red'),
(201506,'A','Red'),
(201501,'B','Red'),
(201502,'B','Red'),
(201503,'B','Blue'),
(201504,'B','Blue'),
(201505,'B','Blue'),
(201506,'B','Blue'),
(201501,'C','Red'),
(201502,'C','Red'),
(201503,'C','Blue'),
(201504,'C','Green'),
(201505,'C','Green'),
(201506,'C','Green'),
(201501,'D','Red'),
(201502,'D','Red'),
(201503,'D','Blue'),
(201504,'D','Blue'),
(201505,'D','Red'),
(201506,'D','Red')

我想知道每个组在颜色方面采取的路径,以及最近一个月某个类别在颜色更改之前是特定颜色的情况。这样,与颜色关联的月份将充当类别-颜色组合的时间上限。

我尝试使用 CTE 和 row_number() 函数来实现此目的,如下面的代码所示,但它不太有效。

这里是示例代码:

; with colors (grp, color, mon, rn) as (
    select  grp
        ,   color
        ,   mon
        ,   row_number() over (partition by grp order by mon asc) rn
    from    (
        select  grp
            ,   color
            ,   max(mon) mon
        from    #colors
        group by grp, color
        ) as z
    )
    select  grp
        ,   firstColor
        ,   firstMonth
        ,   secondColor
        ,   secondMonth
        ,   thirdColor
        ,   thirdMonth
    from    (
        select  c1.grp
            ,   c1.color firstColor
            ,   c1.mon firstMonth
            ,   c2.color secondColor
            ,   c2.mon secondMonth
            ,   c3.color thirdColor
            ,   c3.mon thirdMonth
            ,   row_number() over (partition by c1.grp order by c1.mon asc) rn
        from    colors c1 left outer join colors c2 on (
                        c1.grp = c2.grp
                    and c1.color <> c2.color
                    and c1.rn = c2.rn - 1
                ) left outer join colors c3 on (
                        c1.grp = c3.grp
                    and c2.color <> c3.color
                    and c2.rn = c3.rn - 1
                )
        ) as d
    where   rn = 1
    order by grp

这会产生以下(不正确)结果集: result set

正如你所看到的,没有迹象表明D组的原始颜色是红色——它应该是红色(201502)-->蓝色(201504)-->红色(201506)。这是因为使用了 max() 函数,但删除它需要以我无法推断的方式修改连接逻辑。

我尝试删除 max() 函数并更改 row_number() 上的分区以包含颜色,但我认为这会减少到相同的集合从逻辑上讲。

当类别数量少于这些类别之间的变化时,我该如何解释这种情况?

最佳答案

我会采取不同的方法,通常我会避免“预定义”列中的月份数(如果可能)。这是一个可以将月份分隔成行的解决方案,但实际上它会将结果组合成预期的输出格式:

WITH nCTE (mon, grp, color, n) AS (
  SELECT *, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY mon) n
  FROM colors
), monthsCTE (mon, grp, color, n) AS (
  SELECT l.mon, l.grp, l.color, ROW_NUMBER() OVER(PARTITION BY l.grp ORDER BY l.mon) n
  FROM nCTE l LEFT JOIN nCTE r
    ON l.grp = r.grp AND l.n = r.n - 1
  WHERE l.color != r.color OR r.color IS NULL
)

SELECT m1.grp, m1.color, m1.mon, m2.color, m2.mon, m3.color, m3.mon
FROM monthsCTE m1 LEFT JOIN monthsCTE m2
  ON m1.grp = m2.grp AND m2.n = 2 LEFT JOIN monthsCTE m3
  ON m1.grp = m3.grp AND m3.n = 3
WHERE m1.n = 1
ORDER BY 1

还有一个fiddle

您可以使用月份CTE的“内部”而不是外部SELECT来获取单独行中的结果(这样您就不需要ROW_NUMBER... > 部分),或者像这样保留...

EDIT: It's actually easier to do what you REALLY wanted. Just remove the GROUP BY clause (and the interrupting MAX() functions).

EDIT2: As noted by Me.Name, old solution would fail over years. Corrected code fragment & fiddle.

关于sql - 查找窗口中 n 个类别之间的变化之前的最大值(类别之间有 m>n 个变化),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31009582/

相关文章:

mysql - MySQL 的 IF() 函数的标准方法

PHP Cookie 不会设置

mysql - 替换 MYSQL 中的非 ASCII 字符

sql-server - Postgres OpenXML

sql - Join 子句中的 case 语句

r - 对于每个 ID,返回 r 中起始列的最早日期和结束列的最晚日期

python - 在 Pandas 的 groupby 对象中绘制每个组的大小

sql - 何时使用单独的日期和时间而不是单个日期时间

sql-server - 在变量中使用语句时,Coldfusion 更新查询会抛出错误

python - 从 pandas dataframe groupby 中提取带有计数的新列