sql – 在窗口中的n个类别之间进行更改之前查找最大值,以便在类别之间进行m> n次更改

我有一个类似于以下测试数据的数据集:

create table #colors (mon int, grp varchar(1), color varchar(5)) 
insert #colors values 
(201501,'A','Red'),
(201502,'A','Red'),
(201503,'A','Red'),
(201504,'A','Red'),
(201505,'A','Red'),
(201506,'A','Red'),
(201501,'B','Red'),
(201502,'B','Red'),
(201503,'B','Blue'),
(201504,'B','Blue'),
(201505,'B','Blue'),
(201506,'B','Blue'),
(201501,'C','Red'),
(201502,'C','Red'),
(201503,'C','Blue'),
(201504,'C','Green'),
(201505,'C','Green'),
(201506,'C','Green'),
(201501,'D','Red'),
(201502,'D','Red'),
(201503,'D','Blue'),
(201504,'D','Blue'),
(201505,'D','Red'),
(201506,'D','Red')

我想知道每个小组在颜色方面所采取的路径,以及最近一个月类别是颜色变化之前的特定颜色.以这种方式,与颜色相关联的月份用作类别 – 颜色组合的上时间界限.

我已经尝试使用CTE和row_number()函数执行此操作,如下面的代码中所示,但它不能正常工作.

以下是示例代码:

; with colors (grp, color, mon, rn) as (
    select  grp
        ,   color
        ,   mon
        ,   row_number() over (partition by grp order by mon asc) rn
    from    (
        select  grp
            ,   color
            ,   max(mon) mon
        from    #colors
        group by grp, color
        ) as z
    )
    select  grp
        ,   firstColor
        ,   firstMonth
        ,   secondColor
        ,   secondMonth
        ,   thirdColor
        ,   thirdMonth
    from    (
        select  c1.grp
            ,   c1.color firstColor
            ,   c1.mon firstMonth
            ,   c2.color secondColor
            ,   c2.mon secondMonth
            ,   c3.color thirdColor
            ,   c3.mon thirdMonth
            ,   row_number() over (partition by c1.grp order by c1.mon asc) rn
        from    colors c1 left outer join colors c2 on (
                        c1.grp = c2.grp
                    and c1.color <> c2.color
                    and c1.rn = c2.rn - 1
                ) left outer join colors c3 on (
                        c1.grp = c3.grp
                    and c2.color <> c3.color
                    and c2.rn = c3.rn - 1
                )
        ) as d
    where   rn = 1
    order by grp

这导致以下(不正确)结果集:

如你所见,没有迹象表明D组的原始颜色是红色 – 它应该是红色(201502) – >蓝色(201504) – >红色(201506).这是因为使用了max()函数,但删除它需要以我无法推断的方式修改连接逻辑.

我已经尝试删除max()函数并更改row_number()上的分区以包含颜色,但我认为这在逻辑上会减少到相同的集合.

当类别少于这些类别之间的更改时,如何计算方案?

最佳答案 我采取了不同的方法,通常我会避免“预定义”列中的月数(如果可能的话).这是一个可以将月分成行的解决方案,但它实际上将结果组合成预期的输出格式:

WITH nCTE (mon, grp, color, n) AS (
  SELECT *, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY mon) n
  FROM colors
), monthsCTE (mon, grp, color, n) AS (
  SELECT l.mon, l.grp, l.color, ROW_NUMBER() OVER(PARTITION BY l.grp ORDER BY l.mon) n
  FROM nCTE l LEFT JOIN nCTE r
    ON l.grp = r.grp AND l.n = r.n - 1
  WHERE l.color != r.color OR r.color IS NULL
)

SELECT m1.grp, m1.color, m1.mon, m2.color, m2.mon, m3.color, m3.mon
FROM monthsCTE m1 LEFT JOIN monthsCTE m2
  ON m1.grp = m2.grp AND m2.n = 2 LEFT JOIN monthsCTE m3
  ON m1.grp = m3.grp AND m3.n = 3
WHERE m1.n = 1
ORDER BY 1

一个fiddle

您可以使用monthsCTE的“inside”而不是外部SELECT来获得单独行中的结果(那么您不需要ROW_NUMBER …部分),或者保留它像这样……

EDIT: It’s actually easier to do what you REALLY wanted. Just remove the GROUP BY clause (and the interrupting MAX() functions).

EDIT2: As noted by Me.Name, old solution would fail over years. Corrected code fragment & fiddle.

点赞