我有一个类似于以下测试数据的数据集:
create table #colors (mon int, grp varchar(1), color varchar(5))
insert #colors values
(201501,'A','Red'),
(201502,'A','Red'),
(201503,'A','Red'),
(201504,'A','Red'),
(201505,'A','Red'),
(201506,'A','Red'),
(201501,'B','Red'),
(201502,'B','Red'),
(201503,'B','Blue'),
(201504,'B','Blue'),
(201505,'B','Blue'),
(201506,'B','Blue'),
(201501,'C','Red'),
(201502,'C','Red'),
(201503,'C','Blue'),
(201504,'C','Green'),
(201505,'C','Green'),
(201506,'C','Green'),
(201501,'D','Red'),
(201502,'D','Red'),
(201503,'D','Blue'),
(201504,'D','Blue'),
(201505,'D','Red'),
(201506,'D','Red')
我想知道每个小组在颜色方面所采取的路径,以及最近一个月类别是颜色变化之前的特定颜色.以这种方式,与颜色相关联的月份用作类别 – 颜色组合的上时间界限.
我已经尝试使用CTE和row_number()函数执行此操作,如下面的代码中所示,但它不能正常工作.
以下是示例代码:
; with colors (grp, color, mon, rn) as (
select grp
, color
, mon
, row_number() over (partition by grp order by mon asc) rn
from (
select grp
, color
, max(mon) mon
from #colors
group by grp, color
) as z
)
select grp
, firstColor
, firstMonth
, secondColor
, secondMonth
, thirdColor
, thirdMonth
from (
select c1.grp
, c1.color firstColor
, c1.mon firstMonth
, c2.color secondColor
, c2.mon secondMonth
, c3.color thirdColor
, c3.mon thirdMonth
, row_number() over (partition by c1.grp order by c1.mon asc) rn
from colors c1 left outer join colors c2 on (
c1.grp = c2.grp
and c1.color <> c2.color
and c1.rn = c2.rn - 1
) left outer join colors c3 on (
c1.grp = c3.grp
and c2.color <> c3.color
and c2.rn = c3.rn - 1
)
) as d
where rn = 1
order by grp
这导致以下(不正确)结果集:
如你所见,没有迹象表明D组的原始颜色是红色 – 它应该是红色(201502) – >蓝色(201504) – >红色(201506).这是因为使用了max()函数,但删除它需要以我无法推断的方式修改连接逻辑.
我已经尝试删除max()函数并更改row_number()上的分区以包含颜色,但我认为这在逻辑上会减少到相同的集合.
当类别少于这些类别之间的更改时,如何计算方案?
最佳答案 我采取了不同的方法,通常我会避免“预定义”列中的月数(如果可能的话).这是一个可以将月分成行的解决方案,但它实际上将结果组合成预期的输出格式:
WITH nCTE (mon, grp, color, n) AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY mon) n
FROM colors
), monthsCTE (mon, grp, color, n) AS (
SELECT l.mon, l.grp, l.color, ROW_NUMBER() OVER(PARTITION BY l.grp ORDER BY l.mon) n
FROM nCTE l LEFT JOIN nCTE r
ON l.grp = r.grp AND l.n = r.n - 1
WHERE l.color != r.color OR r.color IS NULL
)
SELECT m1.grp, m1.color, m1.mon, m2.color, m2.mon, m3.color, m3.mon
FROM monthsCTE m1 LEFT JOIN monthsCTE m2
ON m1.grp = m2.grp AND m2.n = 2 LEFT JOIN monthsCTE m3
ON m1.grp = m3.grp AND m3.n = 3
WHERE m1.n = 1
ORDER BY 1
一个fiddle
您可以使用monthsCTE的“inside”而不是外部SELECT来获得单独行中的结果(那么您不需要ROW_NUMBER …部分),或者保留它像这样……
EDIT: It’s actually easier to do what you REALLY wanted. Just remove the
GROUP BY
clause (and the interruptingMAX()
functions).
EDIT2: As noted by Me.Name, old solution would fail over years. Corrected code fragment & fiddle.