我一直在阅读使用CTE的递归查询的以下
Microsoft article,似乎无法解决如何将它用于组常见项目.
我有一个包含以下列的表:
> ID
> FirstName
>姓氏
> DateOfBirth
> BirthCountry
> GroupID
我需要做的是从表格中的第一个人开始,遍历表格,找到所有具有相同(LastName和BirthCountry)或具有相同(DateOfBirth和BirthCountry)的人.
现在棘手的部分是我必须为它们分配相同的GroupID,然后对于该GroupID中的每个人,我需要查看是否有其他人拥有相同的信息然后将它们放在相同的GroupID中.
我想我可以用多个游标做到这一点,但它变得棘手.
这是样本数据和输出.
ID FirstName LastName DateOfBirth BirthCountry GroupID
----------- ---------- ---------- ----------- ------------ -----------
1 Jonh Doe 1983-01-01 Grand 100
2 Jack Stone 1976-06-08 Grand 100
3 Jane Doe 1982-02-08 Grand 100
4 Adam Wayne 1983-01-01 Grand 100
5 Kay Wayne 1976-06-08 Grand 100
6 Matt Knox 1983-01-01 Hay 101
> John Doe和Jane Doe属于同一组(100),因为它们具有相同的(LastName和BirthCountry).
> Adam Wayne在Group(100),因为他和John Doe一样(BirthDate和BirthCountry).
> Kay Wayne在Group(100),因为她和Adam Wayne一样(LastName和BirthCountry),他已经在Group(100).
> Matt Knox是一个新组织(101),因为他与以前组中的任何人都不匹配.
> Jack Stone是一个团体(100),因为他和Kay Wayne一样(BirthDate和BirthCountry)已经在Group(100).
数据脚本:
CREATE TABLE #Tbl(
ID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DateOfBirth DATE,
BirthCountry VARCHAR(50),
GroupID INT NULL
);
INSERT INTO #Tbl VALUES
(1, 'Jonh', 'Doe', '1983-01-01', 'Grand', NULL),
(2, 'Jack', 'Stone', '1976-06-08', 'Grand', NULL),
(3, 'Jane', 'Doe', '1982-02-08', 'Grand', NULL),
(4, 'Adam', 'Wayne', '1983-01-01', 'Grand', NULL),
(5, 'Kay', 'Wayne', '1976-06-08', 'Grand', NULL),
(6, 'Matt', 'Knox', '1983-01-01', 'Hay', NULL);
最佳答案 这就是我想出来的.我很少编写递归查询,所以这对我来说是一个很好的做法.顺便说一句,Kay和Adam不会在您的样本数据中共享出生国家.
with data as (
select
LastName, DateOfBirth, BirthCountry,
row_number() over (order by LastName, DateOfBirth, BirthCountry) as grpNum
from T group by LastName, DateOfBirth, BirthCountry
), r as (
select
d.LastName, d.DateOfBirth, d.BirthCountry, d.grpNum,
cast('|' + cast(d.grpNum as varchar(8)) + '|' as varchar(1024)) as equ
from data as d
union all
select
d.LastName, d.DateOfBirth, d.BirthCountry, r.grpNum,
cast(r.equ + cast(d.grpNum as varchar(8)) + '|' as varchar(1024))
from r inner join data as d
on d.grpNum > r.grpNum
and charindex('|' + cast(d.grpNum as varchar(8)) + '|', r.equ) = 0
and (d.LastName = r.LastName or d.DateOfBirth = r.DateOfBirth)
and d.BirthCountry = r.BirthCountry
), g as (
select LastName, DateOfBirth, BirthCountry, min(grpNum) as grpNum
from r group by LastName, DateOfBirth, BirthCountry
)
select t.*, dense_rank() over (order by g.grpNum) + 100 as GroupID
from T as t
inner join g
on g.LastName = t.LastName
and g.DateOfBirth = t.DateOfBirth
and g.BirthCountry = t.BirthCountry
对于递归终止,必须跟踪等价(通过字符串连接),以便在每个级别只需要考虑新发现的等价(或连接,转换等).请注意,我避免使用单词组避免流入GROUP BY概念.
http://rextester.com/edit/TVRVZ10193
编辑:我使用几乎任意数字的等价,但如果你希望它们出现在基于最低ID的序列中,每个块都很容易.当然,不是使用row_number()说min(ID)作为grpNum,假设ID是唯一的.