实际上这个代码主要是参考了下面URL的代码,为了便于matlab输出信息,修改在了一个文件里面,但是他的主题思路很清晰,还是很建议读者读一下的。
一、首先DBSCAN的主体思路伪代码,以下伪代码出自维基百科(https://en.wikipedia.org/wiki/DBSCAN):
DBSCAN(D, eps, MinPts)
C = 0
for each unvisited point P in dataset D
mark P as visited
N = getNeighbors (P, eps)
if sizeof(N) < MinPts
mark P as NOISE
else
C = next cluster
expandCluster(P, N, C, eps, MinPts)
expandCluster(P, N, C, eps, MinPts)
add P to cluster C
for each point P' in N
if P' is not visited
mark P' as visited
N' = getNeighbors(P', eps)
if sizeof(N') >= MinPts
N = N joined with N'
if P' is not yet member of any cluster
add P' to cluster C
二、图解
这里参考stackoverflow里的一个优秀回答,点此转到
三、修改后的集成代码
clc;
clear all;
DataSet = data;
%获取数据维度
[instance,length] = size(DataSet);
inputinfo = DataSet(1:instance,1:length-1);
fprintf(['实例数:' num2str(instance) '\n']);
fprintf(['数据维度:' num2str(length) '\n']);
fprintf('录入数据完毕...\n');
epsilon = 1.5;%半径:0.8
MinPts =14;%区域最小密度值:22
C = 0;
fprintf(['初始类标:' num2str(C) '\n']);
n = size(DataSet,1);
fprintf(['实例数:' num2str(n) '\n']);
IDX = zeros(n,1);
fprintf('新建IDX变量用于保存第i个数据的将要归属的类标...\n');
Distance = pdist2(DataSet,DataSet);
fprintf('新建Distance变量保存DataSet矩阵实例之间的欧氏距离...\n');
visited = false(n,1);
fprintf('新建visited变量保存第i个实例是否已经被访问...\n');
isnoise = false(n,1);
fprintf('新建isnoise变量保存第i个实例是否是噪点...\n');
for i=1:n
if ~visited(i)
visited(i) = true;
Neighbors = find(Distance(i,:)<=epsilon);
fprintf(['已经获取与第' num2str(i) '个实例距离小于' num2str(epsilon) '的邻域集合实例数:' num2str(numel(Neighbors)) '\n']);
if numel(Neighbors)<MinPts
fprintf(['第' num2str(i) '个实例是噪点...\n']);
else
C = C + 1;
IDX(i) = C;
k = 1;
while true
j = Neighbors(k);
if ~visited(j)
visited(j) = true;
Neighbors2 = find(Distance(j,:)<=epsilon);
if numel(Neighbors2)>=MinPts
Neighbors = [Neighbors Neighbors2];
end
end
if IDX(j) == 0
IDX(j) = C;
end
k = k+1;
if k>numel(Neighbors)
break;
end
end
end
end
end
最后的数据聚类结果生成了C个类,每个数据实例的类标保存在了IDX变量里面,聚类完成!