算法思路:
确定一种恒量两个数据之间相似度(距离),确定一个阀值theta以及最多能够聚类的类别个数q.先将第一个作为第一类,然后每进入一个样本与之前的所有样本计算距离,当距离大于阀值并且没有达到类别个数的时候,我们将其当作另外一类,否则把它归为离它最近的那个。同时这里与一个类别的距离是与这一类样本的均值(m)的距离,不断更新。
mCnewk=(nCnewk−1)mColdk+xnCnewk
function [bel,m]=MBSAS(X,threshold,q,order)
%Input
% :the column of X represents a sample
% :thershold is uesd to divide whether the sample into the C
% :q is the number of clusters
% :order represents the order of presentation of the vectors of X
%Output:
% :bel is the corresponding label;
% :m
%---------------------------------Ordering the data------------------------
[l,N]=size(X);
if(length(order)==N)
X1=[];
for i=1:N
X1=[X1 X(:,order(i))];
end
X=X1;
clear X1;
end
%--------------------------------Cluster determining phase-----------------
n_clust=1;
[l,N]=size(X);
bel=zeros(1,N);
bel(1)=n_clust;
m=X(:,1);
for i=2:N
[m1,m2]=size(m);
%Dertermining the closest cluster representative
[s1,s2]=min(sqrt(sum((m-X(:,i)*ones(1,m2)).^2)));
if (s1>threshold)&&(n_clust<q)
n_clust=n_clust+1;
bel(i)=n_clust;
m=[m X(:,i)];
end
end
[m1,m2]=size(m);%m2 is the number of cluster
%----------------------------Pattern classification phase-------------------
for i=1:N
if(bel(i)==0)
[s1,s2]=min(sqrt(sum((m-X(:,i)*ones(1,m2)).^2)));
bel(i)=s2;
m(:,s2)=((sum(bel==s2)-1)*m(:,s2)+X(:,i))/sum(bel==s2);
end
end
end
算法缺点:
聚类依赖与样本出现的顺序,以及阀值对其结果的影响非常大。