最近看到了朴素贝叶斯定理,看着看着就看到了em聚类的算法中(K-means聚类的原型)。
动手自己编个程序:
%EM algorithm
clc;
clear;
sigma = 1.5;
miu1 = 3;
miu2 = 7;
N = 1000;
x = zeros(1,N);
for i = 1:N
if rand>0.5
x(1,i) = randn*sigma + miu1;
y(1,i) = randn*sigma + miu1;
else
%sigma = 0.5;
x(1,i) = randn*sigma + miu2;
y(1,i) = randn*sigma + miu2;
end
end
plot(x,y,'o');
k = 2;
%miu = rand(1,k)*40;
miu(1) = 4;
miu(2) = 6;
cov(1) = 2;
cov(2) = 2;
%cov = rand(1,k)*6;
a(1) = 1.5;
a(2) = 1.5;
% expectations = zeros(N,k);
num = [0,0];
n = 1;
for step = 1:10000
n = 1;
m = 1;
x1 = [];
y1 = [];
x2 = [];
y2 = [];
num = [1 1];
for i = 1:N
p1 = exp(-(x(i)-miu(1))*(x(i)-miu(1))/(2*cov(1)*cov(1)))/sqrt((2*pi))*cov(1);
p2 = exp(-(x(i)-miu(2))*(x(i)-miu(2))/(2*cov(2)*cov(2)))/sqrt((2*pi))*cov(2);
p(i) = a(1)*p1+a(2)*p2;
if p1>p2
x1(n) = x(i);
y1(n) = y(i);
n = n+1;
num(1) = num(1) + 1;
else
x2(m) = x(i);
y2(m) = y(i);
m = m+1;
num(2) = num(2) + 1;
end
end
oldmiu = miu;
oldcov = cov;
miu(1) = sum(x1)/num(1);
miu(2) = sum(x2)/num(2);
cov(1) = sqrt(sum((x1-miu(1))*(x1-miu(1))')/num(1));
cov(2) = sqrt(sum((x2-miu(2))*(x2-miu(2))')/num(2));
a(1) = num(1)/N;
a(2) = num(2)/N;
plot(x1,y1,'ro',x2,y2,'go');
epsilon = 0.0001;
if sum(abs(oldmiu-miu))<epsilon
break;
end
step
% miu
end
plot(x1,y1,'ro',x2,y2,'go');
运行后的结果图如下:
不知道是我自己编的不对,还是别的原因(应该是我编的不对),在初始化参数的时候,不能跟实际的偏离太大,如果偏离太大了
最终的结果就完全不对。不知道是算法本身的缺陷还是自己没有把算法理解对。
希望有高手来指导下。