Mean shift is a non-parametric feature-space analysis technique for locating the maxima of a density function, a so-called mode-seeking algorithm.Application domains include cluster analysis in computer vision and image processing.
Overview
Mean shift is a procedure for locating the maxima of a density function given discrete data sampled from that function.It is useful for detecting the modes of this density.This is an iterative method, and we start with an initial estimate x . Let a kernel function K(xi−x) be given. This function determines the weight of nearby points for re-estimation of the mean. Typically a Gaussian kernel on the distance to the current estimate is used, K(xi−x)=e−c||xi−x||2 . The weighted mean of the density in the window determined by K
m(x)=∑xi∈N(x)K(xi−x)xi∑xi∈N(x)K(xi−x)
where
N(x) is the neighborhood of
x , a set of points for which
K(x)≠0 .
The difference
m(x)−x is called mean shift in Fukunaga and Hostetler. The mean-shift algorithm now sets
x←m(x) , and repeats the estimation until
m(x) converges.
Although the mean shift algorithm has been widely used in many applications, a rigid proof for the convergence of the algorithm using a general kernel in a high dimensional space is still missing. Aliyari Ghassabeh showed the convergence of the mean shift algorithm in one-dimension with a differentiable, convex, and strictly decreasing profile function.However, the one-dimensional case has limited real world applications. Also, the convergence of the algorithm in higher dimensions with a finite number of the (or isolated) stationary points has been proved. However, a sufficient condition for a general kernel function to have finite (or isolated) stationary points have not been provided.
Details
Let data be a finite set S embedded in the n-dimensional Euclidean space, X. Let K be a flat kernel that is the characteristic function of the λ -ball in X ,
K(x)={10if ∥x∥≤λif ∥x∥>λ
In each iteration of the algorithm,
s←m(s) is performed for all
s∈S simultaneously. The first question, then, is how to estimate the density function given a sparse set of samples. One of the simplest approaches is to just smooth the data, e.g., by convolving it with a fixed kernel of width h,
f(x)=∑iK(x−xi)=∑ik(∥x−xi∥2h2)
where xi are the input samples and k(r) is the kernel function (or Parzen window). h is the only parameter in the algorithm and is called the bandwidth. This approach is known as kernel density estimation or the Parzen window technique. Once we have computed f(x) from equation above, we can find its local maxima using gradient ascent or some other optimization technique. The problem with this “brute force” approach is that, for higher dimensions, it becomes computationally prohibitive to evaluate f(x) over the complete search space. Instead, mean shift uses a variant of what is known in the optimization literature as multiple restart gradient descent. Starting at some guess for a local maximum, y_k, which can be a random input data point x_1, mean shift computes the gradient of the density estimate f(x) at y_k and takes an uphill step in that direction.
Types of kernels[edit]
Kernel definition: Let X be the n-dimensional Euclidean space, R^n . Denote the ith component of x by x_i . The norm of x is a non-negative number. |x|^2=x^Tx A function K: X\rightarrow R is said to be a kernel if there exists a profile, k: [0, \infty]\rightarrow R , such that
K(x) = k(|x|^2)
and
k is non-negative.
k is nonincreasing: k(a)\ge k(b) if a < b .
k is piecewise continuous and \int_0^\infty k(r),dr < \infty\
The two frequently used kernel profiles for mean shift are:
Flat kernel
k(x) =
{10if x≤λif x>λ
Gaussian kernel
k(x) = e^{-\frac{|x|^2}{2 \sigma^2}},
where the standard deviation parameter \sigma works as the bandwidth parameter, h .
Applications[edit]
Clustering[edit]
Consider a set of points in two-dimensional space. Assume a circular window centered at C and having radius r as the kernel. Mean shift is a hill climbing algorithm which involves shifting this kernel iteratively to a higher density region until convergence. Every shift is defined by a mean shift vector. The mean shift vector always points toward the direction of the maximum increase in the density. At every iteration the kernel is shifted to the centroid or the mean of the points within it. The method of calculating this mean depends on the choice of the kernel. In this case if a Gaussian kernel is chosen instead of a flat kernel, then every point will first be assigned a weight which will decay exponentially as the distance from the kernel’s center increases. At convergence, there will be no direction at which a shift can accommodate more points inside the kernel.
Tracking[edit]
The mean shift algorithm can be used for visual tracking. The simplest such algorithm would create a confidence map in the new image based on the color histogram of the object in the previous image, and use mean shift to find the peak of a confidence map near the object’s old position. The confidence map is a probability density function on the new image, assigning each pixel of the new image a probability, which is the probability of the pixel color occurring in the object in the previous image. A few algorithms, such as ensemble tracking,[7] CAMshift,[8][9] expand on this idea.
Smoothing[edit]
Let x_i and z_i, i = 1,…,n, be the d-dimensional input and filtered image pixels in the joint spatial-range domain. For each pixel,
Initialize j = 1 and y_{i,1} = x_i
Compute y_{i,j+1} according to m(\cdot) until convergence, y = y_{i,c}.
Assign z_i =(x_i^s,y_{i,c}^r). The superscripts s and r denote the spatial and range components of a vector, respectively. The assignment specifies that the filtered data at the spatial location axis will have the range component of the point of convergence y_{i,c}^r.
Strengths[edit]
Mean shift is an application-independent tool suitable for real data analysis.
Does not assume any predefined shape on data clusters.
It is capable of handling arbitrary feature spaces.
The procedure relies on choice of a single parameter: bandwidth.
The bandwidth/window size ‘h’ has a physical meaning, unlike k-means.
Weaknesses[edit]
The selection of a window size is not trivial.
Inappropriate window size can cause modes to be merged, or generate additional “shallow” modes.
Often requires using adaptive window size.
See also[edit]
Mean Shift算法,一般是指一个迭代的步骤,即先算出当前点的偏移均值,移动该点到其偏移均值,然后以此为新的起始点,继续移动,直到满足一定的条件结束.
Meanshift推导
给定 d 维空间 Rd 的 n 个样本点 , i=1,…,n 在空间中任选一点 x ,那么Mean Shift向量的基本形式定义为:
Mh=1k∑xi∈Sk(xi−x)
Sk 是一个半径为
h 的高维球区域,满足以下关系的y点的集合,
Sh(x)=y:(y−xi)T(y−xi)<h2
k表示在这n个样本点xi中,有k个点落入
Sk 区域中.
再以meanshift向量的终点为圆心,再做一个高维的球。如下图所以,重复以上步骤,就可得到一个meanshift向量。如此重复下去,meanshift算法可以收敛到概率密度最大得地方。也就是最稠密的地方。
最终的结果如下:
Meanshift推导:
把基本的meanshift向量加入核函数,核函数的性质在这篇博客介绍:http://www.cnblogs.com/liqizhou/archive/2012/05/11/2495788.html
那么,meanshift算法变形为
(1)
解释一下K()核函数,h为半径,Ck,d/nhd 为单位密度,要使得上式f得到最大,最容易想到的就是对上式进行求导,的确meanshift就是对上式进行求导.
(2)
令:
K(x)叫做g(x)的影子核,名字听上去听深奥的,也就是求导的负方向,那么上式可以表示
对于上式,如果才用高斯核,那么,第一项就等于fh,k
第二项就相当于一个meanshift向量的式子:
那么(2)就可以表示为
下图分析的构成,如图所以,可以很清晰的表达其构成。
要使得=0,当且仅当=0,可以得出新的圆心坐标:
(3)
上面介绍了meanshift的流程,但是比较散,下面具体给出它的算法流程。
选择空间中x为圆心,以h为半径为半径,做一个高维球,落在所有球内的所有点xi
计算,如果<ε(人工设定),推出程序。如果>ε, 则利用(3)计算x,返回1.
2.meanshift在图像上的聚类:
真正大牛的人就能创造算法,例如像meanshift,em这个样的算法,这样的创新才能推动整个学科的发展。还有的人就是把算法运用的实际的运用中,推动整个工业进步,也就是技术的进步。下面介绍meashift算法怎样运用到图像上的聚类核跟踪。
一般一个图像就是个矩阵,像素点均匀的分布在图像上,就没有点的稠密性。所以怎样来定义点的概率密度,这才是最关键的。
如果我们就算点x的概率密度,采用的方法如下:以x为圆心,以h为半径。落在球内的点位xi 定义二个模式规则。
(1)x像素点的颜色与xi像素点颜色越相近,我们定义概率密度越高。
(2)离x的位置越近的像素点xi,定义概率密度越高。
所以定义总的概率密度,是二个规则概率密度乘积的结果,可以(4)表示
(4)
其中:代表空间位置的信息,离远点越近,其值就越大,表示颜色信息,颜色越相似,其值越大。如图左上角图片,按照(4)计算的概率密度如图右上。利用meanshift对其聚类,可得到左下角的图。