聚类算法大观园

     这里有一篇论文《Survey of Clustering Algorithms》Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE。主要讲聚类算法的综述。这篇文章几乎将各个领域的聚类算法给一网打尽,太好了。从各种方向上来谈论聚类算法(层次, 划分 ,大数据集, 图形,文本聚类 , 模糊聚类 等),以及聚类的相关问题(如何计算距离, 如何确定聚类个数, 如果对聚类结果进行评价等)。

 

摘要:在认识的过程中数据分析发挥着不可缺少的作用.,初始的探索需要很少甚至没有先验知识的 聚类分析的研究横跨许多研究领域.  这种多样性造成两方面困难,一方面需要我们掌握许多的工具,另一方面,选择的多样性使我们很容易混淆。所以我们对聚类在统计、计算机科学、机器学习等方向上数据集的算法进行总结,并举例说明了他们在一些标准数据集上的应用, 和在新领域像旅行者问题和生物信息学方向等领域所取得的成就。附带着进行了一些相近的问题的讨论, 例如相似度测量、聚类有效性等。

 

这是文章第二部分基本结构

II. Clustering Algorithms
• A. Distance and Similarity Measures(距离和相似度)
(See also Table I)

• B. Hierarchical(层次聚类)
— Agglomerative
Single linkage, complete linkage, group average
linkage, median linkage, centroid linkage,Ward’s
method, balanced iterative reducing and clustering
using hierarchies (BIRCH), clustering using representatives
(CURE), robust clustering using links(ROCK)
— Divisive divisive analysis (DIANA), monothetic analysis
(MONA)

• C. Squared Error-Based (Vector Quantization)()
— -means, iterative self-organizing data analysis
technique (ISODATA), genetic -means algorithm
(GKA), partitioning around medoids (PAM)

• D. pdf Estimation via Mixture Densities ()
— Gaussian mixture density decomposition (GMDD),
AutoClass

• E. Graph Theory-Based (基于图的距离)
— Chameleon, Delaunay triangulation graph (DTG),
highly connected subgraphs (HCS), clustering iden-tification via connectivity kernels (CLICK), cluster
affinity search technique (CAST)

• F. Combinatorial Search Techniques-Based
— Genetically guided algorithm (GGA), TS clustering,
SA clustering

• G. Fuzzy (模糊聚类)
— Fuzzy -means (FCM), mountain method (MM), possibilistic
-means clustering algorithm (PCM), fuzzy
-shells (FCS)
• H. Neural Networks-Based (基于神经网络的聚类)
— Learning vector quantization (LVQ), self-organizing
feature map (SOFM), ART, simplified ART (SART),
hyperellipsoidal clustering network (HEC), self-splitting
competitive learning network (SPLL)

• I. Kernel-Based (核心)
— Kernel -means, support vector clustering (SVC)

• J. Sequential Data (空间数据聚类)
— Sequence Similarity
— Indirect sequence clustering
— Statistical sequence clustering
• K. Large-Scale Data Sets (See also Table II) (大规模数据集聚类)
— CLARA, CURE, CLARANS, BIRCH, DBSCAN,
DENCLUE, WaveCluster, FC, ART

• L. Data visualization and High-dimensional Data (数据可视化和多维数据聚类)
— PCA, ICA, Projection pursuit, Isomap, LLE,
CLIQUE, OptiGrid, ORCLUS

• M. How Many Clusters? (聚类个数的确定)

 

 ( to be continued)

 

 

    原文作者:聚类算法
    原文地址: https://blog.csdn.net/kdnuggets/article/details/640450
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞