一种改进K-means算法的聚类算法CARDBK

Clustering Algorithm CARDBK Improved from K-means Algorithm

扫码查看

原文链接

NETL
NSTL
维普
万方数据

中文摘要：CARDBK聚类算法与批K-means算法的不同之处在于,每个点不是只归属于一个簇,而是同时影响多个簇的质心值,一个点影响某一个簇的质心值的程度取决于该点与其它离该点更近的簇的质心之间的距离值.从聚类结果的熵、纯度、F1值、Rand Index和NMI等5个性能指标值来看,与多个不同算法在多个不同数据集上分别聚类相比,该算法具有较好的聚类结果;与多个不同算法在同一数据集上很多不同的初始化条件下分别聚类相比,该算法具有较好且稳定的聚类结果;该算法在不同大小数据集上聚类时具有线性伸缩性且速度较快.

外文摘要：The difference between our clustering algorithm and batch K-means algorithm is that in our algorithm each point is not only attributable to one cluster,instead affects multiple cluster centroid values,and the degree of influence of a point on a cluster centroid depends on the distance values between this point and the other more near cluster centroids.Our algorithm and a number of different algorithms on a number of different data sets were clustered respectively from the point of view of their clustering result's five performance index values such as entropy,purity,F1 value,Rand Index and normalized mutual information,and the results show our algorithm has a better clustering results.Our algorithm and a number of different algorithms were clustered respectively on one same data set but under many different initialization conditions,and clustering results of our algorithm are preferably more stable and better.Cluster on different size data sets by our algorithm has a linear scalability and is faster.

外文关键词：

ClusteringText clusteringDocument clusteringK-meansAlgorithm

作者：

朱烨行、李艳玲、崔梦天、杨献文

展开 >

作者单位：

西安邮电大学经济与管理学院西安710121

第二炮兵工程大学电子工程系西安710025

西南民族大学计算机科学与技术学院成都610041

电子科技大学计算机科学与工程学院成都610000

西安财经学院信息与教育技术中心西安710061

展开 >

关键词：

聚类文档聚类文本聚类 K-means 算法

基金：

国家自然科学基金国家自然科学基金中国博士后科学基金四川省学术和技术带头人培养资金四川省博士后科研基金

项目编号：

61379019711021492013M540704

出版年：

2015

DOI：

10.11896/j.issn.1002-137X.2015.3.041

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCDCSCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2015.42(3)

被引量13
参考文献量2