聚类分析针对不同的数据特点采用不同的相似性度量,现实世界中数据分布复杂,存在分布无规律、密度不均匀等现象,单独考虑实例属性相似性或分布结构连通性会影响聚类效果.为此,提出了一种基于属性相似性和分布结构连通性的聚类算法(A Clustering Algorithm Based on Attribute Similarity and Distributed Structure Connectivity,ASDSC).首先,利用待聚类数据集中的所有数据实例构建完全无向图,定义了一种兼顾属性相似和分布结构连通的新颖相似性度量方式,用于计算节点相似性,并构造邻接矩阵更新边的权重;其次,借助邻接矩阵执行递增步长的随机游走,依据顶点的连通中心性来识别簇中心并给定簇编号,同时获取其他顶点的连通性;然后,利用连通性计算顶点间的依赖关系,并据此进行簇编号的传播,直至完成聚类.最后,为了验证该方法的聚类性能,在16个合成数据集和10个真实数据集上与5种先进聚类算法进行了对比实验,ASDSC算法取得了优异性能.
Clustering Algorithm Based on Attribute Similarity and Distributed Structure Connectivity
According to different data characteristics,clustering analysis adopts different similarity measures.However,the data distribution is complex in the real world,and there are various phenomena such as irregular distribution and uneven density.Con-sidering attribute similarity or distribution structure connectivity alone will reduce clustering performance.Therefore,this paper proposes a clustering algorithm based on attribute similarity and distributed structure connectivity(ASDSC).Firstly,a completely undirected graph is constructed using all data instances,and a novel similarity measurement method is defined to calculate the node similarity by the topology structure and the attributes similarity,and the adjacency matrix are constructed to update the weights of edges.Secondly,based on the adjacency matrix,random walk with increasing step is performed.Subsequently,the clus-ter centers and their numbers are obtained according to the connected centrality of nodes,and the connectivity of other nodes is al-so acquired.Then,the connectivity is used to calculate the dependencies among nodes,and the propagation process of cluster num-ber is carried out accordingly until the clustering process is completed.Finally,comparative experiments with 5 advanced cluste-ring algorithms are conducted on 16 synthetic datasets and 10 real datasets,and the result show that the ASDSC algorithm has achieved excellent performance.
ClusteringSimilarity measureAttribute similarityDistributed structure connectivityCluster number propagation