针对传统的K-medoids聚类算法在聚类时需要随机选择初始类中心且指定聚类数目K,及聚类结果不稳定的问题,提出了一种优化初始类中心的自适应K-medoids算法(adaptive K-medoids algorithm for optimizing initial class centers,CH_KD)。其思想是定义了特征重要度,以此筛选出每一簇中最优的代表特征,组成特征子集,并重点研究了传统划分算法的自适应优化与改进。首先,利用特征标准差定义特征区分度,选择出区分度强的特征。其次,利用皮尔逊相关系数度量特征簇中每个特征的冗余度,选择出冗余度低的特征。最后,将特征区分度与特征冗余度之积作为特征重要度,以此筛选出每一簇中最优的代表特征,组成特征子集。实验将所提算法与其他聚类算法在14个UCI数据集上进行对比,结果验证了 CH_KD算法的有效性与优势。
Adaptive K-medoids algorithm for optimizing initial class center
To solve the problem that the traditional K-medoids clustering algorithm needs to randomly select the initial cluster center and specify the number of clusters K,and the clustering results are unstable,this paper proposes an adaptive K-medoids algorithm to optimize the initial cluster center(CH_KD).The purpose is to define the feature importance,so as to screen out the best representative features in each cluster and form a feature subset,and focus on the adaptive optimization and improvement of the traditional partition algorithm.First,the feature discrimination is defined by the feature standard deviation,and the features with strong discrimination are selected.Secondly,Pearson correlation coefficient is used to measure the redun-dancy of each feature in the feature cluster,and the features with low redundancy are selected.Finally,the product of feature discrimination and feature redundancy is taken as the feature importance to screen out the best representative features in each cluster and form a feature subset.The experiment compares the proposed algorithm with other clustering algorithms on 14 UCI datasets,and the results verify that CH_KD the effectiveness and advantages of algorithm.