首页|基于邻域互信息与K-means特征聚类的特征选择

基于邻域互信息与K-means特征聚类的特征选择

扫码查看
针对多数邻域系统通过人工调试很难搜索到最佳邻域半径,以及传统的K-means聚类需要随机选取簇中心和指定簇的数目等问题,提出了一种基于邻域互信息与K-means特征聚类的特征选择方法.首先,将样本在各特征下与其他样本距离的平均值作为自适应邻域半径,确定样本的邻域集,并由此构建自适应邻域熵、邻域互信息、归一化邻域互信息等度量,反映特征之间的相关性;然后,基于归一化邻域互信息构建自适应K近邻集合,利用Pearson相关系数表示特征的权重定义加权K近邻密度,实现自动选取K-means算法的簇中心,进而完成K-means特征聚类;最后,给出加权平均冗余度,选出每个特征簇中加权平均冗余度最大的特征构成最优特征子集.实验结果表明所提算法不仅可以有效提升特征选择的分类结果而且可以获得更好的聚类效果.
Feature selection using neighborhood mutual information and feature clustering with K-means
Aiming at the problems that it is difficult to search the optimal neighborhood radius through manual debug-ging in most neighborhood systems,and that traditional K-means clustering requires random selection of cluster centers and the number of specified clusters,this paper proposed a feature selection method using neighborhood mutual inform-ation and feature clustering with K-means.Firstly,the average distance of the sample from other samples under each feature is taken as the adaptive neighborhood radius,and the neighborhood set of the sample is determined.Then to re-flect the correlation between features,some metrics are presented,such as adaptive neighborhood entropy,neighbor-hood mutual information,normalized neighborhood mutual information,etc.Secondly,an adaptive K neighbor set is constructed based on the normalized neighborhood mutual information,and the weighted K neighbor density is defined by using the feature weight with the Pearson correlation coefficient so that the K-means algorithm can automatically se-lect the cluster center.The K-means feature clustering is completed well.Finally,the weighted average redundancy de-gree is given,and the feature with the largest weighted average redundancy in each feature cluster is selected to form the optimal subset of features.Experimental results show that the developed algorithm can not only effectively improve the classification results of feature selection,but also obtain better clustering effects.

feature selectionneighborhood mutual informationK-meansfeature clusteringadaptive K-nearest neigh-borfeature weightweighted k-nearest neighbor density

孙林、梁娜、徐久成

展开 >

天津科技大学 人工智能学院,天津 300457

河南师范大学 计算机与信息工程学院,河南 新乡 453007

特征选择 邻域互信息 K-means 特征聚类 自适应K近邻 特征权重 加权K近邻密度

国家自然科学基金项目国家自然科学基金项目国家自然科学基金项目河南省科技攻关计划项目

6207608961772176619760822121-02210136

2024

智能系统学报
中国人工智能学会 哈尔滨工程大学

智能系统学报

CSTPCD北大核心
影响因子:0.672
ISSN:1673-4785
年,卷(期):2024.19(4)
  • 14