基于邻域互信息与K-means特征聚类的特征选择

Feature selection using neighborhood mutual information and feature clustering with K-means

孙林 ¹梁娜 ²徐久成²

扫码查看

作者信息

1. 天津科技大学人工智能学院,天津 300457
2. 河南师范大学计算机与信息工程学院,河南新乡 453007
折叠

摘要

针对多数邻域系统通过人工调试很难搜索到最佳邻域半径,以及传统的K-means聚类需要随机选取簇中心和指定簇的数目等问题,提出了一种基于邻域互信息与K-means特征聚类的特征选择方法.首先,将样本在各特征下与其他样本距离的平均值作为自适应邻域半径,确定样本的邻域集,并由此构建自适应邻域熵、邻域互信息、归一化邻域互信息等度量,反映特征之间的相关性;然后,基于归一化邻域互信息构建自适应K近邻集合,利用Pearson相关系数表示特征的权重定义加权K近邻密度,实现自动选取K-means算法的簇中心,进而完成K-means特征聚类;最后,给出加权平均冗余度,选出每个特征簇中加权平均冗余度最大的特征构成最优特征子集.实验结果表明所提算法不仅可以有效提升特征选择的分类结果而且可以获得更好的聚类效果.

Abstract

Aiming at the problems that it is difficult to search the optimal neighborhood radius through manual debug-ging in most neighborhood systems,and that traditional K-means clustering requires random selection of cluster centers and the number of specified clusters,this paper proposed a feature selection method using neighborhood mutual inform-ation and feature clustering with K-means.Firstly,the average distance of the sample from other samples under each feature is taken as the adaptive neighborhood radius,and the neighborhood set of the sample is determined.Then to re-flect the correlation between features,some metrics are presented,such as adaptive neighborhood entropy,neighbor-hood mutual information,normalized neighborhood mutual information,etc.Secondly,an adaptive K neighbor set is constructed based on the normalized neighborhood mutual information,and the weighted K neighbor density is defined by using the feature weight with the Pearson correlation coefficient so that the K-means algorithm can automatically se-lect the cluster center.The K-means feature clustering is completed well.Finally,the weighted average redundancy de-gree is given,and the feature with the largest weighted average redundancy in each feature cluster is selected to form the optimal subset of features.Experimental results show that the developed algorithm can not only effectively improve the classification results of feature selection,but also obtain better clustering effects.

关键词

特征选择/邻域互信息/K-means/特征聚类/自适应K近邻/特征权重/加权K近邻密度

Key words

feature selection/neighborhood mutual information/K-means/feature clustering/adaptive K-nearest neigh-bor/feature weight/weighted k-nearest neighbor density

引用本文复制引用

基金项目

国家自然科学基金项目(62076089)

国家自然科学基金项目(61772176)

国家自然科学基金项目(61976082)

河南省科技攻关计划项目(2121-02210136)

出版年

2024

智能系统学报

中国人工智能学会　哈尔滨工程大学

智能系统学报

CSTPCD北大核心

影响因子：0.672

ISSN：1673-4785

参考文献量14

段落导航