首页|基于改进邻域空间的高维混合数据特征选择算法

基于改进邻域空间的高维混合数据特征选择算法

扫码查看
作为数据挖掘领域中一项重要的数据预处理技术,特征选择算法能够有效应对高维数据带来的"维数灾难"问题。然而,如何对高维的混合数据进行特征选取仍然是当前研究的重点和难点之一,基于邻域关系的邻域粗糙集模型因其能够处理名词型属性与数值型属性并存的混合数据,已成功应用于混合数据的特征选择。但是,现有邻域粗糙集对混合数据邻域关系的度量,仍然是基于等价关系的名词型数据划分与基于相似关系的数值型数据划分的简单融合,在利用模型划分的邻域空间和预定义的评价函数对高维混合数据进行特征选取时,适应性较差。为此,在邻域粗糙集模型的基础上,提出一种改进的邻域空间构造方法,并设计相应的邻域空间度量公式作为判别指标,自适应地调节邻域空间下邻域粒的大小;为了准确地表征高维混合数据邻域空间的判别能力,设计一种考虑边界数据和邻域空间大小的评价函数;在此基础上,提出一种启发式的高维混合数据特征选择算法。通过UCI标准数据集验证所提出算法的有效性。
Improved neighborhood space based feature selection algorithm for high-dimensional mixed data
As important data preprocessing technology in the field of data mining,the feature selection algorithm can effectively deal with the"curse of dimensionality"caused by high-dimensional data.Nonetheless,how to perform feature selection on high-dimensional mixed data is still one of the focuses and difficulties of current research.Because of competently dealing with mixed data of categorical attributes and numerical attributes coexisting,the neighborhood rough set model has been widely used in feature selection of mixed data in recent years.However,existing measurement of the neighborhood relationship for mixed data still adopts the simple fusion of categorical data partition based on equivalence relationship and numerical data partition based on similarity relationship.When the features of high-dimensional mixed data are selected by the partitioned neighborhood space and predefined evaluation function,the adaptability is poor.Therefore,an improved construction method of neighborhood space is proposed on the basis of the neighborhood rough set model.Considering boundary overlapped data and the size of neighborhood space,an evaluation function is designed to characterize the discrimination ability of neighborhood space.On this basis,a heuristic feature selection algorithm considering high-dimensional mixed data is proposed.The validity and superiority of proposed algorithm are verified by the UCI standard data set.

feature selectionneighborhood spacehigh-dimensional mixed dataneighborhood rough setevaluation function

张腾飞、张宇迪、马福民

展开 >

南京邮电大学自动化学院人工智能学院,南京 210023

南京财经大学信息工程学院,南京 210023

特征选择 邻域空间 高维混合数据 邻域粗糙集 评价函数

国家自然科学基金国家自然科学基金江苏省自然科学基金江苏省自然科学基金

6207317361973151BK20191376BK20191406

2024

控制与决策
东北大学

控制与决策

CSTPCD北大核心
影响因子:1.227
ISSN:1001-0920
年,卷(期):2024.39(3)
  • 23