Improved neighborhood space based feature selection algorithm for high-dimensional mixed data
As important data preprocessing technology in the field of data mining,the feature selection algorithm can effectively deal with the"curse of dimensionality"caused by high-dimensional data.Nonetheless,how to perform feature selection on high-dimensional mixed data is still one of the focuses and difficulties of current research.Because of competently dealing with mixed data of categorical attributes and numerical attributes coexisting,the neighborhood rough set model has been widely used in feature selection of mixed data in recent years.However,existing measurement of the neighborhood relationship for mixed data still adopts the simple fusion of categorical data partition based on equivalence relationship and numerical data partition based on similarity relationship.When the features of high-dimensional mixed data are selected by the partitioned neighborhood space and predefined evaluation function,the adaptability is poor.Therefore,an improved construction method of neighborhood space is proposed on the basis of the neighborhood rough set model.Considering boundary overlapped data and the size of neighborhood space,an evaluation function is designed to characterize the discrimination ability of neighborhood space.On this basis,a heuristic feature selection algorithm considering high-dimensional mixed data is proposed.The validity and superiority of proposed algorithm are verified by the UCI standard data set.
feature selectionneighborhood spacehigh-dimensional mixed dataneighborhood rough setevaluation function