According to the tumor gene situation of high dimensionality,noise and redundancy,this paper improved the F-score algorithm by the Spearman correlation coefficient,optimized the binary gray wolf algorithm,and proposed a gene feature selection algorithm with the improved F-score and the binary gray wolf algorithm.Firstly,by considering the correlation between features,the F-score value of each feature and the absolute value of Spearman correlation coefficient between features were calculated.Secondly,by calculating the weight coefficient,the weight value of each feature was derived to be ranked according to their importance and select a primary feature subset.Finally,the binary gray wolf algorithm was optimized through adjusting the proportion of global search and local search to enhance the global search capability and improve the speed of local search,so that the time overhead could be saved and the optimal feature subset was selected,which can improve the classification performance and efficiency of feature selection.The designed algorithm is tested on nine tumor gene datasets and simulated on two indexes of correct accuracy and number of filtered features.When compared with four other algorithms,the experimental results prove that the algorithm performed well,reduced the dimensionality of gene data,and had better classification accuracy.
关键词
肿瘤基因/Fisher-score/Spearman相关系数/二进制灰狼优化算法/特征选择
Key words
tumor gene/Fisher-score/Spearman correlation coefficient/binary grey wolf optimization algorithm/feature selection