Feature selection method based on improved slime mould algorithm
[Objective]In the era of big data in which high-dimensional datasets are frequently handled,the significance of feature selection is noted.Due to its simplicity and efficiency,the slime mould algorithm(SMA)has been widely used in the field of feature selection and has been improved.However,current improvements mainly focus on strategy addition and algorithm mixing,without thorough improvements based on characteristics of feature selection problems.To fill this research gap,we propose a feature-selection method based on the improved SMA(ISMA).[Methods]First,for the purpose of addressing the issue of imbalanced exploration and exploitation capabilities in the SMA caused by the small range of fitness function values during feature selection,the parameter governing the update of slime mould positions is modified.Second,to tackle the problem of SMA's tendency to converge towards the origin,we improve the position update formula of the SMA.Finally,to mitigate the problem of SMA getting trapped in local optima,we propose a method based on an equilibrium pool to enhance the slime mould position update formula.The Musk1 and Lymphography datasets were selected for comparative experiments to assess the global exploration and local exploitation capabilities of ISMA and SMA.Subsequently,11 datasets from the UCI repository were selected for comparative experiments to evaluate the performance of ISM A against other metaheuristic feature selection algorithms.Datasets include 3 low-dimensional,5 medium-dimensional,and 3 high-dimensional ones.Herein,classification accuracy and dimension reduction rate are selected as evaluation metrics.ISM A is compared against eight other algorithms,namely SMA,genetic algorithm(GA),binary particle swarm optimization 8(VPSO),a binary version of the hybrid grey wolf optimization and particle swarm optimization(BGWOPSO),binary gray wolf optimization 1(BGWO1),binary gravitational search algorithm(BGSA),binary bat algorithm(BBA),and binary ant lion optimizer-approach 1(BALO-1).[Results]Compared to SMA,ISMA exhibits higher population diversity in the early iterations,indicating stronger global exploration capability.As the number of iterations increases,the population diversity gradually decreases,achieving a smooth transition from exploration to exploitation.In those later iterations,the population diversity of ISM A becomes comparable to,or even lower than,SMA,demonstrating its strong local exploitation ability and the ability to effectively balance exploration and exploitation.Experimental results indicate that ISMA shows competitive performance in improving model classification performance and reducing feature dimensions.From the perspective of average classification accuracy,ISMA outperforms SMA on all datasets,with the highest average classification accuracy improvement of 6.53 percentage points.Compared to other benchmark algorithms,ISMA achieves the best average classification accuracy on 9 datasets and the second-best average classification accuracy on the remaining 2 datasets,with differences of only 0.19 and 0.05 percentage points,respectively.Additionally,its average dimension reduction rate is superior to the first-ranked algorithm.From the perspective of average dimension reduction rate,ISMA achieves the best dimension reduction rate on two datasets and demonstrates satisfactory overall performance.[Conclusions]In this paper,we propose a feature-selection method based on an ISMA.The ISMA incorporates new parameters to balance the exploration and exploitation capabilities of the SMA.It enhances the position update formula to avoid ineffective computations and introduces an equilibrium pool that replaces the current best solution for position updates,thereby improving population diversity and reducing the probability of getting trapped in local optima.Effectiveness-analysis experiments validate the effectiveness of the algorithm improvements proposed in this paper.Experimental results on 11 UCI datasets demonstrate that the proposed algorithm exhibits higher generalization performance and certain advantages compared to other metaheuristic feature selection algorithms.The future research will be focused on how to maintain high classification accuracy and good stability of ISMA while further reducing the dimension reduction rate.Furthermore,regarding the issue of imbalanced exploration and exploitation capabilities in the SMA caused by the small value range of fitness function values in feature selection problems,better alternatives such as normalization or replacement of the tanh function with other functions may exist.Additionally,alternative approaches can be explored to address the problem of convergence towards the origin in the SMA,such as incorporating position-update formulas from other algorithms.