基于改进黏菌算法的特征选择方法

扫码查看

原文链接

万方数据
维普

中文摘要：[目的]在经常处理高维数据集的大数据时代中,特征选择是至关重要的.黏菌算法(slime mould algorithm,SMA)因其简单高效而被广泛应用于特征选择领域,并得到改进.然而,现有改进大多局限于策略添加和算法混合,未根据特征选择问题的特点进行深入改进.为此,本文提出了一种基于改进SMA(improved SMA,ISMA)的特征选择方法.[方法]首先,针对在特征选择时适应度函数值域较小导致的SMA全局探索和局部开发能力不平衡的问题,修改决定黏菌位置更新方式的参数;其次,针对SMA倾向往原点方向收敛的问题,改进SMA的位置更新公式;最后,针对SMA容易陷入局部最优的问题,提出一种基于均衡池改进黏菌位置更新公式的方法.进一步选取Musk1数据集和Lymphography数据集对比ISMA和SMA的全局探索和局部开发能力,并选取11个UCI数据集评价ISMA的性能.[结果]与SMA相比,ISMA具有更强的全局探索能力和局部开发能力,能够很好地平衡探索与开发.与SMA、GA和BGWO1等8种算法相比,ISMA在提高模型分类性能和降低特征维度上均有一定的竞争力.从平均分类准确率的角度看,与SMA相比,ISMA在所有数据集上均优于SMA,平均分类准确率最高提升6.53个百分点.与其他对比算法相比,ISMA在9个数据集上取得最优的平均分类准确率,而在剩下的2个数据集上也取得了次优的平均分类准确率,与第一名仅分别相差0.19个百分点和0.05个百分点,同时其平均维度缩减率均优于第一名.从平均维度缩减率的角度看,ISMA在2个数据集上取得最优的维度缩减率,总体表现良好.[结论]本文提出的基于ISMA的特征选择方法具有更高的泛化性能,与其他元启发式特征选择算法相比也有一定的优势.

外文标题：Feature selection method based on improved slime mould algorithm

外文摘要：[Objective]In the era of big data in which high-dimensional datasets are frequently handled,the significance of feature selection is noted.Due to its simplicity and efficiency,the slime mould algorithm(SMA)has been widely used in the field of feature selection and has been improved.However,current improvements mainly focus on strategy addition and algorithm mixing,without thorough improvements based on characteristics of feature selection problems.To fill this research gap,we propose a feature-selection method based on the improved SMA(ISMA).[Methods]First,for the purpose of addressing the issue of imbalanced exploration and exploitation capabilities in the SMA caused by the small range of fitness function values during feature selection,the parameter governing the update of slime mould positions is modified.Second,to tackle the problem of SMA's tendency to converge towards the origin,we improve the position update formula of the SMA.Finally,to mitigate the problem of SMA getting trapped in local optima,we propose a method based on an equilibrium pool to enhance the slime mould position update formula.The Musk1 and Lymphography datasets were selected for comparative experiments to assess the global exploration and local exploitation capabilities of ISMA and SMA.Subsequently,11 datasets from the UCI repository were selected for comparative experiments to evaluate the performance of ISM A against other metaheuristic feature selection algorithms.Datasets include 3 low-dimensional,5 medium-dimensional,and 3 high-dimensional ones.Herein,classification accuracy and dimension reduction rate are selected as evaluation metrics.ISM A is compared against eight other algorithms,namely SMA,genetic algorithm(GA),binary particle swarm optimization 8(VPSO),a binary version of the hybrid grey wolf optimization and particle swarm optimization(BGWOPSO),binary gray wolf optimization 1(BGWO1),binary gravitational search algorithm(BGSA),binary bat algorithm(BBA),and binary ant lion optimizer-approach 1(BALO-1).[Results]Compared to SMA,ISMA exhibits higher population diversity in the early iterations,indicating stronger global exploration capability.As the number of iterations increases,the population diversity gradually decreases,achieving a smooth transition from exploration to exploitation.In those later iterations,the population diversity of ISM A becomes comparable to,or even lower than,SMA,demonstrating its strong local exploitation ability and the ability to effectively balance exploration and exploitation.Experimental results indicate that ISMA shows competitive performance in improving model classification performance and reducing feature dimensions.From the perspective of average classification accuracy,ISMA outperforms SMA on all datasets,with the highest average classification accuracy improvement of 6.53 percentage points.Compared to other benchmark algorithms,ISMA achieves the best average classification accuracy on 9 datasets and the second-best average classification accuracy on the remaining 2 datasets,with differences of only 0.19 and 0.05 percentage points,respectively.Additionally,its average dimension reduction rate is superior to the first-ranked algorithm.From the perspective of average dimension reduction rate,ISMA achieves the best dimension reduction rate on two datasets and demonstrates satisfactory overall performance.[Conclusions]In this paper,we propose a feature-selection method based on an ISMA.The ISMA incorporates new parameters to balance the exploration and exploitation capabilities of the SMA.It enhances the position update formula to avoid ineffective computations and introduces an equilibrium pool that replaces the current best solution for position updates,thereby improving population diversity and reducing the probability of getting trapped in local optima.Effectiveness-analysis experiments validate the effectiveness of the algorithm improvements proposed in this paper.Experimental results on 11 UCI datasets demonstrate that the proposed algorithm exhibits higher generalization performance and certain advantages compared to other metaheuristic feature selection algorithms.The future research will be focused on how to maintain high classification accuracy and good stability of ISMA while further reducing the dimension reduction rate.Furthermore,regarding the issue of imbalanced exploration and exploitation capabilities in the SMA caused by the small value range of fitness function values in feature selection problems,better alternatives such as normalization or replacement of the tanh function with other functions may exist.Additionally,alternative approaches can be explored to address the problem of convergence towards the origin in the SMA,such as incorporating position-update formulas from other algorithms.

外文关键词：

feature selectionslime mould algorithmequilibrium poolmeta-heuristic algorithm

作者：

张鑫强、邱一卉、李若玉

展开 >

作者单位：

厦门理工学院经济与管理学院,福建厦门 361024

关键词：

特征选择黏菌算法均衡池元启发式算法

基金：

国家自然科学基金福建省自然科学基金

项目编号：

71800402482022J011261

出版年：

2024

DOI：

10.6043/j.issn.0438-0479.202210015

厦门大学学报(自然科学版)

厦门大学

厦门大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.449

ISSN：0438-0479

年,卷(期)：2024.63(3)

参考文献量10