近红外光谱具有简单、快速和无损等特点,已成为广泛采用的复杂体系的定性和定量分析方法.然而近红外光谱通常包含大量与目标组分不相关的冗余波长,导致预测模型的预测性能变差,因此在建模前需对光谱变量进行选择.本研究首次将蜉蝣算法(Mayfly algorithm,MA)离散化并用于近红外光谱定量分析.MA模拟蜉蝣的求偶与交配行为,首先设置相同数量的雌性和雄性蜉蝣个体,对蜉蝣进行位置更新并离散.雄性蜉蝣吸引雌性蜉蝣通过"门当户对"的交配以及突变的方式产生子代,子代数量固定为20.将得到的子代加入原始种群中,根据总种群数保留相应数量的最优个体,使种群数在每次迭代后保持不变,形成的新一代种群进行下一次迭代.重复上述过程,直至达到最大迭代次数.采用玉米和掺伪植物油的近红外光谱数据验证了MA算法的性能.对MA算法中重力系数、迭代次数和种群数量3个参数进行优化.采用MA选择后的变量和待分析组分的含量建立偏最小二乘(Partial least squares,PLS)模型,并与全光谱PLS模型进行对比.结果显示,MA-PLS模型对玉米数据集中油、水分、蛋白质和淀粉含量预测的预测均方根误差(Root mean square error of prediction,RMSEP)比PLS模型分别下降了30.59%、40.24%、36.96%和27.93%,对掺伪植物油数据集中紫苏籽油、大豆油、玉米油和棉籽油含量预测的RMSEP分别下降了83.85%、90.90%、81.60%和92.18%.此外,MA-PLS所使用的变量数也显著少于PLS模型.因此,MA算法能够有效降低PLS模型的复杂度,提高PLS模型预测的准确性.
A Variable Selection Method Based on Mayfly Algorithm for Near-infrared Spectroscopy
Near-infrared (NIR) spectroscopy has become a widely used analytical technique for qualitative and quantitative analysis of complex systems due to its advantages such as simplicity,rapidity,and non-destruction. However,NIR spctoscopy often contains numerous redundant wavelengths that are not correlated with the target components,which will reduce the prediction accuracy of model. Therefore,it is necessary to select spectral variables before modeling. In this research,discretized mayfly algorithm (MA) was first developed for quantitative analysis of NIR spectroscopy. The MA simulated the courtship and mating behavior of mayflies. Initially,same number of male and female mayflies was set. The positions of mayflies were updated and discretized. Mayflies produced 20 offsprings through mating and mutation. These offsprings were added to the initial number of search agents. To evaluate the performance of the MA,NIR data of corn and adulterated vegetable oils were used for partial least squares (PLS) modeling analysis. The influence of gravity coefficient,iteration numbers and population numbers of MA were investigated. The MA-PLS was compared with the full-spectrum PLS model. Results showed that the root mean square error of prediction (RMSEP) of MA-PLS model for prediction of oil,moisture,protein and starch contents in corn dataset decreased by 30.59%,40.24%,36.96%and 27.93% compared with PLS,and the RMSEP of MA-PLS for prediction of perilla seed oil,soybean oil,corn oil and cottonseed oil in adulterated vegetable oil dataset decreased by 83.85%,90.90%,81.60% and 92.18% compared with PLS. In addition,the number of variables used in MA-PLS was also less than PLS. Therefore,MA could effectively reduce the complexity of PLS and improve the accuracy of prediction of PLS.
Near-infrared spectroscopyVariable selectionMayfly algorithmPartial least squaresSwarm intelligence optimization