近红外光谱结合Stacking集成学习的猕猴桃糖度检测研究

Study on Sugar Content Detection of Kiwifruit Using Near-Infrared Spectroscopy Combined With Stacking Ensemble Learning

扫码查看

原文链接

维普
万方数据

中文摘要：利用近红外光谱技术Stacking集成学习对猕猴桃糖度的无损检测.以湖北"云海一号"猕猴桃为研究对象,采用红外分析仪获取了 280个样本的光谱数据,包含了 4 000～10 000 cm-1范围内的1 557个波长数据,使用折射仪测量糖度值.通过蒙特卡洛随机采样结合T检验的奇异样本识别算法筛除异常值样本.利用SPXY算法按照4∶1的比例划分训练集和测试集.使用多元散射校正(MSC)、SG平滑滤波(SG)、趋势校正(DT)、矢量归一化(VN)、标准正态变换(SNV)五种方法对数据进行预处理.使用无信息变量消除法(UVE)、竞争性自适应重加权算法(CARS)和区间变量迭代空间收缩特征选择算法(iVISSA)提取特征波长,使用连续投影算法(SPA)进行二次提取,消除共线性变量.由于单一模型的泛化能力有限,为了扩大建模能力,设计了一种基于Stacking算法的集成学习模型.选择贝叶斯岭回归(BRR)、偏最小二乘回归(PLSR)、支持向量机回归(SVR)以及人工神经网络(ANN)作为基学习器,线性回归(LR)作为元学习器建立集成模型,比较不同组合下集成模型的性能.使用Pearson相关系数分析基学习器与集成模型之间的关系.结果表明:在五种预处理方法之中,矢量归一化的效果最佳.对预处理后的光谱进行特征波长提取,结果显示VN-CARS-PLSR模型效果最好,在测试集上的R2p为0.805,RMSEP为0.498.模型提取了 177个特征波长,数据量相比于原始光谱减少了 88.6％.通过Stacking算法对基学习器进行融合,对比不同的组合方式,发现PLS+SVR+ANN集成模型预测精度最高,R2p达到了 0.853,RMSEP下降至0.433.通过Pearson相关系数分析了基学习器对集成模型性能的影响.研究表明,与单一模型相比,Stacking集成模型能够进行更加全面的建模,具有更高的泛化能力,该方法为猕猴桃糖度品质的无损检测提供了技术支持.

外文摘要：In this study,we employ near-infrared spectroscopy with Stacking ensemble learning to perform non-destructive sugar content analysis in kiwifruit.Our research focuses on the"Yunhai No.1"kiwifruit variety from Hubei.Using an infrared analyzer,we gathered spectral data from 280 samples,spanning 1 557 wavelengths in the 4 000～10 000 cm-1 range,and measured sugar content with a refractometer.Outliers were identified and excluded using a singular sample identification algorithm that combines Monte Carlo random sampling with a T-test.The SPXY algorithm was then employed to split the data into training and testing sets in a 4∶1 ratio.Data preprocessing involved multiple scattering corrections(MSC),Savitzky-Golay smoothing(SG),de-trending(DT),vector normalization(VN),and standard normal variable(SNV)transformations.Feature wavelengths were initially selected using uninformative variable elimination(UVE),competitive adaptive reweighted sampling(CARS),and interval variable iterative space shrinkage approach(iVISSA),followed by a secondary selection with the successive projections algorithm(SPA)to remove collinear variables.To address the limitations of single models in generalization,we designed an integrated learning model using the Stacking algorithm.This model incorporated Bayesian ridge regression(BRR),partial least squares regression(PLSR),support vector regression(SVR),and artificial neural networks(ANN)as base learners,with linear regression(LR)serving as the meta-learner.We assessed the performance of various ensemble model combinations and analyzed the influence of base learners on ensemble performance using the Pearson correlation coefficient.Experimental results indicated that vector normalization was the most effective among the five preprocessing methods.The VN-CARS-PLSR model demonstrated superior performance,with R2p of 0.805 and RMSEP of 0.498,identifying 177 feature wavelengths and reducing data volume by 88.6％compared to the original spectrum.Comparisons of different base learner combinations in the Stacking algorithm revealed that the PLS+SVR+ANN integrated model achieved the highest predictive accuracy,with R2p of 0.853 and RMSEP of 0.433.The study concludes that the stacking ensemble model offers more comprehensive modeling capabilities and superior generalization than single models,providing valuable technical support for non-destructive sugar quality detection in kiwifruit.

外文关键词：

KiwifruitNear-infrared spectroscopySugar contentStacking ensemble learningModel fusion

作者：

郭志强、张博涛、曾云流

展开 >

作者单位：

武汉理工大学信息工程学院,湖北武汉 430070

华中农业大学果蔬园艺作物种质创新与利用全国重点实验室·国家柑橘保鲜技术研发专业中心,湖北武汉 430070

关键词：

猕猴桃近红外光谱糖度 Stacking集成学习模型融合

基金：

江西省科学院重点研发项目猕猴桃质量安全与加工保鲜岗位项目湖北省重点研发项目

项目编号：

2021YSBG21019CARS262023BBB064

出版年：

2024

DOI：

10.3964/j.issn.1000-0593(2024)10-2932-09

光谱学与光谱分析

中国光学学会

光谱学与光谱分析

CSTPCD北大核心

影响因子：0.897

ISSN：1000-0593

年,卷(期)：2024.44(10)