采用可见/近红外漫反射光谱技术对苹果可溶性固形物含量(Soluble solids content,SSC)检测时,光谱采集探头到苹果表面的距离变化是随机和不可控的,造成检测精度降低.并且采用特征波长筛选算法优化预测模型时,忽略了被舍弃光谱数据中所包含的与成分含量相关信息,造成光谱信息丢失.针对以上问题,通过探究检测距离对漫反射光谱的影响规律,提出一种距离校正方法(Distance correction,DC),并采用数据融合方法将特征波长和非特征波长数据中的有效信息相结合,以提高苹果SSC预测模型的预测性能.为了验证所提出方法的有效性,分别采用多元散射校正(Multiple scattering correction,MSC)、标准正态变换(Standard normal variate transform,SNV)和DC算法对苹果光谱预处理后,建立苹果SSC的偏最小二乘回归(Partial least squares regression,PLSR)模型.结果表明,DC能更加有效提升PLSR模型的预测性能.为了减少模型数据量、消除光谱中共线性和无效信息,在DC预处理光谱的基础上,采用竞争性自适应加权采样算法(Competitive adaptive reweighted sampling,CARS)、自举软收缩(Bootstrapping soft shrinkage,BOSS)和区间变量迭代空间收缩法(Interval variable iterative space shrinkage approach,iVISSA)对光谱数据进行特征波长筛选.建模结果表明,DC-CARS-PLSR模型具有较好预测结果,并且大幅减少了光谱数据量.为了充分利用特征波长和非特征波长数据中与苹果SSC相关的信息,将特征和非特征波长PLSR模型的潜变量得分相融合,建立融合PLSR预测模型.结果表明,所提出的数据融合方法能够进一步提高模型预测性能.其中CARS算法的特征波长和非特征波长数据融合建模结果具有最佳预测性能,校正集相关系数Rc、校正集均方根误差(Root mean square error of calibration,RMSEC)、预测集相关系数 Rp、预测集均方根误差(Root mean square error of prediction,RMSEP)和相对分析误差(Relative percentage difference,RPD)分别为 0.981、0.297%、0.957、0.469%和 3.424.
Optimization of Apple Soluble Solids Content Prediction Models Based on Distance Correction and Data Fusion
When using visible/near-infrared diffuse reflectance spectroscopy for the detection of soluble solids content(SSC)in apples,the distance from the spectral acquisition probe to the sample surface varies randomly and uncontrollably,resulting in a reduction of detection accuracy.Moreover,when using characteristic wavelengths to establish the prediction models,the contribution of non-characteristic wavelengths to the prediction results is often neglected,resulting in the loss of spectral information.Therefore,a distance correction(DC)method was proposed by exploring the law of the influence of detection distance on diffuse reflectance spectra and establishing prediction models for apple SSC by combining the modeling method of fusion of characteristic wavelength and non-characteristic wavelength data.The results showed that DC could more effectively improve the prediction performance of the PLSR model;the use of the competitive adaptive reweighted sampling(CARS)algorithm for characteristic wavelength screening based on DC preprocessing could effectively simplify the model and improve the model prediction performance;and the fusion modeling results of characteristic and non-characteristic wavelength data of the CARS algorithm had the best prediction performance,with the correlation coefficient of calibration(Rc),root mean square error of calibration(RMSEC),the correlation coefficient of prediction(Rp),root mean square error of prediction(RMSEP)and relative percentage difference(RPD)of 0.981,0.297%,0.957,0.469% and 3.424,respectively.