Prediction of soluble solids contents in kiwifruit based on both hyperspec-tral imaging technology and machine learning
[Objective]In the context of predicting soluble solids contents(SSC)for Miliang No.1 ki-wifruit,SSC is a key quality indicator representing the concentration of soluble sugars,which are impor-tant for determining the sweetness and maturity of the fruit.Accurate and timely SSC assessment is cru-cial for both consumer satisfaction and market pricing.Traditional methods like refractometry and liq-uid chromatography,while accurate,are time-consuming,costly and destructive,making them unsuit-able for large-scale or real-time monitoring.To address these challenges,this study aims to develop a non-destructive SSC prediction model using hyperspectral imaging technology,integrating multiple pre-processing methods,feature extraction algorithms and machine learning models.The goal is to enhance the robustness and generalization of SSC predictions by optimizing the entire prediction process,rather than focusing on individual steps like preprocessing or feature extraction,which has been the primary focus of many previous studies.[Methods]This study was conducted using 150 Miliang No.1 kiwi-fruit samples,which were randomly divided into a training set of 120 samples and a test set of 30 sam-ples.Hyperspectral images were captured using a Rikola portable hyperspectral imager,covering the spectral range from 500 nm to 900 nm with a wavelength interval of 2 nm,resulting in 194 spectral bands.The imaging was conducted in a controlled dark-box laboratory environment to ensure data con-sistency and minimize external interference.After the hyperspectral images were captured,SSC mea-surements were performed using an ATAGO PAL-BX/ACID 8 refractometer.Three SSC measurements were taken for each sample,and the arithmetic mean of the three values was used as the actual SSC val-ue.To improve the quality of the spectral data,various preprocessing methods were applied.Four spe-cific methods were employed to enhance data consistency and eliminate noise:multiplicative scatter correction(MSC),Savitzky-Golay smoothing(SG),SG combined with MSC(SG-MSC)and SG com-bined with standard normal variate(SG-SNV).The optimal preprocessing method was determined based on the performance of the partial least squares regression(PLSR)model,with MSC identified as the most effective method for reducing noise and correcting baseline drift.On this basis,feature extrac-tion was performed using competitive adaptive reweighted sampling(CARS),successive projections al-gorithm(SPA)and random frog(RF)to identify key spectral bands most relevant to SSC.These extract-ed spectral bands were then used as inputs for four machine learning models:partial least squares re-gression(PLSR),support vector regression(SVR),random forest regression(RFR)and backpropaga-tion neural network(BPNN).The coupling relationships between the spectral data and the actual SSC measurements were evaluated,and their predictive performances were compared.Based on the best-per-forming model,particle swarm optimization(PSO)was further introduced to fine-tune the model param-eters,aiming to enhance both prediction accuracy and generalization ability.[Results]After applying the four preprocessing methods to the spectral data,the MSC method was found to be the most effective at eliminating noise and baseline drift,leading to a significant overlap in the spectral curves.The MSC-CARS-PLSR,MSC-SPA-PLSR and MSC-RF-PLSR models demonstrated improved performance com-pared to the full-band PLSR model.Specifically,the R2 value for these models increased by 0.01 to 0.092,while the RMSEC decreased by 0.038 3 to 0.134 1.The three feature extraction methods were particularly successful in reducing interfering variables and improving the predictive power of the mod-els.It was noted that the majority of the spectral feature bands identified through the feature extraction process were concentrated within the 750 nm to 900 nm range,indicating that this range was the most sensitive interval for predicting the SSC of kiwifruit.Following feature extraction,the performance of the four machine learning models was evaluated,and the MSC-CARS-SVR model was found to exhibit the best predictive performance.After PSO parameter optimization,the comparison revealed that MSC-CARS-PSO-SVR model had the best prediction effect,with the coefficient of determination R-.=0.949,R2=0.913,the root mean square error RMSEC=0.341 2,and the RMSEP=0.364 9.These results indicat-ed that the SVR model,especially when optimized using PSO,was highly effective at handling com-plex,high-dimensional and small-sample data,making it particularly well-suited for predicting SSC in kiwifruit and other quality metrics.However,the worst prediction effect was achieved by utilizing the BPNN model,in which the CARS-BPNN test set R2=0.633,RMSEP=1.230 8.It indicated that the characteristics of the dataset used in this experiment may not be applicable to neural network prediction models,as its complexity or size may not be sufficient to effectively avoid model overfitting,which in turn may lead to limited prediction performance and affect the accuracy of the results.[Conclusion]The results of this study demonstrate that the MSC-CARS-PSO-SVR model is highly effective at pre-dicting the internal quality indicators of kiwifruit,particularly SSC.This model provides a scientific ba-sis for non-destructive quality inspection of agricultural products.By combining data preprocessing,fea-ture extraction and machine learning techniques with hyperspectral imaging,the study presents a rapid,non-destructive method for SSC detection in kiwifruit.The findings offer valuable technical support for intelligent fruit quality monitoring,grading and sorting systems,and have the potential to be applied across a wide range of fruit and agricultural products in related industries.