Identification of Types of Tobacco Leaf Diseases Using Near-Infrared Spectroscopy and Random Forest Algorithm
In this study,spectral data from various samples of tobacco leaf diseases are collected using a handheld near-infrared spectrometer.This data is then subjected to preprocessing,which includes the application of a Savitzky-Golay(SG)filter for smoothing and the first derivative to the original spectral data.Training models are subsequently developed utilizing the random forest(RF)algorithm,and sample testing was conducted.For the purposes of comparative analysis,traditional classification algorithms,such as the support vector machine(SVM),back propagation(BP)neural network,and partial least squares discriminant analysis(PLS-DA),are also employed and their performances are evaluated.It is shown by the experimental results that the classification accuracy,sensitivity,and specificity associated with the RF algorithm are higher than those associated with the SVM,BP neural network,and PLS-DA algorithms.Additionally,the F1-score and area under the curve(AUC)values obtained from the RF algorithm surpassed those obtained from the other algorithms.These results indicate that the prediction accuracy of the RF algorithm is superior,and the overall performance of the model utilizing this algorithm is the best among those tested.A rapid detection method based on a handheld near-infrared spectroscopy spectrometer and the proposed RF algorithm has been demonstrated to identify tobacco leaf diseases efficiently,non-destructively,rapidly,and accurately.This method provides a new technical reference for the detection and identification of tobacco leaf disease species.