OBJECTIVE To establish an origin classification model of Angelica dahurica with unbalanced sample size based on near-infrared spectroscopy combined with data-enhanced convolutional neural network(CNN)algorithm.METHODS In this study,95 samples of Angelica dahurica were collected,and near-infrared spectroscopy was performed on different samples within the wavelength range of 12 500 to 4 000 cm-1.The near-infrared spectroscopy dataset of Angelica dahurica used in this study faces issues such as small sample size and uneven distribution of sample origins.To enhance the generalizability of the model,three data augmentation algorithms were proposed,including spectral shifting,spectral noise addition,and spectral combination.Additionally,to address the problem of sample imbalance,Focal Loss was used as the loss function for training the CNN model.RESULTS The three data enhancement algorithms were applied to the SVM model.Adding Gaussian noise with a signal-to-noise ratio of 20 to the spectral data had the best effect,which could increase the accuracy of the model to 84.2%.Aiming at the problem of sample imbalance,Focal Loss is used as the loss function to train the CNN model,and the accuracy rate can reach 94.7%.CONCLUSION The infrared spectroscopy combined with data-enhanced CNN algorithm provides a rapid and non-destructive detec-tion method and reliable data analysis method for the origin traceability of Radix Angelicae Dahuricae,and provides a new method ref-erence for the origin traceability of Chinese medicinal materials.
near infrared spectroscopyAngelica dahuricaorigin traceabilitydata enhancementconvolutional neural network