为构建高质量的六大茶类识别模型,本研究中收集了 370份样品,通过采集其近红外光谱(near-infrared spectroscopy,NIRS),结合光谱预处理、特征提取以及数据挖掘分类器算法,建立六大茶类快速识别模型.结果表明:1)支持向量机(support vector machine,SVM)与随机森林(random forest,RF)分类器皆适于六大茶类快速识别模型的构建;2)SVM分类器更适于结合原始光谱(original spectrum,OS)建模,预处理易使基于该分类器建立的模型鉴别性能减弱;3)随机森林(RF)分类器更适用于预处理后光谱建模,所得模型较OS模型在识别正确率(recognition accuracy,RA)及受试者工作特征曲线下面积(area under the curve,AUC)均得到明显提升;4)特征提取中线性判别分析(linear discriminant analysis,LDA)算法表现最好,所得模型的RA较OS模型明显提升,其中最佳模型OS-LDA-SVM的RA为100.00%,AUC为1.00,识别正确率高、泛化能力强、模型性能优异,可产业化应用.综上所述,近红外光谱结合预处理、特征提取算法及分类器建立模型,进行六大茶类识别的可行性强,模型的识别正确率高、性能优异,可为茶叶贸易的茶类快速识别提供科学、准确、高效的技术支撑,为国际茶类识别模型的产业化应用奠定基础.
Rapid Identification of Six Major Tea Categories Based on Near-Infrared Spectroscopy
In order to construct a high-quality recognition model for the six major tea categories,this study selected 370 samples and collected their near-infrared spectroscopy(NIRS).A rapid recognition model for the six major tea categories was developed by combined these data with spectral pre-processing,feature extraction and data mining classifier algorithms.The results indicated that:1)Support vector machine(SVM)and random forest(RF)classifiers were both suitable for constructing rapid identification models for the six tea categories.2)The SVM classifier was more suitable for modeling with the original spectrum(OS),and pre-processing algorithms tended to weaken the discriminatory performance of the models based on this classifier.3)The RF algorithm was more suitable for modeling with pre-processing spectra,and the resulting models had a significant improvement in recognition accuracy(RA)and area under the curve(AUC)of the receiver operating characteristic curve compared to the OS models.4)Among the feature extraction algorithms,the linear discriminant analysis(LDA)algorithm performed the best,yielding models with significantly improved RA compared to OS models.The optimal model,OS-LDA-SVM,achieved RA of 100.00%and AUC of 1.00,demonstrating high recognition rate,strong generalization ability,excellent model performance,and potential in industrial application.In summary,NIRS combined with pre-processing,feature extraction algorithms and classifiers to build models for the identification of the six tea categories was highly feasible.The models have high recognition accuracy and excellent performance,providing scientific,accurate,and efficient technical support for the rapid identification of tea categories in the tea trade,which could lay the foundation for the industrial application of international tea category identification models.