目的:基于超高效液相色谱串联四极杆飞行时间质谱(UHPLC-QTOF-MSE)分析并经数字量化处理,结合随机森林(Random Forest,RF)算法构建数据辨识模型,以实现中华草龟、巴西龟、台湾龟、鳄鱼龟、鳖甲基原的数字化鉴定.方法:经样品预处理后,对不同来源、不同批次的龟甲进行UPLC-QTOF-MSE分析,并以混合样品为基准进行峰位校正、提取并经量化处理,获取反映多肽离子信息的精确质量数-保留时间数据对(Exact Mass Retention Time,EMRT).然后基于信息增益率的特征筛选获取重要多肽离子信息,结合随机森林(RF)算法进行数据建模,同时基于内部交叉验证中的准确率(Acc)、精确率(P)、曲线下面积(AUC)等参数进行模型评价.最后基于最优模型进行龟甲基原的鉴定验证分析.结果:基于信息增益率的特征筛选,得到71个特征多肽信息,建立的RF模型具有优秀的辨识效果,准确率、精确率以及AUC均大于0.950且外部鉴定验证的正确率为100.0%.结论:基于UHPLC-QTOF-MSE分析,并结合RF算法能够高效准确地实现不同来源龟甲基原的数字化鉴定,可为龟甲的质量控制及基原考证提供参考和帮助.
Identification of Different Tortoiseshell's Species based on Random Forest and UHPLC-QTOF-MSE
Objective:Based on ultra-high performance liquid chromatography tandem quadrupole time-of-flight mass spectrometry(UHPLC-QTOF-MSE)analysis and digital quantization,a data identification model was constructed by combining with the Random Forest(RF)algorithm to realize the digital identification of the species of Chinese tortoises,Brazilian tortoises,Taiwanese tortoises,alligator tortoises,and soft-shelled turtles.Methods:After sample pretreatment,different sources and batches of tortoiseshells were analyzed by UPLC-QTOF-MSE.The peak positions were corrected,extracted,and quantified based on the mixed samples to obtain the data pairs of Exact Mass-Retention Time(EMRT)reflecting the information of peptide ions.Then the information about important peptide ions was obtained based on feature screening of information gain rate,combined with RF for data modeling.At the same time,the models were evaluated according to parameters such as accuracy(Acc),precision(P),and area under the curve(AUC)in internal cross-validation.Finally,the identification validation analysis of tortoiseshell species was carried out based on the optimal model.Results:Based on the feature screening of information gain rate,the 71 characteristic polypeptide information were obtained and the established RF model has excellent identification effect,with the accuracy,precision and AUC all greater than 0.950 and the correct rate of external identification validation was 100.0%.Conclusion:Based on the UHPLC-QTOF-MSE analysis and combined with the RF algorithm,the digital identification of the species of the Tortoiseshell can be realized efficiently and accurately,which can provide reference and help for quality control and the species identification of the Tortoiseshell.