首页|基于色谱峰形优劣的代谢组学峰检测参数优化算法比较

基于色谱峰形优劣的代谢组学峰检测参数优化算法比较

扫码查看
非靶向代谢组学数据预处理的关键步骤之一为峰检测(Peak picking),在高分辨质谱的峰检测过程中应用最广泛的算法为基于连续小波变换的centWave算法.本研究通过代谢物标准品和尿液两个数据集,结合优良峰形色谱峰比例、可信色谱峰比例和可重复色谱峰比例3个评价指标对IPO(Isotopologue parameters optimization)和centWave Sweep两种centWave参数优化算法进行了全面比较.为了快速准确地对色谱峰形优劣进行区分,比较了随机森林(Random forest)、自适应提升(Adaboost)和梯度提升树(Gradient boosting decision tree)3种集成学习算法在区分色谱峰形优劣方面的性能.根据准确度和衡量二分类模型精确度的F1分数,选择随机森林建立区分模型(准确度93.5%,F1分数0.938).研究结果表明,相比于XCMS Online推荐的参数,采用IPO和centWave Sweep进行参数优化后,不同数据集的可信色谱峰比例和可重复色谱峰比例均得到了提高,取得了较好的优化效果;但是,对于不同数据集的优良峰形色谱峰比例,与推荐参数相比并无明显差异,并且得到的优良峰形色谱峰比例均较低,表明现有的参数优化算法并不能使优良峰形色谱峰比例得到提升.过多的不良峰形色谱峰可能会导致下游的统计分析降低检验效能,或由于潜在特征峰无法准确积分而产生假阳性结果,提示在代谢组学研究中需要对得到的潜在生物标志物进一步确认.
Comparison of Metabolomics Peak-Picking Parameter Optimization Algorithms Based on Chromatographic Peak Shape
Peak picking is one of the essential steps in non-targeted metabolomics data preprocessing based on liquid chromatography-mass spectrometry.Among various peak-picking algorithms,centWave algorithm based on continuous wavelet transform has been widely adopted in high-resolution mass spectrometry.In this study,the optimization effects of two centWave parameter optimization algorithms,IPO and centWave Sweep,were compared.Two datasets including metabolite standards and urine were used for comprehensive evaluation of these two algorithms with respect to three indicators:good peak shape ratio,reliable peak ratio,and repeatable peak ratio.To quickly and accurately distinguish good and bad peak shapes,three ensemble learning algorithms,random forest,adaboost and gradient boosting decision tree,were selected to establish a model for distinguishing chromatographic peak shape.Finally,according to the accuracy and F1 score,random forest was selected to establish a discrimination model(Accuracy 93.5%,F1 score 0.938).Compared with recommended parameters of XCMS Online,the proportion of reliable peaks and the proportion of repeatable peaks of two parameter optimization algorithms were improved in different datasets.However,when it came to the proportion of peaks with good shape,there was no significant difference between the optimized parameters and the parameters recommended by XCMS Online in different datasets.Furthermore,all three parameter settings resulted in relatively low proportions of peaks with good shape.The results indicated that the current parameter optimization algorithm was unable to improve the proportion of peaks with good shape.An excessive number of bad shape peaks could not only decrease the statistical power of analysis but also generate false positive results.Therefore,it was critical to perform additional confirmation of potential markers in the practical application of metabolomics researches.

MetabolomicsPeak-pickingcentWaveEnsemble learning

盛阳昊、王珏、蒋跃平

展开 >

中南大学湘雅医院药学部,长沙 410008

复杂基质样本生物分析湖南省重点实验室,长沙 410000

中南大学湘雅医院,国家老年疾病临床医学研究中心,长沙 410008

长沙医学院药学院,长沙 410200

展开 >

代谢组学 峰检测 centWave 集成学习

复杂基质样本生物分析湖南省重点实验室基金项目

2017TP1037

2024

分析化学
中国化学会 中国科学院长春应用化学研究所

分析化学

CSTPCD北大核心
影响因子:1.423
ISSN:0253-3820
年,卷(期):2024.52(1)
  • 21