首页|基于机器学习算法的深静脉血栓代谢标志物的分析策略

基于机器学习算法的深静脉血栓代谢标志物的分析策略

扫码查看
深静脉血栓形成(DVT)是临床常见的周围血管病,具有发病隐匿和病因复杂等特征.由于缺乏准确有效的早期诊断方法,DVT易被漏诊或误诊,筛选可靠的生物标志物是目前亟需解决的关键问题.本研究采用基于气相色谱-质谱联用(GC-MS)的代谢组学技术考察DVT大鼠尿液中的内源性代谢物变化规律,结合多元统计分析和多种特征选择算法筛选特征代谢物,构建了DVT机器学习诊断模型.建立了下腔静脉结扎大鼠模型,在代谢笼中收集血栓形成期(48~72 h)尿液,采用GC-MS技术采集DVT及对照组尿液代谢谱.经FiehnLib气相色谱-质谱数据库比对,在大鼠尿液中共鉴定出176种内源性代谢物;使用正交偏最小二乘判别分析(OPLS-DA)结合Mann-Whitney U检验,筛选出26种DVT差异代谢物;联合多种特征选择算法进一步筛选获得13种与DVT发生密切相关的关键代谢物.构建了高斯朴素贝叶斯(GNB)、支持向量机(SVM)、逻辑回归(LR)和线性判别分析(LDA)机器学习模型并用于DVT诊断.对各模型的性能进行评估比较发现,使用13种关键代谢物构建的诊断模型准确率高、稳定性好,其预测性能优于使用176种代谢物及26种差异代谢物构建的诊断模型.研究结果表明,联合多种特征选择算法分析DVT大鼠尿液的代谢物信息,可有效筛选出可靠的DVT潜在标志物,构建的机器学习模型可为DVT的自动化诊断提供新的技术手段.
Analysis Strategy of Deep Vein Thrombosis Metabolomic Biomarkers Based on Machine Learning Algorithms
Deep vein thrombosis(DVT)is a common peripheral vascular disease in clinical practice.The lack of precise and efficient early diagnostic techniques renders it susceptible to being overlooked or misdiagnosed,and therefore,identifying trustworthy biomarkers is a major issue that has to be resolved.In this study,the endogenous metabolites in the urine of DVT rats were screened by metabolomics technology based on gas chromatograph-mass spectrometry(GC-MS)and the characteristic metabolites were identified by multiple feature selection algorithms and multivariate statistical analysis,for the development of a machine learning-based diagnostic model for DVT.The urine samples in metabolic cage in the thrombus development phase(between 48 and 72 h)of rats were collected,which was used as the models for inferior vena cava ligation.The metabolic profiles of the control group and DVT were obtained using the GC-MS method.A total of 176 kinds of endogenous metabolites were identified in rat urine through comparison with the FiehnLib database,26 kinds of differential metabolites associated with DVT were screened through a combination of the Mann-Whitney U test and orthogonal partial least squares discriminant analysis(OPLS-DA),and 13 kinds of significant metabolites strongly correlated with DVT were further evaluated in conjunction with various machine learning feature selection techniques.For DVT diagnosis,machine learning models such as Gaussian Naive Bayes(GNB),support vector machine(SVM),logistic regression(LR),and linear discriminant analysis(LDA)were developed.The diagnostic model constructed using 13 kinds of key metabolites demonstrated excellent accuracy and stability,and surpassed the predictive performance of the models utilizing 176 kinds of metabolites and 26 kinds of differential metabolites,as evidenced by examination and comparison of each model's efficacy.The study showed that the integration of multiple feature selection algorithms for analyzing metabolite information in DVT rat urine was capable of effectively identifying reliable potential markers of DVT.Furthermore,the developed machine learning model offered a novel technical approach for the automated diagnosis of DVT.

Deep vein thrombosisMachine learningMetabolomicsGas chromatography-mass spectrometryFeature selection

刘明锋、吴妍娟、周世栋、党丽虹、李健、杜艳、孙俊红、曹洁

展开 >

山西医科大学法医学院,晋中 030600

上海市法医学重点实验室,司法部司法鉴定重点实验室(司法鉴定科学研究院),上海 200063

深静脉血栓形成 机器学习 代谢组学 气相色谱-质谱联用 特征选择

上海市法医学重点实验室、司法部司法鉴定重点实验室(司法鉴定科学研究院)开放基金项目山西省科技创新人才团队专项项目山西省自然科学基金项目

KF20200220220405100102520210302123302

2024

分析化学
中国化学会 中国科学院长春应用化学研究所

分析化学

CSTPCD北大核心
影响因子:1.423
ISSN:0253-3820
年,卷(期):2024.52(7)
  • 42