首页|基于瘤胃球菌微生物群丰度构建疾病类型预测的肠道菌群标签

基于瘤胃球菌微生物群丰度构建疾病类型预测的肠道菌群标签

扫码查看
为探讨肠道菌群在疾病类型预测中的价值,利用机器学习基于瘤胃球菌丰度构建了疾病的非侵入性评估模型.选取ExperimentHub R库存储库数据,下载来自不同研究的人类粪便瘤胃球菌丰度信息及实验方案、疾病状态、年龄、性别、抗生素使用情况、地区、吸烟情况等多种信息,利用随机森林、决策树、Adaboost等机器学习模型建立疾病筛查的评估模型,使用GridSearchCV(网格搜索)调整参数,并用混淆矩阵评估外部验证结果.经数据处理提取标准化命名了12种瘤胃球菌、7种疾病并将25个变量进行了哑变量变换.利用多种瘤胃球菌属微生物的丰度及性别、年龄等样本一般资料信息建立了3种评估模型.其中随机森林模型准确率最高(0.884),且当n_estimators为220时,模型得分为0.892,为最佳模型.外部验证结果也显示可见模型中分类算法预测错误的情况相对较少,模型性能良好.根据粪便样本的宏基因组学数据,基于瘤胃球菌丰度利用随机森林算法可以有效地对疾病类型进行预测.
Bacterial Signature for Prediction of Disease Type Based on Abundance of Ruminococcus
The study used machine learning model to construct a non-invasive evaluation model of diseases based on the abun-dance of Ruminococus to explore the value of intestinal flora in the prediction of disease types.Data in R library was used to down-load data from different studies.Abundance of Ruminococcus,study condition,disease state,age,sex,antibiotic use,region,smoking situation,and other information of human samples were selected,and the evaluation model of disease screening was es-tablished by using machine learning classification models such as random forest,decision tree and Adaboost.The parameters were adjusted by GridSearchCV,and the external verification results were evaluated by using a confusion matrix.Three evalua-tion models were established based on the abundance of Ruminococcus and the general information of samples such as sex and age.The random forest model had the highest accuracy(0.884).In addition,when n_estimators was 220,the score was 0.892,which was the best model.The external validation results also showed that the classification algorithm in the visible model predict-ed relatively few errors,and the model performed well.According to the metagenomic data of fecal samples,the random forest al-gorithm can effectively predict the disease types based on the abundance of Ruminococcus.

modeling predictionintestinal floraRuminococcusmachine learning

徐婷、沈佳豪、赵康、黄鹭、董恩惠、曾可心、卞新为、季明辉、许勤

展开 >

南京医科大学护理学院,南京 211166

南京中医药大学中医学院·中西医结合学院,南京 210023

南京医科大学第一临床医学院,南京 211166

建模预测 肠道菌群 瘤胃球菌 机器学习

国家自然科学基金江苏高校优势学科建设工程项目江苏省重点学科建设项目(十三五)

82073407苏政办发[2018]87号苏教研[2016]9号

2024

生物技术进展
中国农业科学院茶叶研究所 中国农业科学院生物科技研究所

生物技术进展

CSTPCD
影响因子:0.554
ISSN:2095-2341
年,卷(期):2024.14(2)
  • 42