军事医学2024,Vol.48Issue(3) :213-218.DOI:10.7644/j.issn.1674-9960.2024.03.008

一种基于卷积神经网络的大肠杆菌和志贺菌基因组鉴别方法

A convolutional neural network-based method for differentiating between Escherichia coli and Shigella genomes

孟人杰 罗楠 靳远 岳俊杰 王博千 高沅铭
军事医学2024,Vol.48Issue(3) :213-218.DOI:10.7644/j.issn.1674-9960.2024.03.008

一种基于卷积神经网络的大肠杆菌和志贺菌基因组鉴别方法

A convolutional neural network-based method for differentiating between Escherichia coli and Shigella genomes

孟人杰 1罗楠 1靳远 2岳俊杰 2王博千 2高沅铭3
扫码查看

作者信息

  • 1. 国防科技大学计算机学院,长沙 410073;军事科学院军事医学研究院生物工程研究所,病原微生物生物安全国家重点实验室,北京 100071
  • 2. 军事科学院军事医学研究院生物工程研究所,病原微生物生物安全国家重点实验室,北京 100071
  • 3. 国防科技大学计算机学院,长沙 410073
  • 折叠

摘要

目的 利用深度学习方法,鉴别基因组相似度很高的大肠杆菌和志贺菌,为临床诊断和疫情防控提供参考依据.方法 提出一种迁移学习大规模预训练蛋白质语言模型的卷积神经网络(CNN),用于细菌类型鉴别,该方法可在属水平上实现对细菌类型的快速准确鉴别.为了验证模型的可靠性,该研究从美国国家生物技术信息中心(NCBI)下载相关细菌的全因组数据,并选择相似度很高的大肠杆菌和志贺菌的全基因组蛋白质序列作为实验样本.结果 在2960个高组装质量和4945个包含低组装质量的菌株上进行分类实验时,该方法在属水平上的分类准确率分别达到97.13%和95.56%,优于现有的其他方法.结论 这种基于深度学习的细菌类型鉴别方法通过自监督预训练技术与迁移学习相结合,可以学习到人类无法直观统计和观察的高维特征差异,表现出巨大潜力.此外,该方法对所用菌株的基因组序列的拼装完成度要求较低,适用范围广,更具实际应用价值.

Abstract

Objective To differentiate between highly genetically similar bacteria,such as Escherichia coli and Shigella spp.using deep learning techniques in order to contribute to clinical diagnosis and epidemic prevention.Methods A convolutional neural network(CNN)was proposed based on transfer learning with a large-scale pre-trained protein language model,which could enable rapid and accurate identification of bacterial strains at the genus level.To validate the reliability of this model,whole-genome data on related bacteria was retrieved from the National Center for Biotechnology Information(NCBI)in the United States before the full-genome protein sequences of highly genetically similar strains of Escherichia coli and Shigella spp.were selected as experimental samples.Results With this method,genus-level classification accuracies of 97.13%and 95.56%were made available respectively during classification experiments on 2960 strains with high assembly quality and 4945 strains with low assembly quality,which outperformed the other methods currently available.Conclusion This study demonstrates the reliability and potential of deep learning-based methods for differentiation of bacterial types.By integrating self-supervised pre-training techniques with transfer learning,this approach can capture high-dimensional feature differences that are not easily discernible or statistically analyzable by humans.Furthermore,this method exhibits broad applicability,as it requires lower assembly completeness of the bacterial genome sequences used.

关键词

大肠杆菌/志贺菌/细菌鉴别/全基因组蛋白/卷积神经网络

Key words

Escherichia coli/Shigella/bacterial identification/whole genome protein/convolutional neural network

引用本文复制引用

基金项目

国家自然科学基金(82003519)

国家自然科学基金(32070025)

国家自然科学基金(62102439)

病原微生物生物安全国家重点实验室研究项目(SKLPBS1807)

病原微生物生物安全国家重点实验室研究项目(SKLPBS2214)

出版年

2024
军事医学
军事医学科学院

军事医学

CSTPCD
影响因子:0.586
ISSN:1674-9960
参考文献量22
段落导航相关论文