首页|Virsearcher: Identifying Bacteriophages from Metagenomes by Combining Convolutional Neural Network and Gene Information

Virsearcher: Identifying Bacteriophages from Metagenomes by Combining Convolutional Neural Network and Gene Information

扫码查看
Metagenome sequencing provides an unprecedented opportunity for the discovery of unknown microbes and viruses. A large number of phages and prokaryotes are mixed together in metagenomes. To study the influence of phages on human bodies and environments, it is of great significance to isolate phages from metagenomes. However, it is difficult to identify novel phages because of the diversity of their sequences and the frequent presence of short contigs in metagenomes. Here, virSearcher is developed to identify phages from metagenomes by combining the convolutional neural network (CNN) and the gene information of input sequences. Firstly, an input sequence is encoded in accordance with the different functions of its coding and the non-coding regions and then is converted into word embedding code through a word embedding layer before a convolutional layer. Meanwhile, the hit ratio of the virus genes is combined with the output of the CNN to further improve the performance of the network. The genes used by virSearcher consist of complete and incomplete genes. Experiments on several metagenomes have showed that, compared with others, virSearcher can significantly improve the performance for the identification of short sequences, while maintaining the performance for long ones. The source code of virSearcher is freely available from http://github.com/DrJackson18/virSearcher .

GenomicsConvolutional neural networksCodesViruses (medical)EncodingTrainingFeature extraction

Qiaoliang Liu、Fu Liu、Yan Miao、Jiaxue He、Tian Dong、Tao Hou、Yun Liu

展开 >

College of Communication Engineering, Jilin University, Changchun, China

Genetic Diagnosis Center, First Hospital of Jilin University, Changchun, China

2023

IEEE/ACM transactions on computational biology and bioinformatics

IEEE/ACM transactions on computational biology and bioinformatics

EI
ISSN:1545-5963
年,卷(期):2023.20(1)
  • 36