目的 通过对已报道的A族链球菌(GAS)全基因组数据进行梳理和生物信息学分析,从基因组大数据中提取前噬菌体信息,并对其在基因组中的存在状态及部分前噬菌体的基因组成进行分析,了解GAS种群内前噬菌体分布特点。 方法 回顾性研究。收集下载GenBank数据库中截至2020年5月发布的GAS基因组组装序列,整理菌株重要背景信息建立本地化基因组数据库。利用生物信息学软件构建GAS全基因组系统发生树,进行核心基因组分析,并对基因组中潜在的前噬菌体及其完整性进行预测,获得前噬菌体分布特征。统计数据库中基因型种类、核心基因数量及前噬菌体的数量、长度和携带率。 结果 建立了包含2 529株GAS基因组序列的数据库,涵盖140种血清型(emm基因型)。分离地点主要包括东亚、欧洲、美洲、大洋洲19个国家和地区。分离菌株疾病背景主要分为侵袭性感染、非侵袭性感染和免疫继发症3类;共鉴定出1 005个核心基因,这些基因在95%以上菌株中均存在;对其中1 798条序列分析发现,有1 366条序列存在1个或以上完整的前噬菌体,携带率为76。0%。每株菌携带完整前噬菌体的数量范围为0~6个,长度范围为32。8~62。6 kb,主要分布在30~40 kb。中国菌株近些年优势克隆中存在的前噬菌体主要为phiHKUssa、phiHKUvir和phiHKU488,主要携带speC、spd1和ssa 3种毒力基因。 结论 前噬菌体在GAS基因组中分布广泛,可能在其种群优势克隆演变和扩张过程中发挥重要作用,进而重塑特定emm基因型内部种群结构。GAS基因组数据库的建立为GAS病原监测提供了重要数据支撑。 Objective To illustrate the characteristics of the distribution of prophages among the Group A Streptococcus(GAS) by mining the existing whole genome sequencing of the GAS, performing bioinformatic analyses, extracting data about prophages, and analyzing the state of prophages in the genome and genetic composition of some prophages。 Methods It was a retrospective study。Genome assembly sequences of GAS reported in GenBank till May 2020 were collected, and the important background information of these strains was sorted out to create a local genomic database。A phylogenetic tree of the whole genome of GAS was conducted using the bioinformatics software。The core genome was analyzed, and potential prophages and their integrity in the genome were predicted to obtain the characteristics of the distribution of prophages。Genotype types, number of core genes, and number, length and carrying rate of prophages in the database for GAS were analyzed。 Results A database containing the genome sequence of 2 529 GAS strains was established, involving 140 emm genotypes。These strains were isolated from 19 countries from East Asia, Europe, America and Oceania。Stratified by the disease background, these strains were mainly divided into invasive infection, non-invasive infection and immune sequelae。Prophage analysis of 1 798 genomes showed that at least one complete prophage was detected in 1 366 (76。0%) genomes。The number of complete prophages of each strain ranged from 0 to 6, and the length ranged from 32。8 to 62。6 kb, which was mainly 30-40 kb in length。The phiHKUssa, phiHKUvir and phiHKU488 were the most common prophages present in dominant clones circulated in China in recent years, which mainly carried virulence genes like the speC, spd1 and ssa。 Conclusions Prophages are widely distributed in the genome of GAS, which are of great significance in the evolution and expansion of dominating clones and thus reshape the population structure within the emm genotype。The establishment of a local genome database provides important baseline data for molecular epidemiological surveillance。
Construction of a genomic database for Group AStreptococcus and the analysis of prophage distribution
Objective To illustrate the characteristics of the distribution of prophages among the Group A Streptococcus(GAS) by mining the existing whole genome sequencing of the GAS, performing bioinformatic analyses, extracting data about prophages, and analyzing the state of prophages in the genome and genetic composition of some prophages. Methods It was a retrospective study.Genome assembly sequences of GAS reported in GenBank till May 2020 were collected, and the important background information of these strains was sorted out to create a local genomic database.A phylogenetic tree of the whole genome of GAS was conducted using the bioinformatics software.The core genome was analyzed, and potential prophages and their integrity in the genome were predicted to obtain the characteristics of the distribution of prophages.Genotype types, number of core genes, and number, length and carrying rate of prophages in the database for GAS were analyzed. Results A database containing the genome sequence of 2 529 GAS strains was established, involving 140 emm genotypes.These strains were isolated from 19 countries from East Asia, Europe, America and Oceania.Stratified by the disease background, these strains were mainly divided into invasive infection, non-invasive infection and immune sequelae.Prophage analysis of 1 798 genomes showed that at least one complete prophage was detected in 1 366 (76.0%) genomes.The number of complete prophages of each strain ranged from 0 to 6, and the length ranged from 32.8 to 62.6 kb, which was mainly 30-40 kb in length.The phiHKUssa, phiHKUvir and phiHKU488 were the most common prophages present in dominant clones circulated in China in recent years, which mainly carried virulence genes like the speC, spd1 and ssa. Conclusions Prophages are widely distributed in the genome of GAS, which are of great significance in the evolution and expansion of dominating clones and thus reshape the population structure within the emm genotype.The establishment of a local genome database provides important baseline data for molecular epidemiological surveillance.