首页|利用中华蜜蜂工蜂幼虫肠道转录组纳米孔长读段数据完善东方蜜蜂参考基因组序列和功能注释

利用中华蜜蜂工蜂幼虫肠道转录组纳米孔长读段数据完善东方蜜蜂参考基因组序列和功能注释

扫码查看
[目的]将已获得的中华蜜蜂Apis cerana cerana转录组纳米孔长读段数据比对到东方蜜蜂A.cerana参考基因组,进行注释基因的结构优化,鉴定未注释的新基因和新转录本并进行功能注释以及预测其SSR位点、完整ORF和转录因子(transcription factor,TF)家族及成员的分析验证,完善现有的东方蜜蜂参考基因组序列和功能注释.[方法]基于已获得的高质量的接种蜜蜂球囊菌Ascosphaera apis的中华蜜蜂工蜂4,5和6日龄幼虫肠道转录组纳米孔测序数据,使用gffcompare软件将已鉴定到的全长转录本比对到东方蜜蜂参考基因组以优化已注释基因的结构;采用gffcompare软件鉴定参考基因组上未注释的新基因和新转录本,再通过比对Nr,KOG,eggNOG,GO和KEGG数据库进行功能注释;使用MISA,TransDecoder v3.0.0和animalTFDB 2.0软件分别预测SSR位点、完整ORF和TF家族及成员.[结果]共对东方蜜蜂参考基因组上已注释的4 648个基因结构进行了优化,对1 336个基因同时延长了 5'UTR和3'UTR,分别延长了 1 688个基因的5'UTR和1 624个基因的3'UTR;共鉴定到2 148个新基因,其中分别有818,298,587,359和333个新基因可注释到Nr,KOG,eggNOG,GO和KEGG数据库;共鉴定到35 432条新转录本,其中分别有30 974,21 222,29 025,19 852和9 214条新转录本可注释到上述5个数据库;共发掘出22 541个SSR位点,其中单、双、三和六碱基重复的SSR数量分别为12 078,7 140,2 825和43个,混合SSR的数量为2 964个,分布频率最高的类型是单碱基重复(153.37个/Mb);共预测到58个TF家族及1 611个成员;共预测出28 775个完整ORF,其中编码长度分布在100~200个氨基酸的ORF(38.99%)最多.[结论]研究结果优化了东方蜜蜂参考基因组上已注释基因的结构,并补充了参考基因组上未注释的新基因、新转录本、SSR、完整ORF及TF.
Improvement of the sequences and functional annotations of the Apis cerana reference genome with the nanopore long-read data of the gut transcriptome of larval A.cerana cerana workers
[Aim]The obtained nanopore long-read data of Apis cerana cerana transcriptome were compared with the reference genome of A.cerana,and the structures of the annotated genes were optimized.The unannotated new genes and new transcripts were identified and functionally annotated,and their SSR loci,complete ORFs and transcription factor(TF)families and members were predicted and verified,so as to improve the sequence and functional annotations of the reference genome of A.cerana.[Methods]Based on the high-quality transcriptome nanopore sequencing data of the 4-,5-and 6-day-old larvae of A.cerana cerana workers infected with Ascosphaera apis,the identified full-length transcripts were mapped to the reference genome of A.cerana with gffcompare software to optimize the structures of the annotated genes.The unannotated novel genes and transcripts in the reference genome were identified utilizing the gffcompare software and mapped to the Nr,KOG,eggNOG,GO and KEGG databases for functional annotation.MISA,TransDecoder v3.0.0 and animalTFDB 2.0 software were employed to respectively predict the SSR loci,complete ORFs as well as TF families and members.[Results]A total of 4 648 annotated genes in the reference genome of A.cerana were structurally optimized,the 5'UTR and 3'UTR of 1 336 genes were simultaneously extended,while the 5'UTR of 1 688 genes and the 3'UTR of 1 624 genes were respectively extended.A total of 2 148 novel genes were identified,among which 818,298,587,359 and 333 genes could be annotated to Nr,KOG,eggNOG,GO and KEGG databases,respectively.A total of 35 432 novel transcripts were identified,among which 30 974,21 222,29 025,19 852,and 9 214 could be respectively annotated to the aforementioned five databases.A total of 22 541 SSR loci were detected,of which the numbers of SSRs with single,double,three and six base repeat were 12 078,7 140,2 825 and 43,respectively.The number of mixed SSRs was 2 964,and the type with the highest distribution frequency was single base repeat(153.37/Mb),and 58 TF families and 1 611 members were predicted.A total of 28 775 complete ORFs were predicted,of which the ORFs with the coding lengths ranging from 100 to 200 aa(38.99%)were the most abundant.[Conclusion]These results optimize the structures of the annotated genes in the A.cerana reference genome and supplement novel genes,novel transcripts,SSR,complete ORFs,and TFs that were unannotated in the reference genome.

Apis ceranaA.cerana cerana3rd-generation sequencing technologynanopore sequencingfull-length transcripttranscriptomegenome

李坤泽、宋宇轩、臧贺、荆欣、范小雪、陈颖、那志豪、陈大福、付中民、郭睿

展开 >

福建农林大学蜂学与生物医药学院,福州 350002

天然生物毒素国家地方联合工程实验室,福州 350002

福建省蜂疗研究所,福州 350002

东方蜜蜂 中华蜜蜂 第三代测序技术 纳米孔测序 全长转录本 转录组 基因组

国家自然科学基金国家自然科学基金现代农业产业技术体系建设专项福建省自然科学基金面上项目福建农林大学硕士生导师团队项目福建农林大学科技创新专项福建省大学生创新创业训练计划福建省大学生创新创业训练计划

3217279232372943CARS-4-KXJ72022J01131334KFb22060XA202310389027S202310389076

2024

昆虫学报
中国科学院动物研究所,中国昆虫学会

昆虫学报

CSTPCD北大核心
影响因子:0.756
ISSN:0454-6296
年,卷(期):2024.67(3)
  • 12