Robotics & Machine Learning Daily News2024,Issue(Jun.24) :4-5.

Yunnan University Researchers Describe Recent Advances in Sequencing Technology (NmTHC: a hybrid error correction method based on a generative neural machine tr anslation model with transfer learning)

云南大学的研究人员描述了测序技术的最新进展(NmTHC:一种基于带转移学习的生成式神经机翻译模型的混合纠错方法)

Robotics & Machine Learning Daily News2024,Issue(Jun.24) :4-5.

Yunnan University Researchers Describe Recent Advances in Sequencing Technology (NmTHC: a hybrid error correction method based on a generative neural machine tr anslation model with transfer learning)

云南大学的研究人员描述了测序技术的最新进展(NmTHC:一种基于带转移学习的生成式神经机翻译模型的混合纠错方法)

扫码查看

摘要

由一名新闻记者-机器人与机器学习每日新闻的工作人员新闻编辑-研究人员详细介绍了测序技术的新数据。根据NewsRx Jou Rnists在云南大学的新闻报道,研究表明:“背景第三代测序技术产生的单通长读显示出较高的错误率,而循环一致测序(CCS)产生的读码更短。”本研究的资助者包括国家自然科学基金。新闻记者从云南大学的研究中得到一句话:“hus,借助于Nex T世代测序(NGS)技术中同源的高精度、低成本的短读,用算法来管理长读的错误率是有效的。”提出了一种基于生成神经机器翻译模型的混合纠错方法(NmTHC),该方法可以自动捕捉长读和短读对齐区域内的差异,以及长读区域内的上下文关系进行纠错。该算法将长读理解为一种特殊的“遗传语言”,采用生成神经网络的思想对其进行处理。该算法以递归神经网络(RNN)为核心层,构建了序列到序列(seq2seq)框架,将校正前后的长读理解为翻译源语和目的语中的句子。利用长读与短读的对齐信息建立训练专用语料库,利用训练良好的模型对校正后的长读进行预测,NmTHC在主流平台的真实数据集上的性能优于最新的主流混合纠错方法。我们的实验评估结果表明,NmTHC在六个基准数据集中不需要任何分段就可以将更多的碱基与参考基因组对齐,证明它在不牺牲长读的长度优势的情况下增强了比对一致性。

Abstract

By a News Reporter-Staff News Editor at Robotics & Machine Learning Daily News Daily News-Researchers detail new data in sequenc ing technology. According to news reporting from Yunnan University by NewsRx jou rnalists, research stated, "Backgrounds The single-pass long reads generated by third-generation sequencing technology exhibit a higher error rate. However, the circular consensus sequencing (CCS) produces shorter reads." Funders for this research include National Natural Science Foundation of China. The news reporters obtained a quote from the research from Yunnan University: "T hus, it is effective to manage the error rate of long reads algorithmically with the help of the homologous high-precision and low-cost short reads from the Nex t Generation Sequencing (NGS) technology. In this work, a hybrid error correctio n method (NmTHC) based on a generative neural machine translation model is propo sed to automatically capture discrepancies within the aligned regions of long re ads and short reads, as well as the contextual relationships within the long rea ds themselves for error correction. Akin to natural language sequences, the long read can be regarded as a special 'genetic language' and be processed with the idea of generative neural networks. The algorithm builds a sequence-to-sequence( seq2seq) framework with Recurrent Neural Network (RNN) as the core layer. The be fore and post-corrected long reads are regarded as the sentences in the source a nd target language of translation, and the alignment information of long reads w ith short reads is used to create the special corpus for training. The well-trai ned model can be used to predict the corrected long read. NmTHC outperforms the latest mainstream hybrid error correction methods on real-world datasets from tw o mainstream platforms, including PacBio and Nanopore. Our experimental evaluati on results demonstrate that NmTHC can align more bases with the reference genome without any segmenting in the six benchmark datasets, proving that it enhances alignment identity without sacrificing any length advantages of long reads."

Key words

Yunnan University/Emerging Technologies/Machine Learning/Machine Translation/Sequencing Technology/Technology

引用本文复制引用

出版年

2024
Robotics & Machine Learning Daily News

Robotics & Machine Learning Daily News

ISSN:
段落导航相关论文