Yunnan University Researchers Describe Recent Advances in Sequencing Technology (NmTHC: a hybrid error correction method based on a generative neural machine tr anslation model with transfer learning)

扫码查看

原文链接

NETL
NSTL

外文摘要：By a News Reporter-Staff News Editor at Robotics & Machine Learning Daily News Daily News-Researchers detail new data in sequenc ing technology. According to news reporting from Yunnan University by NewsRx jou rnalists, research stated, "Backgrounds The single-pass long reads generated by third-generation sequencing technology exhibit a higher error rate. However, the circular consensus sequencing (CCS) produces shorter reads." Funders for this research include National Natural Science Foundation of China. The news reporters obtained a quote from the research from Yunnan University: "T hus, it is effective to manage the error rate of long reads algorithmically with the help of the homologous high-precision and low-cost short reads from the Nex t Generation Sequencing (NGS) technology. In this work, a hybrid error correctio n method (NmTHC) based on a generative neural machine translation model is propo sed to automatically capture discrepancies within the aligned regions of long re ads and short reads, as well as the contextual relationships within the long rea ds themselves for error correction. Akin to natural language sequences, the long read can be regarded as a special 'genetic language' and be processed with the idea of generative neural networks. The algorithm builds a sequence-to-sequence( seq2seq) framework with Recurrent Neural Network (RNN) as the core layer. The be fore and post-corrected long reads are regarded as the sentences in the source a nd target language of translation, and the alignment information of long reads w ith short reads is used to create the special corpus for training. The well-trai ned model can be used to predict the corrected long read. NmTHC outperforms the latest mainstream hybrid error correction methods on real-world datasets from tw o mainstream platforms, including PacBio and Nanopore. Our experimental evaluati on results demonstrate that NmTHC can align more bases with the reference genome without any segmenting in the six benchmark datasets, proving that it enhances alignment identity without sacrificing any length advantages of long reads."

外文关键词：

Yunnan UniversityEmerging TechnologiesMachine LearningMachine TranslationSequencing TechnologyTechnology

出版年：

2024

Robotics & Machine Learning Daily News

ISSN：

年,卷(期)：2024.(Jun.24)