首页|基于Contig的单面基因组框架填充2-近似算法

基于Contig的单面基因组框架填充2-近似算法

扫码查看
随着基因测序技术的持续发展,基因组框架填充问题受到广泛关注。该文针对基于contig的单面含重复基因的基因组框架填充问题开展研究。通过设计有效的近似算法,完成根据参照基因组,将缺失基因填充至基因测序获得的不完整框架中,提高基因组框架的完整性。前期研究的基因组框架填充问题,缺失基因可以插入到不完整序列的任意两个基因之间,而基于片段重叠群(contig)的基因组框架填充,缺失基因的插入位置被限制在两个contig之间,更具一般性,该问题已被证明是NP完全问题。现有的近似算法中,2-近似算法处理的实例具有特殊性,2。57-近似算法针对一般实例,但近似性能比不够理想。该文以缺失基因、基因位点和断点三者之间的对应关系为基础,采用贪婪策略和最大匹配相结合的方式避免在填充过程中出现冗余公共邻接,并通过生成新的contig增加外邻接的数量,将针对一般实例的算法近似性能比提高到2,完成了基于Python的可视化程序开发,进一步验证了算法的有效性。
A 2-Approximation Algorithm for Contig-based One-sided Genome Scaffold Filling
The problem of genome scaffold filling gets more and more attention with the advancement of genome sequencing technology.The contig-based one-sided genomic scaffold filling problem is researched.Effective approximation algorithms are devised to enhance the integrity of the genome scaffold.The objective of algorithms is to insert the missing genes derived from genome sequencing into the scaffold according to the reference genome.In previous research on genome scaffold filling,missing genes can be inserted between any two genes in an incomplete sequence.While in genome scaffold based on contigs,the insertion position of missing genes is limited to between two contigs.This problem has been proven to be NP-complete and more general.Two related algorithms are analyzed.The 2-approximation isnot for general but special case,and the approximation performance ratio of the 2.57-approximation algorithm is considered unsatisfactory.A new algorithm focusing on the correspondence between missing genes,gene slots and breakpoints is designed by using greedy method and maximal matching.The algorithm not only solves the redundant common adjacencies problem,but also increases the number of external adjacencies by generating new contigs.As a result,the approximation performance for general case is improved to a ratio of 2.The effectiveness of the algorithm is further verified by developing a visualization program based on Python.

genomescaffold fillingapproximation algorithmgreedy strategymaximum matching

柳楠、卞忠勇、李洋、朱永琦

展开 >

山东建筑大学 计算机科学与技术学院,山东 济南 250101

基因组 框架填充 近似算法 贪婪策略 最大匹配

国家自然科学基金项目山东省自然科学基金项目

61902221ZR2018MF012

2024

计算机技术与发展
陕西省计算机学会

计算机技术与发展

CSTPCD
影响因子:0.621
ISSN:1673-629X
年,卷(期):2024.34(2)
  • 16