A 2-Approximation Algorithm for Contig-based One-sided Genome Scaffold Filling
The problem of genome scaffold filling gets more and more attention with the advancement of genome sequencing technology.The contig-based one-sided genomic scaffold filling problem is researched.Effective approximation algorithms are devised to enhance the integrity of the genome scaffold.The objective of algorithms is to insert the missing genes derived from genome sequencing into the scaffold according to the reference genome.In previous research on genome scaffold filling,missing genes can be inserted between any two genes in an incomplete sequence.While in genome scaffold based on contigs,the insertion position of missing genes is limited to between two contigs.This problem has been proven to be NP-complete and more general.Two related algorithms are analyzed.The 2-approximation isnot for general but special case,and the approximation performance ratio of the 2.57-approximation algorithm is considered unsatisfactory.A new algorithm focusing on the correspondence between missing genes,gene slots and breakpoints is designed by using greedy method and maximal matching.The algorithm not only solves the redundant common adjacencies problem,but also increases the number of external adjacencies by generating new contigs.As a result,the approximation performance for general case is improved to a ratio of 2.The effectiveness of the algorithm is further verified by developing a visualization program based on Python.