生物信息学中,大规模的生物基因序列比对是最重要的基础问题.针对主流的BWT(burrows-wheeler transform)索引技术的研究,提出一种新的多阶混合BWT索引方法MD-BWT(multi difference cover mod3 burrows-wheeler transform),根据待比对序列的长度,动态选取适合的多位索引查找.实验结果表明,改进后的方法可以有效减少序列比对算法中的比对次数和计算次数,降低序列比对算法中索引算法的时间复杂度,明显提高序列比对的效率.在构造BWT(S)字符串过程中,通过DC3(difference cover mod 3)算法来构造后缀数组,实验表明DC3算法构造后缀数组比倍增算法的时间复杂度更低,时间性能更优.
Research on Sequence Alignment Algorithm Based on Optimized BWT Indexing Technique
Large-scale gene sequence alignment is the most important fundamental problem in bioinformatics.Based on the mainstream research of BWT index technology,the paper proposes a new multi-order mixed BWT index method,which dynamically selects the appropiate multi-bit indexing according to the length of the sequence to be compared.The experimental results show that the improved method can effectively reduce the number of comparison and calculation times,reduce the time complexity of the index algorithm,and significantly improve the efficiency of sequence comparison.In the process of constructing BWT(S)string,this paper uses DC3(difference cover mod 3)algorithm to construct the suffix array.Experiments show that DC3 algorithm has better time performance compared to Binary Lifting.
sequence alignmentBWT indexdifference cover mod 3suffix array