Research on the Identification of DNA Replication Origin Based on Machine Learning
DNA replication occurs in all organisms,is the basis of biological inheritance,which is the process of generating two identical copies from a single original DNA molecule.In order to have a thorough understanding of this important biological process and then apply it to the development of the new strategy against genetic disorders,it is necessary to study the mechanism of DNA replication.In the post-genomic era,with the explosive growth of DNA sequence data,there is an urgent need to develop high-throughput data alignment tool that can identify DNA replication origin purely based on the sequence information.In the paper,a new predictor called iROI-PCM was proposed to represent the physicochemical attribute matrix of DNA sequence samples by combining a series of autocovariance and cross covariance,and the support vector machine is used for classification.Through strict cross validation,the results show that the proposed predictor is significantly better than the existing predictor in sensitivity,specificity,accuracy,and stability indexes,which can be helpful for relevant research to a certain extent.