The Pattern of 8-mer Spectrum Distribution of Genome Sequences in Higher Or-ganisms and Its Application in the Study of Species Evolution
The 8-mer spectrum of the genome sequence is species-specific,and it is of great significance to unscramble the internal law of the 8-mer spectrum to reveal the sequence composition rule and evolutionary model of the genome.The 8-mer spectrum distribu-tion of 66 species was analyzed and it was found that the 8-mer spectrum distribution of higher mammals was mainly three peaks,the spectrum distribution of birds and reptiles was mainly two peaks,while the spectrum distribution of fish and invertebrates was mainly one peak.To further investigate the makeup of the genomic 8-mer spectrum,16 XY dinucleotide classification methods were used.The results showed that only the CG classification had the following two characteristics:(1)The 8-mer spectrum of CG0,CG1,and CG2 subsets presented an unimodal distribution,and the three peaks were separated from each other;(2)Relative to the random center lo-cation,the spectrum distribution of CG1 and CG2 subsets was far away from the random center,and the spectrum of CG0 subsets was distributed around the random center.To further verify the relationship between the spectrum distribution of CG0,CG1,and CG2 sub-sets and species evolution,a phylogenetic tree of 66 species was constructed using the separability of the spectrum of three CG subsets.The phylogenetic tree divided species into four clusters,namely higher mammals,birds and reptiles,fish,and invertebrates.The re-sults show that the spectrum distribution of the three CG subsets is closely related to the information on species genome evolution.
Genome sequence8-mer spectrumSeparabilityPhylogenetic tree