Prediction of C2H2 zinc finger protein based on support vector machine
The first step in the transmission of genetic information is transcription,transcription is regulated by a variety of transcription factors.Transcription factors can bind to specific nucleotide sites upstream of genes and then influence the transcription process.The category with the largest number of transcription factors is zinc finger protein.Because zinc finger motifs in zinc finger protein are different,so they can bind to different sites and perform different regulatory process.The category with the largest number of zinc finger protein is C2H2 zinc finger protein.In this paper,the data set of C2H2 zinc finger protein is established,and based on the three types of feature information including amino acid composition,auto-covariance average chemical shift and dipeptide composition.The zinc finger protein is predicted by using the algorithm of support vector machine,and the accuracy is 87.86%in Jackknife.After that,different methods are used to reduce the dimension of dipeptide composition,and the accuracy is 90.21%after dimension reduction.Finally,multi-feature information is used to predict,and the accuracy is 92.55%.Prediction of zinc finger protein in order to better understand the structure,function and regulation mechanism.