Analysis and Prediction of Binding Regions of Disease-related Proteins and DNA Molecules
The interactions of specific DNA molecules with proteins are involved in many cellular activities,and these interactions are closely related to many human diseases.In order to understand the molecular mechanism of proteins bind to DNA molecules,it is important to identify which residues in the biomolecular structure bind to DNA molecules.However,it is difficult to accurately identify the binding residues of proteins to DNA molecules.In this study,we will use machine learning algorithms to predict the binding regions of disease-associated proteins to DNA molecules,which lays the foundation for the next step of precise identification of binding sites.In this paper,the datasets used in the prediction models were extracted from Uniprot and PDB databases,and the location-specific scoring matrix(PSSM)and the physicochemical indices of amino acids were extracted as features,we extracted the location-specific scoring matrix(PSSM)and the physicochemical indexes of amino acids as the features,and used the random forest algorithm,5 fold cross-test results showed that the total accuracy reaches 94%when 103 physical and chemical indexes are used as characteristics,and the precision,recall and Markov correlation coefficient are 88%,75%and 0.78 respectively.It is obvious that this model has a good ability to recognize the binding regions of disease-related proteins and DNA molecules.
Disease-associated proteinsPosition-specific scoring matrixProteins bind to DNA moleculesMachine learning algorithm