首页|疾病相关的蛋白质与配体DNA分子结合区域的分析与预测

疾病相关的蛋白质与配体DNA分子结合区域的分析与预测

扫码查看
很多细胞的生命活动涉及到特定的DNA分子与蛋白质相互作用,而且这些相互作用与人类很多疾病的产生密切有关.为了 了解蛋白质与DNA分子结合的分子机制,确定蛋白质序列中哪些残基与DNA分子结合是非常重要的.但是目前,精确识别蛋白与DNA分子结合残基还很困难.在这项研究中,我们将使用机器学习算法来预测疾病相关蛋白与DNA分子的结合区域,这为下一步精确识别结合位点奠定了基础.预测模型中使用的数据集来自于Uniprot和PDB数据库,我们提取位置特异性打分矩阵(PSSM)、氨基酸的理化指数为特征,利用随机森林算法、5折交叉检验结果得到:在使用103种理化指数作为特征时,预测总精度最高达到94%,精确率、召回率以及马氏相关系数分别为88%、75%和0.78.可见该模型对于疾病相关的蛋白与DNA分子的结合区域是有较好的识别能力.
Analysis and Prediction of Binding Regions of Disease-related Proteins and DNA Molecules
The interactions of specific DNA molecules with proteins are involved in many cellular activities,and these interactions are closely related to many human diseases.In order to understand the molecular mechanism of proteins bind to DNA molecules,it is important to identify which residues in the biomolecular structure bind to DNA molecules.However,it is difficult to accurately identify the binding residues of proteins to DNA molecules.In this study,we will use machine learning algorithms to predict the binding regions of disease-associated proteins to DNA molecules,which lays the foundation for the next step of precise identification of binding sites.In this paper,the datasets used in the prediction models were extracted from Uniprot and PDB databases,and the location-specific scoring matrix(PSSM)and the physicochemical indices of amino acids were extracted as features,we extracted the location-specific scoring matrix(PSSM)and the physicochemical indexes of amino acids as the features,and used the random forest algorithm,5 fold cross-test results showed that the total accuracy reaches 94%when 103 physical and chemical indexes are used as characteristics,and the precision,recall and Markov correlation coefficient are 88%,75%and 0.78 respectively.It is obvious that this model has a good ability to recognize the binding regions of disease-related proteins and DNA molecules.

Disease-associated proteinsPosition-specific scoring matrixProteins bind to DNA moleculesMachine learning algorithm

冯永娥、孙鹏哲

展开 >

内蒙古农业大学理学院,呼和浩特 010018

疾病相关的蛋白质 位置特异性打分矩阵 蛋白质与配体DNA分子结合 机器学习算法

2024

内蒙古农业大学学报(自然科学版)
内蒙古农业大学

内蒙古农业大学学报(自然科学版)

北大核心
影响因子:0.384
ISSN:1009-3575
年,卷(期):2024.45(1)