蛋白质是一种具有空间结构的物质.蛋白质结构预测的主要目标是从已有的大规模的蛋白质数据集中提取有效的信息,从而预测自然界中蛋白质的结构.目前蛋白质结构预测实验存在的一个问题是,缺少能够进一步反映出蛋白质空间结构特征的数据集.当前主流的PDB蛋白质数据集虽然是经过实验测得,但没有利用到蛋白质的空间特征,而且存在掺杂核酸数据和部分数据不完整的问题.针对以上问题,从蛋白质的空间结构角度来研究蛋白质的预测.在原始PDB数据集的基础上,提出了河海图结构蛋白质数据集(Hohai Graphic Protein Data Bank,HohaiGPDB).该数据集以图结构为基础,表达出了蛋白质的空间结构特征.基于传统Transformer网络模型对新的数据集进行了相关的蛋白质结构预测实验,在HohaiGPDB数据集上的预测准确率可以达到59.38%,证明了 HohaiGPDB数据集的研究价值.HohaiGPDB数据集可以作为蛋白质相关研究的通用数据集.
Hohai Graphic Protein Data Bank and Prediction Model
Protein is a kind of substance with spatial structure.The main goal of protein structure prediction is to extract effective information from existing large-scale protein datasets,so as to predict the structure of proteins in nature.At present,one of the problems in protein structure prediction experiments is the lack of data sets that can further reflect the spatial structure of pro-teins.Although the current mainstream PDB(protein data bank)is experimentally measured,it does not utilize the spatial charac-teristics of proteins,and there are problems of doping nucleic acid data and partial data is incomplete.In view of the above pro-blems,this paper studies the prediction of protein from the perspective of spatial structure.Based on the original PDB,the Hohai graphic protein data bank is proposed.The dataset expresses the spatial structure characteristics of proteins based on the graph structure.Based on the traditional Transformer network model,relevant protein structure prediction experiments are carried out on the new dataset,and the prediction accuracy of HohaiGPDB could reach 59.38%,which proves the research value of Hohai-GPDB.The HohaiGPDB could be used as a general data set for protein-related studies.
Hohai graphic protein data bankProtein spatial structureProtein structure predictionTransformer model