Identification of intrinsically disordered protein based on persistent homology sequence features
Intrinsically disordered proteins(IDPs)have a wide range of impacts on important physiological processes such as molecular recognition,molecular assembly,transcription and translation regulation,protein phosphorylation,and cell signal transduction.Therefore,it is critical to be able to identify IDPs accurately.The previous methods had problems such as cumbersome operation,high cost and low efficiency.In order to improve the cost-effectiveness,combined with the related techniques of machine learning,a deep neural network structure based on the combination of MLP and ResNet50 was developed,in which the ResNet50 removed the fully connected layer and retained the rest,which was located between the two MLP for feature extraction.In addition,the new sequence features are used for the first time:the persistent entropy of the sequence based on the persistent homology algorithm and the correlation odds of continuous amino acids.The simulation results show that the neural network structure designed in this study performs better than other prediction methods based on the same test set.It has good generalization ability even for shorter sequences.
intrinsically disordered proteinpersistent homologycorrelation odds of amino acidsdeep neural networks