首页|基于持续同调序列特征的无序蛋白质区域识别

基于持续同调序列特征的无序蛋白质区域识别

扫码查看
固有无序蛋白质(IDPs)对分子识别、分子组装、转录以及翻译调控、蛋白质磷酸化、细胞信号转导等重要生理过程具有广泛的影响.因此,能够快速、可靠、准确地识别IDPs至关重要.以往的物理化学方法存在操作繁琐、成本高、效率低等问题,为了提高效益,结合机器学习的相关技术,开发了一种深度神经网络结构:基于多层感知器(MLP)网络和ResNet50 网络的组合,其中变体的ResNet50 网络去除了原本的全连接层而保留了其余部分,位于两个MLP网络之间,用于进行IDPs的特征提取.此外,使用新的序列特征,即基于持续同调算法的序列持久熵以及连续氨基酸的相关概率.仿真结果表明,基于相同的测试集,该研究中设计的深度神经网络结构与其他已知的预测方法相比表现更优,即使是对较短的序列也有较好的泛化能力.
Identification of intrinsically disordered protein based on persistent homology sequence features
Intrinsically disordered proteins(IDPs)have a wide range of impacts on important physiological processes such as molecular recognition,molecular assembly,transcription and translation regulation,protein phosphorylation,and cell signal transduction.Therefore,it is critical to be able to identify IDPs accurately.The previous methods had problems such as cumbersome operation,high cost and low efficiency.In order to improve the cost-effectiveness,combined with the related techniques of machine learning,a deep neural network structure based on the combination of MLP and ResNet50 was developed,in which the ResNet50 removed the fully connected layer and retained the rest,which was located between the two MLP for feature extraction.In addition,the new sequence features are used for the first time:the persistent entropy of the sequence based on the persistent homology algorithm and the correlation odds of continuous amino acids.The simulation results show that the neural network structure designed in this study performs better than other prediction methods based on the same test set.It has good generalization ability even for shorter sequences.

intrinsically disordered proteinpersistent homologycorrelation odds of amino acidsdeep neural networks

王煜民、赵加祥、王增科

展开 >

南开大学 电子信息与光学工程学院,天津 300350

固有无序蛋白 持续同调算法 氨基酸的相关概率 深度神经网络

2024

微电子学与计算机
中国航天科技集团公司第九研究院第七七一研究所

微电子学与计算机

CSTPCD
影响因子:0.431
ISSN:1000-7180
年,卷(期):2024.41(10)