微电子学与计算机2024,Vol.41Issue(10) :13-20.DOI:10.19304/J.ISSN1000-7180.2023.0854

基于持续同调序列特征的无序蛋白质区域识别

Identification of intrinsically disordered protein based on persistent homology sequence features

王煜民 赵加祥 王增科
微电子学与计算机2024,Vol.41Issue(10) :13-20.DOI:10.19304/J.ISSN1000-7180.2023.0854

基于持续同调序列特征的无序蛋白质区域识别

Identification of intrinsically disordered protein based on persistent homology sequence features

王煜民 1赵加祥 1王增科1
扫码查看

作者信息

  • 1. 南开大学 电子信息与光学工程学院,天津 300350
  • 折叠

摘要

固有无序蛋白质(IDPs)对分子识别、分子组装、转录以及翻译调控、蛋白质磷酸化、细胞信号转导等重要生理过程具有广泛的影响.因此,能够快速、可靠、准确地识别IDPs至关重要.以往的物理化学方法存在操作繁琐、成本高、效率低等问题,为了提高效益,结合机器学习的相关技术,开发了一种深度神经网络结构:基于多层感知器(MLP)网络和ResNet50 网络的组合,其中变体的ResNet50 网络去除了原本的全连接层而保留了其余部分,位于两个MLP网络之间,用于进行IDPs的特征提取.此外,使用新的序列特征,即基于持续同调算法的序列持久熵以及连续氨基酸的相关概率.仿真结果表明,基于相同的测试集,该研究中设计的深度神经网络结构与其他已知的预测方法相比表现更优,即使是对较短的序列也有较好的泛化能力.

Abstract

Intrinsically disordered proteins(IDPs)have a wide range of impacts on important physiological processes such as molecular recognition,molecular assembly,transcription and translation regulation,protein phosphorylation,and cell signal transduction.Therefore,it is critical to be able to identify IDPs accurately.The previous methods had problems such as cumbersome operation,high cost and low efficiency.In order to improve the cost-effectiveness,combined with the related techniques of machine learning,a deep neural network structure based on the combination of MLP and ResNet50 was developed,in which the ResNet50 removed the fully connected layer and retained the rest,which was located between the two MLP for feature extraction.In addition,the new sequence features are used for the first time:the persistent entropy of the sequence based on the persistent homology algorithm and the correlation odds of continuous amino acids.The simulation results show that the neural network structure designed in this study performs better than other prediction methods based on the same test set.It has good generalization ability even for shorter sequences.

关键词

固有无序蛋白/持续同调算法/氨基酸的相关概率/深度神经网络

Key words

intrinsically disordered protein/persistent homology/correlation odds of amino acids/deep neural networks

引用本文复制引用

出版年

2024
微电子学与计算机
中国航天科技集团公司第九研究院第七七一研究所

微电子学与计算机

CSTPCD
影响因子:0.431
ISSN:1000-7180
段落导航相关论文