一种基于深度学习的核酸结合蛋白多标签预测模型
A multi-label prediction model of nucleic acid-binding proteins based on deep learning
魏志森 1邓城2
作者信息
- 1. 闽南师范大学 计算机学院,福建 漳州 363000;数据科学与智能应用福建省高校重点实验室,福建 漳州 363000
- 2. 闽南师范大学 计算机学院,福建 漳州 363000
- 折叠
摘要
核酸结合蛋白(nucleic acid-binding protein,NABP)包括DNA结合蛋白(DNA-binding protein,DBP)和RNA结合蛋白(RNA-binding protein,RBP),准确识别NABP有助于了解蛋白质的作用机制.为了解决NABP预测中的交叉预测问题,提出一种新的同时预测DBP和RBP的深度学习模型DeepDPRP,并采用多标签学习方法训练模型.利用双向长短期记忆网络从位置特异性得分矩阵中提取全局蛋白质序列特征,再用卷积神经网络从中捕捉更复杂的特征,同时结合基于结构模式的卷积模块以有效利用已发现的蛋白质结构特征.独立测试实验表明,DeepD-PRP明显优于现有的核酸结合蛋白预测器,具有更高的性能和更低的交叉预测率.广泛的消融实验证明了该模型的有效性.
Abstract
Nucleic acid-binding protein(NABP)includes DNA-binding protein(DBP)and RNA-binding protein(RBP),and accurate identification of NABP helps to understand the mechanism of protein action.To solve cross-prediction problem in NABP prediction,a novel deep learning model called DeepDPRP is developed for predicting DBP and RBP at the same time,and the multi-label learning method is to train the model.DeepDPRP extracts global protein sequence features from posi-tion-specific scoring matrices by bidirectional long and short term memory,followed by convolution-al neural network to capture more sophisticated features,and a structural motif-based convolutional module is combined to efficiently utilise the discovered structural features of proteins.Experimental results on two independent test datasets show that DeepDPRP significantly outperforms existing NABP predictors with higher performance and lower cross-prediction.Extensive ablation experi-ments demonstrate the effectiveness of the proposed model.
关键词
核酸结合蛋白预测/卷积神经网络/双向长短期记忆/多标签学习Key words
nucleic acid-binding protein prediction/convolutional neural network/bidirectional long and short term memory/multi-label learning引用本文复制引用
基金项目
福建省自然科学基金(2022J01913)
福建省教育厅中青年项目(JAT190362)
出版年
2024