iRSC-PseAAC:基于有效降维算法LDA预测蛋白质中的氧化还原敏感半胱氨酸位点
iRSC-PseAAC:Predicting Redox-sensitive Cysteine Sites in Proteins Based on Effective Dimension Reduction Algorithm LDA
魏欣 1刘春生 2吕哲 3林刚 4胡思亲 4贾建华5
作者信息
- 1. 江西服装学院商学院数理统计教研室,南昌 330201
- 2. 江西服装学院商学院智慧物流教研室,南昌 330201
- 3. 江西服装学院大数据学院信息工程教研室,南昌 330201
- 4. 江西服装学院大数据学院数据科学教研室,南昌 330201
- 5. 景德镇陶瓷大学信息工程学院生物信息研究室,江西景德镇 333403
- 折叠
摘要
氧化还原敏感半胱氨酸(RSC)硫醇参与了许多生物过程,并发挥着重要作用.因此,有必要对氧化还原敏感半胱氨酸进行准确鉴定.然而,传统的氧化还原敏感半胱氨酸鉴定非常昂贵且耗时.目前,迫切需要一种数学计算方法来识别序列信息,快速准确地鉴定出氧化还原敏感半胱氨酸.在此,我们开发了一种名为iRSC-PseAAC的有效预测器,它采用降维算法LDA结合支持向量机来预测氧化还原敏感半胱氨酸位点.在交叉验证中,特异性(Sp)、灵敏性(Sn)、准确性(Acc)和马修斯相关系数(MCC)的结果分别为0.841、0.868、0.859和0.692.在独立数据集的结果中,特异性(Sp)、灵敏性(Sn)、准确性(Acc)和马修斯相关系数(MCC)分别为0.906、0.882、0.890和0.767.与现有的预测方法相比,iRSC-PseAAC具有明显的改进效果.本研究提出的方法还可用于计算蛋白质组学中的许多问题.
Abstract
Redox-sensitive cysteine(RSC)thiol plays an important role in many biological processes such as photosynthesis,cellular metabolism,and transcription.Therefore,it is necessary to identify red-ox-sensitive cysteine accurately.However,traditional redox-sensitive cysteine identification is very ex-pensive and time-consuming.At present,there is an urgent need for a mathematical calculation method to identify sequence information and redox-sensitive cysteines quickly and accurately.Here,we devel-oped an effective predictor called iRSC-PseAAC,which used the dimension reduction algorithm LDA combined with the support vector machine to predict redox-sensitive cysteine sites.In the cross-validation results,the specificity(Sp),sensitivity(Sn),accuracy(Acc)and Matthews correlation coefficient(MCC)were 0.841,0.868,0.859 and 0.692 respectively.In the independent dataset results,the Sp,Sn,Acc and MCC were 0.906,0.882,0.890 and 0.767 respectively.compared with existing prediction methods,iRSC-PseAAC had obvious improvement effect.The method proposed for this study can also be used for many problems in computational proteomics.
关键词
氧化还原敏感半胱氨酸/特征提取/词嵌入/线性判别分析/机器学习Key words
redox-sensitive cysteine(RSC)/feature extraction/word embedding/linear discriminant analysis/machine learning引用本文复制引用
基金项目
国家自然科学基金项目(61761023)
江西省自然科学基金项目(20202BABL202004)
江西省教育厅科研项目(GJJ212419)
江西省教育厅科研项目(GJJ2202814)
江西省教育厅科研项目(GJJ2202813)
出版年
2024