首页|A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Site Prediction

A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Site Prediction

扫码查看
Post-translational modifications(PTMs)have key roles in extending the functional diver-sity of proteins and,as a result,regulating diverse cellular processes in prokaryotic and eukaryotic organisms.Phosphorylation modification is a vital PTM that occurs in most proteins and plays a significant role in many biological processes.Disorders in the phosphorylation process lead to mul-tiple diseases,including neurological disorders and cancers.The purpose of this review is to orga-nize this body of knowledge associated with phosphorylation site(p-site)prediction to facilitate future research in this field.At first,we comprehensively review all related databases and introduce all steps regarding dataset creation,data preprocessing,and method evaluation in p-site prediction.Next,we investigate p-site prediction methods,which are divided into two computational groups:algorithmic and machine learning(ML).Additionally,it is shown that there are basically two main approaches for p-site prediction by ML:conventional and end-to-end deep learning methods,both of which are given an overview.Moreover,this review introduces the most important feature extraction techniques,which have mostly been used in p-site prediction.Finally,we create three test sets from new proteins related to the released version of the database of protein post-translational modifications(dbPTM)in 2022 based on general and human species.Evaluating online p-site pre-diction tools on newly added proteins introduced in the dbPTM 2022 release,distinct from those in the dbPTM 2019 release,reveals their limitations.In other words,the actual performance of these online p-site prediction tools on unseen proteins is notably lower than the results reported in their respective research papers.

PhosphorylationMachine learningDeep learningPost-translational modifica-tionDatabase

Farzaneh Esmaili、Mahdi Pourmirzaei、Shahin Ramazi、Seyedehsamaneh Shojaeilangari、Elham Yavari

展开 >

Department of Information Technology,Tarbiat Modares University,Tehran 14115-111,Iran

Department of Biophysics,Faculty of Biological Sciences,Tarbiat Modares University,Tehran 14115-111,Iran

Biomedical Engineering Group,Department of Electrical Engineering and Information Technology,Iranian Research Organization for Science and Technology(IROST),Tehran 33535-111,Iran

2023

基因组蛋白质组与生物信息学报(英文版)
中国科学院北京基因组研究所

基因组蛋白质组与生物信息学报(英文版)

CSTPCDCSCD
影响因子:0.495
ISSN:1672-0229
年,卷(期):2023.6(6)
  • 148