首页|A computational model to identify fertility-related proteins using sequence information

A computational model to identify fertility-related proteins using sequence information

扫码查看
Fertility is the most crucial step in the development process,which is controlled by many fertility-related proteins,including spermatogenesis-,oogenesis-and embryogenesis-related proteins.The identification of fertility-related proteins can provide important clues for studying the role of these proteins in development.Therefore,in this study,we constructed a two-layer classifier to identify fertility-related proteins.In this classifier,we first used the composition of amino acids(AA)and their physical and chemical properties to code these three fertility-related proteins.Then,the feature set is optimized by analysis of variance(ANOVA)and incremental feature selection(IFS)to obtain the optimal feature subset.Through five-fold cross-validation(CV)and independent data tests,the performance of models constructed by different machine learning(ML)methods is evaluated and compared.Finally,based on support vector machine(SVM),we obtained a two-layer model to classify three fertility-related proteins.On the independent test data set,the accuracy(ACC)and the area under the receiver operating characteristic curve(AUC)of the first layer classifier are 81.95%and 0.89,respectively,and them of the second layer classifier are 84.74%and 0.90,respectively.These results show that the proposed model has stable performance and satisfactory prediction accuracy,and can become a powerful model to identify more fertility related proteins.

fertility-related proteinsmachine learningsequence informationfeature selection

Yan LIN、Jiashu WANG、Xiaowei LIU、Xueqin XIE、De WU、Junjie ZHANG、Hui DING

展开 >

Key Laboratory for Animal Disease-Resistance Nutrition of the Ministry of Agriculture,Animal Nutrition Institute,Sichuan Agricultural University,Chengdu 611130,China

School of Life Science and Technology and Center for Informational Biology,University of Electronic Science and Technology of China,Chengdu 610054,China

Sichuan Major Science and Technology ProjectNational Natural Science Foundation of China

2021ZDZX0009035Z2060

2024

计算机科学前沿
高等教育出版社

计算机科学前沿

CSTPCDEI
影响因子:0.303
ISSN:2095-2228
年,卷(期):2024.18(1)
  • 68