首页|基于增量学习的冠状病毒人际感染预测研究

基于增量学习的冠状病毒人际感染预测研究

扫码查看
2019年底至今,新型冠状病毒大流行严重影响了公众卫生和社会秩序,而基于机器学习的预测方法可以判别冠状病毒的可感染性表型和大流行风险.目前,已发现6类感染人的冠状病毒,病毒基因组序列差异显著,病毒持续遗传变异导致机器学习模型性能下降并引发潜在的学习遗忘现象.文章基于增量学习的模型框架,使用One-class SVM算法对冠状病毒新类群进行持续鉴别,并进一步使用参数共享和知识蒸馏的联合策略改造BP神经网络,对冠状病毒人际感染表型进行持续学习和预测.结果显示,One-class SVM对6类病毒区分的权衡参数v组合在0.92、0.81、0.24、0.11、0.55、0.20下达到最优的病毒类群分类效果;当隐藏层节点批次增加为6时,预测模型取得最好性能表现,IAC取得最大值0.903 5,BT取得最大值-0.039 9,有效地抑制了神经网络模型的学习遗忘趋势,模型的预测性能接近联合训练的性能表现(IAC:0.923 6),明显优于未使用知识蒸馏的神经网络(IAC:0.776 4),进一步与其他增量方法比较,优于基于样本的ESRIL方法(IAC:0.866 2)和基于模型参数的CCLL方法(IAC:0.885 3),具有重要的公共卫生应用价值.
Using incremental learning to identify human infection of zoonotic coronavirus
Since late 2019,the widespread outbreak of the novel coronavirus has had a severe impact on public health and social order.Machine learning-based prediction methods have the capability to determine the infectivity phenotype and pandemic risk of coronaviruses.Presently,six classes of coro-naviruses that infect humans have been identified.These viruses exhibit significant differences in their genomic sequences,and the continuous genetic variation in these viruses has resulted in a decline in the performance of machine learning models,potentially causing issues related to learned forgetting.This study,based on an incremental learning model framework,employed a One-class SVM algorithm for continuous discrimination of novel coronavirus subgroups.Furthermore,a combined strategy of pa-rameter sharing and knowledge distillation to adapt a backpropagation(BP)neural network for contin-uous learning and prediction of the human-infecting phenotype of coronaviruses was employed.The re-sults indicate that the One-class SVM,with a combination of balancing parameters v at 0.92,0.81,0.24,0.11,0.55,and 0.2,achieved the optimal classification performance for the six virus classes.It was found that the prediction model achieved the best performance when the number of hidden layer nodes was increased to 6,with a maximum Index of Agreement(IAC)value of 0.903 5 and a maxi-mum Bias Total(BT)value of-0.039 9.This effectively suppressed the learning amnesia trend in the network model,with the model's predictive performance being close to that of joint data training(IAC:0.923 6).This performance was significantly better than that of neural networks without knowledge distillation(IAC:0.776 4).Moreover,in comparison to other incremental methods,our approach outperformed sample-based methods such as ESRIL(IAC:0.866 2)and model parameter-based methods like CCLL(IAC:0.885 3).This research holds important implications for public health applications.

incremental learningcoronavirusspike proteinhuman infection

杨晓宇、沈骜、沈嘉豪、廖玉琼、强小利、寇铮

展开 >

广州大学计算科技研究院,广东广州 510006

广州大学计算机科学与网络工程学院,广东广州 510006

增量学习 冠状病毒 刺突蛋白 人际感染

国家自然科学基金国家自然科学基金广东省基础与应用基础研究基金广州市科技计划广州市科技计划

61972109621721142022A1515011468202201020237SL2022A03J01035

2024

广州大学学报(自然科学版)
广州大学

广州大学学报(自然科学版)

影响因子:0.293
ISSN:1671-4229
年,卷(期):2024.23(2)
  • 28