Using incremental learning to identify human infection of zoonotic coronavirus
Since late 2019,the widespread outbreak of the novel coronavirus has had a severe impact on public health and social order.Machine learning-based prediction methods have the capability to determine the infectivity phenotype and pandemic risk of coronaviruses.Presently,six classes of coro-naviruses that infect humans have been identified.These viruses exhibit significant differences in their genomic sequences,and the continuous genetic variation in these viruses has resulted in a decline in the performance of machine learning models,potentially causing issues related to learned forgetting.This study,based on an incremental learning model framework,employed a One-class SVM algorithm for continuous discrimination of novel coronavirus subgroups.Furthermore,a combined strategy of pa-rameter sharing and knowledge distillation to adapt a backpropagation(BP)neural network for contin-uous learning and prediction of the human-infecting phenotype of coronaviruses was employed.The re-sults indicate that the One-class SVM,with a combination of balancing parameters v at 0.92,0.81,0.24,0.11,0.55,and 0.2,achieved the optimal classification performance for the six virus classes.It was found that the prediction model achieved the best performance when the number of hidden layer nodes was increased to 6,with a maximum Index of Agreement(IAC)value of 0.903 5 and a maxi-mum Bias Total(BT)value of-0.039 9.This effectively suppressed the learning amnesia trend in the network model,with the model's predictive performance being close to that of joint data training(IAC:0.923 6).This performance was significantly better than that of neural networks without knowledge distillation(IAC:0.776 4).Moreover,in comparison to other incremental methods,our approach outperformed sample-based methods such as ESRIL(IAC:0.866 2)and model parameter-based methods like CCLL(IAC:0.885 3).This research holds important implications for public health applications.