计算机研究与发展2024,Vol.61Issue(1) :172-183.DOI:10.7544/issn1000-1239.202220835

基于自适应深度集成网络的概念漂移收敛方法

Concept Drift Convergence Method Based on Adaptive Deep Ensemble Networks

郭虎升 孙妮 王嘉豪 王文剑
计算机研究与发展2024,Vol.61Issue(1) :172-183.DOI:10.7544/issn1000-1239.202220835

基于自适应深度集成网络的概念漂移收敛方法

Concept Drift Convergence Method Based on Adaptive Deep Ensemble Networks

郭虎升 1孙妮 2王嘉豪 2王文剑1
扫码查看

作者信息

  • 1. 山西大学计算机与信息技术学院 太原 030006;计算智能与中文信息处理教育部重点实验室(山西大学) 太原 030006
  • 2. 山西大学计算机与信息技术学院 太原 030006
  • 折叠

摘要

概念漂移是流数据挖掘领域中的一个重要且具有挑战性的难题.然而,目前的方法大多仅能够处理线性或简单的非线性映射,深度神经网络虽然有较强的非线性拟合能力,但在流数据挖掘任务中,每次只能在新得到的 1个或一批样本上进行训练,学习模型难以实时调整以适应动态变化的数据流.为解决上述问题,将梯度提升算法的纠错思想引入含概念漂移的流数据挖掘任务之中,提出了一种基于自适应深度集成网络的概念漂移收敛方法(concept drift convergence method based on adaptive deep ensemble networks,CD_ADEN).该模型集成多个浅层神经网络作为基学习器,后序基学习器在前序基学习器输出的基础上不断纠错,具有较高的实时泛化性能.此外,由于浅层神经网络有较快的收敛速度,因此所提出的模型能够较快地从概念漂移造成的精度下降中恢复.多个数据集上的实验结果表明,所提出的CD_ADEN方法平均实时精度有明显提高,相较于对比方法,平均实时精度有 1%~5%的提升,且平均序值在 7种典型的对比算法中排名第一.说明所提出的方法能够对前序输出进行纠错,且学习模型能够快速地从概念漂移造成的精度下降中恢复,提升了在线学习模型的实时泛化性能.

Abstract

Concept drift is an important and challenging problem in streaming data mining field.However,most existing methods can only deal with linear or simple nonlinear mappings.In spite of the ability of fitting nonlinear functions,neural network models have difficulty in adjusting dynamically according to the changing data streaming because only one sample or one batch of samples is available at a time for model training in the context of streaming data mining task.In order to solve above problem,the thought of gradient boosting algorithm is introduced to solve the problem of streaming data mining task with concept drift and a concept drift convergence method based on adaptive deep ensemble networks(CD_ADEN)is proposed.The proposed model combines several shallow neural networks as base leaner,and subsequent base learner corrects the output of precedent base learner to make the final output achieve high real-time generalization performance.Besides,because of the high convergence speed of shallow neural network,the proposed model will quickly recover from accuracy decrease caused by concept drift.The experimental results on multiple datasets show that the average real-time accuracy of the proposed CD_ADEN method is significantly improved compared with the comparative methods,the average real-time accuracy is improved by 1%-5%,and the average ordinal value ranks first in the comparison of several algorithms.It shows that the proposed model can correct the error of the pre-order output,and the learning model can quickly recover from the accuracy drop caused by concept drift,which improves the real-time generalization performance of the online learning model.

关键词

流数据/概念漂移/梯度提升/深度学习/快速适应

Key words

streaming data/concept drift/gradient boosting/deep learning/quick adaptation

引用本文复制引用

基金项目

国家自然科学基金(62276157)

国家自然科学基金(U21A20513)

国家自然科学基金(62076154)

山西省重点研发计划(202202020101003)

出版年

2024
计算机研究与发展
中国科学院计算技术研究所 中国计算机学会

计算机研究与发展

CSTPCD北大核心
影响因子:2.649
ISSN:1000-1239
参考文献量7
段落导航相关论文