电子学报2024,Vol.52Issue(9) :3228-3239.DOI:10.12263/DZXB.20230124

基于动态样本选择的概念漂移自适应预测方法

Concept Drift Adaptive Prediction Method Based on Dynamic Sample Selection

代劲 李昊 王国胤
电子学报2024,Vol.52Issue(9) :3228-3239.DOI:10.12263/DZXB.20230124

基于动态样本选择的概念漂移自适应预测方法

Concept Drift Adaptive Prediction Method Based on Dynamic Sample Selection

代劲 1李昊 2王国胤3
扫码查看

作者信息

  • 1. 重庆邮电大学软件学院,重庆 400065;计算智能重庆市重点实验室,重庆 400065
  • 2. 重庆邮电大学计算机学院,重庆 400065;计算智能重庆市重点实验室,重庆 400065
  • 3. 计算智能重庆市重点实验室,重庆 400065
  • 折叠

摘要

概念漂移是影响流数据挖掘性能的重要因素,当前主要通过增量更新或重训练模型进行处理,但对已有知识并未充分利用.从综合利用全体样本出发,本文构建了一种基于动态样本选择的概念漂移自适应分类方法.该方法在新样本到来时进行基于局部一致性的漂移检测,在发现漂移发生时去除区域内的噪声样本,当检测到新概念出现时,对历史相似概念进行重用.最后,对区域内不同类别样本进行多代表点归纳,并同步更新预测模型.本文在含有不同漂移类型的合成数据集上进行去噪效果验证,并在真实数据集上进行预测任务.实验结果表明,该方法可以有效去除因概念漂移而形成的漂移噪声,有效提升了预测模型性能,整体预测表现优于流行的概念漂移自适应模型.

Abstract

Concept drift is an important performance factor in stream data mining,mainly handled by incremental up-dating or retraining models,but not fully utilizing existing knowledge.This paper proposed an concept drift adaptive predic-tion method based on dynamic sample selection,starting from the comprehensive use of all samples.The method performs local consistency based drift detection when new samples arrive,removes noisy samples in the region when drift is detected,and reuses historically similar concepts when new concepts are detected.Finally,multi-representative point summarization is performed for different categories of samples in the region,and the prediction model is updated simultaneously.In this pa-per,the denoising effect is verified on synthetic datasets containing different drift types,and the prediction task is performed on the real dataset.The experimental results show that the method can effectively remove the drift noise due to conceptual drift,which effectively improves the performance of the prediction model.The prediction outperforms the popular concept drift adaptive model.

关键词

概念漂移/局部漂移检测/流数据/样本选择/样本去噪/自适应预测

Key words

concept drift/local drift detection/stream data/sample selection/sample denoisy/adaptive forecast

引用本文复制引用

基金项目

国家自然科学基金(61936001)

国家自然科学基金(62002037)

重庆市自然科学基金(cstc2021jcyjmsxmX0849)

重庆市自然科学基金(cstb2023nscq-LZX0006)

出版年

2024
电子学报
中国电子学会

电子学报

CSTPCDCSCD北大核心
影响因子:1.237
ISSN:0372-2112
参考文献量27
段落导航相关论文