首页|非平衡数据流在线主动学习方法

非平衡数据流在线主动学习方法

扫码查看
数据流分类是数据流挖掘领域一项重要研究任务,目标是从不断变化的海量数据中捕获变化的类结构。目前,几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题。基于此,提出一种非平衡数据流在线主动学习方法(Online active learning method for imbalanced data stream,OALM-IDS)。AdaBoost 是一种将多个弱分类器经过迭代生成强分类器的集成分类方法,AdaBoost。M2引入了弱分类器的置信度,此类方法常用于静态数据。定义了基于非平衡比率和自适应遗忘因子的训练样本重要性度量,从而使AdaBoost。M2方法适用于非平衡数据流,提升了非平衡数据流集成分类器的性能。提出了边际阈值矩阵的自适应调整方法,优化了标签请求策略。将概念漂移程度融入模型构建过程中,定义了基于概念漂移指数的自适应遗忘因子,实现了漂移后的模型重构。在6个人工数据流和4个真实数据流上的对比实验表明,提出的非平衡数据流在线主动学习方法的分类性能优于其他5种非平衡数据流学习方法。
Online Active Learning Method for Imbalanced Data Stream
Data stream classification is an important research task in the field of data stream mining,which aims to capture changing class structures from the ever-changing massive data.At present,almost no frameworks can sim-ultaneously address the common problems in data stream,such as multi-class imbalance,concept drift,outlier and the exorbitant costs associated with labeling the unlabeled samples.In this paper,we propose an online active learning method for imbalanced data stream(OALM-IDS).AdaBoost is an ensemble classification method that iteratively generates a strong classifier from multiple weak classifiers.AdaBoost.M2 further introduces the confid-ence degree of weak classifiers,which is suitable for static data.In the method,we firstly define an importance measure of training sample based on imbalanced ratio and adaptive forgetting factor,which makes the AdaBoost.M2 method applying for imbalanced data stream and improves the performance of ensemble classifier.Then,we propose an adaptive adjustment method of marginal threshold matrix,which optimizes the label request strategy.Finally,we define an adaptive forgetting factor based on the concept drift index by bringing the degree of concept drift into the construction process of model,which realizes the model reconstruction after drift.Comparat-ive experiments on six artificial data streams and four real data streams show that the classification performance of the online active learning method is better than those of the existing five learning methods for imbalance data stream.

Active learningdata stream classificationmulti-class imbalanceconcept drift

李艳红、任霖、王素格、李德玉

展开 >

山西大学计算机与信息技术学院 太原 030006

山西大学计算智能与中文信息处理教育部重点实验室 太原 030006

主动学习 数据流分类 多类非平衡 概念漂移

国家自然科学基金国家自然科学基金国家自然科学基金山西省重点研发计划

620761586207229441871286201903D421041

2024

自动化学报
中国自动化学会 中国科学院自动化研究所

自动化学报

CSTPCD北大核心
影响因子:1.762
ISSN:0254-4156
年,卷(期):2024.50(7)