首页|基于核极限学习机的多标签数据流半监督在线分类方法

基于核极限学习机的多标签数据流半监督在线分类方法

扫码查看
实际应用中涌现的大量流数据具有高速到达、海量、动态变化等特点,同时,这些数据流常含有多个标签且只有少量数据被标记,从而带来多标签数据环境下的概念漂移与标签缺失问题.为此,文中提出基于核极限学习机的多标签数据流半监督在线分类方法.首先,针对多标签数据流的标签缺失问题,根据滑动窗口将数据流划分为k块,对每块数据构造特征相似性矩阵和标签相似性矩阵,并加入核极限学习机的训练中.同时为了适应流数据的特点,设计增量式更新机制,构建半监督在线核极限学习机.然后,为了适应数据流中的概念漂移问题,采用基于时间戳丢弃更新的机制,预先设定数据规模,当数据到达指定规模后,丢弃最旧的无标签数据,将新的数据加入更新.最后,在10个多标签数据集上的实验表明,文中方法对标签缺失和概念漂移问题具有较强的适应能力,并能保持较优的分类效果.
Semi-supervised Online Classification Method for Multi-label Data Stream Based on Kernel Extreme Learning Machine
In practical applications,a large amount of streaming data emerges,and it is characterized of high arrival speed,massive volume and dynamic variation.Moreover,the data streams often contain multiple labels but only a small amount of data in the streams is labeled,causing the problems of concept drift and label missing in the multi-label data.To solve these problems,a semi-supervised online classification method for multi-label data stream based on kernel extreme learning machine is proposed in this paper.Firstly,the data stream is divided into k blocks according to the sliding window to tackle the label missing problem in multi-label data stream.A feature similarity matrix and a label similarity matrix are constructed for each piece of data and they are added to the training of kernel extreme learning machine model.An incremental update mechanism is designed to construct a semi-supervised online kernel extreme learning machine to adapt to the characteristics of streaming data.Secondly,to address the issue of the concept drift problem in data stream,the timestamp mechanism is adopted for discarding update.The data size is preset in advance.When the data reaches the specified size,the oldest unlabeled data is discarded and new data is added for updating.Finally,experiments on 10 multi-label datasets demonstrate that the proposed method possesses strong adaptability to the problems of label missing and concept drift,while maintaining good classification performance.

Data Stream ClassificationSemi-supervised ClassificationMulti-label ClassificationKernel Extreme Learning MachineConcept Drift

王雨晨、邱士远、李培培、胡学钢

展开 >

合肥工业大学计算机与信息学院 合肥 230601

合肥工业大学大数据知识工程教育部重点实验室 合肥 230009

合肥综合性国家科学中心大健康研究院健康大数据与群体医学研究所 合肥 230032

合肥工业大学安徽省工业安全与应急技术重点实验室 合肥 230009

展开 >

数据流分类 半监督分类 多标签分类 核极限学习机 概念漂移

2024

模式识别与人工智能
中国自动化学会,国家智能计算机研究开发中心,中国科学院合肥智能机械研究所

模式识别与人工智能

CSTPCD北大核心
影响因子:0.954
ISSN:1003-6059
年,卷(期):2024.37(8)