面向分布式系统标签噪声的时间序列分类方法

Time series classification method for distributed system label noise

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：时间序列数据广泛存在于工业、医疗等应用领域的分布式边缘设备中,由于其往往具备人类不可识别的特征,基于现实数据的时间序列分类任务中普遍存在数据"孤岛"和标注错误等问题.为解决分布式数据环境下这一困难,提出一种联邦时序过滤框架,该框架充分考虑自监督对比学习在提取复杂时序数据表征的优越性,并结合联邦学习方法来解决分布式系统的隐私安全问题,同时降低通信成本.首先,通过在服务器上维护一套基准样本,使用基于区别对比损失和预测对比损失的时序增强预监督策略,通过预训练-微调方法获得一个高泛化时间序列表征能力的预监督模型;然后,引入一种新的标签噪声过滤的方法,利用由预监督模型指导的伪标签与本地标注的标签协同过滤设备中的噪声数据,并将干净数据集用于全局模型的训练;最后,根据各种标签噪声下对框架进行有效性验证,验证不同基准数据比例对于所构造框架的影响,并通过消融实验验证预监督模型各损失的过滤效果.

外文摘要：Distributed edge devices in the industrial,healthcare,and other application fields frequently contain time series data.Due to the often unrecognizable features it possesses,there are common issues in time series classification tasks based on real-world data,such as'data islands'and labeling errors.To address this difficulty in distributed data environments,a federated temporal filtering framework is proposed.It incorporates the advantages of self-supervised contrastive learning in extracting complex temporal data representations and is combined with the federated learning approach to tackle the privacy and security issues of distributed systems,while also reducing the communication cost.By maintaining a set of benchmark samples on the server,this paper employs a time-series augmented pre-supervised strategy that relies on distinguishing contrast loss and predicting contrast loss.A pre-supervised model with a high-capacity for generalizing time-series characterizations is achieved through a pre-training and fine-tuning methodology in this approach.Meanwhile,a new approach for label noise filtering is introduced,which utilizes pseudo-labels guided by the pre-supervised model to filter the noisy data in the device in concert with local dataset labels,and uses the clean dataset for the training of the global model.Finally,this paper validates the framework's effectiveness across different types of labeling noise,examines the impact of varying baseline data ratios on the constructed framework,and confirms the filtering effects of each loss in the pre-supervised model through ablation experiments.

外文关键词：

federated learningself-supervised learningtime series classificationlabel noisedistributed system

作者：

林子谦、张坤、樊重俊、杨夏洁

展开 >

作者单位：

上海理工大学管理学院,上海 200093

上海财经大学信息管理与工程学院,上海 200433

关键词：

联邦学习自监督学习时间序列分类标签噪声分布式系统

出版年：

2024

DOI：

10.13195/j.kzyjc.2023.1576

控制与决策

东北大学

控制与决策

CSTPCD北大核心

影响因子：1.227

ISSN：1001-0920

年,卷(期)：2024.39(12)