非独立同分布下联邦半监督学习的数据分享研究

扫码查看

原文链接

万方数据
维普

中文摘要：联邦学习作为一种保护本地数据隐私安全的分布式机器学习方法,联合分散的设备共同训练共享模型.通常联邦学习在数据均有标签情况下进行训练,然而现实中无法保证标签数据完全存在,提出联邦半监督学习.在联邦半监督学习中,如何利用无标签数据提升系统性能和如何缓解数据异质性带来的负面影响是两大挑战.针对标签数据仅在服务器场景,基于分享的思想,设计一种可应用在联邦半监督学习系统上的方法Share&Mark,该方法将客户端的分享数据由专家标记后参与联邦训练.同时,为充分利用分享的数据,根据各客户端模型在服务器数据集上的损失值动态调整各客户端模型在联邦聚合时的占比,即ServerLoss聚合算法.综合考虑隐私牺牲、通信开销以及人工标注成本3个方面的因素,对不同分享率下的实验结果进行分析,结果表明,约3％的数据分享比例能平衡各方面因素.此时,采用Share& Mark方法的联邦半监督学习系统FedMatch在CIFAR-10和Fashion-MNIST数据集上训练的模型准确率均可提升8％以上,并具有较优的鲁棒性.

外文标题：Research on Data Sharing of Federated Semi-Supervised Learning with Non-IID

外文摘要：Federated Learning(FL)is a distributed machine-learning method that protects the privacy and security of local data by training a shared model on decentralized devices.Typically,FL is performed when all data are labeled.However,in reality,the availability of labeled data is not always guaranteed.Therefore,Federated Semi-Supervised Learning(FSSL)has been proposed.In FSSL,there are two major challenges:utilizing unlabeled data to improve system performance and mitigating the negative effects of data heterogeneity.To address the scenario in which labeled data exist only on the server,a method called Share&Mark is designed based on the concept of sharing.This method can be applied to FSSL systems.Share&Mark involves having experts annotate the shared data from client devices,which then participate in federated training.In addition,to leverage the shared data fully,the ServerLoss aggregation algorithm dynamically adjusts the proportions of the client models during federated aggregation based on their respective loss values on the server dataset.Considering privacy sacrifices,communication costs,and manual annotation costs,the experimental results for different sharing ratios are analyzed.It is found that a sharing ratio of approximately 3％is a balanced choice considering all factors.With Share&Mark method,the FSSL system called FedMatch achieves an accuracy improvement of more than 8％on the CIFAR-10 and Fashion-MNIST datasets.It also demonstrates high robustness.

外文关键词：

Federated Semi-Supervised Learning(FSSL)Federated Learning(FL)data non-Independent and Identical Distribution(non-IID)robustnessaggregation algorithmdata sharing

作者：

顾永跟、高凌轩、吴小红、陶杰

展开 >

作者单位：

湖州师范学院信息工程学院,浙江湖州 313000

关键词：

联邦半监督学习联邦学习数据非独立同分布鲁棒性聚合算法数据分享

基金：

浙江省现代农业资源智慧管理与应用研究重点实验室项目

项目编号：

2020E10017

出版年：

2024

DOI：

10.19678/j.issn.1000-3428.0067926

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

年,卷(期)：2024.50(6)