首页|非独立同分布下联邦半监督学习的数据分享研究

非独立同分布下联邦半监督学习的数据分享研究

扫码查看
联邦学习作为一种保护本地数据隐私安全的分布式机器学习方法,联合分散的设备共同训练共享模型。通常联邦学习在数据均有标签情况下进行训练,然而现实中无法保证标签数据完全存在,提出联邦半监督学习。在联邦半监督学习中,如何利用无标签数据提升系统性能和如何缓解数据异质性带来的负面影响是两大挑战。针对标签数据仅在服务器场景,基于分享的思想,设计一种可应用在联邦半监督学习系统上的方法Share&Mark,该方法将客户端的分享数据由专家标记后参与联邦训练。同时,为充分利用分享的数据,根据各客户端模型在服务器数据集上的损失值动态调整各客户端模型在联邦聚合时的占比,即ServerLoss聚合算法。综合考虑隐私牺牲、通信开销以及人工标注成本3个方面的因素,对不同分享率下的实验结果进行分析,结果表明,约3%的数据分享比例能平衡各方面因素。此时,采用Share& Mark方法的联邦半监督学习系统FedMatch在CIFAR-10和Fashion-MNIST数据集上训练的模型准确率均可提升8%以上,并具有较优的鲁棒性。
Research on Data Sharing of Federated Semi-Supervised Learning with Non-IID
Federated Learning(FL)is a distributed machine-learning method that protects the privacy and security of local data by training a shared model on decentralized devices.Typically,FL is performed when all data are labeled.However,in reality,the availability of labeled data is not always guaranteed.Therefore,Federated Semi-Supervised Learning(FSSL)has been proposed.In FSSL,there are two major challenges:utilizing unlabeled data to improve system performance and mitigating the negative effects of data heterogeneity.To address the scenario in which labeled data exist only on the server,a method called Share&Mark is designed based on the concept of sharing.This method can be applied to FSSL systems.Share&Mark involves having experts annotate the shared data from client devices,which then participate in federated training.In addition,to leverage the shared data fully,the ServerLoss aggregation algorithm dynamically adjusts the proportions of the client models during federated aggregation based on their respective loss values on the server dataset.Considering privacy sacrifices,communication costs,and manual annotation costs,the experimental results for different sharing ratios are analyzed.It is found that a sharing ratio of approximately 3%is a balanced choice considering all factors.With Share&Mark method,the FSSL system called FedMatch achieves an accuracy improvement of more than 8%on the CIFAR-10 and Fashion-MNIST datasets.It also demonstrates high robustness.

Federated Semi-Supervised Learning(FSSL)Federated Learning(FL)data non-Independent and Identical Distribution(non-IID)robustnessaggregation algorithmdata sharing

顾永跟、高凌轩、吴小红、陶杰

展开 >

湖州师范学院信息工程学院,浙江湖州 313000

联邦半监督学习 联邦学习 数据非独立同分布 鲁棒性 聚合算法 数据分享

浙江省现代农业资源智慧管理与应用研究重点实验室项目

2020E10017

2024

计算机工程
华东计算技术研究所 上海市计算机学会

计算机工程

CSTPCD北大核心
影响因子:0.581
ISSN:1000-3428
年,卷(期):2024.50(6)