Federated Learning(FL)is a distributed machine-learning method that protects the privacy and security of local data by training a shared model on decentralized devices.Typically,FL is performed when all data are labeled.However,in reality,the availability of labeled data is not always guaranteed.Therefore,Federated Semi-Supervised Learning(FSSL)has been proposed.In FSSL,there are two major challenges:utilizing unlabeled data to improve system performance and mitigating the negative effects of data heterogeneity.To address the scenario in which labeled data exist only on the server,a method called Share&Mark is designed based on the concept of sharing.This method can be applied to FSSL systems.Share&Mark involves having experts annotate the shared data from client devices,which then participate in federated training.In addition,to leverage the shared data fully,the ServerLoss aggregation algorithm dynamically adjusts the proportions of the client models during federated aggregation based on their respective loss values on the server dataset.Considering privacy sacrifices,communication costs,and manual annotation costs,the experimental results for different sharing ratios are analyzed.It is found that a sharing ratio of approximately 3%is a balanced choice considering all factors.With Share&Mark method,the FSSL system called FedMatch achieves an accuracy improvement of more than 8%on the CIFAR-10 and Fashion-MNIST datasets.It also demonstrates high robustness.
关键词
联邦半监督学习/联邦学习/数据非独立同分布/鲁棒性/聚合算法/数据分享
Key words
Federated Semi-Supervised Learning(FSSL)/Federated Learning(FL)/data non-Independent and Identical Distribution(non-IID)/robustness/aggregation algorithm/data sharing