Research on Data Sharing of Federated Semi-Supervised Learning with Non-IID
Federated Learning(FL)is a distributed machine-learning method that protects the privacy and security of local data by training a shared model on decentralized devices.Typically,FL is performed when all data are labeled.However,in reality,the availability of labeled data is not always guaranteed.Therefore,Federated Semi-Supervised Learning(FSSL)has been proposed.In FSSL,there are two major challenges:utilizing unlabeled data to improve system performance and mitigating the negative effects of data heterogeneity.To address the scenario in which labeled data exist only on the server,a method called Share&Mark is designed based on the concept of sharing.This method can be applied to FSSL systems.Share&Mark involves having experts annotate the shared data from client devices,which then participate in federated training.In addition,to leverage the shared data fully,the ServerLoss aggregation algorithm dynamically adjusts the proportions of the client models during federated aggregation based on their respective loss values on the server dataset.Considering privacy sacrifices,communication costs,and manual annotation costs,the experimental results for different sharing ratios are analyzed.It is found that a sharing ratio of approximately 3%is a balanced choice considering all factors.With Share&Mark method,the FSSL system called FedMatch achieves an accuracy improvement of more than 8%on the CIFAR-10 and Fashion-MNIST datasets.It also demonstrates high robustness.
Federated Semi-Supervised Learning(FSSL)Federated Learning(FL)data non-Independent and Identical Distribution(non-IID)robustnessaggregation algorithmdata sharing