基于模型相似度与本地损失的双重客户端选择算法

Dual-Client Selection Algorithm Based on Model Similarity and Local Loss

李红娇 ¹王宝金 ¹王朝晖 ¹胡仁豪¹

扫码查看

作者信息

1. 上海电力大学计算机科学与技术学院,上海 201306
折叠

摘要

联邦学习是一种分布式机器学习技术,通过聚合客户端本地模型参数共建全局模型.现有的联邦学习客户端选择算法作用于训练前或者训练后.面对统计异质的客户端数据,训练前选择算法会使一些性能较差的客户端参与聚合,导致模型的准确率下降.而训练后选择算法要求所有客户端参与训练,需要大量的通信开销.为此,提出双重客户端选择(DCS)算法,在训练前选择1个客户端训练子集,以减少全局模型的下发,在子集训练后选择部分客户端参与聚合,以减少本地模型的上传.在本地训练前,服务器根据本地与全局模型的余弦相似度进行层次聚类,得到不同的选择概率分布,从中选出无偏的训练子集,以便更好地适应客户端数据的统计异质性.在子集训练后,服务器不仅考虑了本地损失,还结合了本地与全局模型的余弦相似度筛选出聚合子集,提高全局模型准确率.在Fashion-MNIST和CIFAR-10数据集上的实验结果表明,DCS算法相比于基线算法的测试准确率最大可提升8.55个百分点,同时上行和下行链路的通信开销分别为O(mn+2d)和O(dn+m).

Abstract

Federated learning is a distributed machine-learning technique that collaboratively constructs a global model by aggregating local model parameters from clients.Existing client selection algorithms for federated learning perform only pre-or post-training.With statistically heterogeneous client data,pre-training selection algorithms may involve poorly performing clients in aggregation,leading to a reduction in model accuracy.However,post-training selection algorithms require that all clients participate in training,which results in significant communication overhead.To address these issues,this study proposes a Dual-Client Selection(DCS)algorithm.This algorithm first selects a subset of clients for training prior to the local training phase to reduce the download of global models.Following the subset training,some clients are chosen to participate in aggregation to reduce the upload of local models.Prior to local training,the server conducts hierarchical clustering based on the cosine similarity between the local and global models.This process yields different selection probability distributions from which an unbiased training subset is selected to better adapt to the statistical heterogeneity of the client data.Following subset training,the server considers not only the local loss but also the cosine similarity between the local and global models.This enables the aggregated subset to be chosen,thereby improving the accuracy of the global model.Experimental results on the Fashion-MNIST and CIFAR-10 datasets demonstrate that the proposed DCS algorithm improves the test accuracy by a maximum of 8.55 percentage points as compared with the baseline algorithm,where the communication overheads of the uplink and downlink are O(mn+2d)and O(dn+m),respectively.

关键词

联邦学习/客户端选择/模拟相似度/聚类/本地损失

Key words

federated learning/client selection/model similarity/clustering/local loss

引用本文复制引用

基金项目

国家自然科学基金(61702321)

出版年

2024

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

参考文献量6

段落导航