首页|基于联邦学习的主动半监督短文本分类方法

基于联邦学习的主动半监督短文本分类方法

扫码查看
短文本分类应用广泛,是当前的研究热点,但受到短文本标注数据稀缺和数据隐私保护不便集中训练的影响,分类效果不佳.针对上述问题,我们提出了基于联邦学习的主动半监督异质图注意力网络模型(Active Semi-Supervised Learning empowered Heterogeneous Graph ATtention network model based on Federated learning,Fed-ASSL-HGAT),通过设计新颖的主动半监督学习(Active Semi-Supervised Learning,ASSL)框架生成高质量标注样本赋能异质图注意力网络(Heterogeneous Graph ATttention network model,HGAT),引入联邦学习对部署在不同节点的模型进行联合训练以满足数据隐私保护需求.所提出的ASSL框架通过将主动学习的多类别标注转化成二元类别标注,可大大降低标注难度;设计基于信息增益的选择策略筛选软、硬标签,以防止信息损失;通过半监督学习选择高准确率、高稳定性的正负样本打伪标签以确保标注质量.实验结果表明,所提出的ASSL-HGAT(S)在AGNews、Snippets、TagMyNews数据集上相比HGAT基线模型F1值分别提升2.45%、8.11%、7.46%.融合联邦学习所进一步提出的Fed-ASSL-HGAT模型可在不泄漏隐私数据的情况下满足性能要求.
An Active Semi-Supervised Short Text Classification Method Based on Federated Learning
Short-text classification is broadly used and is a current hot research spot.However,the performance of short-text classification is hampered by the sca1rcity of annotated data for short texts and the challenges of centralized train-ing for private data.To address these issues,we propose Fed-ASSL-HGAT(Active Semi-Supervised Heterogeneous Graph ATtention network model based on Federated learning),an active semi-supervised heterogeneous graph attention network model based on federated learning.This model utilizes the innovative active semi-supervised learning(ASSL)framework to generate high-quality labeled samples for empowering the heterogeneous graph attention network(HGAT)model.Addition-ally,federated learning is introduced to facilitate the joint training of the models deployed on different nodes,thereby satis-fying the requirements of data privacy protection.The proposed ASSL framework significantly reduces the annotation diffi-culty by transforming the multi-class annotation task into a binary classification task.To mitigate information loss,we em-ploy a selection strategy based on information gain to filter soft and hard labels.Semi-supervised learning is employed to se-lect positive and negative samples with high accuracy and stability for pseudo-labeling,thereby ensuring the labeling quali-ty.Experimental results demonstrate that the proposed ASSL-HGAT(Active Semi-supervised Learning Empowered Hetero-geneous Graph Attention Network)model achieves improvements of 2.45%,8.11%,and 7.46%in F1 scores comparing with the HGAT baseline model on the AGNews,Snippets,and TagMyNews datasets,respectively.By incorporating the federat-ed learning,the Fed-ASSL-HGAT model can meet the performance requirements without scarifying data privacy.

heterogeneous graph neural networkactive learningsemi-supervised learningfederated learning

孔德焱、冀振燕、杨燕燕、刘洋、刘吉强

展开 >

北京交通大学软件学院,北京 100044

北京交通大学网络空间安全学院智能交通数据安全与隐私保护北京市重点实验室,北京 100044

异质图神经网络 主动学习 半监督学习 联邦学习

国家自然科学基金国家自然科学基金

5217549351935002

2024

电子学报
中国电子学会

电子学报

CSTPCD北大核心
影响因子:1.237
ISSN:0372-2112
年,卷(期):2024.52(10)