计算机与数字工程2024,Vol.52Issue(1) :18-22,42.DOI:10.3969/j.issn.1672-9722.2024.01.003

基于关系挖掘和对抗训练的多标签文本分类

Multi-label Text Classification Based on Relationship Mining and Adversarial Training

杨冬菊 程伟飞
计算机与数字工程2024,Vol.52Issue(1) :18-22,42.DOI:10.3969/j.issn.1672-9722.2024.01.003

基于关系挖掘和对抗训练的多标签文本分类

Multi-label Text Classification Based on Relationship Mining and Adversarial Training

杨冬菊 1程伟飞1
扫码查看

作者信息

  • 1. 北方工业大学信息学院 北京 100144;大规模流数据集成与分析技术北京市重点实验室(北方工业大学) 北京 100144
  • 折叠

摘要

传统的多标签文本分类方法存在忽略标签语义、没有充分利用文本与标签以及标签与标签之间的关系等问题.为了解决以上问题,论文提出了一种基于关系挖掘和对抗训练的多标签文本分类模型.该模型利用了BERT模型和图注意力网络(GAT)分别提取文本的语义信息和挖掘标签之间的关系.首先,通过BERT模型对文本进行编码,以获取文本的语义信息.然后,使用图注意力网络(GAT)来挖掘标签之间的关系,以更好地理解标签之间的依赖关系.为了进一步挖掘文本与可学习的标签嵌入之间的关系,该模型采用了多头自注意力机制.此外,为了提高模型的鲁棒性,论文采用了R-drop策略进行模型训练.实验结果表明,在AAPD和RCV1数据集上,所提出的模型相比当前一些主流的多标签文本分类模型,不仅能够关注文本信息,还能够有效捕捉文本与标签之间的依赖关系以及标签与标签之间的关系,从而取得更好的性能.

Abstract

Traditional multi-label text classification methods ignore the label semantics and do not fully exploit the relation-ship between text and label as well as between label and label.In this paper,a multi-label text classification model is proposed based on relationship mining and adversarial training to solve the above problems.The BERT model and Graph Attention Network(GAT)are used to extract the semantic information of the text and mine the relationship between labels,respectively.First,the text is encoded using the BERT model to obtain semantic information of the text.Then,GAT is used to mine the relationships between la-bels to better understand the dependencies between labels.To further mine the relationship between text and learnable label embed-dings,the model employs a multi-head self-attention mechanism.Moreover,to improve the robustness of the model,the R-drop strategy is used for model training in this paper.Experimental results on AAPD and RCV1 datasets show that the proposed model not only focuses on textual information,but also effectively captures the dependencies between text and labels and the relationships be-tween labels to achieve better performance compared to some of the current mainstream multi-label text classification models.

关键词

BERT/注意力机制/R-drop/图注意网络/归一化

Key words

BERT/attention mechanism/R-drop/graph attention network/normalization

引用本文复制引用

基金项目

国家自然科学基金重点项目(61832004)

广州市科技计划-重点研发计划(202206030009)

出版年

2024
计算机与数字工程
中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD
影响因子:0.355
ISSN:1672-9722
参考文献量22
段落导航相关论文