计算机工程与设计2024,Vol.45Issue(10) :3111-3119.DOI:10.16208/j.issn1000-7024.2024.10.030

融合对比学习和BERT的层级多标签文本分类模型

Hierarchical multi-label text classification model based on contrastive learning and BERT

代林林 张超群 汤卫东 刘成星 张龙昊
计算机工程与设计2024,Vol.45Issue(10) :3111-3119.DOI:10.16208/j.issn1000-7024.2024.10.030

融合对比学习和BERT的层级多标签文本分类模型

Hierarchical multi-label text classification model based on contrastive learning and BERT

代林林 1张超群 2汤卫东 1刘成星 1张龙昊1
扫码查看

作者信息

  • 1. 广西民族大学 人工智能学院,广西南宁 530006
  • 2. 广西民族大学 人工智能学院,广西南宁 530006;广西民族大学广西混杂计算与集成电路设计分析重点实验室,广西南宁 530006
  • 折叠

摘要

为有效解决现有文本分类模型难以建模标签语义关系的问题,提出一种融合对比学习和自注意力机制的层级多标签文本分类模型,命名为SampleHCT.设计一个标签特征提取模块,能有效提取标签的语义和层次结构特征.采用自注意力机制构建具有混合标签信息的阳性样本.使用对比学习训练文本编码器的标签意识.实验结果表明,SampleHCT相较于19个基准模型,取得了更高的分类分数,验证了其具有更有效的标签信息建模方式.

Abstract

To effectively address the problem that existing text classification models are unable to model semantic relationships among labels,a hierarchical multi-label text classification model named SampleHCT was put forward,in which contrastive lear-ning and self-attention mechanism were combined.A label features extraction module was designed using SampleHCT to extract both semantic and hierarchical structural features of labels.The self-attention mechanism was adopted to construct positive sam-ples with mixed label information.The contrastive learning was established to train the label-awareness of the text encoder.Experimental results demonstrate that SampleHCT achieves higher classification scores compared to 19 benchmark models,which verifies that SampleHCT has a more effective label information modeling method.

关键词

文本分类/对比学习/自注意力机制/层级结构/多标签/标签信息/全局特征

Key words

text classification/contrastive learning/self-attention mechanism/hierarchical structure/multiple labels/label in-formation/global features

引用本文复制引用

基金项目

国家自然科学基金项目(62062011)

广西自然科学基金项目(2019GXNSFAA185017)

广西民族大学研究生教育创新计划基金项目(gxunchxs2022094)

出版年

2024
计算机工程与设计
中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心
影响因子:0.617
ISSN:1000-7024
段落导航相关论文