基于标签交互Seq2Seq模型的多标签文本分类方法

Multi-label text classification method based on label interaction Seq2Seq model

王嫄 ¹胡鹏 ²鄢艳玲 ²王佳帅 ²赵婷婷 ²杨巨成²

扫码查看

作者信息

1. 天津科技大学人工智能学院,天津300457;普迈康(天津)精准医疗科技有限公司,天津300000
2. 天津科技大学人工智能学院,天津300457
折叠

摘要

多标签文本分类任务可被建模为文本序列到标签序列的映射任务.然而,现有的序列到序列(Seq2Seq)模型仅从嘈杂文本中提取粗粒度的文本级表示,忽略了标签与单词之间细粒度的交互线索,导致类别理解偏差.对此,提出基于编码器—解码器结构的标签语义交互Seq2Seq模型.在文本语义提取阶段,使用门控机制融合粗粒度的文本级表示和细粒度的交互线索,最终得到类别理解纠正的文本表示.在2个标准数据集上,与LEAM,LSAN,SGM等6个算法进行对比实验,结果表明,本文模型在2个主要评价指标上均得到显著提升.

Abstract

Multi-label text classification task can be modeled as a text sequence to label sequence mapping task.However,existing sequence-to-sequence (Seq2Seq)models only extract coarse-grained text-level representations from noisy texts,ignoring fine-grained interaction cues between labels and words,leading to class understanding bias.In this regard,a label semantic interaction Seq2Seq model based on encoder-decoder structure is proposed.In the text semantic extraction stage,a gating mechanism is used to fuse coarse-grained text-level representations and fine-grained interaction cues,and finally,a class understanding corrected text representation is obtained.On two standard datasets,the experimental results compared with six algorithms such as LEAM,LSAN and SGM,and the results show that the model are significantly improved in two main evaluation indicators.

关键词

多标签文本分类/序列到序列/自适应门/多头注意力/标签嵌入

Key words

multi-label text classification/sequence to sequence (Seq2Seq )/adapative gate/multi-head attention/label embedding

引用本文复制引用

基金项目

国家自然科学基金(61702367)

国家自然科学基金(61976156)

天津市企业科技特派员项目(20YDTPJC00560)

天津市自然科学基金(19JCYBJC15300)

出版年

2024

传感器与微系统

中国电子科技集团公司第四十九研究所

传感器与微系统

CSTPCD北大核心

影响因子：0.61

ISSN：1000-9787

段落导航