首页|基于标签交互Seq2Seq模型的多标签文本分类方法

基于标签交互Seq2Seq模型的多标签文本分类方法

扫码查看
多标签文本分类任务可被建模为文本序列到标签序列的映射任务.然而,现有的序列到序列(Seq2Seq)模型仅从嘈杂文本中提取粗粒度的文本级表示,忽略了标签与单词之间细粒度的交互线索,导致类别理解偏差.对此,提出基于编码器—解码器结构的标签语义交互Seq2Seq模型.在文本语义提取阶段,使用门控机制融合粗粒度的文本级表示和细粒度的交互线索,最终得到类别理解纠正的文本表示.在2个标准数据集上,与LEAM,LSAN,SGM等6个算法进行对比实验,结果表明,本文模型在2个主要评价指标上均得到显著提升.
Multi-label text classification method based on label interaction Seq2Seq model
Multi-label text classification task can be modeled as a text sequence to label sequence mapping task.However,existing sequence-to-sequence (Seq2Seq)models only extract coarse-grained text-level representations from noisy texts,ignoring fine-grained interaction cues between labels and words,leading to class understanding bias.In this regard,a label semantic interaction Seq2Seq model based on encoder-decoder structure is proposed.In the text semantic extraction stage,a gating mechanism is used to fuse coarse-grained text-level representations and fine-grained interaction cues,and finally,a class understanding corrected text representation is obtained.On two standard datasets,the experimental results compared with six algorithms such as LEAM,LSAN and SGM,and the results show that the model are significantly improved in two main evaluation indicators.

multi-label text classificationsequence to sequence (Seq2Seq )adapative gatemulti-head attentionlabel embedding

王嫄、胡鹏、鄢艳玲、王佳帅、赵婷婷、杨巨成

展开 >

天津科技大学人工智能学院,天津300457

普迈康(天津)精准医疗科技有限公司,天津300000

多标签文本分类 序列到序列 自适应门 多头注意力 标签嵌入

国家自然科学基金国家自然科学基金天津市企业科技特派员项目天津市自然科学基金

617023676197615620YDTPJC0056019JCYBJC15300

2024

传感器与微系统
中国电子科技集团公司第四十九研究所

传感器与微系统

CSTPCD北大核心
影响因子:0.61
ISSN:1000-9787
年,卷(期):2024.43(8)