首页|基于标签概念的多标签文本分类方法

基于标签概念的多标签文本分类方法

扫码查看
多标签文本分类是自然语言处理中重要且具有挑战性的任务之一。现有的方法注重文本表示学习,关注文本内部信息预测所属标签,忽略了属于某一标签的全体实例中共享的关键信息。鉴于此,本文提出一种基于标签概念的多标签文本分类方法:利用词频和潜在狄利克雷分布(latent Dirichlet allocation,LDA)方法从训练集全体实例中抽取各标签所对应的关键词,接着采取与文本编码相同方式对关键词编码,获得标签概念表示。在训练和预测过程中,检索与文本表示最相似的标签概念辅助分类,增加标签概念表示与文本表示的对比损失,使文本编码过程中能充分学习全局的标签概念信息。将本文方法嵌套在常用的多标签文本分类模型上进行实验,结果表明该方法有效提高了相应模型的性能。
Multi-Label Text Classification Method Based on Label Concept
Multi-label text classification is one of the important and challenging tasks in natural language processing.The existing methods pay attention to text representation learning,focus on the information inside the text to predict the label,but ignore the key information shared in all instances belonging to a certain label.In view of this,in this article we propose a multi-label text classification method based on the label concept.In our proposed method,word frequency and latent Dirichlet allocation(LDA)method are used to extract the key words corresponding to each tag from all the examples of the training set,and then the key words are encoded in the same way as text encoding to obtain the label concept representation.In the process of training and prediction,the auxiliary classification of tag concept that is most similar to text representation is retrieved,and the loss of comparison between tag concept representation and text representation is increased,so that the global tag concept information can be fully learned in the process of text coding.Experimental results showed that integrat-ing our proposed method into commonly used multi-label text classification models significantly improved the performance of the respective models.

label conceptglobal key informationcontrast lossmulti-label text classification

汪乐乐、张贤坤

展开 >

天津科技大学人工智能学院,天津 300457

标签概念 全局关键信息 对比损失 多标签文本分类

天津市科技计划

21ZYQCSY00050

2024

天津科技大学学报
天津科技大学

天津科技大学学报

影响因子:0.269
ISSN:1672-6510
年,卷(期):2024.39(1)
  • 25