To effectively address the problem that existing text classification models are unable to model semantic relationships among labels,a hierarchical multi-label text classification model named SampleHCT was put forward,in which contrastive lear-ning and self-attention mechanism were combined.A label features extraction module was designed using SampleHCT to extract both semantic and hierarchical structural features of labels.The self-attention mechanism was adopted to construct positive sam-ples with mixed label information.The contrastive learning was established to train the label-awareness of the text encoder.Experimental results demonstrate that SampleHCT achieves higher classification scores compared to 19 benchmark models,which verifies that SampleHCT has a more effective label information modeling method.
关键词
文本分类/对比学习/自注意力机制/层级结构/多标签/标签信息/全局特征
Key words
text classification/contrastive learning/self-attention mechanism/hierarchical structure/multiple labels/label in-formation/global features