Multi-label Classification Model of Chinese Short Texts Based on Deep Learning
Currently,short Chinese texts cannot be effectively distinguished by conventional multi-label classification algo-rithms due to their short length,diverse structure and lack of context.In view of the above problems,this paper proposes a multi-la-bel classification model CRC-MHA for Chinese short texts based on deep learning.The CRC-MHA model abandons the convention-al way of using Word2vec for static word embedding in the text representation layer,and uses BERT to perform dynamic word em-bedding for the input sentence.With the advantage of massive pre-training text,it can better characterize the contextual semantics of the text.At the same time,it designs a parallel feature extraction strategy combining CNN,RCNN and multi-head self-attention mechanism in the feature extraction layer,which enhances the capture of key features inside the text to improve the multi-label clas-sification effect.The experimental results show that the weighted average F1 value of the evaluation index of the CRC-MHA model is 1.95%higher than that of the BERT model,0.42%higher than that of the BERT-CNN model,and 0.34%higher than that of the BERT-RCNN model,which verifies the effectiveness of the model.
multi-label classificationChinese short textdynamic word embeddingfeature extraction