Very Short Texts Hierarchical Classification Combining Semantic Interpretation and DeBERTa
Text hierarchy classification has important applications in scenarios such as social comment topic classification and search term classification.The data in these scenarios often exhibits short text features,which is reflected in the sparsity and sen-sitivity of information.It poses great challenges for model feature representation and classification performance.The complexity and associativity of the hierarchical label space further exacerbate the difficulties.In view of this,a method fusing semantic inter-pretation and DeBERTa model is proposed,and the core idea of the method is as follows:introducing the semantic interpretation of individual words or phrases in specific contexts to supplement and optimize the content information acquired by the model;combining the disentangled attention and enhanced mask decoder of the DeBERTa model to better grasp the location information and improve the feature extraction ability.The method firstly performs grammatical disambiguation and lexical annotation on the training text,and then constructs the GlossDeBERTa model to perform semantic disambiguation with high accuracy to obtain the semantic interpreted sequence.Then the SimCSE framework is used to make the interpreted sequence vectorized to better charac-terize the sentence information in the interpreted sequence.Finally,the training text passes through the DeBERTa model neural network to get the feature vector representations of the original text,which is then summed up with the corresponding feature vector in the interpreted sequence,and passed into the multi-class classifier.The experiments select the very short text portion of the short text hierarchical categorization dataset TREC and expand the data,resulting in a dataset with an average length of 12 words.Multiple sets of comparison experiments show that the DeBERTa model proposed in this paper with fused semantic inter-pretation has the best performance,and the Accuracy,F1-micro,and F1-macro values on the validation and test sets are much bet-ter than other algorithmic models,which can well cope with the task of hierarchical categorization of very short texts.
Very short textHierarchical classificationSemantic interpretationDeBERTaGlossDeBERTaSimCSE