Short Text Feature Extension and Classification Method Based on Semantic Embedding of Tags and Graph Convolution Network
In short text classification,too short text length,fewer keywords and underutilization of the label information leads to the severe problems of sparse features and ambiguous semantics,which can affect the performance of short text classification.Agraph convolution network model based on tag semantic embedding is proposed for the problem.Firstly,according to TF/IDF,a new word frequency method is proposed,which comprehensively considers the inter-class and intra-class distribution of words in the global corpus.Then,through By label embedding method,each training text with the corresponding label is mapped into one feature space in the text graph.After filtering and aggregation in one feature space,synonyms embedded of label information can highlight the category representation.Finally,the text graph is input into the graph convolution neural network to learn new feature.Both the learned new feature Both the learned new feature and the features from the pre-training model can improve the classification accuracy of short texts and the generalization ability of the whole model.We choose four short text datasets such as MR,web_snippets,R8 and R52,to evaluate the performance of our proposed algorithm and fourteen benchmark models.The experimental results show that the proposed model in this paper is superior to others in classification accuracy,recall ratio and F1-score.
short textsemantics of labelfeature spacegraph convolution networkpre-training model