A News Topic Text Classification Method Based on XLNet and Multi-granularity Contrastive Learning
News topic text was typically concise but rich in meaning. However,traditional methods in most studies often only considered one type of granularity vector,either word or sentence-level,and failed to fully utilize the correlated information among different granularity vectors of news topic text. To address this issue and explore the dependence relationship between word vectors and sentence vectors in texts,a news topic classification method based on XLNet and multi-granularity feature contrastive learning was proposed. Firstly,features were extracted from the news topic text using XLNet to obtain the feature representations and potential spatial relationships of words and sentences in the text. Then,positive and negative sample pairs of different granularity features were generated using the R-Drop strategy in contras-tive learning. Feature similarity learning was conducted on the word-word embedding,word-sentence em-bedding,and sentence-sentence embedding with certain weights,allowing the model to more deeply ex-plore the related information between character attributes and sentence attributes,thereby enhancing the model's expression ability. Experiments were conducted on THUCNews,Toutiao,and SHNews datasets,the results showed that the proposed method outperformed other methods in terms of accuracy and F1 val-ue,with F1 values reached 93.88%,90.08%,and 87.35% respectively,thus verifying the effective-ness and rationality of the proposed method.
natural language processingtext classificationnews topicXLNetcontrastive learning