基于XLNet和多粒度对比学习的新闻主题文本分类方法

扫码查看

原文链接

万方数据
维普

中文摘要：新闻主题文本内容简短却含义丰富,传统方法通常只考虑词粒度或句粒度向量中的一种进行研究,未能充分利用新闻主题文本不同粒度向量之间的关联信息.为深入挖掘文本的词向量和句向量间的依赖关系,提出一种基于XLNet和多粒度特征对比学习的新闻主题分类方法.首先,利用XLNet对新闻主题文本进行特征提取获得文本中词、句粒度的特征表示和潜在空间关系;然后,通过对比学习R-Drop策略生成不同粒度特征的正负样本对,以一定权重对文本的词向量-词向量、词向量-句向量和句向量-句向量进行特征相似度学习,使模型深入挖掘出字符属性和语句属性之间的关联信息,提升模型的表达能力.在THUCNews、Toutiao和SHNews数据集上进行实验,实验结果表明,与基准模型相比,所提方法在准确率和F1值上都有更好的表现,在三个数据集上的F1值分别达到了93.88％、90.08％、87.35％,验证了方法的有效性和合理性.

外文标题：A News Topic Text Classification Method Based on XLNet and Multi-granularity Contrastive Learning

外文摘要：News topic text was typically concise but rich in meaning. However,traditional methods in most studies often only considered one type of granularity vector,either word or sentence-level,and failed to fully utilize the correlated information among different granularity vectors of news topic text. To address this issue and explore the dependence relationship between word vectors and sentence vectors in texts,a news topic classification method based on XLNet and multi-granularity feature contrastive learning was proposed. Firstly,features were extracted from the news topic text using XLNet to obtain the feature representations and potential spatial relationships of words and sentences in the text. Then,positive and negative sample pairs of different granularity features were generated using the R-Drop strategy in contras-tive learning. Feature similarity learning was conducted on the word-word embedding,word-sentence em-bedding,and sentence-sentence embedding with certain weights,allowing the model to more deeply ex-plore the related information between character attributes and sentence attributes,thereby enhancing the model's expression ability. Experiments were conducted on THUCNews,Toutiao,and SHNews datasets,the results showed that the proposed method outperformed other methods in terms of accuracy and F1 val-ue,with F1 values reached 93.88％,90.08％,and 87.35％ respectively,thus verifying the effective-ness and rationality of the proposed method.

外文关键词：

natural language processingtext classificationnews topicXLNetcontrastive learning

作者：

陈敏、王雷春、徐瑞、史含笑、徐渺

展开 >

作者单位：

湖北大学计算机学院湖北武汉 430062

关键词：

自然语言处理文本分类新闻主题 XLNet 对比学习

出版年：

2025

DOI：

10.13705/j.issn.1671-6841.2023164

郑州大学学报(理学版)

郑州大学

郑州大学学报(理学版)

北大核心

影响因子：0.437

ISSN：1671-6841

年,卷(期)：2025.57(2)