融合主题特征的文本情感分析模型

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：随着大型语言模型的快速发展,如何在保证模型性能的同时减少模型参数量,成为了自然语言处理领的一个重要挑战.然而,现有的参数压缩技术往往难以兼顾模型的稳定性和泛化能力.为此,提出了一种融合主题特征的情感分析新架构,旨在利用主题信息增强模型对文本情感极性的判断能力.具体而言,采用一种结合LDA和K-means的方法来提取文本的主题特征,并将其作为固定维度的向量与词嵌入进行拼接,得到新的词向量表示.随后使用平均池化技术构建句子级别的表征向量,并输入到一个全连接层进行情感分类.为了验证所提模型的有效性,在公开的情感分析数据集上与多个基准算法进行了对比实验.实验结果表明,所提模型在多个数据集上明显优于ALBERT,准确率提高了约3.5％,在参数量仅有微小增加的情况下维持了较高的稳定性和泛化能力.

外文标题：Text Emotional Analysis Model Fusing Theme Characteristics

外文摘要：With the rapid development of large-scale language models,how to reduce the number of model parameters while ensu-ring model performance has become an important challenge in the field of natural language processing.However,the existing pa-rameter compression techniques are often difficult to balance the stability and generalization ability of the model.To this end,this paper proposes a new framework for sentiment analysis that integrates topic features,aiming to use topic information to enhance the model's ability to judge text sentiment polarity.Specifically,a method combining LDA and K-means is used to extract the topic features of the text,and it is spliced with word embeddings as a fixed-dimensional vector to obtain a new word vector repre-sentation.Sentence-level representation vectors are then constructed using average pooling techniques and fed into a fully connect-ed layer for sentiment classification.To verify the effectiveness of the proposed model,comparative experiments with multiple benchmark algorithms are carried out on public sentiment analysis datasets.Experimental results show that the proposed model is significantly better than ALBERT in multiple data sets,with an accuracy rate increases by about 3.5％,and it maintains high sta-bility and generalization ability with only a small increase in the number of parameters.

外文关键词：

Emotional analysisALBERT modelLatent dirichlet allocationTheme featuresAverage pooling

作者：

杨俊哲、宋莹、陈逸菲

展开 >

作者单位：

南京信息工程大学自动化学院南京 210044

无锡学院自动化学院江苏无锡 214105

关键词：

情感分析 ALBERT模型 LDA模型主题特征平均池化

基金：

江苏省高等学校自然科学研究面上项目江苏省研究生实践创新计划

项目编号：

19KJB520044SJCX23_0392

出版年：

2024

DOI：

10.11896/jsjkx.230600111

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(z1)

参考文献量29