基于多特征融合的微博细粒度情感分析

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：[目的]针对现有微博情感分析模型在微博文本相关特征提取和内容情感信息挖掘中存在的不足,提出RB-LCM模型以提升微博文本的细粒度情感分析效果.[方法]首先,采用RoBERTa动态编码微博文本字句特征;随后,利用Bi-LSTM与胶囊网络捕获微博语句更深层次的全局特征与局部特征;在此基础上,利用多头自注意力特征融合的方式对微博语句的相关多维度特征进行有效融合.训练过程采用改进的Focal Loss与FGM解决数据集标签不平衡以及模型的鲁棒性等问题.[结果]RB-LCM模型在SMP2020-EWECT数据集、NLPCC2013任务2数据集、NLPCC2014任务1数据集上的准确率与F1值分别为80.64％和77.41％、67.17％和51.08％、71.27％和58.25％,在二分类情感数据集weibo senti 100k上的准确率与F1值则分别达到98.45％和98.44％,其表现均优于各数据集上先进的情感分析模型.[局限]进行情感分析时只结合文本信息,尚未涉及相关图片、视频、语音等信息.[结论]本文提出的RB-LCM模型能够有效提升微博细粒度情感分析效果.

外文标题：Micro-Blog Fine-Grained Sentiment Analysis Based on Multi-Feature Fusion

外文摘要：[Objective]This paper proposes an RB-LCM model to improve the fine-grained sentiment analysis of Weibo texts.[Methods]First,we used the RoBERTa to encode the character and sentence-level features of Weibo posts.Then,we utilized the Bi-LSTM and capsule network to capture in-depth global and local features of Weibo sentences.Third,we deployed multi-head self-attention feature fusion to fuse the relevant multi-dimensional features.Finally,we used improved Focal Loss and FGM to train the model and improve the dataset labels'imbalance and the model's robustness.[Results]The accuracy and F1 value of the proposed model on the SMP2020-EWECT dataset reached 80.64％and 77.41％.The model's accuracy and F1 value on the NLPCC2013 task 2 dataset were 67.17％and 51.08％.The model's accuracy and F1 value on the NLPCC2014 task 1 dataset reached 71.27％and 58.25％.The model's accuracy and F1 value on the binary sentiment dataset weibo_senti_100k dataset were up to 98.45％and 98.44％,respectively.All results were better than the advanced sentiment analysis models on each dataset.[Limitations]Our model did not include relevant pictures,videos,voice,or other information for sentiment analysis.[Conclusions]The proposed model can effectively analyze the sentiment of Weibo posts.

外文关键词：

RoBERTaMulti-Head Self-Attention FusionBi-LSTMMicroblog Sentiment AnalysisCapsule Network

作者：

吴旭旭、陈鹏、江欢

展开 >

作者单位：

中国人民公安大学信息网络安全学院北京 100045

北京工商大学电商与物流学院北京 100048

关键词：

RoBERTa 多头自注意力融合双向长短时记忆网络微博情感分析胶囊网络

基金：

中国人民公安大学基本科研业务费项目

项目编号：

2022JKF02018

出版年：

2023

DOI：

10.11925/infotech.2096-3467.2022.1028

数据分析与知识发现

中国科学院文献情报中心

数据分析与知识发现

CSTPCDCSSCICSCDCHSSCD北大核心EI

影响因子：1.452

ISSN：2096-3467

年,卷(期)：2023.7(12)

参考文献量10