基于多任务联合训练的长文本多实体情感分析

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：多实体情感分析旨在识别文中的核心实体并判断其对应的情感,是目前细粒度情感分析领域的研究热点,对长文本多实体情感分析的研究目前还处于起步阶段.文中提出了一种基于多任务联合训练的长文本多实体情感分析模型(PAM),首先采用TF-IDF算法提取文章中与标题相似的句子,剔除冗余信息以缩短文本长度,通过两个BiLSTM分别进行核心实体识别和情感分析任务的学习,获取各自需要的特征,然后利用融入相对位置信息的多头注意力机制将实体识别任务学习到的知识向情感分析任务传递,实现两个任务的联合学习,最后利用提出的Entity_Extract算法根据实体词在文本中出现的次数和先后位置从模型预测的候选实体中确定核心实体并获取其对应的情感.在搜狐新闻数据集上的实验结果证明了 PAM模型的有效性.

外文标题：Long Text Multi-entity Sentiment Analysis Based on Multi-task Joint Training

外文摘要：Multi-entity sentiment analysis aims to identify core entities in a text and judge their corresponding sentiment,which is a research hotspot in the field of fine-grained sentiment analysis.However,most existing researches of long text multi-entity sen-timent analysis is still in its early stages.This paper proposes a long text multi-entity sentiment analysis model(PAM)based on multi-task joint training.To begin with,the utilization of TF-IDF algorithm for extracting sentences similar to the article title can help eliminate redundant information and reduce the length of text.Subsequently,the adoption of two BiLSTM models for core entity recognition and sentiment analysis tasks respectively enables the acquisition of necessary features.Next,multi-head atten-tion mechanism is employed,which is integrated with relative position information,to transfer the knowledge gained from entity recognition task to sentiment analysis task,thus enabling joint learning of the two tasks.Finally,the proposed Entity_Extract al-gorithm is used to identify core entities from predicted candidate entities according to the number and position of entities in the text and obtain their corresponding emotions.Experimental results on Sohu news datasets demonstrate the effectiveness of PAM model.

外文关键词：

Long textMulti-entityFine-grained sentiment analysisMulti-task learning

作者：

张昊妍、段利国、王钦晨、郜浩

展开 >

作者单位：

太原理工大学计算机科学与技术学院太原 030024

山西电子科技学院信创产业学院山西临汾 041000

关键词：

长文本多实体细粒度情感分析多任务学习

基金：

山西省自然科学基金面上项目

项目编号：

202203021221234

出版年：

2024

DOI：

10.11896/jsjkx.230400001

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(6)

参考文献量25