基于ChatGLM的情感分析数据增强方法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：情感分析是自然语言处理领域的热门任务之一.由于训练数据的标注难度大、成本高,少样本下的情感分析受到人们关注,数据增强方法是少样本学习的主要方法之一.然而,传统的数据增强方法并没有关注到情感分析的特点,增强的数据中可能产生语义不一致、情感偏差和过度生成等问题.为了解决以上问题,提出一种针对情感分析的基于ChatGLM模型的多阶段数据增强策略.先使用EDA方法对文本进行词语级别的简单数据增强,再通过情感词典对生成的数据进行过滤,最后通过ChatGLM模型进行句子级别的增强.实验结果表明,该方法与传统最优的数据增强方法相比,在3个数据集上的准确度分别提升了1.9%、2.1%、2.2%,证明了该方法对于少样本情感分析的有效性.

外文标题：Data Augmentation Method for Sentiment Analysis Based on ChatGLM

外文摘要：Sentiment analysis is one of the popular tasks in natural language processing.Due to the difficulty and high cost of annotating train-ing data,sentiment analysis with limited samples has drawn people's attention.Data augmentation methods are one of the primary approaches for handling limited sample learning.However,traditional data augmentation methods have not taken into account the characteristics of senti-ment analysis,which can lead to issues such as semantic inconsistencies,sentiment bias,and excessive generation in the augmented data.To address these problems,a multi-stage data augmentation strategy based on the ChatGLM model is proposed specifically for sentiment analysis.Specifically,it starts with simple word-level data augmentation using EDA methods,followed by filtering the generated data using a sentiment lexicon,and finally,enhancing it at the sentence level using the ChatGLM model.Experimental results demonstrate that this data augmenta-tion method improves accuracy by 1.9%,2.1%,and 2.2%on three different datasets compared to the traditional optimal data augmentation method,confirming the effectiveness of this approach for limited sample sentiment analysis.

外文关键词：

few-shot learningsentiment analysisdata augmentationpre-trained modelsnatural language processing

作者：

高新周、叶宁、徐康、王甦、王汝传

展开 >

作者单位：

南京邮电大学计算机学院

江苏省无线传感网高技术研究重点实验室,江苏南京 210023

关键词：

少样本学习情感分析数据增强预训练模型自然语言处理

出版年：

2024

DOI：

10.11907/rjdk.232292

软件导刊

湖北省信息学会

软件导刊

影响因子：0.524

ISSN：1672-7800

年,卷(期)：2024.23(12)