基于SimCSE和BERT混合模型的短文本情感分类

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：为了解决BERT模型训练效果受到文本向量存在的各向异性问题,将对比学习(SimCSE)和BERT结合起来构建模型(SimCSE-BERT),分类器不但通过对比学习思想扩充了训练数据量,还可基于SimCSE模型获得"对齐"和"均匀性"俱佳的文本向量去优化基础BERT模型以提高分类效果.实验结果表明,与基础BERT模型相比,混合模型的准确率在外卖、携程酒店和淘宝数据集上分别提升 0.562、0.584 和0.734 个百分点.该模型在短文本情感分类数据集上的分类效果有明显提升,并且具有良好的泛化能力.

外文标题：Short Text Emotion Classification Based On SimCSE and BERT Hybrid Model

外文摘要：In order to solve the problem that the training effect of the BERT model is affected by the anisotropy of the text vector.This paper combines comparative learning(SimCSE)and BERT to build a model(SimCSE-BERT).The classifier not only expands the amount of training data through the idea of comparative learning,but also obtains text vectors with good ″alignment″and ″uniformity″based on the SimCSE model to optimize the basic BERT model to improve the classification effect.Experimental results are as follows:compared with the basic BERT model,the accu-racy of the hybrid model increases by 0.562,0.584,and 0.734 percentage points in takeout,Ctrip hotel,and Taobao data sets respectively.The classification effect of this model on short text emotional classification data set has been significantly improved,and it has good generalization ability.

外文关键词：

Emotion classificationHybrid ModelText vector

作者：

刘继、李帅文

展开 >

作者单位：

新疆财经大学统计与数据科学学院,新疆乌鲁木齐 830012

新疆财经大学新疆社会经济统计与大数据应用研究中心,新疆乌鲁木齐 830012

关键词：

情感分类混合模型文本向量

基金：

国家自然科学基金国家自然科学基金

项目编号：

7216403471762028

出版年：

2024

计算机仿真

中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD

影响因子：0.518

ISSN：1006-9348

年,卷(期)：2024.41(5)

参考文献量7