首页|TS-Aug架构的半监督自训练情感分类算法

TS-Aug架构的半监督自训练情感分类算法

扫码查看
网络教学资源的普及使得资源评价的文本数据规模逐步增大.传统的有监督学习文本分类对标注数据的依赖度较高,需要足够的数据量和高质量数据才能得到良好的结果.在网络教学资源的评价文本工作中,由于标注数据难以获取且质量参差不齐,使得这一任务的难度越来越高.针对这一困难,提出一种TS-Aug半监督自训练方案,通过添加无标签数据并进行伪标签训练,能在强力数据增广的作用下大幅扩充样本集,解决数据增广中的过拟合风险.首先利用标注数据和弱增广策略进行初始化监督训练,然后利用无标注数据和强增广策略进行半监督训练,最后使用标注数据进行微调监督训练.在自建的在线课程评论数据中,能将分类F1-Score从0.88 提升至 0.95,表明TS-Aug半监督自训练方案在文本分类任务中具有较好的应用前景.
Semi-Supervised Self-Training Sentiment Classification Algorithm Based on TS-Aug Architecture
With the popularity of online teaching resources,the text data size for resource evaluation has gradually increased.Traditional supervised text classification heavily relies on labeled data and requires sufficient and high-quality data to achieve good results.The difficulty of this task has become increasingly high due to the difficulty in obtaining and uneven quality of labeled data.To address this difficulty,this paper proposes a semi-supervised self-training scheme named TS-Aug.By adding unlabeled data and pseudo-labels for training,we can significantly expand the sample set through the aggressive data augmentation,and also solved the overfitting risk in data augmentation.Specifically,the process involves initializing supervised training using labeled data and weak augmentation strategies,followed by semi-supervised training using unlabeled data and strong augmentation strategies,and finally fine-tuning the model with supervised training using labeled data.In our self-built online course comment data,we can improve the classification F1-score from 0.88 to 0.95.This indicates that the TS-Aug semi-supervised self-training scheme has good applied prospects in text classification tasks.

few-shot learningsemi-supervised trainingdata augmentationsentiment classification

郭卡、王芳

展开 >

安徽外国语学院信息与数学学院,安徽 合肥 231200

少样本学习 半监督训练 数据增广 情感分类

安徽省高校自然科学研究项目安徽省省级质量工程课程思政教学团队项目安徽外国语学院校级质量工程教学创新团队项目

KJ2021A11972020kcszjxtd34aw2023jxcxtd06

2024

南京师范大学学报(工程技术版)
南京师范大学

南京师范大学学报(工程技术版)

影响因子:0.313
ISSN:1672-1292
年,卷(期):2024.24(1)
  • 17