Semi-Supervised Self-Training Sentiment Classification Algorithm Based on TS-Aug Architecture
With the popularity of online teaching resources,the text data size for resource evaluation has gradually increased.Traditional supervised text classification heavily relies on labeled data and requires sufficient and high-quality data to achieve good results.The difficulty of this task has become increasingly high due to the difficulty in obtaining and uneven quality of labeled data.To address this difficulty,this paper proposes a semi-supervised self-training scheme named TS-Aug.By adding unlabeled data and pseudo-labels for training,we can significantly expand the sample set through the aggressive data augmentation,and also solved the overfitting risk in data augmentation.Specifically,the process involves initializing supervised training using labeled data and weak augmentation strategies,followed by semi-supervised training using unlabeled data and strong augmentation strategies,and finally fine-tuning the model with supervised training using labeled data.In our self-built online course comment data,we can improve the classification F1-score from 0.88 to 0.95.This indicates that the TS-Aug semi-supervised self-training scheme has good applied prospects in text classification tasks.