This paper proposes a text classification method for an imbalanced short text dataset,which includes Data Augmentation,Dilated Convolution,and ProbSparse Self-Attention.The proposed method addresses the issue of sample imbalance through Roformer-Sim.Additionally,the character embedding vector is obtained using RoBERTa in the embedding layer,and the structure of TextRCNN is utilized for feature extraction to extract information from the text.At the same time,the Dilated Convolution was used in the pooling layer to prevent the loss of important informa-tion and ProbSparse Self-Attention was used to obtain weights for different word embedding vector.The classification F1 value of the proposed model on the Dataset of Inspection Records of Civil Aviation Regulatory Matters reached 96.31%.The comparative experimental results with other classic deep learning algorithms show that the model pro-posed in this paper performs well in the application of the short text dataset.
关键词
不平衡文本/文本分类/数据增强/空洞卷积/概率稀疏自注意力
Key words
Imbalanced short text/Text classification/Data augmentation/Dilated convolution/ProbSparse self-at-tention