首页|融合空洞卷积和自注意力的民航监管文本分类

融合空洞卷积和自注意力的民航监管文本分类

扫码查看
针对不平衡的短文本数据集的文本分类,提出了一种结合数据增强、空洞卷积和概率稀疏自注意力(ProbSparse Self-Attention)的短文本分类方法。首先,通过RoFormer-Sim解决了样本类别不平衡的问题。其次,在嵌入层中使用RoBERTa获得字嵌入向量。然后,使用TextRCNN的结构通过特征提取来提取文本中包含的信息。同时,在池化层使用了空洞卷积来防止重要信息的丢失,并使用概率稀疏自注意力来获得不同字嵌入向量的权重。所提出的模型在民航监管事项检查记录数据集上的分类F1 值达到96。31%。与其它经典的深度学习算法的对比实验结果表明,上述模型在短文本数据集上应用表现良好。
Text Classification of Civil Aviation Supervision Based on Dilated Convolution and Self-Attention
This paper proposes a text classification method for an imbalanced short text dataset,which includes Data Augmentation,Dilated Convolution,and ProbSparse Self-Attention.The proposed method addresses the issue of sample imbalance through Roformer-Sim.Additionally,the character embedding vector is obtained using RoBERTa in the embedding layer,and the structure of TextRCNN is utilized for feature extraction to extract information from the text.At the same time,the Dilated Convolution was used in the pooling layer to prevent the loss of important informa-tion and ProbSparse Self-Attention was used to obtain weights for different word embedding vector.The classification F1 value of the proposed model on the Dataset of Inspection Records of Civil Aviation Regulatory Matters reached 96.31%.The comparative experimental results with other classic deep learning algorithms show that the model pro-posed in this paper performs well in the application of the short text dataset.

Imbalanced short textText classificationData augmentationDilated convolutionProbSparse self-at-tention

王欣、干镞锐、许雅玺、史珂

展开 >

中国民用航空飞行学院计算机学院,四川 广汉 618307

中国民用航空飞行学院经济与管理学院,四川 广汉 618307

中国民用航空飞行学院民航监察员培训学院,四川 广汉 618307

不平衡文本 文本分类 数据增强 空洞卷积 概率稀疏自注意力

2024

计算机仿真
中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD
影响因子:0.518
ISSN:1006-9348
年,卷(期):2024.41(11)