首页|基于字词向量融合的民航智慧监管短文本分类

基于字词向量融合的民航智慧监管短文本分类

扫码查看
为解决民航监管事项所产生的检查记录仅依靠人工进行分类分析导致效率低的问题,提出一种基于数据增强与字词向量融合的双通道特征提取的短文本分类模型,探讨民航监管事项的分类,包括与人、设备设施环境、制度程序和机构职责等相关问题.为解决类别不平衡问题,采用数据增强算法在原始文本上进行变换,生成新的样本,使各个类别的样本数量更加均衡.将字向量和词向量按字融合拼接,得到具有词特征信息的字向量.将字词融合的向量分别送入到文本卷积神经网络(TextCNN)和双向长短期记忆(BiLSTM)模型中进行不同维度的特征提取,从局部的角度和全局的角度分别提取特征,并在民航监管事项检查记录数据集上进行试验.结果表明:该模型准确率为0.983 7,F1 值为0.983 6.与一些字嵌入模型和词嵌入模型相对比,准确率提升 0.4%.和一些常用的单通道模型相比,准确率提升3%,验证了双通道模型提取的特征具有全面性和有效性.
Short text classification of civil aviation intelligent supervision based on character-word fusion
In order to address the inefficiencies in manually classifying and analyzing inspection records about civil aviation supervision,a dual-channel feature extraction short text classification model was proposed.The model combined data augmentation techniques and character-word vector fusion.The model aimed to tackle classification issues related to people,equipment and facilities,institutional procedures and institutional responsibilities in civil aviation supervised matters.In order to tackle the issue of class imbalance,data augmentation algorithms were employed to generate new samples by transforming the original texts,thereby balancing the sample sizes across different categories.The word vectors and character vectors were fused by combining them at the character level,resulting in character vectors that retain word-level features.These fused character vectors were then fed into TextCNN and BiLSTM for feature extraction at different dimensions.By extracting features from both local and global perspectives,this dual-channel approach aimed to capture comprehensive and effective information from the inspection records dataset in civil aviation regulatory matters.Experimental results on the civil aviation regulatory matter inspection record dataset demonstrate that the proposed model achieves an accuracy of 0.983 7 and an F1 score of 0.983 6.Compared with some existing word embedding models and character embedding models,the accuracy is improved by 0.4%.Furthermore,when compared with commonly used single-channel models,the accuracy is increased by 3%,which validates the effectiveness and comprehensiveness of the features extracted by the dual-channel model.

character-word vector fusioncivil aviation supervisionshort texttext convolutional neural networks(TextCNN)bi-directional long short-term memory(BiLSTM)

王欣、干镞锐、许雅玺、史珂、郑涛

展开 >

中国民用航空飞行学院 计算机学院,四川 广汉 618307

中国民用航空飞行学院 经济与管理学院,四川 广汉 618307

中国民用航空飞行学院 民航监察员培训学院,四川 广汉 618307

字词向量融合 民航监管 短文本 文本卷积神经网络(TextCNN) 双向长短期记忆(BiLSTM)

国家自然科学基金中央高校基本科研业务费专项中央高校基本科研业务费专项

U2033213J2022-048J2019-045

2024

中国安全科学学报
中国职业安全健康协会

中国安全科学学报

CSTPCD北大核心
影响因子:1.548
ISSN:1003-3033
年,卷(期):2024.34(2)
  • 15