基于BERT模型的空管危险源文本数据挖掘
Textual Data Mining of Air Traffic Control Hazard Sources Based on BERT Modeling
杨昌其 1姜美岑 1林灵1
作者信息
- 1. 中国民用航空飞行学院,四川 广汉 618000
- 折叠
摘要
由于危险源与安全隐患在民航安全管理工作中容易出现概念混淆和记录混乱的情况,根据双重预防机制管理规定,需要将两者区分开来.通过在ASIS系统上采集得到空管危险源控制清单作为研究对象,并对其进行相应的文本数据挖掘工作.根据危险源与安全隐患特点构建相应的文本分类模型:首先通过文本清洗、去停用词、Jieba分词等对空管危险源控制清单进行预处理,然后基于BERT模型生成词向量,采用BERT-Base-Chinese预训练模型进行预训练,并对模型进行微调超参数,最后结合Softmax分类器得到分类结果.
Abstract
As hazardous sources and safety hazards are prone to conceptual confusion and record confusion in civil aviation safety management,it is necessary to distinguish the two according to the management regulations of the dual prevention mechanism.The control list of ATC hazardous sources is collected on ASIS system as the research object of this paper,and the corresponding text data mining work is carried out on it.The corresponding text classification model is constructed according to the characteristics of haz-ardous sources and safety hazards:firstly,the ATC hazardous source control list is preprocessed by text cleaning,de-duplication,Jieba split,etc.,and then the word vectors are generated based on the BERT model,and the pre-training model is pre-trained using the BERT-Base-Chinese pre-training model with fine-tuning of hyper-parameters,and finally,the classification is combined with a Softmax classifier to get the classification results.
关键词
文本分类/数据挖掘/BERT模型/危险源/安全隐患Key words
text categorization/data mining/BERT model/hazard sources/safety hazards引用本文复制引用
基金项目
中国民用航空局空中交通管理局横向科研项目(H2023-100)
出版年
2024