Named Entity Recognition of Aviation Unsafe Events Embedded with Fusion Domain Dictionary
许雅玺 1孟天宇 2王欣 3刘炳南4
扫码查看
点击上方二维码区域,可以放大扫码查看
作者信息
1. 中国民用航空飞行学院经济与管理学院,广汉 618307
2. 四川腾盾科技有限公司,成都 610037
3. 中国民用航空飞行学计算机学院,广汉 618307
4. 中国国际航空股份有限公司,重庆 401120
折叠
摘要
针对航空不安全事件领域命名实体识别任务,以航空安全信息周报为数据源,分析并构建航空不安全事件命名实体识别数据集和领域词典.为解决传统命名实体识别模型对于捕获领域实体边界性能较差的问题,基于BERT(bidirectional encoder representations from transformers)预训练语言模型提出融合领域词典嵌入的领域语义信息增强的方法.在自建数据集上进行多次对比实验,结果表明:所提出的方法可以进一步提升实体边界的识别率,相较于传统的双向长短期记忆网络-条件随机场(bi-directional long short term memory-conditional random field,BiLSTM-CRF)命名实体识别模型,性能提升约 5%.
Abstract
Aiming at the task of named entity recognition in the field of aviation unsafe events,the aviation safety information weekly report was used as the data source to analyze and construct the named entity recognition dataset and domain dictionary of aviation unsafe events.In order to solve the problem of poor performance of the traditional named entity recognition models in capturing domain entity boundaries,based on the bidirectional encoder representations from transformers(BERT)pre-trained language model,a method for enhancing domain semantic information by integrating domain dictionary embedding was proposed.Several comparative experiments were carried out on self-built datasets.The results show that the proposed method can further improve the recognition rate of entity boundaries.Compared with the traditional bi-directional long short term memory-conditional random field(BiLSTM-CRF)named entity recognition model,the performance is improved by about 5%.
关键词
航空不安全事件/领域词典/命名实体识别/预训练语言模型
Key words
aviation unsafe events/domain dictionary/named entity recognition/pre-trained language model