针对卫星领域命名实体语料匮乏、现有算法识别性能较低的问题,提出一种考虑模糊边界的卫星领域实体标注方法,构建包含8 类常见卫星领域实体的语料库,与该领域现有语料库相比粒度更细、覆盖更广,并以此为基础提出迁移学习和多网络融合的卫星领域实体识别算法.该算法采用预训练双向编码器对语料语义平滑迁移获得子词级别特征,采用双向长短期记忆(bi-directional long-short term memory,BiLSTM)神经网络捕捉上下文信息确定边界,以条件随机场作为解码器实现标签预测.实验结果表明:相比于BiLSTM等传统模型具有更优的识别性能,算法在8种实体上的F1值均在92%以上,微平均F1值达到96.10%.
Satellite domain corpus construction and named entity recognition
Aiming at the lack of named entity corpus in the satellite domain and the low recognition performance of existing algorithms,a satellite domain entity labeling method considering fuzzy boundaries was proposed,constructed a corpus containing 8 common satellite domain entities where the granularity was finer and the coverage was wider in comparison with the existing corpora in this field.Based on this,a transfer learning and multi-network fusion satellite domain entity recognition algorithm was proposed.Algorithm used pretrained bidirectional encoder representations for transformers to smoothly transfer the semantics of the corpus for subword-level features,a BiLSTM(bi-directional long-short term memory)network for capturing contextual information to determine boundaries,and label prediction was achieved using a conditional random field as a decoder.Experimental results show that,compared with traditional models such as BiLSTM,the proposed algorithm has better recognition performance where the F1-score in 8 entities is all above 92%and the micro-average F1-score reaches96.10%.
name entity recognitiontransfer learningneural networksdata scarcity