首页|卫星领域语料库构建与命名实体识别

卫星领域语料库构建与命名实体识别

Satellite domain corpus construction and named entity recognition

扫码查看
针对卫星领域命名实体语料匮乏、现有算法识别性能较低的问题,提出一种考虑模糊边界的卫星领域实体标注方法,构建包含8 类常见卫星领域实体的语料库,与该领域现有语料库相比粒度更细、覆盖更广,并以此为基础提出迁移学习和多网络融合的卫星领域实体识别算法.该算法采用预训练双向编码器对语料语义平滑迁移获得子词级别特征,采用双向长短期记忆(bi-directional long-short term memory,BiLSTM)神经网络捕捉上下文信息确定边界,以条件随机场作为解码器实现标签预测.实验结果表明:相比于BiLSTM等传统模型具有更优的识别性能,算法在8种实体上的F1值均在92%以上,微平均F1值达到96.10%.
Aiming at the lack of named entity corpus in the satellite domain and the low recognition performance of existing algorithms,a satellite domain entity labeling method considering fuzzy boundaries was proposed,constructed a corpus containing 8 common satellite domain entities where the granularity was finer and the coverage was wider in comparison with the existing corpora in this field.Based on this,a transfer learning and multi-network fusion satellite domain entity recognition algorithm was proposed.Algorithm used pretrained bidirectional encoder representations for transformers to smoothly transfer the semantics of the corpus for subword-level features,a BiLSTM(bi-directional long-short term memory)network for capturing contextual information to determine boundaries,and label prediction was achieved using a conditional random field as a decoder.Experimental results show that,compared with traditional models such as BiLSTM,the proposed algorithm has better recognition performance where the F1-score in 8 entities is all above 92%and the micro-average F1-score reaches96.10%.

name entity recognitiontransfer learningneural networksdata scarcity

徐聪、石会鹏、陈志敏、张鑫宇、王静、杨甲森

展开 >

中国科学院国家空间科学中心 复杂航天系统电子信息技术重点实验室,北京 100190

中国科学院大学,北京 100049

国家无线电监测中心检测中心,北京 100041

命名实体识别 迁移学习 神经网络 数据稀缺

中国科学院复杂航天系统电子信息技术重点实验室择优基金资助项目

Y42613A32S

2024

国防科技大学学报
国防科学技术大学

国防科技大学学报

CSTPCD北大核心
影响因子:0.517
ISSN:1001-2486
年,卷(期):2024.46(4)
  • 7