首页|基于RoBERTa_BiLSTM_CRF的文本情报命名实体识别

基于RoBERTa_BiLSTM_CRF的文本情报命名实体识别

扫码查看
随着网络信息的爆炸式增长,威胁情报分析作为军事情报分析与战略决策的重要组成部分,其面临着来源多样化和信息结构复杂化的挑战.传统的人工信息提取方法在处理这些大量结构化及非结构化信息时效率低下,准确性有限.文中针对这一挑战,提出了一种结合RoBERTa、BiLSTM和条件随机场(Conditional Random Fields,CRF)的命名实体识别新算法.此算法通过Ro-BERTa模型深入挖掘文本的语义特征,BiLSTM模型捕捉序列上下文信息,CRF层用于精确的实体标记,从而有效提升信息提取的准确率和效率.本文基于开源情报语料库构建了一个涉及导弹发射事件的命名实体识别数据集,并在此基础上进行了实验,结果表明,该方法在精确率、召回率及F1值等关键指标上相较于主流深度学习方法表现出显著的性能提升,其中F1值高达94.21%.
Named Entity Recognition for Textual Intelligence Based on RoBERTa_BiLSTM_CRF
With the explosive growth of network information,threat intelligence analysis,as an important part of military intelligence analysis and strategic decision-making,faces challenges such as diversified sources and complex information structures.Traditional manual information extraction methods are ineffi-cient and limited in accuracy when dealing with these large amounts of structured and unstructured infor-mation.In response to these challenges,this paper proposes a new named entity recognition algorithm that combines RoBERTa,BiLSTM,and Conditional Random Fields(CRF).The algorithm leverages the RoBERTa model to deeply explore the semantic features of the text,the BiLSTM model to capture the se-quence context information,and the CRF layer for precise entity labeling,thereby effectively improving the accuracy and efficiency of information extraction.Based on an open-source intelligence corpus,we constructed a named entity recognition dataset involving missile launch events and conducted experi-ments.The results show that our method significantly outperforms mainstream deep learning methods in key metrics such as precision,recall,and Fl score,with an Fl score as high as 94.21%.

threat intelligence analysisnamed entity recognitionRoBERTaBiLSTMCRF

陆泽健、赵文、尹港港

展开 >

中国电子科学研究院,北京 100041

威胁情报分析 命名实体识别 RoBERTa BiLSTM CRF

2024

中国电子科学研究院学报
中国电子科学研究院

中国电子科学研究院学报

影响因子:0.663
ISSN:1673-5692
年,卷(期):2024.19(5)
  • 7