As the network environment becomes increasingly complex,the security landscape is growing more severe,making the protection of networks from external attacks a crucial task.In order to transform cybersecurity from a reactive defense approach to proactive defense,Cyber Threat Intelligence(CTI)has emerged.By analyzing and detecting CTI,gathering intelligence evidence,potential attacks can be prevented.Therefore,sharing CTI to defend against cyber-attacks has become increasingly important.However,CTI is often shared in an unstructured format,making its conversion to semi-structured or structured data essential for many subsequent tasks.Named Entity Recognition(NER)technology can facilitate this transformation.Although NER has achieved considerable success in general domains,many challenges remain in the field of CTI.This article first introduces the background of threat intelligence and its connection to NER.Then,it summarizes NER technologies in chronological order,covering rule-based and dictionary-based NER,unsupervised learning methods,feature-based supervised learning methods,and deep learning-based NER.It provides a comprehensive overview of the current research status and future directions of NER in the CTI field.Lastly,a comparative study of the corpora used for NER in CTI is conducted,followed by experiments using state-of-the-art(SOTA)deep learning methods.The analysis identifies issues present in CTI datasets.The proposed BBC(BERT-BiGRU-CRF)deep learning entity recognition model achieves the best experimental results,with F1 scores of 97.36%,90.40%,82.87%,and 73.91%on the AutoLabel,DNRTI,CTIReports,and APTNER datasets,respectively.
关键词
命名实体识别/网络威胁情报/深度学习/网络威胁情报数据集
Key words
named entity recognition/Cyber Threat Intelligence(CTI)/deep learning/CTI datasets