针对中文网络安全领域缺乏公开数据集和有效的命名实体识别(Named Entity Recognition,NER)方法,提出一种融合汉字多源信息的网络安全NER方法.通过构建数据集中所有字符的偏旁和字频向量表,增强了中文字向量的特征表达能力,嵌入到改进的词汇融合模型中进行字向量与词向量的融合,输入到条件随机场(Conditional Random Fields,CRF)进行解码.实验结果表明,该方法在保持较快解码速度和占用较低计算机资源的情况下,在网络安全数据集上,其准确率、召回率和F1值分别为0.864 9、0.840 2和0.852 3,均优于现有模型,能够为后续网络安全知识图谱的构建提供支撑.
Network Security Named Entity Recognition Method Based on Deep Learning
To solve the problem of the lack of public datasets and effective Named Rntity Recognition(NER)methods in the field of Chinese network security,a network security NER method based on multi-source information of Chinese characters is proposed.By constructing the radical and word frequency vector table of all characters in the dataset,the feature expression ability of the Chinese word vector is enhanced,embedded in the improved vocabulary fusion model to fuse character vector and word vector,and finally input to Conditional Random Fields(CRF)for decoding.Experimental results show that the accuracy,recall rate and F1 values of 0.864 9,0.840 2 and 0.852 3 on the network security dataset respectively,are better than the existing models while maintaining a fast decoding speed and occupying low computer resources,which can improve the support for the subsequent construction of network security knowledge graphs.