情报杂志2024,Vol.43Issue(8) :84-91.DOI:10.3969/j.issn.1002-1965.2024.08.011

基于预训练模型的漏洞信息检索系统研究

Research on Vulnerability Information Retrieval System Based on Pre-Training Model

刘烨 杨良斌
情报杂志2024,Vol.43Issue(8) :84-91.DOI:10.3969/j.issn.1002-1965.2024.08.011

基于预训练模型的漏洞信息检索系统研究

Research on Vulnerability Information Retrieval System Based on Pre-Training Model

刘烨 1杨良斌1
扫码查看

作者信息

  • 1. 国际关系学院网络空间安全学院 北京 100091
  • 折叠

摘要

[研究目的]威胁情报中漏洞信息是指有关网络、系统、应用程序或供应链中存在的漏洞的信息.目前搜索引擎在漏洞信息检索上存在短板,利用预训练模型来构建漏洞检索系统可以提高检索效率.[研究方法]以公开的漏洞信息作为数据来源,构建了一个问答数据集,对Tiny Bert进行增量预训练.使用模型对于每个查询向量化,并把漏洞信息构建成faiss向量数据库,利用HNSW索引进行多通道和单通道召回检索.然后对模型进行对比学习微调生成双塔和单塔模型,利用双塔召回和单塔精排构建了 一个简易的知识检索系统.[研究结论]实验结果表明,预训练模型可以显著地提升检索性能,对比学习微调的双塔模型在构建的漏洞信息测试集中TOP1召回率为92.17%.通过漏洞信息领域的检索实践,对构建威胁情报的检索系统提供了参考.

Abstract

[Research purpose]Vulnerability information in threat intelligence refers to information about the presence of vulnerabilities in networks,systems,applications,or supply chains.Current search engines have shortcomings in vulnerability information retrieval sys-tems.Using the pre-training model to build a vulnerability retrieval system can improve the retrieval efficiency.[Research method]U-sing public vulnerability information as the data source,we construct a question answering dataset and incrementally pre-train Tiny Bert.The model is used to vectorize each query,the vulnerability information is built into a faiss vector database,and the HNSW index is used for multi-channel and single-channel recall retrieval.Then,the two-tower and single-tower models are generated by comparative learning fine-tun-ing,and a simple knowledge retrieval system is constructed by using two-tower recall and single-tower fine ranking.[Research conclusion]The experimental results show that the retrieval performance can be significantly improved by using the pre-trained model,and the TOP 1 re-call rate of the two-tower model fine-tuned by comparative learning is 92.17%in the constructed vulnerability information test set.Through the retrieval practice in the field of vulnerability information,it provides some reference for building the retrieval system of threat intelligence.

关键词

威胁情报/预训练模型/漏洞信息/多通道搜索技术/信息检索系统

Key words

threat intelligence/pre-training model/vulnerability information/multi-channel search technique/information retrieval sys-tem

引用本文复制引用

基金项目

中央高校基本科研业务经费(3262024T01)

中国科学院文献情报中心委托项目(H20230021)

出版年

2024
情报杂志
陕西省科学技术信息研究所

情报杂志

CSTPCDCSSCICHSSCD北大核心
影响因子:1.502
ISSN:1002-1965
参考文献量1
段落导航相关论文