首页|基于预训练模型的漏洞信息检索系统研究

基于预训练模型的漏洞信息检索系统研究

扫码查看
[研究目的]威胁情报中漏洞信息是指有关网络、系统、应用程序或供应链中存在的漏洞的信息.目前搜索引擎在漏洞信息检索上存在短板,利用预训练模型来构建漏洞检索系统可以提高检索效率.[研究方法]以公开的漏洞信息作为数据来源,构建了一个问答数据集,对Tiny Bert进行增量预训练.使用模型对于每个查询向量化,并把漏洞信息构建成faiss向量数据库,利用HNSW索引进行多通道和单通道召回检索.然后对模型进行对比学习微调生成双塔和单塔模型,利用双塔召回和单塔精排构建了 一个简易的知识检索系统.[研究结论]实验结果表明,预训练模型可以显著地提升检索性能,对比学习微调的双塔模型在构建的漏洞信息测试集中TOP1召回率为92.17%.通过漏洞信息领域的检索实践,对构建威胁情报的检索系统提供了参考.
Research on Vulnerability Information Retrieval System Based on Pre-Training Model
[Research purpose]Vulnerability information in threat intelligence refers to information about the presence of vulnerabilities in networks,systems,applications,or supply chains.Current search engines have shortcomings in vulnerability information retrieval sys-tems.Using the pre-training model to build a vulnerability retrieval system can improve the retrieval efficiency.[Research method]U-sing public vulnerability information as the data source,we construct a question answering dataset and incrementally pre-train Tiny Bert.The model is used to vectorize each query,the vulnerability information is built into a faiss vector database,and the HNSW index is used for multi-channel and single-channel recall retrieval.Then,the two-tower and single-tower models are generated by comparative learning fine-tun-ing,and a simple knowledge retrieval system is constructed by using two-tower recall and single-tower fine ranking.[Research conclusion]The experimental results show that the retrieval performance can be significantly improved by using the pre-trained model,and the TOP 1 re-call rate of the two-tower model fine-tuned by comparative learning is 92.17%in the constructed vulnerability information test set.Through the retrieval practice in the field of vulnerability information,it provides some reference for building the retrieval system of threat intelligence.

threat intelligencepre-training modelvulnerability informationmulti-channel search techniqueinformation retrieval sys-tem

刘烨、杨良斌

展开 >

国际关系学院网络空间安全学院 北京 100091

威胁情报 预训练模型 漏洞信息 多通道搜索技术 信息检索系统

中央高校基本科研业务经费中国科学院文献情报中心委托项目

3262024T01H20230021

2024

情报杂志
陕西省科学技术信息研究所

情报杂志

CSTPCDCSSCICHSSCD北大核心
影响因子:1.502
ISSN:1002-1965
年,卷(期):2024.43(8)
  • 1