基于预训练模型的漏洞信息检索系统研究

扫码查看

原文链接

万方数据
维普

中文摘要：[研究目的]威胁情报中漏洞信息是指有关网络、系统、应用程序或供应链中存在的漏洞的信息.目前搜索引擎在漏洞信息检索上存在短板,利用预训练模型来构建漏洞检索系统可以提高检索效率.[研究方法]以公开的漏洞信息作为数据来源,构建了一个问答数据集,对Tiny Bert进行增量预训练.使用模型对于每个查询向量化,并把漏洞信息构建成faiss向量数据库,利用HNSW索引进行多通道和单通道召回检索.然后对模型进行对比学习微调生成双塔和单塔模型,利用双塔召回和单塔精排构建了一个简易的知识检索系统.[研究结论]实验结果表明,预训练模型可以显著地提升检索性能,对比学习微调的双塔模型在构建的漏洞信息测试集中TOP1召回率为92.17％.通过漏洞信息领域的检索实践,对构建威胁情报的检索系统提供了参考.

外文标题：Research on Vulnerability Information Retrieval System Based on Pre-Training Model

外文摘要：[Research purpose]Vulnerability information in threat intelligence refers to information about the presence of vulnerabilities in networks,systems,applications,or supply chains.Current search engines have shortcomings in vulnerability information retrieval sys-tems.Using the pre-training model to build a vulnerability retrieval system can improve the retrieval efficiency.[Research method]U-sing public vulnerability information as the data source,we construct a question answering dataset and incrementally pre-train Tiny Bert.The model is used to vectorize each query,the vulnerability information is built into a faiss vector database,and the HNSW index is used for multi-channel and single-channel recall retrieval.Then,the two-tower and single-tower models are generated by comparative learning fine-tun-ing,and a simple knowledge retrieval system is constructed by using two-tower recall and single-tower fine ranking.[Research conclusion]The experimental results show that the retrieval performance can be significantly improved by using the pre-trained model,and the TOP 1 re-call rate of the two-tower model fine-tuned by comparative learning is 92.17％in the constructed vulnerability information test set.Through the retrieval practice in the field of vulnerability information,it provides some reference for building the retrieval system of threat intelligence.

外文关键词：

threat intelligencepre-training modelvulnerability informationmulti-channel search techniqueinformation retrieval sys-tem

作者：

刘烨、杨良斌

展开 >

作者单位：

国际关系学院网络空间安全学院北京 100091

关键词：

威胁情报预训练模型漏洞信息多通道搜索技术信息检索系统

基金：

中央高校基本科研业务经费中国科学院文献情报中心委托项目

项目编号：

3262024T01H20230021

出版年：

2024

DOI：

10.3969/j.issn.1002-1965.2024.08.011

情报杂志

陕西省科学技术信息研究所

情报杂志

CSTPCDCSSCICHSSCD北大核心

影响因子：1.502

ISSN：1002-1965

年,卷(期)：2024.43(8)

参考文献量1